About the Role
We are seeking an MLOps Engineer who can deploy, optimize, and maintain AI models particularly LLMs and Voice AI in real-world environments.
The main focus of this role is on deploying language models (such as DeepSeek, OpenAI API), working with local GPUs or dedicated servers, and managing real-time communication with language and voice-to-text services.
Responsibilities
- Work with APIs of large language models (OpenAI, DeepSeek, and similar), including managing API keys, rate limits, and stable connections.
- Install, configure, and deploy LLMs on GPUs (e.g., DeepSeek, Mistral, Llama, etc.).
- Implement and integrate Voice-to-Text solutions (such as Whisper or Google Speech API).
- Create and maintain streaming connections to LLMs for live and real-time responses.
- Monitor GPU usage, RAM consumption, and task loads, and optimize system performance.
- Write scripts for simple automation of deployment or monitoring (using Python or Bash).
- Collaborate closely with the backend and model teams to ensure smooth and stable system performance.
Required Skills and Experience
- Proficiency in Python and basic ML libraries (PyTorch or TensorFlow, at least for execution and configuration).
- Hands-on experience deploying LLMs on GPUs.
- Familiarity with APIs such as OpenAI, DeepSeek, and similar services.
- Good understanding of GPU operations (nvidia-smi, memory usage, batching, etc.).
- Experience with lightweight monitoring tools such as Prometheus or Grafana (basic level).
- Ability to work in Linux environments and familiarity with Docker for simple deployments.
- Knowledge of Voice-to-Text frameworks such as Whisper, Vosk, or SpeechRecognition.
Nice to Have
- Experience with streaming or WebSocket connections to models.
- Familiarity with LLM quantization or optimization techniques.
- Interest in inference optimization and latency reduction.