(ASR, TTS, Multimodal LLM Systems)
## Position Summary
We are seeking an experienced *Machine Learning Specialist to lead the design, and development of enterprise-grade audio AI solutions, including Automatic Speech Recognition (ASR), Text-to-Speech (TTS), and Large Language Models with audio modality.
---
## Key Responsibilities
### Speech Recognition (ASR)
Architect and maintain production ASR systems based on state-of-the-art frameworks such as:
OpenAI Whisper
Meta AI wav2vec 2.0
Conformer-based architectures from Google Research
Lead multilingual and domain-adapted fine-tuning initiatives.
Implement streaming, low-latency, and high-availability ASR services.
Define enterprise evaluation benchmarks (WER, latency SLAs, robustness).
---
### Enterprise Text-to-Speech (TTS)
Design scalable neural TTS systems leveraging technologies inspired by:
Google Tacotron
NVIDIA FastPitch
Microsoft VALL-E
Ensure compliance with voice safety, identity protection, and brand voice policies.
Optimize inference pipelines for GPU clusters and edge devices.
---
### Multimodal & Audio-Enabled LLM Systems
Integrate ASR and TTS pipelines with enterprise LLM ecosystems, including:
OpenAI GPT-based systems
Meta LLaMA variants
Design speech-to-speech conversational platforms.
Develop cross-modal embedding alignment strategies.
Implement guardrails, safety filters, and compliance monitoring.
---
### Architecture & Infrastructure
Deploy models using containerized environments (Docker, Kubernetes).
Implement MLOps best practices (CI/CD, monitoring, drift detection).
Collaborate with DevOps, Security, Legal, and Data Governance teams.
---
## Required Qualifications
MSc or PhD in Computer Science, Machine Learning, Electrical Engineering, or related field.
5+ years of experience in ML engineering or speech AI.
Strong expertise in:
Transformer-based architectures
Speech signal processing
PyTorch / TensorFlow
Distributed training and inference
---
## Preferred Qualifications
Familiarity with HPC & CUDA programming.
Experience with real-time streaming architectures.
Publications or patents in speech or multimodal AI.
Knowledge of model quantization and hardware optimization.