Key Responsibilities
Architecture & Pipeline Development
Design, develop, and optimize scalable data management systems for both batch and real-time analytics.
Build robust, automated data pipelines to ingest, transform, and route data across diverse sources.
Maintain and manage databases (both SQL and NoSQL) that serve as the backbone of our data ecosystem.
Experiment Tracking & Service Maintenance
Administer and optimize MLflow (or similar experiment tracking tools) to support model tracking, versioning, and deployment.
Oversee other critical data services to ensure smooth, continuous operation of end-to-end MLOps workflows.
Advanced Analytics & Distributed Processing
Implement real-time analytics systems using time series OLAP databases (e.g., TimescaleDB, InfluxDB) alongside traditional SQL solutions.
Utilize distributed computing frameworks such as Apache Spark (or alternatives like Apache Flink) for large-scale data processing.
Cloud & Storage Technologies
Utilize cloud storage solutions such as AWS S3; evaluate and deploy open-source alternatives (e.g., MinIO) when appropriate.
Set up and manage hybrid OTAP (On-premise, Test, Acceptance, Production) environments to support continuous development and deployment.
Vector Databases & AI Integration
Integrate advanced data technologies, including vector databases (e.g., Qdrant), to support high-performance vector search and similarity-based queries.
Collaborate with data science and AI teams to embed machine learning models into production workflows and optimize MLOps pipelines (CI/CD, automated testing, model deployment, and monitoring).
Software Development & DevOps
Develop and maintain robust codebases in both Python and Java, ensuring high standards for scalability and maintainability. Implement CI/CD pipelines using tools such as Jenkins or GitLab CI/CD to support continuous integration and deployment practices.
System Administration & Linux Proficiency
Manage Linux-based servers and services, ensuring optimal performance and security of our data infrastructure. Configure and troubleshoot system-level issues to support smooth operation of data engineering services.
Collaboration & Mentorship
Work closely with cross-functional teams—including data scientists, ML engineers, and business analysts—to align technical solutions with business requirements.
Mentor junior engineers on data engineering best practices, MLOps strategies, and emerging technologies.
Core Technical Skills:
Programming:
Proficiency in Python and Java for data processing and application development.
Database Management:
Strong experience with SQL databases; familiarity with NoSQL solutions.
Ability to administer and maintain databases and data services.
Experiment Tracking & MLOps Tools:
Hands-on experience with MLflow (or equivalent) for experiment tracking, model registry, and deployment.
Distributed Computing:
Experience with Apache Spark or similar frameworks (e.g., Apache Flink) for scalable data processing.
Cloud & Storage:
Experience with AWS S3 and knowledge of open-source S3 alternatives such as MinIO.
Operating Systems:
Advanced proficiency with Linux, including system administration and troubleshooting.
DevOps & CI/CD:
Familiarity with CI/CD practices and tools (e.g., Jenkins, GitLab CI/CD).
MLOps Integration:
Understanding of AI/ML concepts and integration of machine learning models into production data pipelines.
Agile Methodologies:
Experience working in Agile environments with effective version control practices.