We are seeking a Data Engineer (Mid-Senior) to develop data pipelines, support our ML feature engineering workflows, and contribute to the ongoing evolution of our data platforms based on Iceberg, MinIO, and Trino and SQL databases. You will work closely with senior engineers, ML engineers, and data scientists to deliver reliable, well-tested data flows.
Responsibilities:
- Develop ETL/ELT pipelines using Apache Airflow, integrating SQL Server and other sources into PostgreSQL, ClickHouse, and Iceberg.
- Build and maintain ingestion scripts and feature extraction jobs using Python, pandas, Ray, and SQL.
- Maintain data assets stored in MinIO and queried via Trino.
- Contribute to the design and improvement of data warehouse domains and staging layers.
- Collaborate with ML engineers to prepare feature sets for ML models.
- Help maintain FastAPI-based ML model serving pipelines (I/O schemas, data validation, transformations).
- Implement data quality tests, anomaly monitoring, and automated alerts.
- Contribute to DataHub lineage by annotating inlets/outlets in Airflow DAGs.
- Participate in code reviews, write documentation, and support platform reliability.
- Troubleshoot production issues related to data ingestion, storage, or compute.
Requirments:
- 3–5 years of experience in Data Engineering or related roles.
- Strong Python (pandas, some Ray or Spark experience a plus).
- Good SQL knowledge; ability to work with PostgreSQL and SQL Server.
- Experience working with Airflow DAGs (Docker setup is a plus).
- Experience working with Airflow DAGs (Docker setup is a plus).
- Understanding of object storage (S3/MinIO) and data lake concepts.
- Familiarity with data modeling, staging layers, and transformation patterns.
- Knowledge of Iceberg/Delta/Parquet-based workflows (nice to have).
- Bonus: Experience with FastAPI or ML feature engineering.
- Strong analytical mindset and willingness to learn distributed systems.