Job Description:
As a Senior Data Engineer, you will be responsible for designing, building, and maintaining robust data pipelines and infrastructure. The ideal candidate will have extensive experience with SQL, Kafka, Airflow, and modern data lake technologies, with a primary focus on Python development.
Key Responsibilities:
- Design, develop, and maintain scalable data pipelines and ETL processes.
- Implement real-time data processing solutions using Kafka combined with Spark Streaming, Flink, or Kafka Streams.
- Optimize and manage complex data workflows using Apache Airflow.
- Write efficient, reusable, and reliable code, with a strong emphasis on Python.
- Architect storage solutions using MinIO and implement modern table formats like Apache Iceberg or Delta Lake.
- Perform data modeling and database design to support business needs.
- Monitor and troubleshoot data pipelines to ensure data quality and reliability.
Qualifications:
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
- +5 years of experience in data engineering or a similar role.
- High proficiency in SQL and experience with databases (e.g., ClickHouse, MS SQL Server).
- Expert proficiency in Python for data processing and pipeline development in workflow orchestration (Apache Airflow).
- Strong experience with Apache Kafka.
- Experience with at least one of the following streaming frameworks: Spark Streaming, Apache Flink, or Kafka Streams.
- Experience with object storage systems (specifically MinIO) and open table formats (Apache Iceberg or Delta Lake).
- Strong understanding of data modeling, ETL processes, and data warehousing concepts.
- Excellent problem-solving skills and attention to detail.
- Strong communication and collaboration skills, with the ability to work effectively in a team environment.
- Experience with Java is preferred (but not mandatory).