We are looking for a Senior Data Engineer to join and help lead our Big Data team. In this role, you will take ownership of end-to-end data pipelines—from ingestion to analytics—and help architect and evolve our data platform to handle large-scale, production-grade workloads.
Responsibilities:
- Design, build, and optimize scalable batch and streaming data pipelines using Apache Spark and Kafka
- Define and evolve the architecture of our data platform based on business and technical needs
- Own data modeling and schema design for analytics in ClickHouse and other OLAP systems
- Lead the implementation of data quality, monitoring, and governance practices
- Contribute to the setup and tuning of data lake storage (e.g., MinIO/S3)
- Orchestrate workflows using Apache Airflow, ensuring reliability and maintainability
- Work with analysts and business users to ensure data accessibility and usability via BI tools like Superset
- Review code, mentor team members, and contribute to documentation and technical standards
- Stay up-to-date with trends in data infrastructure and recommend improvements
- Help ensure the system can reliably handle multi-terabyte scale data ingestion and processing
Requirements:
- 5+ years of experience in data engineering, with a strong portfolio of production systems
- Deep understanding of distributed data processing (preferably using Apache Spark)
- Production experience with Apache Kafka for real-time or near-real-time data ingestion
- Strong proficiency in Python / Scala, and advanced SQL
- Experience with columnar databases such as ClickHouse or similar OLAP solutions
- Familiarity with S3-compatible object storage (e.g., MinIO) and file formats like Parquet
- Comfortable designing data pipelines in self-hosted and Linux-based environments
- Experience with workflow orchestration tools like Apache Airflow
- Strong communication skills and a collaborative mindset