We’re building a next-generation ride-hailing and intercity mobility platform that connects drivers and passengers through transparency, trust, and real-time technology.
Every trip, every route, and every decision on our platform generates data — and we’re looking for a Data Engineer who can turn that data into a powerful engine for insight and innovation.
You’ll be part of the core team responsible for building a scalable, reliable data infrastructure that supports analytics, pricing models, fraud detection, and AI-driven recommendations.
Responsibilities:
- Design, develop, and maintain robust data pipelines for collecting, processing, and storing large-scale data from multiple sources (rides, routes, users, transactions).
- Build and optimize ETL/ELT workflows to ensure accurate, real-time data availability.
- Collaborate with data scientists and backend engineers to design data models, schemas, and APIs for analytics and ML pipelines.
- Implement streaming data architectures using Kafka, Spark, or similar tools.
- Manage and optimize our data warehouse / lakehouse (e.g., BigQuery, Snowflake, Databricks, or Azure Synapse).
- Ensure data quality, reliability, and observability through monitoring and validation systems.
- Contribute to data governance, security, and compliance best practices.
- Continuously improve performance, scalability, and cost efficiency of our data infrastructure.
Requirements:
- Strong programming skills in Python (Pandas, PySpark, etc.).
- Experience with data pipeline frameworks (Airflow, Luigi, Prefect).
- Hands-on experience with databases (SQL/NoSQL, PostgreSQL, MongoDB, ElasticSearch).
- Familiarity with Spark NLP or other large-scale NLP frameworks.
- Solid understanding of text data preprocessing (tokenization, normalization, cleaning).
- Familiarity with NLP datasets and annotation workflows.
- Knowledge of data security and handling sensitive information.