Job Overview:
We are seeking an experienced Senior Data Engineer to join our team. The ideal candidate will have strong expertise in building and optimizing data pipelines, data architectures, and large-scale data processing systems. You will work closely with data scientists, analysts, and other stakeholders to ensure reliable and efficient data processing and storage solutions.
Key Responsibilities:
- Design, develop, and maintain scalable ETL pipelines to support various data sources and downstream applications.
- Optimize and maintain large-scale, on-premise data systems for high-performance processing.
- Collaborate with data scientists and analysts to support their data infrastructure needs.
- Manage and maintain data warehouses, relational databases, and distributed data processing frameworks.
- Implement data quality and governance practices to ensure data integrity and reliability.
- Analyze and resolve performance bottlenecks in data systems.
- Develop processes for automating and monitoring data flows.
- Ensure high availability and disaster recovery strategies for critical data systems.
Required Skills & Experience:
- 3+ years of experience as a Data Engineer.
- Strong proficiency in SQL and database technologies (e.g., PostgreSQL, MySQL, Oracle).
- Expertise in distributed data processing frameworks (e.g., Apache Hadoop, Spark).
- Solid understanding of ETL processes, data modeling, and data architecture.
- Experience with batch processing and stream processing technologies.
- Knowledge of scripting and programming languages such as Python, Scala, or Java.
- Familiarity with Linux/Unix environments and command-line tools.
- Experience with version control systems (e.g., Git) and CI/CD pipelines.
- Strong problem-solving skills and attention to detail.
Preferred Qualifications:
- Experience with Apache Hive, Apache Kafka, or other messaging and data management tools.
- Familiarity with big data ecosystems and tools.
- Experience in a DevOps environment and managing on-premise infrastructure.
- Knowledge of security best practices for data handling and processing.