Job Description:
We are seeking a highly skilled Infrastructure Engineer to join our dynamic and fast-paced environment. In this pivotal role, you will be responsible for managing and optimizing mission-critical systems within our infrastructure team. Your primary focus will be overseeing large-scale infrastructure, automating processes, and ensuring the stability and scalability of our systems. We are looking for an expert who can ensure that our mission-critical services remain reliable and efficient, supporting our key business operations.
Key Responsibilities:
System Monitoring & Alerts: Configure, manage, and optimize Monitoring tools (Grafana, Prometheus, ELK Stack) to proactively detect and resolve issues, ensuring maximum uptime and performance.
Automation & Scripting: Develop and maintain Automation scripts using Shell scripting, Python, and Ansible to streamline system management tasks, including deployments, Monitoring, and backups.
Database Management: Administer and maintain databases (PostgreSQL, Redis, MongoDB with replica sets, and ClickHouse) to ensure high availability, performance, and scalability across all platforms.
Message Queue Management: Oversee the setup, maintenance, and Optimization of message queues (RabbitMQ with replicas and Kafka) to ensure efficient and reliable messaging within our distributed systems.
VMware Stack & Virtualization: Manage virtual machines using VMware stack, ensuring scalability, resource Optimization, and reliability across 200+ VMs in a mission-critical environment.
Network & Security: Oversee network devices, firewalls (Pfsense, FortiGate), and Security protocolst o maintain secure and stable connections throughout our infrastructure.
Backup & Documentation: Implement and manage Backup solutions (Veeam) and maintain comprehensive documentation using tools like Bookstack and Passbolt to ensure system integrity and recoverability.
Qualifications:
Education:
Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent experience).
Experience:
- Proven experience in infrastructure engineering or systems administration in mission-critical environments.
- Strong expertise in managing large-scale virtualized environments, particularly with VMware stacks.
- Proficiency in Shell scripting, Python, and Ansible for automating and optimizing infrastructure tasks.
- Deep understanding of Git for version control and working knowledge of integrating CI/CD pipelines.
- Hands-on experience managing Monitoring systems (Grafana, Prometheus) and Logging solutions (ELK Stack).
- Experience with databases such as PostgreSQL, Redis, MongoDB, and ClickHouse.
- Familiarity with message queues like RabbitMQ and Kafka in distributed environments.
- Solid knowledge of networking principles and firewall management (Pfsense, FortiGate).
Skills:
- Strong troubleshooting and problem-solving abilities in a mission-critical context.
- Excellent communication skills and a collaborative, team-oriented mindset.
- Ability to work in high-pressure environments with a focus on system reliability and performance.
- Eagerness to learn and adapt to new technologies and methodologies.
Preferred but Not Required:
- Familiarity with HP servers (DL360, DL380 series), EMC storage solutions, TrueNas, Cisco switches (fiber & copper), and passive networking equipment.
- Experience with Nginx, Apache, and reverse proxy setups.
- Familiarity with high-availability tools like Patroni and HAProxy.
- Exposure to DevOps practices and infrastructure as code principles.
Why Join Us?
- Play a key role in maintaining mission-critical systems within a dynamic, innovative environment.
- Opportunities for growth, learning, and career advancement.
- Collaborate with talented professionals in a fast-paced and challenging work environment.
If you're passionate about infrastructure engineering and looking for a new opportunity to showcase your skills, apply now to become a valued member of our team!