Role Overview: Responsible for the design, deployment, and stability of the underlying hardware and virtualization layers that power our cloud services.
Key Responsibilities:
- Automate the lifecycle of bare-metal servers, from initial provisioning to retirement.
- Maintain and scale OpenStack environments, ensuring high availability of compute and image services using tools like Ansible.
- Architect and manage distributed storage systems (Ceph), focusing on performance tuning for high-I/O AI workloads.
- Perform kernel-level troubleshooting and performance optimization for Linux-based hypervisors.
Requirements:
- Deep knowledge of KVM/QEMU
- Linux systems administration, and software-defined storage
- Experience with infrastructure-as-code (IaC) and provisioning tools (like Ansible) and scripting is essential.