متن کامل آگهی:
We are looking for a passionate and experienced Senior Site Reliability Engineer (SRE) to join our infrastructure team. In this role, you will be responsible for building and maintaining scalable, resilient systems that power our core services. You will work closely with engineering, DevOps teams to ensure high availability, performance, and operational efficiency across all systems.
Key Responsibilities:
Design and implement scalable and highly available infrastructure solutions
Manage monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, EFK Stack)
Collaborate with DevOps, backend, and data teams to improve CI/CD and automation workflows
Define and track SLOs, SLAs, and SLIs for critical services
Conduct incident response, root cause analysis, and performance tuning
Document infrastructure processes, configurations, and architectural decisions
Requirements:
3+ years of experience as an SRE, DevOps Engineer, or related role
Strong Linux system administration background
Hands-on experience with CI/CD tools (Preferred GitLab CI)
Deep knowledge of Kubernetes and related tools (Helm, ArgoCD, etc.)
Proficiency in building and maintaining observability tools
Solid understanding of infrastructure and Infrastructure as Code (e.g., Ansible)
Strong problem-solving skills and a collaborative mindset
What You’ll Get:
The chance to work on real scaling challenges
A team that values transparency, curiosity, and learning
Budget for learning, courses, and conferences
Benefits:
Join our friendly and dynamic team and enjoy a range of perks, such as:
Monthly social events and gathering
Breakfast
Lunch subsidies
Transportation budget
On-site medical care
Comprehensive health insurance
Parking space
Seasonal and special charges and discounts from Okala
Occasional Gifts