← All jobs
AE

Manager-Site Reliability Eng

American Express

Posted 23 Jun 2026

BengaluruHigh payGCCGreat Place to Work
Apply on American Express

Research American Express before you apply

Check ratings, real-employee reviews, verified pay, and interview difficulty.

Manager, Site Reliability Engineering leads and mentors Site Reliability Engineering (SRE) teams, fostering a culture of continuous improvement and inclusivity, while collaborating across the organization to enhance system resilience, scalability, and alignment with business objectives.

Responsibilities

    • Manages and leads a team of Site Reliability Engineering colleagues, enabling a culture of continuous learning, growth opportunities, and inclusivity for all individual colleagues and teams
    • Provides leadership, guidance, and coaching to Site Reliability Engineering teams, supporting training and development of best practices in software development, resiliency, and non-functional system requirements
    • Recruit and develop a high-performing team, recognizing and rewarding achievements, and creating an environment that motivates and energizes colleagues to achieve best business objectives
    • Oversees and facilitates collaboration with Software Engineering teams to design and implement features that improve system resilience, scalability, and performance; ensuring optimal functionality
    • Collaborates with executives, product managers, and other stakeholders to ensure SRE principles are embedded throughout the organization
    • Leads comprehensive chaos engineering experiments and resiliency tests, driving the analyzation of outcomes and implementation of improvements that enhance system robustness and recovery capabilities
    • Plans regular drills and strategic planning to ensure organization is prepared for and can swiftly recover from complex and unexpected disruptions
    • Collaborates and co-creates effectively with teams in product and the business to align technology initiatives with business objectives

Qualifications

  • Education Qualifications:

    • Bachelor’s degree in Computer Science, Information Technology, Engineering, and/or comparable experience; advance degree preferred
    • Knowledge of modern observability stack – Splunk, Elastic Search, Prometheus, Grafana
    • Knowledge of containerization technologies (e.g., Kubernetes, Docker) and microservices architecture
    • Knowledge of observability tools and methodologies, including experience with logging, monitoring, tracing, and performance analysis platforms
    • Knowledge of cloud-based Site Reliability Engineering (SRE) practices and experience with public cloud platforms such as AWS, Azure, or Google Cloud.
    • Work Experience:
    • Experience in software development, or technology operations, with a focus on Site Reliability Engineering
    • Experience in Linux/Unix systems, object-oriented programming languages (e.g., Java), scripting languages (e.g., Python, Bash), and cloud platforms (e.g., AWS, Azure, GCP)

     

    Licenses and Certifications:

    • Advanced certification in Site Reliability Engineering (SRE) or related is a plus