← All jobs
M

Senior Site Reliability Engineering

Microsoft

Listed 26 Jun 2026

BengaluruSite Reliability EngineeringTop payGCC
Apply on Microsoft

Research Microsoft before you apply

Check ratings, real-employee reviews, verified pay, and interview difficulty.

Incident triage and first-line response: Provide on-call coverage for incoming incidents across CDI services. Perform initial investigation, severity assessment, and routing to owning engineering teams. Agentic triage system development: Build and extend AI-driven agents that ingest ICM alerts, correlate with recent deployments and feature flag rollouts, check known-issue databases, and produce initial assessments with suggested severity and owning team. TSG and known-issue matching: Develop automation that matches incoming incidents to relevant Troubleshooting Guides (TSGs) and known issues across Fabric and Power Platform — reducing investigation time and enabling faster resolution. Auto-routing and classification: Configure and extend ICM routing rules and build intelligent classification systems based on service tree, alert signatures, and historical patterns. Incident lifecycle automation: Build agents for incident summarization, customer communications drafting, postmortem generation, and reporting, replacing manual authoring with AI-assisted workflows requiring human judgment only for high-severity incidents. Embody our culture and values Master's Degree or Bachelors in Computer Science, Information Technology, or related field AND 7+ year(s) technical experience in software engineering, network engineering, or systems administration. 4+ years of software engineering experience in site reliability, Live site operations, or incident management for cloud services. Experience with incident management systems and workflows (ICM, PagerDuty, ServiceNow, or similar). Experience with monitoring, alerting, and observability systems (Kusto, Geneva, Grafana, or similar). Strong programming skills in one or more of: C#, PowerShell, Python, KQL/Kusto. Ability to work in an on-call rotation across time zones in a geographically distributed team. Strong communication skills to interface with engineers, leadership, support, and customers. Equal Opportunity Employer (EOP)