Site Reliability Developer 4

NoidaHigh payGCC

Research Oracle before you apply

Check ratings, real-employee reviews, verified pay, and interview difficulty.

Glassdoor reviewsRatings, pros/cons, CEO approval↗AmbitionBoxIndia reviews, salaries, interviews↗Levels.fyiVerified compensation by level↗LinkedInPeople, growth, your connections↗

As a Site Reliability Engineer, you will work with the Production Engineering and SRE teams to own, run, and improve critical healthcare services. You will help keep our cloud-native EHR platforms reliable, secure, scalable, and easy to operate.

You will understand how services are built, deployed, monitored, and supported in production. You will work closely with development teams to improve service design, reduce failures, automate manual work, and improve performance.

You will also help develop and use AI and AIOps to improve operations, including smarter alerting, faster incident detection, automated troubleshooting, and better root cause analysis.

Responsibilities

Key Responsibilities
- Own the reliability, availability, performance, and operations of production services.
- Support cloud-native EHR platforms built with microservices, Kubernetes, and OCI.
- Understand service architecture, dependencies, capacity, security, and failure points.
- Improve monitoring, alerting, observability, and incident response.
- Use AI, automation, and AIOps to reduce manual work and improve system health.
- Build tools and scripts for deployment, monitoring, recovery, and operational tasks.
- Troubleshoot complex production issues and drive them to resolution.
- Lead root cause analysis for major incidents and help prevent repeat issues.
- Partner with development teams to improve service design and operability.
- Create and maintain SOPs, runbooks, dashboards, and knowledge articles.
- Support migration and modernization of existing hosting environments to OCI.
- Review code, improve engineering practices, and mentor team members.
- Work with product, development, support, and cloud teams to deliver reliable healthcare solutions.
- Participate in 24x7 on-call rotation for critical services.
AI and Automation Focus
- Design and support AI-driven operational automation.
- Use AI/AIOps for anomaly detection, alert correlation, and incident insights.
- Help build self-healing and auto-remediation capabilities.
- Apply AI safely to improve reliability, supportability, and customer experience.
- Work with engineering teams to bring applied AI into production operations.
What You Bring
- 6 to 10 + years of experience with production systems or distributed platforms.
- Strong experience with Java and scripting using Python or Shell.
- Good knowledge of microservices, Kubernetes, and cloud platforms.
- Experience with OCI, AWS, Azure, or GCP.
- Strong troubleshooting and debugging skills.
- Experience with monitoring, logging, alerting, and observability tools.
- Knowledge of REST APIs, JSON/XML, SQL, and secure data handling.
- Experience with automation, CI/CD, and production deployment.
- Ability to handle customer-impacting issues and technical escalations.
- Experience with AI/ML, AIOps, or automation in production is a plus.
Nice to Have
- Experience with EHR or healthcare platforms.
- Knowledge of HL7 or FHIR.
- Oracle Health or New Millennium experience.
- Oracle Database experience.
- Strong Kubernetes and OCI experience.

Qualifications

Career Level - IC4