Site Reliability Engineering (SRE) is a relatively new term in the software industry. It is a software engineering approach designed for improved system management and problem solving. Think of it as a new form of system administration.
In SRE, a software engineer is in charge of tasks that are usually performed by the operations team. Site reliability engineering involves ensuring the availability, latency, performance, capacity, scalability and deployment of software systems by the engineers themselves.
In this approach, software meets operations. Companies using SRE hire people with software development experience in order to solve infrastructure and operational problems.
Want to know the difference between an SRE and a DevOps engineer? Read our blog here: https://www.opsera.io/learn/sre-vs-devops-responsibilities-differences-salaries
A site reliability engineer excels at the production side of software. They are expected to ensure that software is delivered and deployed flawlessly. Additionally, SREs are responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning
The SRE model hinges on effective standardization and automation. Engineers are tasked with ideating and implementing methods to enhance and automate operational tasks, thus streamlining development and deployment processes.
Like system administrators, SREs must have some software development experience, but their primary strengths are network engineering, troubleshooting, deployment, configurations. They must also be effective multitaskers, as they must ensure multiple system components collaborate and deliver results consistently.
For greater clarity, let’s look at the average day of a site reliability engineer:
Bear in mind that due to its relatively recent origin, the SRE role is highly subjective when it comes to specific responsibilities. At some companies, SREs play a key role in software development and programming, while at others they might be expected to focus specifically on the operations side.
Suggested Read: What is DevSecOps and Why Is It Important for Your Company?
To land a site reliability engineering job, study the questions listed below. Prepare for a wide range of topics as SRE interviews usually cover multiple areas and/or disciplines, testing the candidate for their skills in programming, incident response, support, architecture, networking, problem solving and general behavior.
Bear in mind that these questions provide a guide and structure around which interviewees can educate themselves. They are a starting point from which to approach your preparation for bagging a coveted SRE job. Since the SRE role is new and requires specialised capabilities, expect to spend a couple of months brushing up old lessons, study newer facets of domain knowledge, and develop the technical and people skills required to thrive in this position. Put in the requisite effort, and the rewards will be worthwhile.
What is a modern CI/CD pipeline
How Honeywell transformed their DevOps -- and you can too.