CONCEPTS
Service Level Objective
A target reliability threshold for a service, typically expressed as a percentage over a time period.
SLO = (Successful Requests / Total Requests) × 100%
A target reliability threshold for a service, typically expressed as a percentage over a time period.
The "Reliability Goal"
SLO (Service Level Objective) is the target you aim for. It is the dividing line between "Happy Users" and "Sad Users."
The Three S's of SRE
- SLI (Indicator): The number you measure (e.g., Latency).
- SLO (Objective): The target you set (e.g., Latency < 200ms).
- SLA (Agreement): The punishment if you miss it (e.g., Refund).
Choosing an SLO
Do not aim for 100%.
- 100% is impossible (physics, ISP outages).
- 100% is too expensive.
- 100% freezes innovation.
Aim for 99.9% (the "Three Nines"), which allows for ~43 minutes of downtime per month.
ExThe Checkout Service
"An e-commerce site set an SLO of 99.95% availability for their Checkout API."
Impact
Using an Error Budget, they ignored minor errors until they burned 50% of the budget.
Resolution
When they hit 0% budget, they froze feature launches for 2 weeks to stabilize the database.
Why SLO Matters
SLOs align engineering and business on what "good enough" reliability means.
Without SLOs, you can't have error budgets. Every incident becomes an emergency.
The Formula
SLO = (Successful Requests / Total Requests) × 100%Successful: 99,900
Total: 100,000
SLO: 99.9%
SLO vs. Other Metrics
SLO
Target reliability
SLI
Measured metric
SLA
Customer promise
Error Budget
Allowed failure
Common Pitfalls
Dream Numbers
Setting 99.999% SLO when your cloud provider only guarantees 99.99%.
Silent SLOs
Defining an SLO in a spreadsheet but never alerting on it.
Related Terms
Frequently Asked Questions
Who sets the SLO?
The Product Manager (representing the user) and Engineering Lead (representing reality) must agree.
What happens if we miss the SLO?
You stop shipping new features and focus on reliability (see "Error Budget Policy").