Error Budget
The amount of unreliability a service can have before violating its SLO.
The amount of unreliability a service can have before violating its SLO.
"Permission to Fail"
An Error Budget is the allowable amount of downtime you can experience in a month without making your users unhappy.
The Philosophy
100% reliability is expensive and slows you down. If your users are happy with 99.9% reliability, then aim for 99.9%. The remaining 0.1% is your budget.
You can spend this budget on:
- Risky feature launches.
- System experiments.
- Chaos engineering.
The Error Budget Policy
The real power comes from the policy: what happens when you run out of budget?
- Budget > 0: Ship features fast.
- Budget < 0: Stop feature work. Focus on reliability (sprints, freezes) until the budget refills.
ExThe Feature Freeze
"A team pushed bad code and caused a 4-hour outage, burning 100% of their quarterly Error Budget."
Why Error Budget Matters
Error budgets prevent perfectionism. If you have budget left, you can take risks. If not, stabilize.
Error budgets turn reliability into a strategic tradeoff, not a binary goal.
The Formula
Error Budget = 100% - SLO