The Four Golden Signals
The four key metrics that represent the health of a system: Latency, Traffic, Errors, and Saturation.
The four key metrics that represent the health of a system: Latency, Traffic, Errors, and Saturation.
"The Vital Signs"
Just as a doctor checks heart rate and blood pressure, an SRE checks the Four Golden Signals.
1. Latency
The time it takes to service a request.
- Tip: Distinguish between latency for successful requests and latency for failed requests; mixing them can hide what users actually experience.
2. Traffic
A measure of how much demand is being placed on your system.
- Web: Requests per second (RPS).
- Audio: Concurrent streams.
3. Errors
The rate of requests that fail.
- Explicit: HTTP 500s.
- Implicit: HTTP 200s with "Success: False" body (content errors).
4. Saturation
How "full" your service is.
- CPU usage, Memory, Disk I/O.
- Many systems degrade before they reach 100% utilization, so watch constrained resources and set utilization targets before latency spikes.
ExThe Slow Disk
“A service was slow, but CPU and Memory were low. No errors were firing.”
Why Four Golden Signals Matters
Standardized by Google SRE, these signals give you a high-level view of any system's health.
Monitoring these four signals gives broad coverage for user-facing incidents, especially when alerts focus on symptoms and imminent saturation.