Learn/Observability
CONCEPTS

Observability

The measure of how well internal states of a system can be inferred from knowledge of its external outputs.

Observability

The measure of how well internal states of a system can be inferred from knowledge of its external outputs.

"Why is it broken?"

Observability (o11y) is the ability to answer new questions about your system without shipping new code.

Monitoring vs. Observability

  • Monitoring: Tells you when something is wrong. ("Alert: CPU is 99%").
  • Observability: Tells you why it is wrong. ("It's because User 123 sent a malformed JSON payload that triggered a regex loop in the Search Service").

The Three Pillars

  1. Metrics: Aggregates. "Are we slow?" (Cheap, fast).
  2. Logs: Events. "What happened?" (Detailed, expensive).
  3. Traces: Context. "Where did the time go?" (Connects the dots).

ExThe Mystery Latency

"A checkout API was intermittently slow (5s response time) for 1% of requests. Metrics showed "High Average Latency" but not why."

Impact
Lost revenue from frustrated users.
Resolution
Using Distributed Tracing, the team saw that all slow requests were hitting a specific legacy fraud-check service that was timing out. Monitoring didn't check that legacy service, but Tracing revealed it immediately.

Why Observability Matters

Observability goes beyond monitoring. It helps you ask questions you didn't know to ask.

Good observability means faster MTTD because you can see what's happening in your systems in real-time.

Common Pitfalls

Hoarding Data
Storing terabytes of logs that no one ever reads. Observability is about *queryability*, not storage.

How to Use Observability

📊
Logs: Structured logs with trace IDs.
📈
Metrics: Time-series data for trends.
🔗
Traces: End-to-end request flow visualization.

Frequently Asked Questions

Do I need a fancy tool?
Tools like Honeycomb, Datadog, or Jaeger help, but Observability is a practice, not a tool. Start with structured logs.

Learn More