Learn/Downtime
CORE METRICS

System Downtime

The period of time during which a system or service is unavailable or failing to perform its primary function.

Downtime = Time Restored - Time Detected

The period of time during which a system or service is unavailable or failing to perform its primary function.

The Cost of Down

Downtime is expensive. For Amazon, one minute of downtime costs over $220,000. For a startup, it costs Credibility.
Downtime can be:

  • Hard Downtime: Site returns 500 error.
  • Soft Downtime: Site loads, but Checkout fails. (Often worse, as it is harder to detect).

ExE-commerce Platform Outage

"Database corruption caused 47 minutes of downtime during Black Friday sale. Cost: $235,000 in lost sales. Root cause: Lack of automated backups. Post-incident: Implemented multi-region failover."

Impact
$235K loss, 15% customer churn increase
Resolution
Implemented automated backups and multi-region database setup

Why Downtime Matters

Directly impacts revenue and customer trust.

The inverse of Uptime.

Common Pitfalls

Not tracking partial downtime
Monitor for degraded performance, not just complete outages. Slow response times are a form of downtime.
Ignoring "soft downtime"
Track critical user flows. If checkout works but search is broken, you have partial downtime affecting revenue.

Industry Benchmarks

ExcellentTop 5%
< 5 min/month
GoodTop 25%
5-15 min/month
Industry AverageTop 50%
15-60 min/month
Needs ImprovementBottom 25%
> 60 min/month

Frequently Asked Questions

How do I calculate downtime cost?
Downtime cost = (Revenue per hour) × (Hours down) + (Lost productivity cost) + (Customer support cost) + (Brand/reputation damage estimate). For e-commerce, include average order value.
What is the difference between planned and unplanned downtime?
Planned downtime is scheduled maintenance (deployments, upgrades). Unplanned downtime is unexpected outages (bugs, crashes). SLAs typically exclude planned downtime from calculations.
How can I reduce downtime?
Focus on MTTD (faster detection) and MTTA (faster response). Implement automated failover, use circuit breakers, and practice incident response with game days.

Learn More