Observability Overhaul
Replace 200 alerts and 50 dashboards with SLOs, traces, and runbooks people actually use.
- 70–90% alert noise reduction
- MTTR cut by half on typical engagements
- Distributed tracing covers your top user journeys
Most observability stacks are graveyards. Dashboards built once and never reopened, alerts firing into channels nobody reads, $40k/month log bills nobody questions.
What I rebuild
User-journey SLOs replacing metric-vanity targets, error-budget-burn alerting that pages 1–2 actionable times per shift, distributed tracing with sampling that captures the slow tail, log strategy with retention by tier, runbooks linked from every alert.
What stays
Whatever vendor you're already paying — Datadog, Grafana Cloud, New Relic, Honeycomb, native Prometheus. I optimise the stack you have unless there's a strong case to switch.
Talk through observability overhaul.
A 30-minute call to understand the shape, the constraints, and whether I'm the right person for it.