Getting an e-commerce platform through Black Friday without a war room
Handled 7.2x the previous year's peak with a single sub-five-minute degradation, no all-hands incident, and a smaller bill than the prior year.
- 7.2x YoY
- Peak RPS handled
- 0
- Sev-1 incidents during peak
- -18% YoY
- Total peak-week cloud spend
- <340ms
- p99 cart latency at peak
Black Friday 2024 had been a 36-hour war room with three near-misses. Leadership wanted 2025 to be boring, and the team had three different opinions on what had actually saved them last time.
- 01
Reconstructed the 2024 incident timeline from logs and pages, then ran a blameless retro to separate what worked from what we got lucky on.
- 02
Built a representative load test with k6 that drove real user journeys, not synthetic RPS. Calibrated it against last year's traffic shape.
- 03
Tuned Karpenter consolidation, HPA target utilisation per service, and pre-warmed the stateful tier ahead of the campaign window. Documented why each number was the number.
- 04
Replaced threshold alerts on CPU with burn-rate alerts on the four user journeys that mattered. The on-call team agreed up-front what would and wouldn't get them out of bed.
- 05
Ran two full-scale game days in production (with marketing's blessing) two and four weeks before peak. Found and fixed a Redis connection storm both times.
Peak weekend ran on autopilot. The incident channel had two messages, both informational. The CTO kept the same playbook for 2026 with marginal tweaks, and the team ran their first holiday on-call rotation that didn't burn anyone out.
Other engagements.
Rebuilding the platform under a payments company without slowing the roadmap
Cut deploy time from 38 minutes to under 9, reduced cluster spend by 31%, and got the team out of a quarterly upgrade panic.
Standing up a platform team where there wasn't one
Delivered a working internal developer platform, paved-path service template, and hired the two engineers who own it now.
HIPAA-aligned cloud foundation for a clinical data startup
Cleared HIPAA technical safeguards review with their first enterprise customer's security team — on the first pass.