Real multi-region failover for a two-sided marketplace
Delivered a tested cross-region failover with documented RTO of 12 minutes and RPO under 30 seconds — and proved it in a live game day.
- 12 minutes
- Tested RTO
- < 30 seconds
- Measured RPO
- 4 (all successful)
- Failover game days
- < 2.1s
- Cross-region data sync lag p99
A 'multi-region' setup that was a passive copy nobody had ever cut over to. Leadership had been telling enterprise customers it existed; the engineering team knew it didn't really.
- 01
Started with the truth: wrote a one-page document describing the actual state of cross-region readiness and shared it with leadership before promising anything.
- 02
Built logical replication for the Postgres tier using pglogical with monitored lag, alerts on drift, and a documented promotion procedure.
- 03
Made the application stateless-by-default at the request layer, with idempotency keys on writes that crossed regions and explicit conflict policies on the few that could.
- 04
Wired Route 53 health-checked failover with sane TTLs, and rehearsed DNS propagation with a partner CDN to understand real-world cutover times.
- 05
Ran four game days at increasing severity — staging-only, prod with synthetic traffic, prod with 5% real traffic, full prod cutover. Wrote postmortems for each. Fixed three real bugs only the game days surfaced.
The marketplace now has a failover capability it can actually demonstrate. Two enterprise customers signed contracts that were blocked on it. The team runs a full game day quarterly and treats it as routine, not a project.
Other engagements.
Rebuilding the platform under a payments company without slowing the roadmap
Cut deploy time from 38 minutes to under 9, reduced cluster spend by 31%, and got the team out of a quarterly upgrade panic.
Standing up a platform team where there wasn't one
Delivered a working internal developer platform, paved-path service template, and hired the two engineers who own it now.
HIPAA-aligned cloud foundation for a clinical data startup
Cleared HIPAA technical safeguards review with their first enterprise customer's security team — on the first pass.