AI Cascade & Agent Systems
Production multi-step AI systems — agents that use tools, cascades that route easy queries to small models and hard ones to larger ones, MCP servers that expose your data to AI clients. With observability, cost guardrails, and the discipline to keep them honest.
The honest version of "AI agents in production" in 2026: most work is still cascades, not autonomous loops. A classifier decides whether the query is simple, a small fast model handles 80% of cases, a larger model handles the remaining 20%, and a human gets routed the 1% that need judgement. The agent isn't out-thinking you — it's plumbing that handles cost and quality together.
What I build
Cost-aware cascades — classification step routes to model tiers (Haiku → Sonnet → Opus, or Flash → Pro, or local-llama → frontier). Cascade decisions cached. Cost-per-query budgets enforced.
Tool-using agents — agent loops with a tool registry, retry semantics, max-step termination, and audit logs of every tool call. Built for use-cases that genuinely need multi-step reasoning, not bolted onto things that don't.
MCP servers — Model Context Protocol servers exposing your databases, APIs, or workflows as tools that any MCP-compatible AI client can use. Authenticated, rate-limited, observable.
Observability — every step traced with OpenTelemetry, LLM spans showing prompt/output/cost/latency, sampled traces drilled into Honeycomb / Langfuse. The "why is this slow?" question becomes a 5-minute investigation.
Operational discipline — eval suites covering the cascade end-to-end, regression detection on routing decisions, kill-switches and circuit breakers, gradual rollout via feature flags.
What I steer you away from
Wiring a 12-tool agent loop onto a problem that's a 200-line script. Multi-agent "swarm" architectures because someone read a paper. Autonomous agents that touch production data without human-in-the-loop checkpoints. Cascades fail less spectacularly than autonomous agents, and ship years sooner.
Adjacent services.
Cloud & DevOps Engineering
Production cloud environments designed deliberately — resilient, cost-aware, and ready for the day you actually need them.
Internal developer platformsPlatform Engineering
Self-service platforms that turn 'open a ticket and wait three days' into 'open a PR and ship in fifteen minutes'.
EKS · GKE · AKS · self-hostedKubernetes & Container Orchestration
Production-grade Kubernetes — clusters that scale, upgrade cleanly, and don't wake people up.