DeepSeek V3: The First Open Model That Made Me Rethink My Stack
DeepSeek V3 dropped in late December 2024 with frontier-class benchmarks at a fraction of the training cost. It is the first open release that genuinely shifts the cost curve.
DeepSeek shipped V3 in the last week of December 2024. The benchmark numbers were the kind of thing that makes you re-read the paper to check you parsed it right. A 671B mixture-of-experts model with frontier-adjacent quality on coding and math, released with open weights, and reportedly trained for under $6M of compute.
I have been running it locally and via API for the last few months. The hype is mostly justified. Here is the practical picture.
The numbers that matter
The headline numbers, sanity-checked against my own usage:
- Coding tasks (HumanEval-style and harder): roughly comparable to Claude 3.5 Sonnet on the workloads I have tried. Slightly worse on some, slightly better on others.
- Reasoning and math: surprisingly strong. Better than I would have predicted from any open model in 2024.
- General chat: solid but not the strongest. The personality is flatter than the frontier-model competitors.
- Multilingual: very strong on Chinese, decent on European languages, weaker on long-tail.
The reported $6M training cost is the part the industry will spend the next year arguing about. There is plenty of unbundled cost (research salaries, prior model iterations, the GPU infrastructure they were not amortising explicitly). Even if the true number is 3-5x higher, it is still a fraction of what Western frontier labs are reportedly spending. That is a real signal about where the cost curve is headed.
What changes operationally
For my stack, V3 has caused three concrete changes:
A second open model in the routing layer. I had Llama 3 70B for self-hosted bulk work. DeepSeek V3 is a different size (the MoE makes it heavier in memory but cheaper per token in inference) and a different quality profile. For coding-heavy internal tooling, V3 has displaced Llama. For general-purpose bulk work, Llama 3 still wins on operational simplicity.
A renewed pricing pressure on frontier APIs. DeepSeek's hosted API is dramatically cheaper than OpenAI or Anthropic for comparable quality on many tasks. I do not move client production traffic to a Chinese-hosted API for compliance reasons (more on that below), but the pricing gap is forcing me to renegotiate budgets and forcing the frontier labs to defend their pricing publicly.
Eval suite expansion. Adding a new model to a routing layer is a week of evaluation work, not a meeting. I have updated my eval harness to include V3 and re-ran the regression suite for two clients. One workload moved. Two did not. That is the right shape.
The compliance and geopolitical question
I cannot write this post without addressing it. DeepSeek is a Chinese company. The hosted API is operated under Chinese jurisdiction. For UK and EU clients, that has data residency and regulatory implications that range from "fine, with a DPA" to "absolutely not, ever".
The open weights are a different matter. Weights downloaded and run on my own infrastructure are not flowing data to anyone else. The licence is permissive. Compliance there is the same as any self-hosted model.
My pragmatic split:
- Self-hosted V3: fine for most use cases, treated like any other model.
- Hosted DeepSeek API: only for non-sensitive evaluation work, or for clients who have explicitly approved Chinese-hosted infrastructure. That second group is small.
This is going to remain a friction point for Western enterprise adoption regardless of how good the model gets. Nothing in the model itself fixes it.
Why the cost number matters
The $6M training claim, even at 3x its real value, is a statement that frontier-class capability does not require frontier-class capital. That statement undermines a lot of strategic assumptions:
- The "no startup can compete with the labs" argument gets weaker. If a focused team can produce a competitive open model on a fraction of the budget, the moat of the frontier labs is engineering excellence, distribution, and trust, not capital.
- The "GPU shortage will protect the incumbents" argument gets weaker. DeepSeek did this with H800s, the export-restricted version of the H100. Constraints drove efficiency gains.
- The "open will always trail closed by 18 months" assumption is stale. The gap on specific benchmarks is now small. On some workloads, open leads.
If V3 is the new floor for what an open model looks like, the next year of the industry will be wild.
What I am telling clients
Three things, depending on the client:
For a client doing internal tooling at scale: re-evaluate self-hosted options. V3 plus the right eval harness might displace a six-figure API line item. The engineering to operate a 671B MoE model is non-trivial but not exotic.
For a client whose product depends on frontier API access: nothing changes today, but build optionality into your architecture. The next year will see further price compression and more open-weight competitors. If your product is locked to a single vendor's specific API, you are leaving leverage on the table.
For a client worried about geopolitical risk: separate the model from the host. Self-hosted weights of any provenance are operationally fine. Hosted APIs sit in jurisdictional reality and the choices there are not technical.
The honest closing
I am not certain V3 is the best open model six months from now. Llama 4 is rumoured. Mistral has a pipeline. Anthropic and OpenAI both have closed releases coming that will reset the bar. The picture changes every quarter.
What I am certain of is that the cost curve has moved decisively. Frontier-class capability is getting cheaper, faster, more open, and harder to monopolise. That is good for users, hard on incumbent margins, and the most interesting structural shift in the AI industry since GPT-3.
If you have not added an open frontier-class model to your routing layer yet, late April 2025 is a fine time to start.