The 'ChatGPT is outdated' narrative is half right
GPT-4 is no longer SOTA on most evals. ChatGPT-the-product still wins distribution. These are different facts and the takes online keep conflating them.
Roughly twice a week in April 2026 someone posts a thread arguing that ChatGPT is dead, the GPT-4 family is obsolete, and OpenAI is finished. They cite Claude 4 winning SWE-Bench, Gemini 2 Pro winning long-context retrieval, and DeepSeek R2 winning math. They are not wrong about the evals. They are wrong about the conclusion.
The model is not the product. Both things are true at once.
The eval picture in April 2026
Numbers below are publicly reported scores as of April 2026. Eval contamination is a real concern; treat any single number with skepticism, especially on benchmarks released before the model's training cutoff. The directional ranking is what matters.
| Eval | GPT-4o / 4.5 | Claude 4 Sonnet | Gemini 2 Pro | DeepSeek R2 | | ----------- | ------------- | --------------- | --------------- | --------------- | | MMLU-Pro | ~76 | ~80 | ~79 | ~78 | | GPQA Diamond| ~54 | ~62 | ~60 | ~63 | | SWE-Bench Verified | ~52 | ~67 | ~58 | ~55 | | MATH | ~76 | ~82 | ~84 | ~88 |
Read this as: GPT-4 class models are no longer best on any major public eval. They are usually third or fourth. That is a real change from 2024.
The harder question — which model is best for your application — is not answered by any of these. I have shipped products where GPT-4o beat Claude 4 on production traces despite Claude winning every public eval, because the prompt format and tool-call patterns were tuned to GPT's behavior over two years.
Why ChatGPT keeps winning anyway
OpenAI reports 800M+ weekly active users on ChatGPT in 2026. That is bigger than most consumer products on the internet. The reasons are not about model quality:
- Distribution. ChatGPT is the brand. My non-technical relatives say "ChatGPT" the way they say "Google."
- Product surface. Voice mode, canvas, custom GPTs, the macOS app integration — these are product features, not model features. Claude has comparable underlying capabilities and a worse product story around them.
- Memory. ChatGPT's memory feature, however imperfect, is sticky. Switching costs are real once it knows you.
- Embedded distribution. Apple Intelligence's ChatGPT integration alone touches more users than Anthropic's total reach.
The "ChatGPT is dead" thread author usually has 50K Twitter followers and runs a Cursor-pilled developer workflow. Their experience is not the median user's.
What is actually true
Three things, separately:
- For frontier-eval workloads — hard reasoning, long-context retrieval, top-tier code generation — OpenAI no longer leads. Claude 4 and Gemini 2 Pro are usually ahead.
- For builder workloads via API — most companies shipping AI features — the right model depends on the task. OpenAI is competitive but rarely the obvious choice. See livebench.ai and DeepMind's leaderboards for current comparisons; do not trust any single source.
- For consumer chat — the actual ChatGPT product — OpenAI is winning by a wide margin and the gap is growing, not shrinking.
The narrative collapse happens because Twitter and Hacker News are dominated by builders, who experience the API reality. Builders project that experience onto the consumer product, and it does not transfer.
The interesting question
The interesting question is not "is ChatGPT dead." It is "does product moat outrun eval gap?" My honest answer: probably yes, for at least 18 more months. Reasons:
- The Apple Intelligence partnership locks in distribution at a scale none of the competitors can match in the near term.
- ChatGPT has the only consumer subscription product in this category that is generating real revenue. Anthropic and Google's consumer chat products are not on the same scale.
- OpenAI's o-series reasoning models, while not always top of leaderboards, are good enough for the consumer use cases that drive retention. Most users do not care if Claude 4 is 8 points better on GPQA.
If a competitor does close the product gap — Meta integrating Llama 4 deeply into WhatsApp, or Google making Gemini the default Pixel/Android assistant in a way that finally works — the consumer story changes. Until then, the API leaderboard and the consumer winner can be different companies, and they are.
Takeaways
- Stop reading "ChatGPT is outdated" threads. They are right about evals and wrong about everything that matters for the actual product.
- For your own builder decisions, run your own evals on your own data. Public leaderboards lead the actual answer by zero percent.
- If you are building a consumer chat product to compete with ChatGPT directly, you are not going to win on model quality. Find a different angle.
- OpenAI losing the eval crown is a real story for builders. It is barely a story for users, and the discourse keeps mixing them up.
- Watch the Apple Intelligence integration metrics if any leak. That is the single largest distribution variable in this market.
The "ChatGPT is dead" narrative is a builder-shaped take dressed up as a market-shaped one. Both halves of that distinction are interesting. Treating them as one story is how you make bad architecture and product decisions.