RAG vs Fine-Tuning: The Adult Conversation Nobody Is Having
Half the AI projects I see are fine-tuning when they should be RAG-ing. The other half are RAG-ing when they should be fine-tuning. Here is the actual decision.
If I have one consulting conversation more than any other in 2024 it is this one:
"We want to fine-tune a model on our data." "What is the actual problem?" "The model doesn't know things about our company."
The answer is RAG. Not fine-tuning. The conversation goes the same way every time and most of the industry is still getting it wrong.
The decision in one sentence
Fine-tuning teaches a model how to behave. RAG teaches a model what to know. Most enterprise problems are about knowledge, not behaviour. Therefore most enterprise problems are RAG problems.
That is the whole post in one paragraph. The rest is the wreckage I see when teams ignore it.
What fine-tuning is good for
There is a real, narrow set of problems where fine-tuning is the right answer:
- Output format. Always returning JSON in a specific schema. Always answering in a specific tone. Always using a specific glossary.
- Behaviour. Refusing certain topics. Always thinking step-by-step. Following an unusual response protocol.
- Compression. Distilling a frontier model's behaviour into a smaller, cheaper model that runs on your infrastructure. This is the underrated one and it is huge if you have the volume.
- Domain-specific reasoning patterns the base model genuinely does not have. Legal-specific argument structures. Medical differential diagnosis flow. Code-specific refactoring patterns for an obscure language.
That is roughly it. None of those involve "knowing things about your company".
What RAG is good for
Everything else. The shape of an enterprise AI use case is almost always:
- The user asks a question.
- The answer is in our documents.
- The model needs to find the right document and answer from it.
That is RAG. Embed your documents. Index them. Retrieve at query time. Stuff the relevant chunks into the prompt. Let the model reason.
Done well, RAG gets you 80% of the way to "the model knows about our company" with two weeks of engineering and a vector database. Done badly, RAG produces confident hallucinations that reference documents that do not exist. The difference between the two is mostly retrieval quality, which is mostly chunking, which is the part everyone underinvests in.
Why teams reach for fine-tuning anyway
Three reasons, all bad:
- It sounds more sophisticated. Fine-tuning has a research aura. RAG sounds like search. Engineers and execs both prefer the prestigious option.
- It feels more "ours". Fine-tuning produces a model artifact you own. RAG produces a pipeline. Ownership feels better even when the outcome is worse.
- The vendor encouraged it. Some vendors sell training infrastructure. Of course they want you fine-tuning. Watch the incentives.
The cost of getting this wrong is not subtle. A fine-tune for "knowledge injection" produces a model that:
- Confabulates the data more confidently than the base model would have.
- Cannot be updated without retraining when the underlying data changes.
- Is impossible to audit. You cannot ask "where did that fact come from".
- Is locked to one model version. When the next frontier model lands, you fine-tune again.
A RAG system has none of those problems by construction.
When you do both
Sometimes the right answer is both. You fine-tune for behaviour and tone. You RAG for knowledge. Customer support is the canonical example: fine-tune so the model writes in your brand voice and follows your escalation rules, then RAG so it cites the actual current product documentation rather than the version it saw during training.
In that hybrid, fine-tuning is the small layer (formatting, tone, guardrails) and RAG is the large layer (the substance of the answer). If your fine-tune is doing knowledge work, you have the layers inverted.
A practical decision tree
When a client asks me "should we fine-tune", I run them through this:
- Does the model already know how to do the task in principle, but you want it to do it with your specific information? RAG.
- Does the model fail to do the task at all, regardless of context? Fine-tune, but only after you have tried prompting hard.
- Does the task always produce a specific structured output that prompting cannot reliably enforce? Fine-tune.
- Are you trying to make a smaller cheaper model behave like a bigger one on a narrow workload? Fine-tune.
- Do you want the model to "be an expert in our domain"? RAG. Always RAG. This is never fine-tuning, no matter what the vendor said.
That tree handles 95% of the cases I see. The 5% that genuinely need both are obvious when you get there.
Closing thought
The fine-tuning industry will not love this post. That is fine. The industry has spent two years selling training as the answer to problems that were really retrieval problems. Some of that was honest enthusiasm. Some of it was upselling. The 2025 conversation needs to be more disciplined, and it starts with engineers being able to tell their executives "we don't need to fine-tune, we need to retrieve better".