Every week now somebody asks me a version of the same question. “Can we have a ChatGPT, but for our own data?” A leader wants a bot that answers questions about internal policies. A product manager wants customers to ask questions about their own accounts. The acronym behind almost all of these conversations is RAG, retrieval augmented generation, and right now the hype around it is everywhere.
So this post is my attempt to explain RAG with no math and no jargon, the same way I explain it in meetings. If you make business decisions about AI, this is for you.
The open book exam
Think about two kinds of exams.
A closed book exam tests what the student memorized. If the question covers something they never studied, a good student says “I do not know.” A bad student writes something that sounds right anyway.
A large language model, the thing behind ChatGPT, took a closed book exam on the whole internet. It memorized an enormous amount, but it knows nothing about your company. It has never seen your price list, your vacation policy, your contracts, your customer tickets. Ask it about those things and it behaves like the bad student. It writes a confident, fluent, well formatted answer that it basically invented. People call this hallucination, and it is the single biggest risk of using these models in business.
RAG turns the closed book exam into an open book exam. Before the model answers, a search step runs against your documents and finds the most relevant pages. Those pages get placed in front of the model together with the question, with an instruction like “answer using only this material.” The model is not remembering anymore. It is reading.
That is the whole trick. Retrieval, find the right pages. Augmented, attach them to the question. Generation, write the answer. Your data stays in your storage, gets searched at question time, and the model writes a nice answer grounded on what it just read.
Why companies want this so much
First, the model finally knows your business. The generic model is impressive but generic. With RAG it answers from your policies, your manuals, your data. That is the difference between a toy and a tool.
Second, freshness without retraining. Training or fine tuning a model on your data is expensive, slow, and goes stale the moment your documents change. With RAG, you update the document and the next answer uses the new version. The knowledge lives in your documents, where your team already maintains it.
Third, you can show sources. A good RAG system answers and then says “based on these three documents, here are the links.” A human can verify. For anything serious, compliance, HR, finance, customer support, this ability to check is not a nice extra. It is the feature.
Fourth, access control is possible. The retrieval step can respect permissions, so the intern’s question does not get answered from the executive compensation folder. Possible does not mean automatic, you have to build it, but at least the architecture allows it.
The honest risk section
Now the part the vendor demo will not emphasize.
RAG reduces hallucination, it does not eliminate it. If the search step brings the wrong pages, the model will write a beautiful answer based on wrong material. If the right answer exists in no document, some models still improvise. Whoever owns the project must measure wrong answers, not assume zero. My rule for anything involving money, legal text, or customer commitments is simple, a human checks before it leaves the building.
Garbage in, garbage out, at high speed. RAG is a mirror of your documents. If your internal wiki has three contradicting versions of the expense policy, the bot will confidently quote one of them, maybe a different one each time. Many companies discover during a RAG project that their real problem is not AI, it is years of messy documentation. The cleanup is often the biggest line on the bill, and also the most lasting benefit.
People risk. If employees or customers learn the bot is wrong one time in ten, trust dies fast and the project dies with it. Launch narrow, one domain of questions, measured well, then grow. A small bot that is right beats a big bot that is sometimes right.
A simple cost view
Compared to training your own model, RAG is cheap. That is one reason for the hype. But it is not free, so here is the realistic shape of the bill.
- Build cost. A demo takes a few weeks. A production system with permissions, monitoring, and evaluation takes months of engineering time. The demo is 20% of the work and 80% of the applause.
- Run cost. Every question costs a few cents in model API calls, plus the search infrastructure. Cents sound small, multiply by your expected questions per month before promising it to everyone.
- Content cost. Somebody must own document quality forever. This is a process cost, not a technology cost, and it is the one most often forgotten.
Against that, put the return. Count the hours your support team or HR team spends answering the same fifty questions, multiply by the loaded cost of those hours. In many companies that math turns positive quickly, which is exactly why every board is asking about this in early 2024.
My short version
When I think about where to begin, I start with one painful, repetitive, well documented question domain. The smallest useful version, sources shown on every answer, and wrong answers measured honestly with real users before any big announcement.
RAG is real, I have watched it work. It is the open book exam, and like any open book exam, the quality of the answer depends on the quality of the book. The model is rented. The book, your data, is the part you actually own. Take care of the book first.
One last honest note. Everything in AI moves fast right now. By the time you read this, RAG might already be the old way, replaced by something with a longer acronym and a better demo. Maybe by next year I am writing the post about why RAG is dead . But the lesson under it survives any technique. Your data is the asset, the model is just the tool that reads it today.
Pax et bonum.