Skip to content
Conddiz Cloud
Back to blog

AI

RAG in production: what nobody tells you before the first deploy

By Beto Coelho2 min read

Almost every AI project that reaches Conddiz starts the same way: someone built a RAG prototype over a weekend, got impressed, and now wants to ship it "to production." The prototype answers beautifully on the 5 questions they tested. The trouble shows up on question number 50 — asked by a real customer, against data that changed since yesterday.

Retrieval is 80% of the problem

Most of the energy spent on "prompt engineering" should go into retrieval instead. If the right passage never reaches the context window, no prompt will save the answer. In practice that means:

  • Chunking that respects document structure. Splitting every 500 tokens blinds the model in the middle of a table or a clause. Split by section, not by token count.
  • Reranking after the vector search. Vector search returns plausible candidates; a reranker decides which ones actually answer the question.
  • Metadata as a first-class citizen. Filtering by date, source and permission before retrieval keeps the model from citing a document the user shouldn't even see.

Changing data breaks silently

In a POC the index is static. In production documents change — and the index has to keep up. Without a reindexing pipeline, the system will confidently answer using last week's version of a policy. That's the kind of error that destroys trust, because the answer looks right.

A wrong answer delivered confidently is worse than "I don't know." Calibrate the system to admit uncertainty.

Continuous evaluation, not a manual spot-check

"Is it working?" can't be answered by eyeballing five examples. We build an evaluation set with real questions and expected answers, and run it on every prompt, model or index change. When accuracy drops, we know before the customer complains.

What we bring to every project

  1. Retrieval first, prompt second.
  2. Automatic reindexing tied to the source of truth.
  3. An evaluation harness that runs in CI.
  4. An explicit path for "I don't know."

RAG in production isn't magic — it's engineering. And good engineering is exactly what separates the pretty demo from the system a customer relies on every day.