Definition

Retrieval-Augmented Generation

Retrieval-augmented generation connects a language model with external knowledge so answers can be grounded in specific documents or data.

Updated May 3, 2026Also known as: RAG

Short definition

Retrieval-augmented generation, usually called RAG, is a pattern for giving a large language model access to relevant external information before it answers.

Instead of relying only on what the model learned during training, a RAG system retrieves documents, snippets or database records and includes them as context for the model.

How it works

A basic RAG pipeline has three steps. First, source documents are indexed so they can be searched. Second, a user question is matched against that index. Third, the retrieved context is passed to the model with instructions to answer using that material.

This does not make the model perfect, but it usually improves freshness, traceability and domain accuracy.

Example

A company can build an internal assistant over HR policies, product documentation and support procedures. When an employee asks a question, the system retrieves the most relevant passages and asks the model to answer from them.

Why it matters

RAG is one of the most practical ways to use language models in business. It reduces hallucinations, keeps answers closer to approved sources and lets teams update knowledge without retraining the whole model.

The quality of a RAG system depends on document quality, chunking, retrieval accuracy, prompt design and evaluation. Poor retrieval can still produce poor answers, even with a strong model.