Retrieval Augmented Generation 101

#machinelearning #llm #nlp

I recently came across RAG (Retrieval Augmented Generation), at first it looked like a very fancy tech jargon but after learning more about it I found it to be a very interesting and clever technique to supplement an LLM in order to perform well on your custom data without hallucinating and going through the process of fine-tuning the model.

What Is Hallucination

Hallucination is a behavior exhibited by Large Language Models where the model generates a response that is incorrect and the model makes up that information. You might have observed this with chat-gpt when asking for very specific details about some library it might end up making up documentation and GitHub links for specific ways of doing things that are not supported by the library. For instance, I was searching on Google on how to extend the express request object type in typescript and it told me about some made-up extend method that it didn't even use

How To Avoid Hallucination

There are a couple of techniques to prevent hallucinations when dealing with LLMs, we will be looking into two specific techniques for the sake of this post.

Fine Tuning
Retrieval Augmented Generation (RAG)

Fine Tuning

When fine-tuning the model you basically use the pre-trained model and then re-train additional layers or the last layers at the end of the model on the target data, keeping the model's pre-trained weights so the model's general understanding could be fine-tuned to the custom data set. However, this is computationally more expensive, time-consuming, and requires more expertise in the field of ML.

Steps Involved In Fine Tuning

Select a pre-trained model
Prepare your custom data
Tune hyperparameters
Fine-tune model using transfer learning
Evaluate the model

RAG Approach

RAG takes a different approach to augment the generative model output by supplementing the process with a retrieval system, since LLMs are already trained on a huge corpus of text they already have a very well understanding of the contextual and semantic meaning of the words, what they lack is the additional information from a custom knowledge base in order to come up with an educated response.

This is where RAG comes in and augments the LLM to prevent hallucinations by providing it with essential information in order to answer the query. Instead of passing the user query directly to the LLM it first goes through a retrieval system to fetch the relevant documents based on the user query which are then augmented on top of the user’s prompt and passed to the LLM, This makes it much more dynamic and flexible to provide external information to the model that it wasn’t aware of before in order to produce a much better response.

This approach also has the benefit of updating your repository of documents without having to retrain the entire model as in the case of fine tuning. You can also employ modern semantic search techniques and Vector DB to augment the retrieval system to get the best results from your knowledge base that can then be passed to the LLM. You can see the high-level overview of a RAG system in the below diagram

If you have any questions or feedback, feel free to reach out!

Connect with me on Twitter, Github, and LinkedIn

DEV Community

Retrieval Augmented Generation 101

What Is Hallucination

How To Avoid Hallucination

Fine Tuning

Steps Involved In Fine Tuning

RAG Approach

Top comments (0)

Read next

Exploring ELECTRA - Efficient Pre-training for Transformers

Quick and Dirty Guide to Running a Local LLM and Making API Requests

A beginner's guide to the Qwen-7b-Chat model by Niron1 on Replicate

Skip college, learn to code