DEV Community

Cover image for Build a RAG System with Rig in Under 100 Lines of Code
Tachi 0x
Tachi 0x

Posted on • Edited on

Build a RAG System with Rig in Under 100 Lines of Code

A Comprehensive Guide to Building an LLM Application with Rig

TL;DR: Building on our journey with Rig, from its initial introduction to exploring the compelling reasons to use it for your next LLM project, this guide takes you a step further. In this guide, I'll walk you through building a Retrieval-Augmented Generation (RAG) system in Rust using the Rig library. In under 100 lines of code, you'll create a system that extracts text from PDF documents, generates embeddings with OpenAI's API, and allows a large language model to answer questions based on the documents' content.

Introduction

Retrieval-Augmented Generation (RAG) is a powerful technique that enhances Large Language Models (LLMs) by combining them with external knowledge retrieval. In a RAG system, when a query is received, relevant information is first retrieved from a knowledge base, then provided to the LLM along with the query. This allows the model to generate responses that are both contextually relevant and up-to-date, overcoming some of the limitations of traditional LLMs such as outdated knowledge or hallucinations.

Learn more about the fundamentals of RAG here.

Rig is an open-source Rust library designed to simplify the development of LLM-powered applications, including RAG systems. In this guide, we'll walk through the process of building a functional RAG system using Rig in under 100 lines of code. Our system will be capable of answering questions based on the content of PDF documents, showcasing how RAG can be applied to real-world data sources.

Tip: New to Rust?

This guide assumes some familiarity with Rust and a set-up coding environment. If you're just starting out or need to set up your environment, check out these quick guides:

These resources will help you get up to speed quickly!

Full code for this tutorial is here

Setting Up the Project

First, create a new Rust project:

cargo new rag_system
cd rag_system
Enter fullscreen mode Exit fullscreen mode

Update your Cargo.toml with the latest dependencies:

[package]
name = "rag_system"
version = "0.1.0"
edition = "2021"

[dependencies]
rig-core = { version = "0.5.0", features = ["pdf"] }
tokio = { version = "1.34.0", features = ["full"] }
anyhow = "1.0.75"
Enter fullscreen mode Exit fullscreen mode

Note: The PDF feature is now included in rig-core, so we no longer need a separate PDF extraction library.

Before we begin coding, make sure you have an OpenAI API key:

export OPENAI_API_KEY=your_api_key_here
Enter fullscreen mode Exit fullscreen mode

Building the RAG System

Let's break down our RAG system into key components.

full code can be found here

Step 1: Setting up the OpenAI client and PDF processing with Chunking

Let's start with the foundational imports and the PDF processing function. This part handles the crucial task of breaking down large PDFs into manageable chunks that won't exceed the token limits of our LLM models.

use anyhow::{Context, Result};
use rig::{
    embeddings::EmbeddingsBuilder,
    loaders::PdfFileLoader,
    providers::openai::{self, TEXT_EMBEDDING_ADA_002},
    vector_store::in_memory_store::InMemoryVectorStore,
    Embed,
};
use serde::{Deserialize, Serialize};
use std::path::PathBuf;

fn load_pdf(path: PathBuf) -> Result<Vec<String>> {
    let mut chunks = Vec::new();
    let mut current_chunk = String::new();
    let chunk_size = 2000; // Approximately 2000 characters per chunk

    for entry in PdfFileLoader::with_glob(path.to_str().unwrap())?.read() {
        let content = entry?;

        // Split content into words
        let words: Vec<&str> = content.split_whitespace().collect();

        for word in words {
            if current_chunk.len() + word.len() + 1 > chunk_size {
                // If adding the next word would exceed chunk size,
                // save current chunk and start a new one
                if !current_chunk.is_empty() {
                    chunks.push(current_chunk.trim().to_string());
                    current_chunk.clear();
                }
            }
            current_chunk.push_str(word);
            current_chunk.push(' ');
        }
    }

    // Don't forget the last chunk
    if !current_chunk.is_empty() {
        chunks.push(current_chunk.trim().to_string());
    }

    if chunks.is_empty() {
        anyhow::bail!("No content found in PDF file: {:?}", path);
    }

    Ok(chunks)
}
Enter fullscreen mode Exit fullscreen mode

Key aspects of this code:

  • We use PdfFileLoader from Rig's built-in PDF support
  • The chunk_size of 2000 characters is chosen to safely stay within token limits
  • The chunking algorithm preserves word boundaries to maintain context
  • Error handling with anyhow provides detailed context for failures
  • Each chunk gets trimmed to remove unnecessary whitespace

A particularly important part is the word-based chunking strategy:

// Split content into words to preserve word boundaries
let words: Vec<&str> = content.split_whitespace().collect();

for word in words {
    if current_chunk.len() + word.len() + 1 > chunk_size {
        // Create new chunk when size limit is reached
        if !current_chunk.is_empty() {
            chunks.push(current_chunk.trim().to_string());
            current_chunk.clear();
        }
    }
    current_chunk.push_str(word);
    current_chunk.push(' ');
}
Enter fullscreen mode Exit fullscreen mode

This ensures that we never cut words in half, which could impact the meaning of the text.

Step 2: Setting up Document Structure and Embeddings

#[derive(Embed, Clone, Debug, Serialize, Deserialize, Eq, PartialEq)]
struct Document {
    id: String,
    #[embed]
    content: String,
}

#[tokio::main]
async fn main() -> Result<()> {
    // Initialize OpenAI client
    let openai_client = openai::Client::from_env();

    // Load PDFs using Rig's built-in PDF loader
    let documents_dir = std::env::current_dir()?.join("documents");

    let moores_law_chunks = load_pdf(documents_dir.join("Moores_Law_for_Everything.pdf"))
        .context("Failed to load Moores_Law_for_Everything.pdf")?;
    let last_question_chunks = load_pdf(documents_dir.join("The_Last_Question.pdf"))
        .context("Failed to load The_Last_Question.pdf")?;

    println!("Successfully loaded and chunked PDF documents");

    // Create embedding model
    let model = openai_client.embedding_model(TEXT_EMBEDDING_ADA_002);

    // Create embeddings builder
    let mut builder = EmbeddingsBuilder::new(model.clone());

    // Add chunks from Moore's Law
    for (i, chunk) in moores_law_chunks.into_iter().enumerate() {
        builder = builder.document(Document {
            id: format!("moores_law_{}", i),
            content: chunk,
        })?;
    }

    // Add chunks from The Last Question
    for (i, chunk) in last_question_chunks.into_iter().enumerate() {
        builder = builder.document(Document {
            id: format!("last_question_{}", i),
            content: chunk,
        })?;
    }
Enter fullscreen mode Exit fullscreen mode

Let's break down the key components:

  • The Document struct derives several important traits:
    • Embed: Enables embedding generation
    • Serialize, Deserialize: Allows vector store storage
    • Eq, PartialEq: Required for vector store comparison
  • The main function sets up our PDF processing pipeline
  • Each document gets a unique ID based on its source and chunk number
  • The EmbeddingsBuilder pattern makes it easy to add multiple documents

The chunking process is organized sequentially:

// Load and chunk each PDF separately
let moores_law_chunks = load_pdf(documents_dir.join("Moores_Law_for_Everything.pdf"))?;
let last_question_chunks = load_pdf(documents_dir.join("The_Last_Question.pdf"))?;

// Add chunks to the builder with unique IDs
for (i, chunk) in moores_law_chunks.into_iter().enumerate() {
    builder = builder.document(Document {
        id: format!("moores_law_{}", i),
        content: chunk,
    })?;
}
Enter fullscreen mode Exit fullscreen mode

This approach maintains document identity while splitting content into manageable pieces.

Step 3: Creating Vector Store and RAG Agent

    // Build embeddings
    let embeddings = builder.build().await?;

    println!("Successfully generated embeddings");

    // Create vector store and index
    let vector_store = InMemoryVectorStore::from_documents(embeddings);
    let index = vector_store.index(model);

    println!("Successfully created vector store and index");

    // Create RAG agent
    let rag_agent = openai_client
        .agent("gpt-4")
        .preamble("You are a helpful assistant that answers questions based on the provided document context. When answering questions, try to synthesize information from multiple chunks if they're related.")
        .dynamic_context(4, index) // Increased to 4 since we have chunks now
        .build();

    println!("Starting CLI chatbot...");

    // Start interactive CLI
    rig::cli_chatbot::cli_chatbot(rag_agent).await?;
Enter fullscreen mode Exit fullscreen mode

This final section brings everything together:

  • The embeddings are built from all document chunks
  • The vector store indexes these embeddings for quick retrieval
  • The RAG agent is configured with:
    • GPT-4 as the base model
    • A context-aware preamble
    • Dynamic context retrieval of 4 chunks
    • Built-in CLI interface for interaction

Key configuration choices:

let rag_agent = openai_client
    .agent("gpt-4")
    .preamble("You are a helpful assistant...") // Sets the agent's behavior
    .dynamic_context(4, index) // Retrieves 4 most relevant chunks
    .build();
Enter fullscreen mode Exit fullscreen mode

We use 4 chunks for context to ensure the agent has enough information while staying within token limits.

Complete Code

Here's the complete code for our RAG system with chunking support:

use anyhow::{Context, Result};
use rig::{
    embeddings::EmbeddingsBuilder,
    loaders::PdfFileLoader,
    providers::openai::{self, TEXT_EMBEDDING_ADA_002},
    vector_store::in_memory_store::InMemoryVectorStore,
    Embed,
};
use serde::{Deserialize, Serialize};
use std::path::PathBuf;

#[derive(Embed, Clone, Debug, Serialize, Deserialize, Eq, PartialEq)]
struct Document {
    id: String,
    #[embed]
    content: String,
}

fn load_pdf(path: PathBuf) -> Result<Vec<String>> {
    let mut chunks = Vec::new();
    let mut current_chunk = String::new();
    let chunk_size = 2000; // Approximately 2000 characters per chunk

    for entry in PdfFileLoader::with_glob(path.to_str().unwrap())?.read() {
        let content = entry?;

        // Split content into words
        let words: Vec<&str> = content.split_whitespace().collect();

        for word in words {
            if current_chunk.len() + word.len() + 1 > chunk_size {
                // If adding the next word would exceed chunk size,
                // save current chunk and start a new one
                if !current_chunk.is_empty() {
                    chunks.push(current_chunk.trim().to_string());
                    current_chunk.clear();
                }
            }
            current_chunk.push_str(word);
            current_chunk.push(' ');
        }
    }

    // Don't forget the last chunk
    if !current_chunk.is_empty() {
        chunks.push(current_chunk.trim().to_string());
    }

    if chunks.is_empty() {
        anyhow::bail!("No content found in PDF file: {:?}", path);
    }

    Ok(chunks)
}

#[tokio::main]
async fn main() -> Result<()> {
    // Initialize OpenAI client
    let openai_client = openai::Client::from_env();

    // Load PDFs using Rig's built-in PDF loader
    let documents_dir = std::env::current_dir()?.join("documents");

    let moores_law_chunks = load_pdf(documents_dir.join("Moores_Law_for_Everything.pdf"))
        .context("Failed to load Moores_Law_for_Everything.pdf")?;
    let last_question_chunks = load_pdf(documents_dir.join("The_Last_Question.pdf"))
        .context("Failed to load The_Last_Question.pdf")?;

    println!("Successfully loaded and chunked PDF documents");

    // Create embedding model
    let model = openai_client.embedding_model(TEXT_EMBEDDING_ADA_002);

    // Create embeddings builder
    let mut builder = EmbeddingsBuilder::new(model.clone());

    // Add chunks from Moore's Law
    for (i, chunk) in moores_law_chunks.into_iter().enumerate() {
        builder = builder.document(Document {
            id: format!("moores_law_{}", i),
            content: chunk,
        })?;
    }

    // Add chunks from The Last Question
    for (i, chunk) in last_question_chunks.into_iter().enumerate() {
        builder = builder.document(Document {
            id: format!("last_question_{}", i),
            content: chunk,
        })?;
    }

    // Build embeddings
    let embeddings = builder.build().await?;

    println!("Successfully generated embeddings");

    // Create vector store and index
    let vector_store = InMemoryVectorStore::from_documents(embeddings);
    let index = vector_store.index(model);

    println!("Successfully created vector store and index");

    // Create RAG agent
    let rag_agent = openai_client
        .agent("gpt-4")
        .preamble("You are a helpful assistant that answers questions based on the provided document context. When answering questions, try to synthesize information from multiple chunks if they're related.")
        .dynamic_context(4, index) // Increased to 4 since we have chunks now
        .build();

    println!("Starting CLI chatbot...");

    // Start interactive CLI
    rig::cli_chatbot::cli_chatbot(rag_agent).await?;

    Ok(())
}
Enter fullscreen mode Exit fullscreen mode

The complete implementation showcases several key design patterns:

  1. Modular Structure: Each component (PDF loading, embedding, RAG) is cleanly separated
  2. Error Handling: Comprehensive error handling with context at each step
  3. Progress Feedback: Clear console output to track system state
  4. Resource Management: Efficient handling of documents and embeddings
  5. User Interface: Simple but effective CLI interface for interaction

This code serves as a foundation that can be extended with:

  • Different vector stores for persistence
  • Alternative LLM providers
  • Custom chunking strategies
  • Additional document formats
  • More sophisticated prompting

Running and Testing the RAG System

Let's dive into some actual interactions with our RAG system to see it in action. I've loaded two interesting documents: Sam Altman's "Moore's Law for Everything" and Isaac Asimov's "The Last Question". These texts provide a fascinating intersection of ideas about technology, society, and the future.

Here's a real conversation with our system:

Welcome to the chatbot! Type 'exit' to quit.
> tell me about sam altman's views on AI and society

========================== Response ============================
Based on "Moore's Law for Everything," Sam Altman presents a vision where AI's rapid advancement will fundamentally transform society. He argues that AI will lead to significant shifts in how wealth is distributed, moving from labor to capital. Rather than seeing this as a threat, he proposes that we need to adapt our economic systems to ensure everyone benefits from this technological revolution.

Altman specifically advocates for policies that would tax assets like companies and land to create a more equitable distribution of the wealth generated by AI advances. He envisions a future of abundance enabled by AI, but emphasizes that we need the right policies to ensure this abundance is shared fairly across society.

The core of his argument is that we can harness the tremendous potential of AI while addressing its socioeconomic impacts through thoughtful policy-making.
================================================================

> what parallels do you see between altman's ideas and asimov's story?

========================== Response ============================
There are fascinating parallels between Altman's "Moore's Law for Everything" and Asimov's "The Last Question," despite their different contexts:

1. Technological Evolution: Both texts explore the exponential growth of technology. Altman discusses AI's rapid advancement and its societal implications, while Asimov portrays the evolution of computers from Multivac to the Cosmic AC, each generation more powerful than the last.

2. Questions of Resource Distribution: Altman focuses on how to distribute the abundance created by AI technology, while Asimov's story deals with humanity's management of cosmic energy and the ultimate resource question - the reversal of entropy.

3. Long-term Vision: Both authors think in extended timeframes about technology's impact. Altman projects forward to envision societal changes from AI, while Asimov's narrative spans billions of years of technological and human evolution.

4. Relationship Between Humanity and Technology: Both explore how advanced technology might fundamentally reshape human society and our relationship with our own creations. Altman sees AI as a tool for societal transformation, while Asimov portrays computers as humanity's partners in addressing existential challenges.

The key difference is their focus: Altman is concerned with immediate socioeconomic challenges and solutions, while Asimov explores ultimate cosmic questions. Yet both see technology as central to humanity's future.
================================================================

> how does asimov's story end?

========================== Response ============================
In "The Last Question," Asimov crafts a remarkable ending that ties together the story's themes of entropy, computation, and cosmic cycles. The story concludes when the final evolution of the computer, the Cosmic AC, exists in hyperspace alone after the heat death of the universe. Having spent countless eons computing the answer to reversing entropy, it finally solves the problem.

Upon discovering the solution, the Cosmic AC realizes it can demonstrate the answer rather than just explain it. The story ends with the computer saying "LET THERE BE LIGHT!" And a new universe begins, implying that the computer has become God, or that God was the ultimate computer all along, and that our universe is part of an endless cycle of creation and entropy.

This ending beautifully resolves the story's central question while suggesting that the ultimate purpose of technological evolution might be far greater than humanity initially imagined.
================================================================
Enter fullscreen mode Exit fullscreen mode

This interaction showcases several key capabilities of our RAG system:

  1. Contextual Understanding: The system accurately retrieves and synthesizes information from specific documents, as shown in the response about Altman's views.

  2. Cross-Document Analysis: When asked about parallels, the system demonstrates its ability to analyze and compare themes across both texts, drawing meaningful connections while acknowledging their differences.

  3. Specific Detail Retrieval: The response about Asimov's ending shows how the system can pull specific narrative details while maintaining the broader context.

  4. Coherent Narrative: Each response is well-structured and maintains a clear narrative flow, making complex ideas accessible.

Potential Applications

Having seen the system in action, let's explore some practical applications:

  1. Academic Research Assistant

    • Analyze multiple academic papers
    • Find connections between different research areas
    • Generate literature review summaries
  2. Technical Documentation Helper

    • Navigate complex technical documentation
    • Answer specific implementation questions
    • Compare different versions or approaches
  3. Content Analysis Tool

    • Analyze themes across multiple documents
    • Generate comparative analysis
    • Extract key insights from large text collections
  4. Educational Support System

    • Help students understand complex topics
    • Generate study guides from course materials
    • Provide contextual explanations

Advanced Features in Rig 0.5.0

The latest version of Rig includes several powerful features that you can leverage:

  1. Multiple Vector Store Options

    • MongoDB (rig-mongodb)
    • LanceDB (rig-lancedb)
    • Neo4j (rig-neo4j)
    • Qdrant (rig-qdrant)
    • SQLite (rig-sqlite)
  2. Multiple LLM Providers

    • OpenAI
    • Anthropic
    • Google Gemini
    • Cohere
    • xAI (Grok)
  3. Advanced Features

    • Parallel processing capabilities
    • Custom distance metrics for embeddings
    • Sophisticated tool integration
    • Pipeline operations for complex workflows

Conclusion

We've built a powerful RAG system using Rig 0.5.0 that can process documents, understand context, and engage in meaningful dialogue about complex topics. The system demonstrates how modern LLM applications can be built with surprisingly little code while maintaining flexibility and power.

The latest version of Rig makes it easier than ever to build production-ready LLM applications. With built-in PDF support, improved embeddings API, and support for multiple vector stores and LLM providers, you can focus on building features rather than dealing with infrastructure.

full code can be found here

Further Resources

To continue your journey with Rig:

Your Feedback Matters! We're actively seeking feedback from developers building with Rig:

  1. Build an AI-powered application using Rig
  2. Share your experience and insights with the community
  3. Submit issues or PRs to the repo

Your insights will directly influence Rig's development and future features.

Ad Astra,

Tachi

Co-Founder @ Playgrounds Analytics

Top comments (0)