PDF-Based RAG System with Rig

A Retrieval-Augmented Generation (RAG) system built in Rust using the Rig framework. This system processes PDF documents, generates embeddings, and enables interactive Q&A based on the document content.

Features

PDF document processing with automatic chunking
OpenAI embeddings generation
In-memory vector store for document retrieval
Interactive CLI interface for Q&A
Context-aware responses using RAG

Prerequisites

Rust (latest stable version)
An OpenAI API key
PDF documents in the documents directory

Setup

Create a new Rust project:

cargo new rag_system
cd rag_system

Add the following dependencies to your Cargo.toml:

[dependencies]
rig-core = { version = "0.5.0", features = ["pdf", "derive"] }
tokio = { version = "1.34.0", features = ["full"] }
anyhow = "1.0.75"
serde = { version = "1.0", features = ["derive"] }

Set up your OpenAI API key:

export OPENAI_API_KEY=your-api-key-here

Create a documents directory and add your PDF files:

mkdir documents
# Add your PDF files to the documents directory

Project Structure

rag_system/
├── Cargo.toml
├── Cargo.lock
├── documents/
│   ├── document1.pdf
│   └── document2.pdf
└── src/
    └── main.rs

Code Overview

The system consists of several key components:

Document Structure

#[derive(Embed, Clone, Debug, Serialize, Deserialize, Eq, PartialEq)]
struct Document {
    id: String,
    #[embed]
    content: String,
}

Represents a document chunk with a unique ID and content.

PDF Processing

The load_pdf function:

Loads PDF content using Rig's built-in PDF loader
Splits content into manageable chunks (2000 characters each)
Maintains word boundaries while chunking
Handles errors gracefully

RAG Pipeline

The main pipeline:

Loads and chunks PDF documents
Generates embeddings using OpenAI's text-embedding-ada-002 model
Stores embeddings in an in-memory vector store
Creates a RAG agent with dynamic context retrieval
Provides an interactive CLI interface

Usage

Build and run the project:

cargo run

Interact with the system through the CLI:

Welcome to the chatbot! Type 'exit' to quit.
> Tell me about the main themes in the documents

Type 'exit' to quit the chatbot.

Customization

Chunk Size

Adjust the chunk size by modifying the chunk_size variable in the load_pdf function:

let chunk_size = 2000;

Context Window

Change the number of chunks used for context by modifying the dynamic_context parameter:

.dynamic_context(4, index)

Model Selection

Change the OpenAI model by modifying the model selection:

.agent("gpt-4")

Error Handling

The system includes comprehensive error handling:

PDF loading errors
OpenAI API errors
Document processing errors
Embedding generation errors

Errors are handled using the anyhow crate and include context for better debugging.

Advanced Features

Document Chunking

The system implements smart document chunking:

Preserves word boundaries
Prevents token limit issues
Enables processing of large documents

Dynamic Context

The RAG agent:

Retrieves relevant chunks based on query similarity
Synthesizes information from multiple chunks
Maintains context across questions

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
rag_system		rag_system
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF-Based RAG System with Rig

Features

Prerequisites

Setup

Project Structure

Code Overview

Document Structure

PDF Processing

RAG Pipeline

Usage

Customization

Chunk Size

Context Window

Model Selection

Error Handling

Advanced Features

Document Chunking

Dynamic Context

About

Releases

Packages

Languages

0xPlaygrounds/rig-rag-system-example

Folders and files

Latest commit

History

Repository files navigation

PDF-Based RAG System with Rig

Features

Prerequisites

Setup

Project Structure

Code Overview

Document Structure

PDF Processing

RAG Pipeline

Usage

Customization

Chunk Size

Context Window

Model Selection

Error Handling

Advanced Features

Document Chunking

Dynamic Context

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages