Ask Questions From PDF with LangChain and FAISS

This repository implements a pipeline for extracting, processing, and querying information from PDF documents using LangChain, FAISS, and Llama-based models. The pipeline supports querying document content with similarity-based retrieval and a retrieval-augmented generation (RAG) approach.

Features

Load and Process PDFs: Automatically extracts content from PDFs and splits it into manageable chunks.
Embeddings Generation: Utilizes Ollama-based embeddings for efficient and accurate document representation.
Vector Store Integration: Uses FAISS for similarity search and Maximum Marginal Relevance (MMR)-based retrieval.
Customizable RAG Pipeline: Combines document retrieval with Llama-based models for accurate question-answering.
Dynamic Prompting: Adopts a flexible and concise chat prompt for generating context-aware answers.

Setup Instructions

Prerequisites

Ensure you have Python installed on your system. Install the required Python libraries:

pip install -r requirements.txt

Environment Configuration

Clone the repository:

https://github.com/Sawanmahna/Ask-Questions-from-PDF-using-LLM.git
cd Ask-Questions-from-PDF-using-LLM

Create a .env file in the project root and set environment variables:

LANGCHAIN_API_KEY="your_api_key"
LANGCHAIN_PROJECT = "pdfchatnow"
LANGCHAIN_ENDPOINT = "https://api.smith.langchain.com"
LANGCHAIN_TRACING_V2=true

Suppress warnings (optional):

import warnings
warnings.filterwarnings("ignore")

Usage

1. Load PDFs

Place your PDF files in the Data/ directory. The script will automatically load and process them.

2. Process Documents

Run the script to extract and split the PDF content into manageable chunks for querying.

3. Ask Questions

You can query the processed PDFs by asking specific questions. For example:

question = "What is the invoice number?"
output = rag_chain.invoke(question)
print("Answer:", output)

4. Sample Questions

"What is the price of Web Design?"
"What is in the PDF?"
"What is the invoice date?"

Acknowledgments

LangChain: For building the framework for language model-based pipelines.
FAISS: For efficient similarity search and retrieval.
Ollama: For embedding and question-answering models.
PyMuPDF: For efficient PDF processing.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Data		Data
Ask_Question_from_PDF.html		Ask_Question_from_PDF.html
Ask_Question_from_PDF.ipynb		Ask_Question_from_PDF.ipynb
Project_Document.docx		Project_Document.docx
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ask Questions From PDF with LangChain and FAISS

Features

Setup Instructions

Prerequisites

Environment Configuration

Usage

1. Load PDFs

2. Process Documents

3. Ask Questions

4. Sample Questions

Acknowledgments

About

Releases

Packages

Languages

Sawanmahna/Ask-Questions-from-PDF-using-LLM

Folders and files

Latest commit

History

Repository files navigation

Ask Questions From PDF with LangChain and FAISS

Features

Setup Instructions

Prerequisites

Environment Configuration

Usage

1. Load PDFs

2. Process Documents

3. Ask Questions

4. Sample Questions

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages