DEV Community

Cover image for Building a PDF AI app with LangChain & OpenAI
Sanjeeb Kumar Sahoo
Sanjeeb Kumar Sahoo

Posted on • Updated on

Building a PDF AI app with LangChain & OpenAI

Hello Dev...

Let's start by grasping the essential basics before we begin our project.

"What is Generative AI ? "

Generative AI is a technology that uses algorithms to create content, like text, images, or even music, on its own. It's like a creative AI that can produce things by learning from existing data. It's like a smart copycat that can produce new things.

"What is LLM? "

Certainly! LLM stands for "Large Language Model." It's like a super-smart computer program that can understand and generate human-like text. Think of it as a virtual assistant that can chat with you, write articles, translate languages, and answer questions by learning from lots of text it has read. It's a powerful tool that makes computers understand and use language more like humans do.

Here are a few examples

  • GPT-3.5/ GPT-4: Developed by OpenAI, GPT-3 is a powerful (GPT-4 is even more powerful) language model that can generate human-like text. It's used in chatbots, content generation, and more.
  • Claude-2: Created by Anthropic AI, It is an alternative language model of GPT.
  • PaLM 2: Pathways Language Model 2 (PaLM 2) is a language model developed by Google.
  • LLaMA: This is the LLM used by Meta AI. Meta recently released an open-source version of LlaMA, known as LLama 2. Other Open Source LLMs are GPT-NeoX-20B, GPT-J, OPT-175B, BLOOM, MPT-30B.

You can check more Open Source models here - https://huggingface.co/blog/os-llms

In this app, we'll work with the OpenAI model.

OpenAI offers an API that allows us to use any of their models. This means we can interact with OpenAI's models using their API. You can find more information here -https://platform.openai.com/examples

To use the OpenAI API, you need to sign up for an account on OpenAI's platform. It is not free, but you will receive a signup bonus of $5 to experiment with the API when you first sign up. If you want to use it more extensively, you will need to add funds to your account. (Please note that signup bonuses are limited to certain models, such as GPT-3.5, but not GPT-4.0.)

Investing just $5 in the OpenAI API opens the door to numerous experiments. With it, you can learn, create, and make it a fantastic value for your investment.

If you want to implement basic functionality you can directly use the OpenAPI.

Explore what's possible with some example applications- https://platform.openai.com/examples

When it comes to implementing Large Language Models (LLMs) on custom data, the OpenAI Direct API can be a bit challenging to navigate. That's where Langchain comes to the rescue.

"What is Langchain ?"

LangChain is a framework that makes it easy to build AI-powered applications using large language models (LLMs).It's not only restricted to OpenAI; you can use any of the LLMs.
It provides a number of features that simplify the development process, such as:

  • Chaining LLMs: LangChain allows you to chain multiple LLMs together to create more complex and sophisticated applications. For example, you could chain one LLM to translate a text from one language to another, and then chain another LLM to summarize the translated text.

  • Using tools: LangChain can be used with other tools and resources, such as Wikipedia and Zapier. This makes it possible to build AI-powered applications that can interact with the real world in more meaningful ways. For example, you could build an application that uses LangChain to generate a list of restaurants near the user, and then uses Zapier to book a table at the user's chosen restaurant.

  • Conversational API: LangChain provides a conversational interface to its API. This makes it easier to interact with the API and to develop AI-powered applications that can have more natural conversations with users. For example, you could build an application that uses LangChain to answer customer questions in a more natural and engaging way.

Here is a simple analogy to help you understand LangChain:
Imagine that you are building a kitchen. You need to use a variety of different appliances and tools to cook a meal, such as a stove, a refrigerator, and a knife. LangChain is like a kitchen for AI developers. It provides a set of tools and resources that make it easier to build AI-powered applications.

"What is VectorDB ?"

A vector database is a type of database that stores and retrieves data in the form of vectors. Vectors are mathematical representations of data points, and they can be used to represent a wide variety of different types of data, including text, images, audio, and video.

Vector databases are particularly well-suited for applications that involve similarity search. Similarity search is the task of finding the most similar data points to a given query data point. For example, a vector database could be used to find the most similar images to a given query image, or the most similar text documents to a given query text document.

Here is an explanation of vector databases in terms of text with an example:
Imagine you have a database of text documents, such as news articles, blog posts, or product descriptions. You can use a vector database to represent each document as a vector. This vector can contain information about the document's content, such as the words that appear in the document, the frequency of those words, and the relationships between those words.
Once you have represented your documents as vectors, you can use a vector database to perform similarity search. This means that you can find the most similar documents to a given query document.
For example, let's say you have a vector database of news articles. You want to find the most similar articles to a query article about the latest iPhone release. You can use the vector database to perform a similarity search, and the database will return the articles with the most similar vectors.

Here is a simple example of how to represent a text document as a vector:
Document: "The latest iPhone release is rumored to have a new triple-lens camera system and a longer battery life."
Vector: [0.5, 0.3, 0.2, 0.1, 0.05]
The vector elements represent the following:
0.5: The frequency of the word "iPhone" in the document.
0.3: The frequency of the word "camera" in the document.
0.2: The frequency of the word "battery" in the document.
0.1: The frequency of the word "release" in the document.
0.05: The frequency of the word "latest" in the document.
(In our case we are using LanceDB you can use any of the VectorDB)

"What is Embeddings?"

Embeddings are like special number lists that represent words or data. These numbers help computers understand what words mean and how they're similar to each other.
Embeddings are fundamentally vectors, which are numerical representations of data, so to store that we need a VectorDB.

"What is OpenAI Embeddings?"

OpenAI embeddings are a specific type of embedding trained on an extensive dataset of text and code, enabling OpenAI to better understand both natural language and programming.
For example, you could use OpenAI embeddings to build a search engine that finds the most similar text documents to a given query document. You could also use OpenAI embeddings to build a recommendation system that recommends products or content to users based on their past behavior and preferences.

Image description

Let's build it.
Tech Stack -

1. NextJS (Full-stack javascript framework, easily scalable, and supports SSR and multiple other features)
2. LangChain (LLM AI javascript/python framework, supports chaining, multi LLM, text search, embedding, and many other features)
3. LanceDB ( Vector Database to store embeddings, which can be further used by LLMs)
Instead of LanceDB, you can also opt for Pinecone Cloud DB for testing, It will offer you some free storage.

Creating NextJS project

npx create-next-app@latest your-app-name
Enter fullscreen mode Exit fullscreen mode

Installing Dependencies.
then install the Javascript version of lance database

npm install vectordb
Enter fullscreen mode Exit fullscreen mode

Similarly, Install the JavaScript version of LangChain, since we'll be using Next.js as a backend. If your application is more AI-focused, prefer Python over JavaScript.

npm install -S langchain
Enter fullscreen mode Exit fullscreen mode

You can read Langchain documentation for more understanding -
https://js.langchain.com/docs/get_started

Now run the App using

npm run dev
Enter fullscreen mode Exit fullscreen mode

Try to use the node version above v16 or the latest.

Now It's time to do some coding...

or Else you can clone the project from

GitHub logo ksanjeeb / PDF-AI

PDF Chat AI with Langchain and OpenAI

PDF-AI

PDF Chat AI with Langchain and OpenAI

For this app, let's consider 4 pages:

1) Landing Page
2) Selecting the Type of Content.

Image description

Image description

3) Uploading PDF File

Image description

Image description

4) Chat with AI Page

Image description

Let's code it or else you can clone my project.

GitHub logo ksanjeeb / PDF-AI

PDF Chat AI with Langchain and OpenAI

PDF-AI

PDF Chat AI with Langchain and OpenAI






File Structure :

Image description

On the UI side, we have 3 components.
I'm using TailwindCSS for styling you can use any alternative.

UI Pages/Components :-

1)index.js (Landing Page)

Click Here:- https://github.com/ksanjeeb/PDF-AI/blob/master/src/pages/index.js

2) upload.js (Select the type and Uploading PDF)

Click Here:- https://github.com/ksanjeeb/PDF-AI/blob/master/src/pages/upload.js

3) chat.js

Click Here :- https://github.com/ksanjeeb/PDF-AI/blob/master/src/pages/chat.js

Let's write the code for the brain (Backend)

Backend/API :-

So for the basic version of the app we need 4 APIs.

  • Upload API for PDF Embedding:
    This API allows users to upload a PDF file. The system will process the file, generate embeddings, and store them in the database.

  • List API for File Metadata:
    Users can access this API to view the metadata of the uploaded PDF file. It provides details such as the file name and associated metadata, making it ideal for displaying in the user interface.
    Note :- In this project, it's important to note that we support one file at a time. If you wish to support multiple file storage, you can achieve this by implementing a robust database system with the capability to store metadata for each file.

  • Delete API for File Embedding:
    For the scenario where we deal with one file at a time, this API allows users to delete the embedding associated with that file in the database. This step is necessary before uploading a new file to ensure no interference with previous embeddings.

  • Chat API with Prompt-Based Responses:
    This API is designed to respond to user prompts. Users provide prompts, and the system generates responses based on the input by using embedding vector embedding search and LLM (GPT-3.5 and GPT-4)

In NextJS, all backend code must be written inside the /api file.

Make sure to create a .env file in the root folder and add the OpenAI API key to it.

.env.local

// Add all the secret key here
OPENAI_API_KEY="Add Your OpenAI key"
lanceDB_URI=lanceDB/vector-data 
//Creating a folder called lanceDB will store the vector data inside it.
Enter fullscreen mode Exit fullscreen mode

You can get the OpenAI API key from here by logging into your account.

Account -> View API Keys -> Generate New Key

OpenAI Platform

Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.

favicon platform.openai.com

  • \api\v1\uploadData.js (For Uploading File)
// Import necessary modules and components
import { PDFLoader } from "langchain/document_loaders/fs/pdf";
import { CSVLoader } from "langchain/document_loaders/fs/csv";
import { connect } from "vectordb";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { LanceDB } from "langchain/vectorstores/lancedb";
import { TextLoader } from "langchain/document_loaders/fs/text";

// Define the configuration for the API, specifying a request body size limit of 4MB
export const config = {
    api: {
        bodyParser: {
            sizeLimit: '4mb' 
        }
    }
}

// Helper function to create a valid string from the input (cleaning and formatting)
function makeValidString(inputString) {
    const pattern = /^[a-z0-9-]+$/;
    const lowerCaseString = inputString.toLowerCase();
    const cleanedString = lowerCaseString.replace('.pdf', '');
    const validCharacters = [];
    for (const char of cleanedString) {
        if (pattern.test(char)) {
            validCharacters.push(char);
        } else if (char === ' ') {
            validCharacters.push('-');
        }
    }
    const validString = validCharacters.join('');
    return validString;
}

// Helper function to determine the appropriate loader based on the file type
function determineLoader(type, context, res) {
    let file;
    switch (type) {
        case 'application/pdf':
            file = new Blob([context], { type: 'application/pdf' });
            return new PDFLoader(file);
        case 'application/csv':
            file = new Blob([context], { type: 'application/csv' });
            return new CSVLoader(file);
        case 'text/plain':
            file = new Blob([context], { type: 'text/plain' });
            return new TextLoader(file);
        case 'application/raw-text':
            return new TextLoader(context);
        default:
            // Handle unsupported file types by sending a response
            res.json({ success: false, error: "Unsupported file type" });
    }
}

// Define the main function to handle POST requests
export default async function POST(req, res) {
    try {
        let base64FileString, fileName, fileType, tableName, buffer;

        if (req.body.isFile) {
            // Extract relevant information from the request body
            base64FileString = req.body.file;
            fileName = req.body.fileName;
            buffer = Buffer.from(base64FileString, 'base64');
        }

        // Generate a table name based on the file name or input data
        tableName = req.body.isFile ? makeValidString(fileName) : fileName;
        fileType = req.body.fileType;

        // Determine the content source (file or input data) and create the appropriate loader
        const context = req.body.isFile ? buffer : req.body.input;
        const loader = await determineLoader(fileType, context, res);

        // Load and split the content using the chosen loader
        const splitDocuments = await loader.loadAndSplit();
        const pageContentList = [];
        const metaDataList = [];

        if (splitDocuments.length > 0) {
            // Extract page content and generate metadata for each split document
            splitDocuments?.forEach((item, index) => {
                pageContentList.push(item.pageContent);
                metaDataList.push({
                    id: index
                });
            });
        }

        // Connect to the database and create a data schema
        const db = await connect(process.env.lanceDB_URI);
        const dataSchema = [
            { vector: Array(1536), text: fileType, id: 1 }
        ];

        // Create a table in the database
        const table = await db.createTable(tableName, dataSchema);

        // Converting Text to OpenAI embedding vector and storing  into the database table using LanceDB.fromTexts()
        await LanceDB.fromTexts(
            [...pageContentList],
            [...metaDataList],
            new OpenAIEmbeddings(),
            { table }
        );

        // Send a success response along with the split documents
        res.json({ success: true });
    } catch (err) {
        // Handle errors by sending a failure response
        res.json({ success: false, error: "" + err });
    }
}

Enter fullscreen mode Exit fullscreen mode

In the provided code, we establish a connection to a database and create a table for storing OpenAI embeddings...

       const db = await connect(process.env.lanceDB_URI);
        const dataSchema = [
            { vector: Array(1536), text: fileType, id: 1 }
        ];

        const table = await db.createTable(tableName, dataSchema)
Enter fullscreen mode Exit fullscreen mode

and then, we split the content, create OpenAI embedding vectors, and store them in a table.

       await LanceDB.fromTexts(
            [...pageContentList],
            [...metaDataList],
            new OpenAIEmbeddings(),
            { table }
        );

Enter fullscreen mode Exit fullscreen mode

https://github.com/ksanjeeb/PDF-AI/blob/master/src/pages/api/v1/uploadData.js

  • \api\v1\query.js (Chat API)
// Import necessary modules and components
import { LanceDB } from "langchain/vectorstores/lancedb";
import { OpenAI } from "langchain/llms/openai";
import { VectorDBQAChain } from "langchain/chains";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { connect } from "vectordb";

// Define the main function that handles POST requests
export default async function POST(request, response) {
  try {
    // Await the request body, which contains the data from the client
    const body = await request.body;

    // Connect to the LanceDB database using the provided URI
    const db = await connect(process.env.lanceDB_URI);

    // Open a table in the database based on the index specified in the request body
    const table = await db.openTable(body.index);

    // Create a LanceDB instance with OpenAI embeddings and the selected table
    const vectorStore = new LanceDB(new OpenAIEmbeddings(), { table });

    // Create an OpenAI model instance with the specified parameters
    const model = new OpenAI({
      modelName: "gpt-3.5-turbo",
      // Additional options can be added here
    });

    // Create a VectorDBQAChain instance from the model, vectorStore, and configuration
    const chain = VectorDBQAChain.fromLLM(model, vectorStore, {
      k: 1,
      returnSourceDocuments: true,
    });

    // Call the chain with the provided query (prompt)
    chain.call({ query: body.prompt })
      .then((res) => response.json({ success: true, message: res })) // Respond with the result on success
      .catch(err => response.json({ success: false, error: "Error :" + err })) // Respond with an error on failure
      .finally(err => response.json({ success: true, message: "resolved." })); // Always respond with a "resolved" message

  } catch (e) {
    // Handle any errors that occur during the process and respond with an error message
    response.json({ success: false, error: "Error :" + e });
  }
}

Enter fullscreen mode Exit fullscreen mode

The client sends a POST request to the Langchain server with the prompt to be answered. The Langchain server then uses the VectorDBQAChain instance to perform the following steps:

  • It searches the LanceDB vector store for the most similar documents to the prompt.

  • It sends the top K documents to the OpenAI LLM for QA.

  • It returns the results of the QA query to the client.

( The variable k represents the number of documents that are considered when sending a query to the OpenAI Language Model (LLM) for Question Answering.
The k is set to 1, which means that only the top 1 document will be sent to the OpenAI model for question answering. This indicates that the model will consider and analyze the content of one document as it generates its answer.)

https://github.com/ksanjeeb/PDF-AI/blob/master/src/pages/api/v1/query.js

  • /api/v1/listIndex (Listing existing Vector Table/Files)
import { connect } from "vectordb";

export default async function GET(request, response) {
    try {
      const db = await connect(process.env.lanceDB_URI);
      const table = await db.tableNames()
      response.json({ success: true, data:table })
    } catch (e) {
      response.json({ success: false, error: "Error :" + e })
    }
  }
Enter fullscreen mode Exit fullscreen mode

https://github.com/ksanjeeb/PDF-AI/blob/master/src/pages/api/v1/listIndex.js

  • /api/v1/deleteIndex.js (Deleting the Vector Table)
import { connect } from "vectordb";

export default async function POST(request, response) {
    try {
        const indexName = request.body.index;
        const db = await connect(process.env.lanceDB_URI);
        await db.dropTable(indexName)
        response.json({ success: true })
    } catch (e) {
        response.json({ success: false, error: "Error :" + e })
    }
}
Enter fullscreen mode Exit fullscreen mode

Hooray! Your app is ready! Congratulations!

Similarly, You can implement text content-based AI Chat, read URL Content, transcripting YouTube content, and many more.
Check here to know more about different sources:- https://js.langchain.com/docs/modules/data_connection/
https://js.langchain.com/docs/integrations/document_loaders/file_loaders/
https://js.langchain.com/docs/integrations/document_loaders/web_loaders/

You can also chain LLMs for complex apps.
https://js.langchain.com/docs/modules/chains/

Integrating with other Vector DB.
https://js.langchain.com/docs/integrations/vectorstores

Integrating with different LLM and Chat models.
https://js.langchain.com/docs/integrations/llms/
https://js.langchain.com/docs/integrations/chat/

Thank you so much.
If you are having any doubt add it in the comment section.
Happy Hacking.....

Photo by Bernd 📷 Dittrich on Unsplash

Top comments (0)