Jones Ndzenyuy

Posted on Nov 2, 2024 • Edited on Jan 6

Harnessing AWS Bedrock: Create a Generative AI PDF Chatbot

#devops #aws #amazonbedrock

Have you started your AI journey and want to implement a project that will help you build hands on experience with Gen AI using Amazon Bedrock? Well worry no more because in this project, I will guide you through an end to end project to build a sophisticated chatbot. It will enable users to interact with PDF documents smoothly. The app is conceived such that, a user can ask a question and the bot responds with an answer from the uploaded file. If it doesn’t find a response, it will report it. The user can upload files of maximum 200MB and the app will manage without difficulties thanks to the power of Bedrock LLM

I will guide you through a set of tools and technologies to create this application so as to guarantee reliable performance and an easy to use user interface, these tools include:

Amazon bedrock
AWS S3
AWS EC2
Docker
Langchain
Streamlit

Architecture

Principle

The app is conceived such that, when a user visits the web page, the first thing he is asked to do is to upload a pdf file. The file is processed using PyPDF and divides it into chunks. The chunks are then converted into vectors which is a representation of the PDF’s content. The generated vectors are stored in an S3 bucket for access and retrieval when the user asks a question.
When there is a query from the user, the application processes the vector from the S3 to seek similarities, it then generates a prompt with a query and context which are then used as input for our LLM(Jurassic-2 Mid) which then generates the answer for the user. The application runs in a Docker container, using Streamlit to create a visually appealing UI.

How to build It

Launch an EC2 instance

Name: pdf-Chat-Bot Instance type: t2.micro AMI: Ubuntu:latest Volume: 8GiB Security gate: Create new - inbound rules => allow 8083 from everywhere => allow ssh from my IP launch template:

#!bin/bash
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -y

Create an IAM Role for the EC2 instance to access Bedrock and S3

Go to AWS console -> IAM -> Roles -> Create role

name: pdfBotRole attach policies: - AmazonBedrockFullAccess - AmazonS3FullAccess

Attach role to EC2 instance

Go to the EC2 console, select the instance then go to Actions -> Security -> Modify IAM role
Select the IAM role previously created "pdfBotRole" and click Apply

Create S3 bucket

On the console, search S3 then create a bucket with the following(xxx are random numbers to make the bucket name unique)

name: bedrock-chatpdf-xxx region: us-east-1 allow defaults and choose create
Copy the bucket name as we'll use it in the next steps

SSH into the instance and clone Source code

Copy the public address of the instance and open a terminal
ssh -i "path/to/.pem-file" ubuntu@public-ip-address
Verify docker installation

docker ps

If docker is installed, will see a table for existing docker containers which will of course be empty.

Clone source

Open a terminal and run the following
git clone https://github.com/Ndzenyuy/chatPdf.git cd chatPdf

In the cloned source code, we have the following files/folders
Dockerfile application.py requirements.txt /images

Access for LLM models in Amazon Bedrock

On the console Amazon Bedrock -> Base models -> Model Access
Make sure you have access to Jurassic-2 Ultra and Titan Embeddings G1 - Text, if not you can request access.

Build and Run App docker image

Make sure you are inside chatPdf folder and run the following command

docker build -t chatPdf-app .

The image will be built, then we can run it with the following

docker run -d -e BUCKET_NAME="yourBucketName" -p 8083:8083 chatPdf

Now copy the public IP of the EC2 instance and type it on the browser followed by the port number 8083. For instance

XX.XX.XX.XX:8083

How to use the App

The landing page will first require the user to upload a pdf document

Either drag and drop or Click on the button "Browse files" Load the PDF document and ask questions based on its content

Conclusion

This project happens to be an innovative PDF chatbot application that will reduce significantly the time researchers spend on reading PDF of articles and books. It transforms hours of traditional page by page reading and trying to understand irrelevant information as to the current needs, into just few prompts and interactive engagements, users can efficiently understand the content, authorship, summaries and in depth knowledge of pdf documents

This app will serve as a valuable tool for students harnessing their ability of interacting with academic articles and literature. By leveraging the procedure of further breaking complex texts, it does not only save time but builds a sense of critical thinking and asking of relevant questions

DEV Community