Build a chatbot to interact with your Pandas DataFrame using Reflex

#reflex #machinelearning #webdev #chatgpt

This article will guide you in creating a chatbot that allows you to upload a CSV dataset. You can then ask questions about the data, and the system, powered by a language model, will provide answers based on the uploaded CSV data.

The following is a sample of the chatbot:

We will use Reflex to build this chatbot.

Outline

Get an OpenAI API Key
Create a new folder, open it with a code editor
Create a virtual environment and activate
Install requirements
reflex setup
my_dataframe_chatbot.py
state.py
style.py
.gitignore
run app
conclusion

Get an OpenAI API Key

First, get your own OpenAI API key:

Go to https://platform.openai.com/account/api-keys.
Click on the + Create new secret key button.
Enter an identifier name (optional) and click on the Create secret key button.
Copy the API key to be used in this tutorial

Create a new folder, open it with a code editor

Create a new folder and name it my_dataframe_chatbot then open it with a code editor like VS Code.

Create a virtual environment and activate

Open the terminal. Use the following command to create a virtual environment .venv and activate it:



python3 -m venv .venv



source .venv/bin/activate

Install requirements

We will need to install reflex to build the app, pandas to read the CSV file, and also openai langchain langchain-experimental to initialize an agent to generate answers to a user's questions of an uploaded CSV file.
Run the following command in the terminal:



pip install reflex==0.3.1 pandas==2.1.1 openai==0.28.1 langchain==0.0.326 langchain-experimental==0.0.36

reflex setup

Now, we need to create the project using reflex. Run the following command to initialize the template app in my_dataframe_chatbot directory.



reflex init --template blank

The above command will create the following file structure in my_dataframe_chatbot directory:

You can run the app using the following command in your terminal to see a welcome page when you go to http://localhost:3000/ in your browser



reflex run

my_dataframe_chatbot.py

We need to build the structure and interface of the app and add components. Go to the my_dataframe_chatbot subdirectory and open the my_dataframe_chatbot.py file. This is where we will add components to build the structure and interface of the app. Add the following code to it:



import reflex as rx

from my_dataframe_chatbot import style
from my_dataframe_chatbot.state import State


def error_text() -> rx.Component:
    """return a text component to show error."""
    return rx.text(State.error_texts, text_align="center", font_weight="bold", color="red",)  


def head_text() -> rx.Component:
    """The header: return a text, text, divider"""
    return rx.vstack(
        rx.text("Chat with your data", font_size="2em", text_align="center", font_weight="bold", color="white",),
        rx.text("(Note: input your openai api key, upload your csv file then click submit to start chat)", 
                  text_align="center", color="white",),
        rx.divider(border_color="white"),
    )



def openai_key_input() -> rx.Component:
    """return a password component"""
    return rx.password(
            value=State.openai_api_key,
            placeholder="Enter your openai key",
            on_change=State.set_openai_api_key,
            style=style.openai_input_style,
    )


color = "rgb(107,99,246)"


def upload_csv():
    """The upload component."""
    return rx.vstack(
        rx.upload(
            rx.vstack(
                rx.button(
                    "Select File",
                    color=color,
                    bg="white",
                    border=f"1px solid {color}",
                ),
                rx.text(
                    "Drag and drop files here or click to select files"
                ),
                ),
                multiple=False,
                accept = {
                    "text/csv": [".csv"],  # CSV format
                },
                max_files=1,
                border=f"1px dotted {color}",
                padding="2em",
                ),
                rx.hstack(rx.foreach(rx.selected_files, rx.text)),
                rx.button(
                    "Submit to start chat",
                    on_click=lambda: State.handle_upload(
                        rx.upload_files()
                    ),
                ),
                padding="2em",
            )


def confirm_upload() -> rx.Component:
    """text component to show upload confirmation."""
    return rx.text(State.upload_confirmation, text_align="center", font_weight="bold", color="green",)  


def qa(question: str, answer: str) -> rx.Component:
    """return the chat component."""
    return rx.box(
        rx.box(
            rx.text(question, text_align="right", color="black"),
            style=style.question_style,
        ),
        rx.box(
                rx.text(answer, text_align="left", color="black"),
                style=style.answer_style,
        ),
        margin_y="1em",
    )


def chat() -> rx.Component:
    """iterate over chat_history."""
    return rx.box(
        rx.foreach(
            State.chat_history,
            lambda messages: qa(messages[0], messages[1]),
        )
    )


def loading_skeleton() -> rx.Component:
    """return the skeleton component."""
    return  rx.container(
                rx.skeleton_circle(
                            size="30px",
                            is_loaded=State.is_loaded_skeleton,
                            speed=1.5,
                            text_align="center",
                        ),  
                        display="flex",
                        justify_content="center",
                        align_items="center",
                    )



def action_bar() -> rx.Component:
    """return the chat input and ask button."""
    return rx.hstack(
        rx.input(
            value=State.question,
            placeholder="Ask a question about your data",
            on_change=State.set_question,
            style=style.input_style,
        ),
        rx.button(
            "Ask",
            on_click=State.answer,
            style=style.button_style,
        ),margin_top="3rem",
    )


def index() -> rx.Component:
    return rx.container(
        error_text(),
        head_text(),
        openai_key_input(),
        upload_csv(),
        confirm_upload(),
        chat(),
        loading_skeleton(),
        action_bar(),
    )


app = rx.App()
app.add_page(index)
app.compile()

The above code will render the text heading, an input field to enter your openai api key, a component to upload your CSV file, the chat component, and a component to input your questions to get answers.

state.py

Create a new file state.py in the my_dataframe_chatbot subdirectory and add the following code:



# import reflex
import reflex as rx

from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain.chat_models import ChatOpenAI
from langchain.agents.agent_types import AgentType

import pandas as pd

import os


class State(rx.State):

    # The current question being asked.
    question: str

    error_texts: str

    # Keep track of the chat history as a list of (question, answer) tuples.
    chat_history: list[tuple[str, str]]

    openai_api_key: str

    # The files to show.
    csv_file: list[str]

    upload_confirmation: str = ""

    file_path: str

    is_loaded_skeleton: bool = True


    async def handle_upload(
        self, files: list[rx.UploadFile]
    ):
        """Handle the upload of file(s).

        Args:
            files: The uploaded files.
        """
        for file in files:
            upload_data = await file.read()
            outfile = rx.get_asset_path(file.filename)
            self.file_path = outfile

            # Save the file.
            with open(outfile, "wb") as file_object:
                file_object.write(upload_data)

            # Update the csv_file var.
            self.csv_file.append(file.filename)

            self.upload_confirmation = "csv file uploaded successfully, you can now interact with your data"



    def answer(self):
        # turn loading state of the skeleton component to False
        self.is_loaded_skeleton = False
        yield


        # check if openai_api_key is empty to return an error
        if self.openai_api_key == "":
            self.error_texts = "enter your openai api"
            return

        # check if csv_file is empty to return an error
        if not self.csv_file:
            self.error_texts = "ensure you upload a csv file and enter your openai api key"
            return


        if os.path.exists(self.file_path):
            df = pd.read_csv(self.file_path)
        else:
            self.error_texts = "ensure you upload a csv file"
            return

        # initializes an agent for working with a chatbot and integrates it with a Pandas DataFrame
        agent = create_pandas_dataframe_agent(
                    ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613", openai_api_key=self.openai_api_key),
                    df,
                    verbose=True,
                    agent_type=AgentType.OPENAI_FUNCTIONS,
                )


        self.upload_confirmation = ""

        # Add to the answer as the chatbot responds.
        answer = ""
        self.chat_history.append((self.question, answer))
        yield

        # run the agent against a question
        output = agent.run(self.question)

        self.is_loaded_skeleton = True

        # Clear the question input.
        self.question = ""

        # Yield here to clear the frontend input before continuing.
        yield

        # update answer from output
        for item in output:
            answer += item
            self.chat_history[-1] = (
                self.chat_history[-1][0],
                answer,
            )
            yield

The above code handles the upload of files, it takes in questions and generates answers.

The handle_upload function manages the asynchronous upload of file(s) provided as a list of rx.UploadFile objects. It reads the uploaded data, specifies an output file path outfile, and saves the uploaded file. Additionally, it updates self.csv_file with the uploaded file's name and provides a confirmation message to self.upload_confirmation to indicate the successful upload of a CSV file.

The answer function interacts with OpenAI's GPT-3.5 Turbo model. It first sets loading state indicators and performs error checks, ensuring that the OpenAI API key is provided and a CSV file is uploaded. If the CSV file exists, it reads the data into a Pandas DataFrame df. The function initializes a chatbot agent and runs it, updating the conversation history as responses are received.

style.py

Create a new file style.py in the my_dataframe_chatbot subdirectory and add the following code. This will add styling to the page and components:



shadow = "rgba(0, 0, 0, 0.15) 0px 2px 8px"
chat_margin = "20%"
message_style = dict(
    padding="1em",
    border_radius="5px",
    margin_y="0.5em",
    box_shadow=shadow,
)

# Set specific styles for questions and answers.
question_style = message_style | dict(
    bg="#F5EFFE", margin_left=chat_margin
)
answer_style = message_style | dict(
    bg="#DEEAFD", margin_right=chat_margin
)

# Styles for the action bar.
input_style = dict(
    border_width="1px", padding="1em", box_shadow=shadow
)
button_style = dict(box_shadow=shadow)

# style for openai input
openai_input_style = {
    "color": "white",
    "margin-top": "3rem",
    "margin-bottom": "0.5rem",
}

.gitignore

You can add the .venv directory to the .gitignore file to get the following:



*.db
*.py[cod]
.web
__pycache__/
.venv/

Run app

Run the following in the terminal to start the app:



reflex run

You should see an interface as follows when you go to http://localhost:3000/

First, you can enter your OpenAI API key. Then, upload a CSV file. Afterward, you can inquire with the chatbot about your dataset, and it will provide responses.

I tested the app with a CSV file that also contains an age column and I have the following chat. The chatbot produced correct responses to the question I asked: