Abstract
Recently, SingleStoreDB has been integrated with LangChain. In this short article, we'll walk through a quick example to demonstrate the integration and how easy it is to use these two technologies together.
The notebook file used in this article is available on GitHub.
Introduction
LangChain is a software development framework designed to simplify the creation of applications using Large Language Models (LLMs). In this short article, we'll streamline the example described in a previous article developed before the SingleStoreDB LangChain integration was announced, and show how easy it is to use SingleStoreDB and LangChain together.
As described in the previous article, we'll follow the instructions to create a SingleStoreDB Cloud account, Workspace Group, Workspace, and Notebook.
Fill out the Notebook
First, we'll install some libraries:
!pip install langchain --quiet
!pip install langchain-community --quiet
!pip install langchain-openai --quiet
!pip install nltk --quiet
!pip install openai --quiet
!pip install pdf2image --quiet
!pip install pdfminer.six --quiet
!pip install unstructured==0.10.14 --quiet
Next, we'll read in a PDF document. This is an article by Neal Leavitt titled "Whatever Happened to Object-Oriented Databases?" OODBs were an emerging technology during the late 1980s and early 1990s. We'll add leavcom.com
to the firewall when prompted. Once the address has been added to the firewall, we'll read the PDF file:
loader = OnlinePDFLoader("http://leavcom.com/pdf/DBpdf.pdf")
data = loader.load()
We can use LangChain's OnlinePDFLoader
, which makes reading a PDF file easier.
Next, we'll get some data on the document:
print (f"You have {len(data)} document(s) in your data")
print (f"There are {len(data[0].page_content)} characters in your document")
The output should be:
You have 1 document(s) in your data
There are 13040 characters in your document
We'll now split the document into pages containing 2,000 characters each:
text_splitter = RecursiveCharacterTextSplitter(
chunk_size = 2000,
chunk_overlap = 20
)
texts = text_splitter.split_documents(data)
print (f"You have {len(texts)} pages")
Next, we'll set our OpenAI API Key
:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
and use LangChain's OpenAIEmbeddings
, then store the text with the vector embeddings in the database system. This is much simpler using the LangChain integration:
embedding = OpenAIEmbeddings(model = "text-embedding-3-small")
docsearch = SingleStoreDB.from_documents(
texts,
embedding,
table_name = "pdf_docs",
distance_strategy = "DOT_PRODUCT",
)
We can now ask a question, as follows:
query_text = "Will object-oriented databases be commercially successful?"
docs = docsearch.similarity_search(query_text)
print(docs[0].page_content)
The integration again shows its power and ease of use.
Finally, we can use a GPT to provide an answer, based on the earlier question:
client = OpenAI()
prompt = f"The user asked: {query_text}. The most similar text from the document is: {docs[0].page_content}"
response = client.chat.completions.create(
model = "gpt-4o-mini",
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
)
print(response.choices[0].message.content)
Here is some example output:
While object-oriented databases are still in use and have solid niche markets,
they have not gained as much commercial success as relational databases.
Observers previously anticipated that OO databases would surpass relational
databases, especially with the emergence of multimedia data on the internet,
but this prediction did not come to fruition. However, OO databases continue
to be used in specific fields, such as CAD and telecommunications. Experts
have varying opinions on the future of OO databases, with some predicting
further decline and others seeing potential growth.
Summary
Comparing our solution in this article with the previous one, we can see that the LangChain integration provides a simpler solution. The framework abstracted the database access allowing us to focus on the business problem and providing a compelling, time-saving solution.
Top comments (0)