Akmal Chaudhri for SingleStore

Posted on Mar 17

Quick tip: How to Build Local LLM Apps with Ollama, DeepSeek-R1 and SingleStore Kai

#singlestoredb #mongodb #deepseek #vectordatabase

Abstract

In a previous article, we saw how to use Ollama and DeepSeek-R1 with SingleStore. In this article, we'll modify the previous example to work with SingleStore Kai, a MongoDB-compatible API.

The notebook file used in this article is available on GitHub.

Introduction

We'll follow the setup instructions from the previous article. We'll also need to enable Kai in the SingleStore portal.

Fill out the notebook

We'll configure the code to use the smallest DeepSeek-R1 model, as follows:

llm = "deepseek-r1:1.5b"

ollama.pull(llm)

Using pymongo, we'll assign our database and collection:

kai_client = pymongo.MongoClient(connection_url_kai)
db = kai_client["langchain_demo"]
collection = db["langchain_docs"]

We'll use some simple documents, as follows:

documents = [
    "Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels",
    "Llamas were first domesticated and used as pack animals 4,000 to 5,000 years ago in the Peruvian highlands",
    "Llamas can grow as much as 6 feet tall though the average llama between 5 feet 6 inches and 5 feet 9 inches tall",
    "Llamas weigh between 280 and 450 pounds and can carry 25 to 30 percent of their body weight",
    "Llamas are vegetarians and have very efficient digestive systems",
    "Llamas live to be about 20 years old, though some only live for 15 years and others live to be 30 years old"
]

embeddings = OllamaEmbeddings(
    model = embedding_model,
)

dimensions = len(embeddings.embed_query(documents[0]))

docs = [Document(text) for text in documents]

and then create the vector index:

collection.create_index(
    [("embedding", "vector")],
    name = "vector_index",
    kaiIndexOptions = {
        "index_type": "AUTO",
        "metric_type": "EUCLIDEAN_DISTANCE",
        "dimensions": dimensions
    }
)

We'll use LangChain's MongoDBAtlasVectorSearch to store the vector embeddings and documents, as follows:

docsearch = MongoDBAtlasVectorSearch.from_documents(
    docs,
    embeddings,
    collection = collection,
    index_name = "vector_index"
)

Next, we'll use the following prompt:

prompt = "What animals are llamas related to?"
docs = docsearch.similarity_search(prompt)
data = docs[0].page_content
print(data)

Example output:

Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels

We'll then use the prompt and response as input to DeepSeek-R1, as follows:

output = ollama.generate(
    model = llm,
    prompt = f"Using this data: {data}. Respond to this prompt: {prompt}."
)

content = output["response"]
remove_think_tags = True

if remove_think_tags:
    content = re.sub(r"<think>.*?</think>", "", content, flags = re.DOTALL)

print(content)

We'll disable <think> and </think> using a flag so that we can control the output of its reasoning process.

Example output:

Llamas are tailless tail Animals belonging to the Largilphas family. They share a lineage with vicuñas, which are also part of the camelid family, indicating close relatedness. However, unlike vicuñas, llamas do not possess tails, distinguishing them from other tailless tail Animals like the wild boars.

The output contains errors, including a contradictory phrase, "tailless tail Animals." Additionally, llamas belong to the Camelidae family, not the incorrectly stated "Largilphas" family. The claim that llamas do not possess tails is also incorrect, as they do have short tails. Furthermore, the comparison with wild boars is misleading because wild boars do have tails, making them an unsuitable example of a "tailless" animal. These inaccuracies make the output incorrect and confusing.

Summary

The ouput from DeepSeek-R1 is incorrect and confusing due to contradictory wording, factual errors about llamas' classification and physical traits, and a misleading comparison with wild boars.

DEV Community

Quick tip: How to Build Local LLM Apps with Ollama, DeepSeek-R1 and SingleStore Kai

Abstract

Introduction

Fill out the notebook

Summary

Top comments (0)

Read next

DeepSeek Open Source Week Day 4: DualPipe and EPLB

Recreating the Dev Reaction Element using NextJS and Spring Boot

Vector Search Demystified: A Guide to pgvector, IVFFlat, and HNSW

esProc SPL & MongoDB: A Match Made in Data Heaven

Okay