Emmanuel Aiyenigba

Posted on Aug 14, 2023

Using Large Language Models inside your database with MindsDB

#ai #database #mindsdb #llm

Introduction

Large Language Models (LLMs) are AI algorithms trained on a large dataset through deep learning techniques to understand, predict and generate new content. LLMs are the forces that power Natural Language Processing (NLP), allowing machines to understand and respond to natural languages in text or voice data.

The versatility of LLMs depends on the volume of data they are trained on. Usually, a model may be considered a large language model if it has at least 1 billion data parameters. GPT-4 has over 1.7 trillion training data parameters, GPT-3 has 175 billion, and PaLM-2, the LLM that powers Bard, has over 360 billion parameters. From their parameters, one can tell which model is the most versatile.

Now, imagine if you can deploy these highly trained models in your database to get insights, make predictions, understand your users, auto-generate content, and more. MindsDB makes this possible! MindsDB is an open-source AI database middleware that allows you to supercharge your databases by integrating various machine learning (ML) engines.

In this article, you will learn how to supercharge your database by deploying LLMs into it using MindsDB.

This article will cover all you need to know about using LLMs in your database with MindsDB. At the end of this article, you will understand how LLMs can supercharge your database and help you perform NLP activities like text/image generation, word processing, behavior prediction, etc. You will also gain the necessary knowledge to deploy LLMs in your database. Given these points, let’s dive into the details by first learning what MindsDB is about.

What is MindsDB

MindsDB is an open-source AI database middleware that manages and connects AI modules with enterprise databases. It is a complete solution for deploying LLMs using various ML engines, carrying out vector operations, time series forecasting, and classical ML, all inside your database.

MindsDB allows you to create AI models off ML engines like OpenAI, Hugging Face, Pytorch, etc. These models are abstracted as generative AI tables. The models are assigned a prompt template that tells the AI what you want to do and how you want it done. For example, you can create a model off OpenAI’s GPT-4 LLM to auto-respond to users’ reviews of your product based on the type - good, bad, or neutral. After creating the model, you will give it a prompt template for it to know what to do, in this case, respond to reviews based on the type and how to do it - you may want the model to respond to bad reviews with an apology, good reviews with a thank you note and neutral review with a promise to do better.

MindsDB supports several databases so that you can connect your database quickly and start carrying out NLP tasks easily. MySQL, PostgreSQL, Oracle, SQLite, QuestDB, CouchDB, MongoDB, AWS DynamoDB, and many other databases are supported. It also supports several applications like Twitter, Slack, GitHub, YouTube, etc., allowing you to supercharge these applications by bringing in-database ML to them.

Let's quickly highlight the advantages of deploying and using LLMs in your database. After that, we will do a practical dive-in on deploying the GPT and DALL-E models inside your database.

Advantages of using LLMs in your database

There are so many advantages to using LLMs in your database. Here are a few:

Making insightful predictions: Using Large language Models within your database can help you make insightful predictions from previous data. For example, you can predict the future price change of a product based on available attributes like price changes in the past.
Content generation and language interpretation: LLMs allow you to carry out natural language processing like text generation, text summarization, content analyses, language interpretation, and more.
Automatically answering questions: Users and visitors on your website can get their questions automatically answered by AI models that have been trained on your product. It prevents users from waiting several hours for support staff to respond to simple questions like “What is your pricing model?”
Improve the security of your database: LLMs can be used to improve the security of your database by generating synthetic data to test the security of your database.

The advantages of using LLMs in your database are enormous. Given these points, it’s time for some practical. Let’s learn how to use the GPT and DALL-E models in our database with MindsDB.

Getting started with using LLMs in your database

We will deploy and use OpenAI’s GPT and DALL-E LLMs in our database. We will use the GPT model to analyze and respond to reviews. The response will be based on the review type - we would respond with a thank you note for a positive review, an apology for a negative review, and a promise to do better for a neutral review.
For the DALL-E model, we will be using it to generate realistic images from a text description. We can tell the model to generate whatever images we want from our input text. Let’s get right in.

Pre-requisite

You need a MindsDB cloud account to follow this tutorial. You can also set up MindsDB via Docker or pip if you prefer to operate a local instance. A MindsDB cloud account is recommended for this tutorial (it’s easy to setup, and no configurations necessary).

Connecting a database

MindsDB only processes your data but doesn’t store it. You’d need to connect your database to MindsDB to store generated data. For this example, we would connect and use a public MySQL database as our database.



CREATE DATABASE review_response
WITH ENGINE = 'mysql',
PARAMETERS = {
    "user": "user",
    "password": "MindsDBUser123!",
    "host": "db-demo-data.cwoyhfn6bzs0.us-east-1.rds.amazonaws.com",
    "port": "3306",
    "database": "public"
};

We gave our database the name review_response. Let’s take a look at the list of tables inside our database. To do this, run the following SQL query.



SHOW FULL TABLES FROM review_response;

There are 18 tables inside this database. We will be using only the amazon_reviews table. Let’s take a look at some of its content.



SELECT * FROM review_response.amazon_reviews
limit 3

We are selecting the first 3 product reviews.

product_name	review
All-New Fire HD 8 Tablet, 8 HD Display, Wi-Fi, 16 GB - Includes Special Offers, Magenta	Late gift for my grandson. He is very happy with it. Easy for him (9yo ).
All-New Fire HD 8 Tablet, 8 HD Display, Wi-Fi, 16 GB - Includes Special Offers, Magenta	I'm not super thrilled with the proprietary OS on this unit, but it does work okay and does what I need it to do. Appearance is very nice, price is very good and I can't complain too much - just wish it were easier (or at least more obvious) to port new apps onto it. For now, it helps me see things that are too small on my phone while I'm traveling. I'm a happy buyer.
All-New Fire HD 8 Tablet, 8 HD Display, Wi-Fi, 16 GB - Includes Special Offers, Magenta	I purchased this Kindle Fire HD 8 was purchased for use by 5 and 8 yer old grandchildren. They basically use it to play Amazon games that you download.

Generating response based on review type using OpenAI’s GPT LLM

Let’s create a model that will analyze reviews and generate a response based on the review type.



CREATE MODEL review_response_model
PREDICT response
USING
    engine = 'openai',
    prompt_template = 'Give a short response to customers based on the review type.
    "I love this product": "positive"
    "It is not good" : "negative"
    "The product is fair": "neutral"
    If the review is negative, respond with an apology and promise to make it better.
    If the review is positive, respond with a thank you and ask them to refer us to their friends.
    If the review is neutral, respond with a promise to make it better next time they buy from us.
    {{review}}';
        api_key = 'YOUR_OPENAI_API_KEY';

We have created our model that uses the OpenAI’s GPT engine and would give a response for every review based on our prompt_template. The prompt_template tells the AI model what we want to do and how we want it done. We can begin using this model after it's done generating (it takes only a few seconds to generate).

Note
By default, MindsDB uses the gpt-3.5-turbo model but you can use the gpt-4 model by passing it to the model-name parameter under the USING clause.

Run the following command to check the status of the model we just created:



DESCRIBE review_response_model

The status of our model is complete which means it’s done generating and we can start using it. Do not forget to add your Open AI api_key to the list of parameters when creating a model.

For convenience, create a MindsDB ML engine to hold your API key so you don’t have to type it in every time you create a model.



CREATE ML_ENGINE openai2
FROM openai
USING
    api_key = 'YOUR_OPENAI_API_KEY';

Note
If you are on MindsDB cloud you can use the API key provided my MindsDB by confirming your email address. However, MindsDB Pro users are required to provide an OpenAI API key.

Let’s query the model with some synthetic data and see the response.



SELECT review, response
FROM review_response_model
WHERE review = "I love this product so much. I am definitely going to buy more"

Output:

review	response
I love this product so much. I am definitely going to buy more	Thank you for your positive feedback! We're thrilled to hear that you love our product. If you have any friends or family who might also enjoy it, we would greatly appreciate it if you could refer them to us. We look forward to serving you again in the future!

The response from our AI model looks great. Let’s try it out again, this time with some negative comment



SELECT review, response
FROM review_response_model
WHERE review = "I hate this product. It didn't meet the description. You guys are fraud"

Output:

review	response
I hate this product. It didn’t meet the description. You guys are fraud	I'm sorry to hear that you're unhappy with our product. We apologize for any inconvenience caused. We strive to provide the best experience for our customers and we would like to make it right. Please reach out to our customer support team so we can assist you further. Thank you for bringing this to our attention.

Great response from our AI model. Now, let’s make a batch prediction with the reviews in the amazon_reviews table.



SELECT input.review, output.response
FROM review_response.amazon_reviews AS input
JOIN review_response_model AS output
LIMIT 3;

Output:

review	response
Late gift for my grandson. He is very happy with it. Easy for him (9yo ).	Thank you for your positive review! We're glad to hear that your grandson is happy with the gift. If you have any friends or family who might also enjoy our products, we would greatly appreciate it if you could refer them to us. Thank you again for your support!
I'm not super thrilled with the proprietary OS on this unit, but it does work okay and does what I need it to do. Appearance is very nice, price is very good and I can't complain too much - just wish it were easier (or at least more obvious) to port new apps onto it. For now, it helps me see things that are too small on my phone while I'm traveling. I'm a happy buyer.	Based on the review provided, the sentiment is neutral. Response: Thank you for your feedback! We appreciate your support and are glad to hear that our product meets your needs. We understand your concerns about the proprietary OS and we will definitely take your feedback into consideration for future improvements. If you have any suggestions on how we can make it easier to port new apps, please let us know. We value your satisfaction and look forward to serving you even better next time.
I purchased this Kindle Fire HD 8 was purchased for use by 5 and 8 years old grandchildren. They basically use it to play Amazon games that you download.	Thank you for your positive review! We're glad to hear that your grandchildren are enjoying the Kindle Fire HD 8 for their gaming needs. If you have any friends or family who might also benefit from this product, we would greatly appreciate it if you could refer them to us. Thank you again for your support!

Writing good prompts is vital to generating accurate content. AI models use your prompts to respond. Therefore, poorly written prompts equals inaccurate responses.

Next, we will learn how to generate images using OpenAI’s DALL-E model.

Generating images using the DALL-E LLM



CREATE MODEL image_generation_model
PREDICT img_url
USING
   engine = 'openai',
   mode = 'image',
   prompt_template = '{{text}}, 4K | detailed and realistic portrait about money and finance |  natural lighting | different currencies';

Let’s generate an image using text. We are using the text “make money online” as our prompt.



SELECT * 
FROM image_generation_model 
WHERE text = 'make money online';

Output:

You can also generate batch images from texts inside your database.

Conclusion

Using Large Language Models in your database can help improve your product by helping you gain insights from data, make relevant predictions, understand user behavior, and generate contextually relevant human-like content. MindsDB allows you to build AI applications fast by simplifying the processes of using ML models inside your database. The models are designed to be production ready by default without the need for an ML Ops flow. MindsDB acts as a middleware for connecting your database with popular AI frameworks. It supports many ML engines, databases, and applications by default, making the process of building and deploying ML models easy. To see what’s possible for yourself, create a MindsDB free account and explore.

Do you need help creating technical content for your developer audience? Reach out and let’s work together.

Shameless Plug

If you find my content valuable and engaging, consider sharing it with your network and follow me here and on Twitter. It would mean a lot to me.

I just launched my newsletter where I’ll be sharing carefully curated technical articles and development news. If you enjoy learning about Open source software, Web engineering, Software engineering, Cloud & DevOps and other aspects of product engineering, please join my readers’ list.

Emmanuel’s Substack | Emmanuel Aiyenigba | Substack

Carefully curated engineering articles, open source software updates and development news worth knowing about. Click to read Emmanuel’s Substack, by Emmanuel Aiyenigba, a Substack publication. Launched a year ago.

emmanuelthecoder.substack.com

Top comments (4)

Paweł Ciosek • Aug 16 '23

Great Article! Cool staff! It looks like the real challange is to design accurate prompt.

Have you tried to analyze estimate cost of analyzing database? It could be common question from business owners.

Emmanuel Aiyenigba • Aug 22 '23 • Edited

Yeah, what you prompt is what you get.

Do you mean the cost for using AI/ML models in DBs in MindsDB? I think you can check out the @mindsdb pricing model. If you mean the security cost, here: "MindsDB doesn't store your data, thus giving you privacy, and it also enables you to control access, permissions and auditing for AI models."

Let me know if I have answered your question.

Tomas Fernandez • Jan 29 '24

Nice article. Very interesting use case.

It's scary to let an LLM write directly into the database. Once people gets wind that their reviews are being fed into an LLM they'll try to jailbreak it. Normally that's innocuous (for the most part), except this LLM can write the db :(

Emmanuel Aiyenigba • Feb 6 '24

Thanks @tomfern. I agree with you that it is scary to let LLM write directly into the database. But I think that this shouldn't be much of a concern if the data in the db are non-sensitive like the use case of answering a review I used in the article. Plus, you can implement safeguards to ensure that LLM outputs are not harmful to users. LLMs should never be fed with sensitive data.

DEV Community

Using Large Language Models inside your database with MindsDB

Introduction

What is MindsDB

Advantages of using LLMs in your database

Getting started with using LLMs in your database

Pre-requisite

Connecting a database

Generating response based on review type using OpenAI’s GPT LLM

Generating images using the DALL-E LLM

Conclusion

Shameless Plug

Emmanuel’s Substack | Emmanuel Aiyenigba | Substack

Top comments (4)

Read next

AI Agents Create Their Own Tools to Master 3D Spatial Reasoning

AI-Powered Stock Prediction System 15% More Accurate Using Historical Data and News Analysis

AI Model Achieves 64% Accuracy in Detecting Pronunciation Errors Using New HMamba Architecture

Million-Token AI Now Runs on Regular GPUs: New Method Slashes Memory Use by 8x