Rohan Sharma

Posted on Sep 14, 2024

RAG Simplified!! 🐣

#python #ai #rag #llm

Hii Hiiiii! 👋

Are you stuck between AI and AI?? I'm too! But we have to go with the flow else we won't be able to last our impact!

This blog is about one such AI thing that is creating a promising impact in the tech world. It doesn't matter if you are a beginner or an expert, if you're working in the tech field or have an interest in it, then you must know about this.

In this blog, I'll be covering Retrieval-Augmented Generation (RAG) in detail and creating a quick prompt model using an exceptional framework LLMWARE.

Let's start... 3️⃣... 2️⃣... 1️⃣... 🤓

What is RAG??

Let's start with the basics so that you can easily understand RAG.

So, first of all, What is AI? AI or Artificial Intelligence is nothing but just the science and engineering of making intelligent machines.

Inside AI, there are so many subsets. Take a look at the diagram below:

Now, let's discuss another field of Chaos, Machine Learning(ML). As per the above diagram, it might be clear that ML is a subset of AI. ML is focused on building computer systems that learn from data. Therefore, ML is a part of the AI that processes and trains a piece of software, called a model, to make useful predictions or generate content from data.

Fun Fact: LLM is a type of artificial intelligence (AI) program and is built on machine learning. Thus, LLMs are trained on huge sets of data — hence the name "large."



AI
├── ML (Machine Learning)
│   ├── LLM (Large Language Models)
│   └── RAG (Retrieval-Augmented Generation)

But What is RAG⁉️

RAG or Retrieval-Augmented Generation is a groundbreaking AI framework (as same as NextJs is a framework of Js) for improving the quality of LLM-generated responses by grounding the model on external sources of knowledge.

RAG extends the already powerful capabilities of LLMs to specific domains or an organization's internal knowledge base, all without the need to retrain the model. It is a cost-effective approach to improving LLM output so it remains relevant, accurate, and useful in various contexts.

I hope you are somewhat clear with the RAG concept. To make the concept clearer, let's jump to the Example part, where we will be creating a simple project to test Prompt-based RAG Models using LLMWARE as the framework.

If you don't know about LLMWARE, please read the below article. It's a only pre-requisite for building the project! 😝

LLMware.ai 🤖: An Ultimate Python Toolkit for Building LLM Apps

Rohan Sharma ・ Aug 29

#python #llm #rag #ai

Let's Prompt Model with LLMWare.ai 🤖

llmware provides a unified framework for building LLM-based applications (e.g., RAG, Agents), using small, specialized models that can be deployed privately, integrated with enterprise knowledge sources safely and securely, and cost-effectively tuned and adapted for any business process.

In this example, we will illustrate:

Discovery - how to discover models in the llmware ModelCatalog.
Load Model - how to load a selected model from the catalog.
Prompt - how to create a basic prompt and run an inference with the model.

So let's start 🟩:

1️⃣ Install the llmware as explained above. Or simply run this code in the terminal:



 pip3 install llmware

2️⃣ Considering that you don't have any test questions to test this project. Therefore, you can use the below one:



def hello_world_questions():

    """ This is a set of useful test questions to do a 'hello world' but there is nothing special about the
    questions - please feel free to edit and ask your own queries with your own context passages.

    --if you are using one of the llmware models, please take note that the models have been trained to answer
    based on the information provided, so if you ask a question without passing any context passage, then
    don't be surprised if the model responds with 'Not Found.' """

    test_list = [

    {"query": "What is the total amount of the invoice?",
     "answer": "$22,500.00",
     "context": "Services Vendor Inc. \n100 Elm Street Pleasantville, NY \nTO Alpha Inc. 5900 1st Street "
                "Los Angeles, CA \nDescription Front End Engineering Service $5000.00 \n Back End Engineering"
                " Service $7500.00 \n Quality Assurance Manager $10,000.00 \n Total Amount $22,500.00 \n"
                "Make all checks payable to Services Vendor Inc. Payment is due within 30 days."
                "If you have any questions concerning this invoice, contact Bia Hermes. "
                "THANK YOU FOR YOUR BUSINESS!  INVOICE INVOICE # 0001 DATE 01/01/2022 FOR Alpha Project P.O. # 1000"},

    {"query": "What was the amount of the trade surplus?",
     "answer": "62.4 billion yen ($416.6 million)",
     "context": "Japan’s September trade balance swings into surplus, surprising expectations"
                "Japan recorded a trade surplus of 62.4 billion yen ($416.6 million) for September, "
                "beating expectations from economists polled by Reuters for a trade deficit of 42.5 "
                "billion yen. Data from Japan’s customs agency revealed that exports in September "
                "increased 4.3% year on year, while imports slid 16.3% compared to the same period "
                "last year. According to FactSet, exports to Asia fell for the ninth straight month, "
                "which reflected ongoing China weakness. Exports were supported by shipments to "
                "Western markets, FactSet added. — Lim Hui Jie"}
]

    return test_list

3️⃣ Make a Python file, let's say fast_start_rag.py, and paste the below code:



import time
from llmware.prompts import Prompt
from llmware.models import ModelCatalog

def fast_start_prompting  (model_name):

    """ This is the main example script - it loads the question list, loads the model, and executes the prompts. """

    t0 = time.time()

    # load in the 'hello world' test questions above
    test_list = hello_world_questions()

    print(f"\n > Loading Model: {model_name}...")

    prompter = Prompt().load_model(model_name)

    t1 = time.time()
    print(f"\n > Model {model_name} load time: {t1-t0} seconds")

    for i, entries in enumerate(test_list):
        print(f"\n{i+1}. Query: {entries['query']}")

        #   run the prompt
        output = prompter.prompt_main(entries["query"],
                                      context=entries["context"],
                                      prompt_name="default_with_context",
                                      temperature=0.30)

        #   'output' is a dictionary with two keys - 'llm_response' and 'usage'
        #   --'llm_response' is the output from the model
        #   --'usage' is a dictionary with the usage stats

        llm_response = output["llm_response"].strip("\n")
        print(f"LLM Response: {llm_response}")

        #   note: the 'gold answer' is the answer we provided above in the hello_world question list
        print(f"Gold Answer: {entries['answer']}")

        print(f"LLM Usage: {output['usage']}")

    t2 = time.time()
    print(f"\nTotal processing time: {t2-t1} seconds")

    return 0


if __name__ == "__main__":

    #   Step 1 - we will pick a model from the ModelCatalog

    #   A few useful methods to discover and display a list of available models...

    #   all generative models
    llm_models = ModelCatalog().list_generative_models()

    #   if you only want to see the local models
    llm_local_models = ModelCatalog().list_generative_local_models()

    #   to see only the open source models
    llm_open_source_models = ModelCatalog().list_open_source_models()

    #   we will print out the local models
    for i, models in enumerate(llm_local_models):
        print("models: ", i, models["model_name"], models["model_family"])

    #   for purposes of demo, try a few selected models from the list

    #   each of these pytorch models are ~1b parameters and will run reasonably fast and accurate on CPU
    #   --per note above, may require separate pip3 install of: torch and transformers
    pytorch_generative_models = ["llmware/bling-1b-0.1", "llmware/bling-tiny-llama-v0", "llmware/bling-falcon-1b-0.1"]

    #   bling-answer-tool is 1b parameters quantized
    #   bling-phi-3-gguf is 3.8b parameters quantized
    #   dragon-yi-6b-gguf is 6b parameters quantized
    gguf_generative_models = ["bling-answer-tool", "bling-phi-3-gguf","llmware/dragon-yi-6b-gguf"]

    #   by default, we will select a gguf model requiring no additional imports
    model_name = gguf_generative_models[0]

    #   to swap in a GPT-4 openai model - uncomment these two lines
    #   model_name = "gpt-4"
    #   os.environ["USER_MANAGED_OPENAI_API_KEY"] = "<insert-your-openai-key>"

    fast_start_prompting(model_name)

4️⃣ Move to the terminal again and run the below code to run the application:



python fast_start_rag.py

Output 📃

Although the code is self-explanable (check the comments) but you might be wondering, what's just happened right now! You may have many questions. But wait! I have that explanation part, especially for visual learners. Kindly go through this link once, Prompt Models (Ex. 3): Fast Start to RAG (2024). And if you want to learn more, then go through the playlist:

Fast Start to RAG (2024 updates) - YouTube

Learn how to master the basics of RAG in this easy to follow step-by-step series of tutorials

youtube.com

Moving to the End...

Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response.

If you still have any questions, drop it in the comment section. Alternatively, you can join the LLMWare Official Discord Channel by following this link: https://discord.com/invite/fCztJQeV7J

Thank you! You're the most beautiful person! Keep learning, keep hustling. Have a good day!! 💝

Star LLMWare.ai ⭐

Top comments (27)

Rohan Sharma • Sep 14 '24

Share your thoughts and doubts here.

Also, don't forget to Star the awesome LLMWare repo

Best Codes • Sep 14 '24

Nice article! I've written my own codes for this, running models locally. I used the nomic-embed-text-v1.5 model, found here:
huggingface.co/nomic-ai/nomic-embe...

I wrote a Python script where my folder was indexed (converted to a text embedding vector database by the model), then GPT4o (or a locally running model) could use tool calling to input something specific and get relevant parts of the output. For large folders; it was a bit slow sometimes, but it worked great!

Basically, the point was to let an AI chat model be able to summarize gigantic files or entire folders on my computer for me.

I think I'm going to open source my project soon, since I used all open-source models (GPT4o is optional) to create it.

Rohan Sharma • Sep 14 '24

That's so great... The downloads are 553,239. You're really amazing. I suggest you join the llmware discord, you'll get a lot of great stuff there! The power of llmware is more based on the SSM's (small specialized models), you can read the documentation or Intro to llmware for more details!

Also, Making your project OS is great thinking if you're thinking to maximize its extent!

Best Codes • Sep 14 '24

Oh, just to be clear — I did not create that model, I only used it! 😅

I've tried a few things like LLMWare, but I usually prefer just to make my own thing, so I know how everything works. Of course, I use libraries for lots of my AI things, but mostly just the Hugging Face transformers library and a couple others.

Rohan Sharma • Sep 14 '24

Oh, just to be clear — I did not create that model, I only used it! 😅
Sorry, I misunderstood it! Nevermind, you have the capability to build one.

LLMWare is too on hugging face though 😉

Best Codes • Sep 14 '24

I'll check it out if I come across it :D

Rohan Sharma • Sep 15 '24

Great!

Luanna Meyer • Sep 20 '24

Yes, I’ve tried AI-powered medical virtual assistants, and the impact has been incredible! Virtual assistants like Kodexia can automate everything from appointment scheduling to patient inquiries, saving time and reducing administrative burden. Kodexia adapts quickly to different medical environments, offering personalized responses while maintaining data security. It’s been a game changer for clinics aiming to boost efficiency and provide real-time support without sacrificing patient care quality. Definitely worth trying if you're looking to enhance operational efficiency!

Rohan Sharma • Sep 20 '24

That looks Cool! Will try it in the future 😉

Luanna Meyer • Sep 27 '24

Thanks, Rohan! You should definitely try out Kodexia’s AI-powered chatbot in the future. It's a game-changer for streamlining customer support, automating responses, and improving overall efficiency. Let me know if you need more details on how it could help your business!

Rohan Sharma • Sep 27 '24

Hey Luaunna, Thank you! I was finding the docs of Kodexia, but unable to get them. Could you please share the link, if possible? Thank you once again for bringing this up!

Luanna Meyer • Sep 27 '24

kodexia.ai/
Here is the link of Kodexia, AI-powered Chatbot.
There are 2 buttons first one is "Get Your Free Chatbot".
Click that button and fill the form and get your free access from the Company.