Introduction to RAG
One of the most valuable skills you can learn today as a developer is learning how to build Retrieval Augmented Generation (RAG) applications using Large Language Models (LLMs).
Why?
Because there are over 64 zettabytes of data in the world and this doesn't even include physical data like books or physical documents. (For your reference, 1 zettabyte is a trillion gigabytes.)
Not only that, 90% percent of the world’s data was created in the last two years, and the volume of data is also doubling in size every two years. So basically, companies are swimming in mountains of data that is getting larger and larger by the day.
How will companies access and use all this data?
Everyone has by now heard of using Retrieval Augmented Generation (RAG) to find information with AI. Being able to access and use the ever-growing volumes of data is a key skill that every company needs.
Even if you know that RAG is basically a short-hand way of describing the workflow of linking documents or knowledge to LLMs, many developers have not tried or experimented with this themselves (yet).
The internet is full of lists of libraries but how to get started?
Here is a short list of the best libraries to help you start with RAG.
1. LLMWare
In LLMWare, you can upload documents, and with a few lines of code, start retrieving information. It handles the entire process that is required in RAG: document ingestion, parsing, chunking, indexing, embedding, storing to vector database, and linking to LLMs to retrieve the answer.
LLMWare is designed to be integrated and end-to-end so all of these steps are accessible out of the box. It assembles all the pieces so you don't have to.
LLMWare makes it very simple and easy to get started:
- RAG workflow through end-to-end examples in just a few lines of code
- Create a library and load files seamlessly
- Generate embeddings effortlessly
- Conduct semantic searches with ease
- Utilize any Hugging Face model or a closed-source model like GPT-4 to answer questions from the data
- Examples include RAG with no-GPU-required models
Disclaimer: I am the founder of LLMWare
2. MongoDB
MongoDB is a widely used, open-source NoSQL database program. It falls under the category of document-oriented databases, which means it stores and organizes data in a format similar to JSON documents. MongoDB is designed to be flexible and scalable, making it suitable for a variety of applications and industries.
Databases, like MongoDB, are a very important step in RAG because they store information, including important metadata, that is extracted from the document or knowledge base, before embeddings.
3. Milvus Vector DB
Milvus is an open-source vector database built to power embedding similarity search and AI applications. Milvus makes unstructured data search more accessible, and provides a consistent user experience regardless of the deployment environment.
Milvus DB, or a similar vector DB, is a crucial step in RAG. It is where vector embeddings are stored for similarity searches. This database allows people to ask questions in natural language and retrieve related results. Without a good embedding and vector DB, the LLM models will not be able to receive the right chunks of text to read.
4. Hugging Face
If you haven't visited Hugging Face, you really should. It is THE place to go for all open-source models and single-handedly saving the world from AI monopolies. Like Github is for open source Projects, Hugging Face is for open source Models. There are over 450,000 models, all FREE, for anyone who wants to use them.
Hugging Face's Transformers Library is the Go-To library that
provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.
These models can be applied on:
- Text, for tasks like text classification, information extraction, question answering, summarization, translation, and text generation, in over 100 languages.
- Images, for tasks like image classification, object detection, and segmentation.
- Audio, for tasks like speech recognition and audio classification.
Transformer models can also perform tasks on several modalities combined, such as table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.
5. Llama.cpp
No GPU? No problem!
Llama.cpp to the rescue!
The main goal of llama.cpp is to run the LLaMA model using 4-bit integer quantization on a MacBook.
- Plain C/C++ implementation without dependencies
- Apple silicon first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks
- AVX, AVX2 and AVX512 support for x86 architectures
- Mixed F16 / F32 precision
- 2-bit, 3-bit, 4-bit, 5-bit, 6-bit and 8-bit integer quantization support
- CUDA, Metal and OpenCL GPU backend support
Once quantized, larger models can be run on CPUs with very little performance loss. Look for GGUF versions of models to try with LLMWare or other RAG workflow.
This is a VERY basic overview to get you started with RAG. If you want an integrated solution that is a one-stop shop for all of these libraries seamlessly working together, visit LLMWare's GitHub library to find over 50 great examples to help you get started.
Find us in discord - we would love to hear from you!
Please be sure to visit our website llmware.ai for more information and updates.
Top comments (4)
How about Langchain or Flowise? MongoDB is just a database and so is Milvus. Huggingface is a platform for open source LLMs. You'll probably need all of this during the development of an RAG. But by the title I expected a more hands on library (like the ones I mentioned)
Hi I understand your point....I mean to say that these are libraries you need to know FOR RAG....Perhaps I should have clarified to say "...for RAG Implementation"? Without some basic understanding of these component libraries for RAG, it is very difficult to use or understand any RAG library. I have an article coming up soon for hands on RAG and LLM fine-tuning. :-)
DISCLAIMER: The author is from LLMWare AI, so this is advertising content!
I added the disclaimer not to mention that I am linking to other extremely popular libraries.