In modern software development, ensuring that best practices are followed across teams and projects is essential to maintain code quality, efficiency, and scalability. However, keeping everyone updated on the latest standards and best practices can be a challenge, especially when different teams work on various parts of the codebase.
An AI-powered assistant that can provide instant answers to questions about your company’s coding standards or software best practices can help alleviate this issue. Using Retrieval-Augmented Generation (RAG), you can combine the power of large language models (LLMs) with the ability to search your company’s documentation in real-time, giving employees quick access to guidelines, code examples, and answers to frequently asked questions.
In this article, we'll walk through how to build a RAG-based assistant that developers can use to query software best practices, guidelines, or standards specific to your organization.
Why RAG for Software Best Practices?
Retrieval-Augmented Generation (RAG) allows a model to fetch relevant information from external documents in real-time while generating answers. This is especially useful when dealing with dynamic and context-specific content, like company coding standards or documentation that may evolve over time.
Unlike pre-trained models that rely solely on their internal knowledge, a RAG-based assistant pulls up-to-date information from your company's repositories or documentation files, ensuring accurate, real-time responses tailored to your exact guidelines.
Prerequisites
This article uses Python and Google Colab for demonstrating RAG with LlamaIndex and Google Gemini.
Prerequisite 1:
Obtain a Google API Key. Since we're using Google Gemini as the Generative AI model, we will need it. You can create one in the Google AI Studio
Prerequisite 2:
Add the GOOGLE_API_KEY
in the Colab secrets as shown in the image below.
Prerequisite 3:
Install dependencies and authorize with Google to get the Gemini (google) API Key
!pip install llama_index
!pip install huggingface-hub
!pip install llama-index-embeddings-gemini
!pip install llama-index-llms-gemini
!pip install google-generativeai
from google.colab import auth
import os
from google.colab import userdata
auth.authenticate_user()
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
Step 1: Gather and Store Your Best Practices Documentation
Before setting up your AI assistant, collect and organize the various documents containing your company's best practices. These could include documents on coding standards, design principles, software architecture guidelines, and more. Ideally, these should be stored in a centralized repository like GitHub, Google Drive, or your company’s knowledge base.
For demonstration purposes, let’s assume you’ve stored a file in a GitHub repository that contains your company’s best practices, covering topics such as DRY (Don't Repeat Yourself), SOLID principles, and clean code practices.
Step 2: Fetching Documentation from the Repository
To enable the assistant to fetch your documents, we’ll download them from your repository using Python’s requests library. This way, your assistant will always have access to the latest version of the documentation.
# import necessary packages
from llama_index.core import SimpleDirectoryReader
from llama_index.core import VectorStoreIndex
from llama_index.core import Document
from llama_index.llms.gemini import Gemini
from llama_index.embeddings.gemini import GeminiEmbedding
from llama_index.core import Settings
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.node_parser import TokenTextSplitter
import requests
# retrieve document
tsPracticesDoc = "https://raw.githubusercontent.com/AhsanAyaz/gemini-rag-llamaindex-ts/refs/heads/main/data-sources/typescript_best_practices.txt"
response = requests.get(tsPracticesDoc)
if response.status_code == 200:
content = response.text
else:
raise Exception(f"Failed to download the file from {url}")
documents = [Document(text=content)]
Step 3: Setting Up the Language Model and Embeddings
Now, let’s configure a large language model (LLM) to handle text generation, and use Gemini embeddings to process and represent your documents in a vector space, allowing for fast and accurate retrieval. We will also configure LlamaIndex's global settings to use our models.
# models and text chunk splitters
llm = Gemini(model_name="models/gemini-1.5-flash-latest")
embed_model = GeminiEmbedding(model_name="models/embedding-001", generation_config = {"temperature": 0.7, "topP": 0.8, "topK": 40})
text_splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=20)
splitter = TokenTextSplitter(chunk_size=1024, chunk_overlap=20)
# global LlamaIndex settings
Settings.llm = llm
Settings.embed_model = embed_model
Settings.text_splitter = text_splitter
Step 4: Generating the VectorIndex and Query Engine
For optimal performance, we are breaking the documentation into smaller chunks. This allows the assistant to search for and retrieve specific sections of the text more effectively. We can now create the Vector index using LlamaIndex and create our Query Engine as follows:
index = VectorStoreIndex.from_documents(documents, show_progress=True)
query_engine = index.as_query_engine()
Step 5: Querying the AI Assistant
Once the document is indexed, your RAG-based assistant is ready to answer questions. The assistant will retrieve relevant sections from the document and generate a response based on the retrieved information.
response = query_engine.query("What are the benefits of DRY in TypeScript? Can you give some examples and give some code samples?")
print(response)
How the AI Assistant Works
- Document Retrieval: When a query is made, the assistant searches the document index using the embeddings to retrieve the most relevant chunks of text.
- Text Generation: The LLM then uses the retrieved chunks to generate a comprehensive and context-specific answer, often including code snippets and examples.
- Continuous Learning: As your documentation evolves, simply updating the repository allows the assistant to work with the latest version of your guidelines.
For instance, if an engineer asks, “What are the benefits of DRY in TypeScript? Can you give some examples and give some code samples?” the assistant might respond with:
"DRY (Don't Repeat Yourself) in TypeScript helps to avoid repetition by centralizing logic, making code more maintainable and easier to understand.
For example, instead of repeating the same calculation multiple times, you can create a function that encapsulates the logic. This makes the code more concise and easier to modify if the calculation needs to be changed.
Here's an example:
// Bad Example: Repeating logic let area1 = 10 * 20; let area2 = 15 * 30; // Good Example: Centralized logic function calculateArea(width: number, height: number): number { return width * height; } let area1 = calculateArea(10, 20); let area2 = calculateArea(15, 30);
Another benefit of DRY is that it reduces the risk of introducing bugs. When you have the same logic in multiple places, it's easy to make a mistake and update only some of the instances. By centralizing the logic, you ensure that any changes are made consistently."
Cool, right? This response not only explains the principle but also provides clear examples of how to implement it.
Step 6: Customizing the Assistant for Your Company
The RAG-based AI assistant can be customized to fit any company's needs. For example:
- Multiple Documents: You can index multiple documents, such as API design guidelines, security protocols, or team-specific coding standards.
- Different Models: Depending on the complexity of your queries, you can switch between different LLMs or fine-tune a model specifically for your use case.
- User-Specific Queries: Customize the assistant to provide different levels of detail based on the user’s role (e.g., junior developers vs. senior architects).
Conclusion
By building a RAG-based AI assistant for company-wide software best practices, you can provide your development teams with instant access to critical information, helping them follow coding standards and guidelines more effectively. This AI assistant reduces the need for manual searches, ensures consistency across teams, and scales with your company’s evolving documentation.
The assistant can also serve as a foundation for broader use cases, such as onboarding new developers, handling code reviews, or even integrating with continuous integration (CI) systems to flag violations of coding standards during development.
This is just the beginning—by leveraging RAG, you can build powerful AI tools that make knowledge accessible to everyone in your organization.
Code
The code for the tutorial can be found here:
https://github.com/AhsanAyaz/gemini-rag-llamaindex-example
Feel free to react to this post, and to give the github repo a star if you found this useful :)
Top comments (0)