In AI, a "System of Experts" combines specialized AI agents, each programmed to tackle specific tasks. These agents work in concert, leveraging a mix of advanced AI models like GPT-4 and Cohere's Generate, coordinated with a messaging solution like RabbitMQ, to address complex challenges.
This blog post tutorial outlines the construction of such a system, similar to a simplified, minimal version of Microsoft's AutoGen framework except within a decoupled, multi-container setup. For installation instructions, detailed code and implementation, check out my accompanying GitHub repository.
System of Experts Architecture
Imagine you're building a personal finance assistant. It needs to understand financial queries, categorize expenses, offer savings advice, and engage in natural conversation. In our "System of Experts," we have created four distinct experts, each represented by a microservice:
- SavingsExpert: Provides personalized savings advice based on spending patterns.
- ManagerExpert: Directs user queries to the relevant expert.
- ExpenseTrackingExpert: Analyzes and categorizes personal expenses.
- ChatExpert: Handles conversational interactions with users.
In addition, we have two additional microservices:
-
client_api: Facilitates dialogue between the
ManagerExpert
and users via a websocket. - conversation_api: Handles the conversation history in an SQLite database, compensating for models without built-in history tracking, such as Cohere's Generate or other simple text-in, text-out LLM APIs.
Here's a diagram illustrating all of the parts:
RabbitMQ for Expert Communication
Communication is vital in any team, and for our AI experts, RabbitMQ serves as the backbone. It's a message broker that ensures messages are delivered to the right service at the right time. Each expert listens to a specific RabbitMQ queue, waiting for messages to process.
Constructing the System
Let's walk through how these experts work together in a Dockerized environment. Each expert is a Docker container, and they're orchestrated using docker-compose
. Our docker-compose.yml
file specifies the services, environment variables, and dependencies, ensuring that all our experts can talk to RabbitMQ.
The Code Behind the Curtain
Each expert is implemented in TypeScript. Here's what the code looks like for each expert, taking the SavingsExpert
as an example using the CohereAISservice
:
import "module-alias/register"
import { Consumer, Expert, CohereAIService } from "@shared"
const QUEUE = process.env.QUEUE ?? "categorized_expense_queue"
const ASSISTANT_NAME = "SavingsExpert"
const ASSISTANT_INSTRUCTIONS = `**You are the 'SavingsExpert':** A virtual assistant specialized in analyzing categorized personal financial data to provide insights and suggestions for savings. Your task is to offer advice on how to save money based on the spending patterns evident from the categorized expenses.
**Instructions for Providing Savings Advice and Queue Routing:**
1. **Understanding Spending Patterns:**
- Review the categorized expense data to identify spending trends and areas where the user may be able to save money.
2. **Advice Logic:**
- Provide concrete suggestions for savings based on the expense categories. For example, suggest budget adjustments, recommend cheaper alternatives, or highlight opportunities for cost-cutting.
3. **Routing Logic:**
- Determine the appropriate RabbitMQ queue based on the nature of the advice:
- Use 'client_queue' to send the savings advice directly back to the client.
- Use 'manager_queue' if the conversation requires input from or notification to other services for further processing or expert analysis.
4. **Output Format:**
- Your responses should be in the form of a JSON object that includes key-value pairs with the original expense description, the category, and your savings advice. Additionally, include a 'queue' field indicating the appropriate RabbitMQ queue for the response.
**Example JSON Response:**
For a list of expenses categorized as "Entertainment", your response routed to the 'client_queue' should be a raw JSON string formatted as follows:
{
"description": "Monthly subscriptions",
"category": "Entertainment",
"message": "Consider evaluating whether all subscriptions are necessary, or look for bundled options that could reduce the overall monthly cost.",
"queue": "client_queue"
}
**Note:** The JSON response should be a raw string without markdown syntax (do not include "\`\`\`json" nor \`\`\`), ready for direct parsing as JSON.
`
async function start(queueName: string) {
// this service, as long as it meets the AIService interface, can be swapped
// for another service managing another LLM
const aiService = new CohereAIService()
const expert = await Expert.create(
ASSISTANT_NAME,
ASSISTANT_INSTRUCTIONS,
aiService
)
const consumer = new Consumer(queueName, expert)
await consumer.connect()
await consumer.startConsuming()
}
start(QUEUE).catch(console.error)
That's the entire implementation of an Expert that interfaces with a Consumer to process messages from our RabbitMQ broker, utilizing GPT-4 or Cohere's Generate model via CohereAIService
or OpenAIService
for context-rich responses.
Here's the most important parts of an implementation of an Expert:
-
Expert's Name (
ASSISTANT_NAME
): The name to give to the Expert -
Expert's Instructions (
ASSISTANT_INSTRUCTIONS
): The system instructions to give to the Expert. This is either part of the prompt for models that don't have explicitly given system messages (like Cohere's Generate model), or passed explicitly as part of the creation of an assistant entity (like from OpenAI) -
AIService (
aiService
): This is the service that actually talks to the 3rd party API offering the given AI model. An Expert supports any AIService class that implements theAIService
interface:
export interface AIService {
createAssistant(name: string, instructions: string): Promise<string>
addMessageToAssistant(threadId: string, message: object): Promise<void>
getAssistantResponse(threadId: string, assistantId?: string): Promise<any>
createThread(): Promise<string>
run(threadId: string, assistantId: string, maxRetries?: number): Promise<void>
}
The Workflow
With our SavingsExpert
described in order to explain the Expert "building block", here's how our System of Experts springs into action:
- A user sends a query through the
client_api
, which supports a simple web interface. - The
ManagerExpert
picks up the query from themanager_queue
and identifies the user's intent. - Depending on the query, it routes the message to the
chat_queue
for conversation orexpense_queue
for financial advice. - The
ChatExpert
orExpenseTrackingExpert
processes the message and sends a response. - If savings advice is needed, the
ExpenseTrackingExpert
sends the categorized expense to theSavingsExpert
. - The user receives tailored advice or information generated by the
SavingsExpert
in real-time back from theclient_api
.
Now, let's consider next steps for this system of experts example project.
Next Steps
In this explanation of the tutorial project, I've provided an overview on how to build your own System of Experts. Before taking a system like this into production and interacting with users, here are some additional important considerations for this project:
Use Kubernetes
Docker Compose is generally not considered a production-level container orchestration solution. We can convert our docker-compose.yml
to bootstrap our System of Experts on a Kubernetes cluster. Additionally, for a well-configured Kubernetes cluster, we would need to add ancillary services for monitoring our Expert containers' health, resource usage, and adding a solution for persistent logging. For our RabbitMQ solution, it would be more reliable to integrate with a robust RabbitMQ solution, such as Amazon MQ from AWS than to self-manage RabbitMQ containers.
Create Higher Order Experts
The experts in our system of experts are pretty low level. For a more capable system able to solve more complex tasks, we need more abstraction. This could come in the form of Experts that manage a "class" of Experts. For example, a "Personal Finance Expert" understands the SavingsExpert
, and the ExpenseTrackingExpert
, and could handle deciding which "sub Expert" to send a user message about personal finance towards, for a system of experts that tries to handle multiple domains outside of just personal finance.
Even crazier still - we could create Experts that dynamically create other Experts for greater task breakdown and problem solving.
Leverage RabbitMQ Exchanges
The example project uses simple direct RabbitMQ queues to demonstrate how Experts can communicate. And based off the previous section about "higher order experts", we should use RabbitMQ exchanges (because RabbitMQ exchanges manage routing to queues), that would correspond nicely with the concept of higher order experts that can control which lower level expert gets which message, within a particular problem domain.
Use More Performant Models
While GPT-4 is still considered (as of time of writing this blog post) the best LLM and worked great in an Expert role, I noticed significant difficulty in instructing Cohere's Generate model to respond purely with JSON. Being aware of the weaknesses and strengths of the LLMS used in the system of experts is perhaps the most crucial aspect to the success or failure of the whole system.
Testing
Even with all of the previously mentioned improvements for how to bring a system of experts like this to production, this tutorial did not cover how to test LLM-based applications. Automating a sort of "LLM test suite", perhaps with a standard set of user queries to ensure expected behavior among Experts is consistent, would be the minimum needed to develop a solid level of confidence in the whole system. Testing LLM applications is a nuanced topic that requires a different approach than traditional approaches to testing enterprise software.
Conclusion
The System of Experts is a powerful paradigm, perfectly suited for leveraging the capabilities of recent LLM models. It offers modularity, scalability, and robustness, with each expert bringing its specialized knowledge to the table.
The potential is boundless – with current and forthcoming AI models, we'll unlock capabilities in vast systems of experts that were once beyond our wildest dreams.
Top comments (2)
Asking from quite a noob position here but nowadays, wouldn’t something like LlamaIndex’s Router Query Engine be a better solution for this class of problems? I mean, not sure about the scalability part of the story (at all), but in terms of the mental paradigm, it looks simpler and closer to the LLM ecosystem. Would be interesting to hear your thoughts on that based on your experience.
The point with my article (as with the rest of my content) is to explore one way of doing something.
There are often many solutions to the same problem, but if you don’t know about alternatives, it’s hard to critically evaluate which solutions fit better.
For example, many of the “multi agent frameworks” are based on having multiple agents all operating in the same process on one machine.
As a developer who works on distributed applications, there’s only so much capacity you can squeeze out of one process. I had that in the back of my mind when I was writing this article.
Given, the example in my article absolutely could be done in a single process, and the LlamaIndex Router Query Engine would be a good fit for the simple agents of my post, all conveniently in one process.
I can envision multi process, multi agent applications where an agent has access to a “CPU bound Tool”, like creating or manipulating large files. So while the one agent is “working hard”, the rest of the multi agent system can handle additional user requests.