Stephen Collins

Posted on Nov 22, 2023

How to Build a System of Experts with LLMs

#systemofexperts #microservices #rabbitmq #llms

In AI, a "System of Experts" combines specialized AI agents, each programmed to tackle specific tasks. These agents work in concert, leveraging a mix of advanced AI models like GPT-4 and Cohere's Generate, coordinated with a messaging solution like RabbitMQ, to address complex challenges.

This blog post tutorial outlines the construction of such a system, similar to a simplified, minimal version of Microsoft's AutoGen framework except within a decoupled, multi-container setup. For installation instructions, detailed code and implementation, check out my accompanying GitHub repository.

System of Experts Architecture

Imagine you're building a personal finance assistant. It needs to understand financial queries, categorize expenses, offer savings advice, and engage in natural conversation. In our "System of Experts," we have created four distinct experts, each represented by a microservice:

SavingsExpert: Provides personalized savings advice based on spending patterns.
ManagerExpert: Directs user queries to the relevant expert.
ExpenseTrackingExpert: Analyzes and categorizes personal expenses.
ChatExpert: Handles conversational interactions with users.

In addition, we have two additional microservices:

client_api: Facilitates dialogue between the ManagerExpert and users via a websocket.
conversation_api: Handles the conversation history in an SQLite database, compensating for models without built-in history tracking, such as Cohere's Generate or other simple text-in, text-out LLM APIs.

Here's a diagram illustrating all of the parts:

RabbitMQ for Expert Communication

Communication is vital in any team, and for our AI experts, RabbitMQ serves as the backbone. It's a message broker that ensures messages are delivered to the right service at the right time. Each expert listens to a specific RabbitMQ queue, waiting for messages to process.

Constructing the System

Let's walk through how these experts work together in a Dockerized environment. Each expert is a Docker container, and they're orchestrated using docker-compose. Our docker-compose.yml file specifies the services, environment variables, and dependencies, ensuring that all our experts can talk to RabbitMQ.

The Code Behind the Curtain

Each expert is implemented in TypeScript. Here's what the code looks like for each expert, taking the SavingsExpert as an example using the CohereAISservice:



import "module-alias/register"

import { Consumer, Expert, CohereAIService } from "@shared"

const QUEUE = process.env.QUEUE ?? "categorized_expense_queue"

const ASSISTANT_NAME = "SavingsExpert"
const ASSISTANT_INSTRUCTIONS = `**You are the 'SavingsExpert':** A virtual assistant specialized in analyzing categorized personal financial data to provide insights and suggestions for savings. Your task is to offer advice on how to save money based on the spending patterns evident from the categorized expenses.

**Instructions for Providing Savings Advice and Queue Routing:**

1. **Understanding Spending Patterns:**
   - Review the categorized expense data to identify spending trends and areas where the user may be able to save money.

2. **Advice Logic:**
   - Provide concrete suggestions for savings based on the expense categories. For example, suggest budget adjustments, recommend cheaper alternatives, or highlight opportunities for cost-cutting.

3. **Routing Logic:**
   - Determine the appropriate RabbitMQ queue based on the nature of the advice:
     - Use 'client_queue' to send the savings advice directly back to the client.
     - Use 'manager_queue' if the conversation requires input from or notification to other services for further processing or expert analysis.

4. **Output Format:**
   - Your responses should be in the form of a JSON object that includes key-value pairs with the original expense description, the category, and your savings advice. Additionally, include a 'queue' field indicating the appropriate RabbitMQ queue for the response.

**Example JSON Response:**

For a list of expenses categorized as "Entertainment", your response routed to the 'client_queue' should be a raw JSON string formatted as follows:

{
  "description": "Monthly subscriptions",
  "category": "Entertainment",
  "message": "Consider evaluating whether all subscriptions are necessary, or look for bundled options that could reduce the overall monthly cost.",
  "queue": "client_queue"
}

**Note:** The JSON response should be a raw string without markdown syntax (do not include "\`\`\`json" nor \`\`\`), ready for direct parsing as JSON.
`

async function start(queueName: string) {
  // this service, as long as it meets the AIService interface, can be swapped
  // for another service managing another LLM
  const aiService = new CohereAIService()
  const expert = await Expert.create(
    ASSISTANT_NAME,
    ASSISTANT_INSTRUCTIONS,
    aiService
  )
  const consumer = new Consumer(queueName, expert)

  await consumer.connect()
  await consumer.startConsuming()
}

start(QUEUE).catch(console.error)

That's the entire implementation of an Expert that interfaces with a Consumer to process messages from our RabbitMQ broker, utilizing GPT-4 or Cohere's Generate model via CohereAIService or OpenAIService for context-rich responses.

Here's the most important parts of an implementation of an Expert:

Expert's Name (ASSISTANT_NAME): The name to give to the Expert
Expert's Instructions (ASSISTANT_INSTRUCTIONS): The system instructions to give to the Expert. This is either part of the prompt for models that don't have explicitly given system messages (like Cohere's Generate model), or passed explicitly as part of the creation of an assistant entity (like from OpenAI)
AIService (aiService): This is the service that actually talks to the 3rd party API offering the given AI model. An Expert supports any AIService class that implements the AIService interface:



export interface AIService {

  createAssistant(name: string, instructions: string): Promise<string>

  addMessageToAssistant(threadId: string, message: object): Promise<void>

  getAssistantResponse(threadId: string, assistantId?: string): Promise<any>

  createThread(): Promise<string>

  run(threadId: string, assistantId: string, maxRetries?: number): Promise<void>

}

The Workflow

With our SavingsExpert described in order to explain the Expert "building block", here's how our System of Experts springs into action:

A user sends a query through the client_api, which supports a simple web interface.
The ManagerExpert picks up the query from the manager_queue and identifies the user's intent.
Depending on the query, it routes the message to the chat_queue for conversation or expense_queue for financial advice.
The ChatExpert or ExpenseTrackingExpert processes the message and sends a response.
If savings advice is needed, the ExpenseTrackingExpert sends the categorized expense to the SavingsExpert.
The user receives tailored advice or information generated by the SavingsExpert in real-time back from the client_api.

Now, let's consider next steps for this system of experts example project.

Next Steps

In this explanation of the tutorial project, I've provided an overview on how to build your own System of Experts. Before taking a system like this into production and interacting with users, here are some additional important considerations for this project:

Use Kubernetes

Docker Compose is generally not considered a production-level container orchestration solution. We can convert our docker-compose.yml to bootstrap our System of Experts on a Kubernetes cluster. Additionally, for a well-configured Kubernetes cluster, we would need to add ancillary services for monitoring our Expert containers' health, resource usage, and adding a solution for persistent logging. For our RabbitMQ solution, it would be more reliable to integrate with a robust RabbitMQ solution, such as Amazon MQ from AWS than to self-manage RabbitMQ containers.

Create Higher Order Experts

The experts in our system of experts are pretty low level. For a more capable system able to solve more complex tasks, we need more abstraction. This could come in the form of Experts that manage a "class" of Experts. For example, a "Personal Finance Expert" understands the SavingsExpert, and the ExpenseTrackingExpert, and could handle deciding which "sub Expert" to send a user message about personal finance towards, for a system of experts that tries to handle multiple domains outside of just personal finance.

Even crazier still - we could create Experts that dynamically create other Experts for greater task breakdown and problem solving.

Leverage RabbitMQ Exchanges

The example project uses simple direct RabbitMQ queues to demonstrate how Experts can communicate. And based off the previous section about "higher order experts", we should use RabbitMQ exchanges (because RabbitMQ exchanges manage routing to queues), that would correspond nicely with the concept of higher order experts that can control which lower level expert gets which message, within a particular problem domain.

Use More Performant Models

While GPT-4 is still considered (as of time of writing this blog post) the best LLM and worked great in an Expert role, I noticed significant difficulty in instructing Cohere's Generate model to respond purely with JSON. Being aware of the weaknesses and strengths of the LLMS used in the system of experts is perhaps the most crucial aspect to the success or failure of the whole system.

Testing

Even with all of the previously mentioned improvements for how to bring a system of experts like this to production, this tutorial did not cover how to test LLM-based applications. Automating a sort of "LLM test suite", perhaps with a standard set of user queries to ensure expected behavior among Experts is consistent, would be the minimum needed to develop a solid level of confidence in the whole system. Testing LLM applications is a nuanced topic that requires a different approach than traditional approaches to testing enterprise software.

Conclusion

The System of Experts is a powerful paradigm, perfectly suited for leveraging the capabilities of recent LLM models. It offers modularity, scalability, and robustness, with each expert bringing its specialized knowledge to the table.

The potential is boundless – with current and forthcoming AI models, we'll unlock capabilities in vast systems of experts that were once beyond our wildest dreams.

Top comments (2)

Fyodor • Mar 30

Asking from quite a noob position here but nowadays, wouldn’t something like LlamaIndex’s Router Query Engine be a better solution for this class of problems? I mean, not sure about the scalability part of the story (at all), but in terms of the mental paradigm, it looks simpler and closer to the LLM ecosystem. Would be interesting to hear your thoughts on that based on your experience.

Stephen Collins • Mar 31

The point with my article (as with the rest of my content) is to explore one way of doing something.

There are often many solutions to the same problem, but if you don’t know about alternatives, it’s hard to critically evaluate which solutions fit better.

For example, many of the “multi agent frameworks” are based on having multiple agents all operating in the same process on one machine.

As a developer who works on distributed applications, there’s only so much capacity you can squeeze out of one process. I had that in the back of my mind when I was writing this article.

Given, the example in my article absolutely could be done in a single process, and the LlamaIndex Router Query Engine would be a good fit for the simple agents of my post, all conveniently in one process.

I can envision multi process, multi agent applications where an agent has access to a “CPU bound Tool”, like creating or manipulating large files. So while the one agent is “working hard”, the rest of the multi agent system can handle additional user requests.

DEV Community