The Best Way to Start a Story Is from the Beginning
In my previous post, I laid the foundations for our core structure—defining the fundamental entity structure and fleshing out the Document entity that underpins our system. With this groundwork in place, I'm excited to dive into defining the Agent as an entity.
However, I can't jump straight into that just yet. While the concepts of documents and their derived types might make intuitive sense, it's important to cover some basic concepts around Agents (a common shorthand for AI Agent).
Once I've laid out these concepts, I'll be able to define the related entities in this context, the basic actions needed for these entities, and finally present the full picture for this page of our codex.
Agents, Their Pieces and Their Wholes
To get started, AWS describes agents as:
"An artificial intelligence (AI) agent is a software program that can interact with its environment, collect data, and use the data to perform self-determined tasks to meet predetermined goals." - Link
As that implies, agents are expected to perform actions much like a "normal human" might, given the same information available to them. The most prolific agents in distribution are the myriad AI chatbots that have emerged in recent years, with several "out-of-the-box" chatbots capable of evaluating files, searching the internet, and more.
Agents reveal much more of their potential when we take things a step further—allowing these programs access to other systems via APIs or similar methods. By providing them with context-specific datasets, we enable agents to achieve their goals significantly more accurately.
To accomplish this our Agents will use:
- Large Language Models (LLM): These will translate my instructions, context, and the options available into a plan of action.
- Tool Use/Function Calling: This enables the agent to execute its plan of action by reaching out to APIs, datasets, or even executing code on its own.
- Agent Orchestration: While individual agents are powerful, orchestrating multiple agents amplifies that power, allowing them to achieve goals that are too intricate for a single agent to handle. Think of orchestration as a collaborative effort where each agent performs specialized roles, enabling them to tackle more complex problems by working in unison. This coordination—whether managing story details in a campaign or automating multifaceted workflows—opens doors to streamlined and sophisticated outcomes that one agent alone couldn't accomplish.
Let's delve a little deeper.
LLMs, Mind in the Model
At the heart of an agent's capabilities lies the Large Language Model (LLM). These models are trained on vast amounts of text data, enabling them to understand and generate human-like language. In my system, the LLM acts as the "brain" of the agent, interpreting instructions, understanding context, and formulating plans.
How LLMs Function in the Agent:
- Interpreting Instructions: The LLM processes my input—be it a command, a question, or a narrative prompt—and determines the underlying intent.
- Contextual Understanding: By leveraging the context-specific datasets I provide (like campaign notes or world-building documents), the LLM can generate responses that are highly relevant and tailored to the specific scenario.
- Generating a Plan of Action: Based on the instructions and context, the LLM devises a sequence of steps or actions that the agent should take to achieve the desired goal.
Tools, Body of Work
While the LLM provides the cognitive processing, the agent needs tools to interact with the environment and execute tasks. These tools are like the "hands" and "feet" of the agent, enabling it to perform actions beyond just thinking.
Key Tools and Functions:
- API Access: Allows the agent to retrieve data from or send commands to external services. For example, fetching the latest game notes from a database or updating a character's status.
- Dataset Interaction: Enables the agent to read from and write to specific datasets, ensuring it has the most up-to-date information and can record new insights. Code Execution: Grants the agent the ability to run code snippets, automate scripts, or perform computations necessary for its tasks.
- External Integrations: Tools that let the agent interface with other platforms, like sending messages on communication apps or updating documents.
By equipping agents with these tools, they can move from planning to action, effectively bridging the gap between decision-making and execution.
Orchestration, Living in a Society
As tasks grow in complexity, a single agent may not suffice. Orchestration comes into play by coordinating multiple agents, each with specialized roles, to work together harmoniously.
Benefits of Agent Orchestration:
- Specialization: Agents can focus on what they do best. One might handle data retrieval, another manages analysis, and a third focuses on content generation.
- Parallel Processing: Tasks can be divided among agents to be executed simultaneously, improving efficiency.
- Collaboration: Agents can share results and insights with each other, enhancing the overall capabilities of the system.
Real-World Application: LLM-as-a-Judge
Imagine a scenario where an Agent acts as a "judge" to evaluate and score the quality of outputs generated by other agents, enabling a systematic and unbiased approach to selecting the best result.
- Basic Orchestration: Start with a simple setup where one agent generates content—let’s say a product description. The LLM-as-a-Judge agent then reviews this description against specific criteria, such as clarity or conciseness, and assigns it a score along with constructive feedback. This basic orchestration not only improves content quality but also provides transparent reasoning that can be used for further refinement.
- Multi-Agent Evaluation: Now, consider a more complex situation where multiple agents produce outputs—say, three agents tasked with generating alternative headlines for a marketing campaign. The LLM-as-a-Judge then evaluates each headline, selecting the one that best meets a particular metric, such as engagement potential or brand alignment. In this setup, the judge agent doesn’t just rate outputs but actively decides which submission is the "winner," streamlining decision-making and ensuring a standard of quality.
- Scalable Orchestration for Diverse Outputs: In an even more advanced use case, orchestration could involve multiple "judged" agents working on different aspects of a task. For example, imagine a content creation pipeline where one agent generates text, another designs accompanying visuals, and a third drafts social media captions. The LLM-as-a-Judge evaluates these outputs based on coherence and alignment with campaign goals, selecting the best combination to form a cohesive final piece. This layered orchestration maximizes output quality by coordinating specialized agents and aligning their work with overarching objectives.
Defining Agents as Entities
Now that we have a clearer understanding of what our agents need to accomplish and a glimpse into how they operate, let's discuss the actual business entity. We'll be defining our Agent entity through its own instructions and the Tool sub-entities it can access to fulfill these instructions.
The Agent
The eponymous star in our Codex of Agents, the Agent entity is the cornerstone of our system. When one of our agents is invoked, it generates a Plan. During the plan's evaluation, different tools are called, various LLM models are utilized, and potentially other agents are orchestrated to meet the requirements for the agent's execution.
Properties:
- Capabilities: A list of actions or tools the agent can utilize, defining what it can do within the system.
- Context: Stores any context-specific information the agent needs to function effectively, such as current tasks or relevant data.
- State: Keeps track of the agent's current status or any intermediate data during operations.
Methods:
- generatePlan(instruction): Uses the LLM to create a plan of action based on the given instruction and context.
- To Be Continued...: You may notice the Plan type outlined isn't delved into in this document, nor are these methods around the plan / integrations actually fully defined. We'll get to that a post or two down the road so we can fully delve into Orchestration.
- executePlan(plan): Carries out the steps defined in the plan using available tools.
- interactWithAgent(otherAgent): Facilitates communication and collaboration with other agents when orchestrated.
- updateState(newState): Updates the agent's internal state based on actions taken or new information received.
Sample Code Snippet:
import { Entity } from './entity';
import { Tool } from './tool';
import { DocumentType as _DocumentType } from '@smitty/types';
interface AgentProps {
capabilities: Tool[];
context: any;
state: any;
}
class Agent extends Entity<AgentProps> {
constructor(props: AgentProps) {
super(props);
// Additional initialization if needed
}
generatePlan(instruction: string): Plan {
// Utilize LLM to generate a plan based on the instruction and context
return llm.generatePlan(instruction, this.props.context);
}
executePlan(plan: Plan): void {
// Execute each step in the plan using available tools
}
updateState(newState: any): void {
this.props.state = newState;
}
// Additional agent-specific methods...
}
The Tools
Tools are standardized components that empower agents to perform specific actions. They follow a consistent structure, making them easily integrable and reusable across different agents.
Standard Structure of Tools
- Name: A unique name for the tool, should be somewhat human readable so plans around the tool may be made easier.
- Description: Explains what the tool does, aiding the agent in selecting the appropriate tool for a task.
- Input Schema: Defines the expected input parameters, ensuring the tool is used correctly.
- Parameters (Params): Specific configurations or settings required by the tool, allowing customization for different contexts.
This design is influenced by guidelines from Anthropic's Tool Use and AWS's Bedrock Tool Documentation, which emphasize the importance of clear interfaces for tool integration.
Base Tool Interface
interface ToolInput {
// Define the structure of the input required by the tool
}
interface ToolParams {
// Define any additional parameters or configurations
}
abstract class Tool<TInput extends ToolInput, TParams extends ToolParams> {
name: string;
description: string;
params: TParams;
constructor(name: string, description: string, params: TParams) {
this.name = name;
this.description = description;
this.params = params;
}
abstract execute(input: TInput): any;
}
DynamoDB Tool
As an example, let's explore a tool that allows agents to interact with DynamoDB to retrieve items based on a given key or composite key.
Key Features:
- Configurable Parameters: Includes the table name and key schema (key names and types) necessary for querying DynamoDB.
- Input Definition: Specifies the inputs required to perform the operation, such as key values.
- Execution Logic: Implements the execute method to perform the actual DynamoDB query.
Implementation:
import { DocumentType as _DocumentType } from '@smitty/types';
import AWS from 'aws-sdk';
import { Tool } from './tool';
interface DynamoDBToolInput extends ToolInput {
key: { [key: string]: any };
}
interface DynamoDBToolParams extends ToolParams {
tableName: string;
keySchema: { [key: string]: string }; // Key names and their data types
}
class GetDynamoDBItemTool extends Tool<DynamoDBToolInput, DynamoDBToolParams> {
constructor(params: DynamoDBToolParams) {
const inputSchema: _DocumentType = {
type: 'object',
properties: {
key: {
type: 'object',
properties: Object.fromEntries(
Object.entries(params.keySchema).map(([keyName, keyType]) => [
keyName,
{ type: keyType },
])
),
required: Object.keys(params.keySchema),
},
},
required: ['key'],
};
super(
'GetDynamoDBItem',
'Retrieves an item from DynamoDB based on the provided key.',
params,
inputSchema
);
}
async execute(input: DynamoDBToolInput): Promise<any> {
// Validate input against inputSchema if necessary
// Use AWS SDK to query DynamoDB
const dynamoDB = new AWS.DynamoDB.DocumentClient();
const params = {
TableName: this.params.tableName,
Key: input.key,
};
try {
const data = await dynamoDB.get(params).promise();
return data.Item;
} catch (error) {
// Handle error
throw error;
}
}
}
When an agent needs to retrieve data from DynamoDB, it can utilize this tool by providing the necessary input, and the tool handles the rest.
Example Instantiation:
const usersKeySchema = {
userId: 'string',
};
const getUserTool = new GetDynamoDBItemTool({
tableName: 'UsersTable',
keySchema: usersKeySchema,
});
const input = {
key: { userId: '12345' },
};
getUserTool.execute(input).then(user => {
console.log('User:', user);
}).catch(error => {
console.error('Error retrieving user:', error);
});
AgentHandoff Tool
The AgentHandoffTool embodies the concept of orchestration within our system. It enables an agent to delegate tasks or pass context to another agent seamlessly.
Functionality:
- Defines Target Agent(s): Specifies the key or identifier of the agent(s) to hand off to.
- Execution Logic: Through the execute method, the tool facilitates passing the current context and appropriate input to the designated agents.
- Seamless Transition: Allows agents to collaborate by handing off tasks without interrupting the workflow. ##### Implementation:
import { DocumentType as _DocumentType } from '@smitty/types';
import { Tool } from './tool';
import { Agent } from './agent';
import { AgentRepository } from './agentRepository';
interface AgentHandoffInput extends ToolInput {
context: any;
input: any;
}
interface AgentHandoffParams extends ToolParams {
targetAgentIds: string[]; // IDs of the agents to hand off to
}
class AgentHandoffTool extends Tool<AgentHandoffInput, AgentHandoffParams> {
constructor(params: AgentHandoffParams) {
const inputSchema: _DocumentType = {
type: 'object',
properties: {
context: { type: 'object' },
input: { type: 'object' },
},
required: ['context', 'input'],
};
super(
'AgentHandoff',
'Hands off the current task and context to specified agent(s).',
params,
inputSchema
);
}
execute(input: AgentHandoffInput): any {
// Validate input against inputSchema if necessary
// Logic to pass context and input to target agents
for (const agentId of this.params.targetAgentIds) {
const targetAgent = AgentRepository.getAgentById(agentId);
if (targetAgent) {
targetAgent.receiveHandoff(input.context, input.input);
} else {
// Handle agent not found scenario
console.error(`Agent with ID ${agentId} not found.`);
}
}
return { status: 'Handoff Complete' };
}
}
Example Instantiation:
const agentHandoffTool = new AgentHandoffTool({
targetAgentIds: ['agent-abc123', 'agent-def456'],
});
const handoffInput = {
context: { sessionId: 'session-789' },
input: { task: 'Process user feedback' },
};
agentHandoffTool.execute(handoffInput);
Closing Thoughts
Phew! That was a bit dive into our agents and tools. Like characters in a story finding their roles, it's good to see our entities finding their parts in this project. We've set a solid foundation, and I can't wait to see how these agents will interact and evolve as we continue this journey.
What's Next
In our next installment, we'll embark on the adventure of defining Plans—the blueprints that guide our agents' actions. We'll explore how these plans function, how they're crafted, and how they orchestrate the harmony between agents and tools. Plus, we'll start following along with the code for this project.
Stay tuned; the plot is just getting interesting!
Helpful References
Evidently AI's LLM-as-a-Judge Guide: LLM-as-a-Judge
AWS AI Agents Overview: AWS AI Agents
Anthropic Tool Use: Tool Use with Anthropic's Models
Bedrock Tool Use: Tool Use with AWS Bedrock
Top comments (0)