Ilya Fastovets for Datalynx

Posted on Feb 16 • Edited on Mar 14

Memory in LLM agents

#ai #llm #openai #machinelearning

The role of memory in LLM chats

In the previous article, we discussed how the reasoning and decision-making capabilities of LLM agents can help us solve practical tasks. In the provided example we used OpenAI LLM with function calls to create this agent. Two dummy methods were implemented to demonstrate its capabilities, namely calling them one after the other, and passing the results between one another. The scratchpad that we added to the agent helped it remember the previous steps so that the next steps could be reasoned correctly.

However, as the decision-making agent was integrated into the chat, the chat itself lacked memory capability. What this means is that every message that you sent to the chat was treated as a new one. Let’s consider the following case. The agent's reasoning and decision-making are now part of the general dialogue with the same storyline. With each message, the chat decides whether to call a specific tool or to continue a conversation with a simple response. But now the user also needs the chat to remember what the user was talking about before, to be able to adjust future responses.

Let’s make an example. Now, you are developing a weather forecast assistant chat that can discuss the weather with you, and advise you on the driving conditions in the specified areas. Apart from being able to retrieve information about the weather, it also needs to remember what you were talking about in the previous messages. This is where the concept of memory comes into play.

LLM memory types

The memory in relation to the LLM chats is basically a history of messages that is provided to the chat as the input along with the most recent user message, to generate a meaningful response. Thus, the simplest form of memory is just the whole history of messages which is called conversation buffer memory.

The problem with this type of memory is that there is a limitation on the input number of tokens that the model can handle. Another consideration is the costs associated with each token used in the input. The simple solution is a conversation buffer window memory type. What it does is that it only uses the latest k messages from the chat history. The obvious problem here is that the chat would not be capable of remembering the long history of the conversation, and the memory would only be short-term.

This problem can be solved by another popular type of memory: conversation summary memory. At each step, use another LLM call to summarize the most recent user message with the previous message or the previous summary. Then, the generated summary is used in the conversation in place of the actual history of the messages. The generated summary is short, and now the chat is capable of having a long memory without using too many tokens. However, a major drawback of conversation summary memory is that important details can be lost.

To be able to handle both of these problems, a combination of the previous two approaches was introduced: a conversation summary buffer memory type. It uses the window approach with the specified window size, but instead of deleting the messages older than the window, it triggers the summary generation for them. Thus, the summary is used for long-term memory while the actual messages are used for short-term memory. This way, the chat would remember what you were talking about a long time ago, and the details of the most recent conversation. Another advantage of this approach is that the summary call is not always triggered, as it depends on the chat length. Because generating the summary is another call to the LLM, it often results in time and cost savings.

Other types of memory also exist, but they are more specific and less related to the weather assistant example we discussed. For the complete list of the memory types available in Langchain, refer to this documentation page.

Example: a driving assistant using weather data

In our example, we will use OpenWeatherMap API to retrieve real-time weather data. Before the example can be run, it is necessary to register the user on this website. Then, activate the subscription and create an API key. The API key will be activated within 2 hours of the subscription. You will be asked to provide payment info, however, a free tier is available if you don’t exceed the free tier requests limit. Use this API key to run the example.

import requests
from pydantic.v1 import BaseModel, Field
from langchain.chat_models import ChatOpenAI
from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser
from langchain.agents import AgentExecutor
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.tools.render import format_tool_to_openai_function
from langchain.agents.format_scratchpad import format_to_openai_function_messages
from langchain.memory import ConversationSummaryBufferMemory
from langchain.tools import StructuredTool

# Put your OpenAI API key here
OPENAI_API_KEY = "..."

# Put your OpenWeatherMap API key here (see instructions above)
OPEN_WEATHER_MAP_API_KEY = "..."

Define the method used in the agent

First, we define the function that we need.

Note that we are using type annotations.
This will help Langchain to properly convert the Python function to Langchain Structured Tool and to represent it as an OpenAI function in OpenAI API.

def get_weather_for_city(city_name: str, units: str="imperial") -> dict:
    """
    Fetches weather data for a specified city.

    Parameters:
    - city_name (str): The name of the city.
    - units (str): Units of measurement. "metric" for Celsius, "imperial" for Fahrenheit.

    Returns:
    - dict: Weather data for the city.
    """
    api_key = OPEN_WEATHER_MAP_API_KEY
    base_url = "http://api.openweathermap.org/data/2.5/weather?"
    complete_url = f"{base_url}appid={api_key}&q={city_name}&units={units}"

    response = requests.get(complete_url)

    if response.status_code == 200:
        return response.json()
    else:
        return {"error": "Failed to fetch data"}

Let's call the function to test if it works correctly and examine what the output looks like.

This is the output that the agent will use to generate the response.

As this is just an example, you can modify the API call to retrieve the necessary data. Use this API reference.

city_name = "London"
weather_data = get_weather_for_city(city_name)
print(weather_data)

{'coord': {'lon': -0.1257, 'lat': 51.5085}, 'weather': [{'id': 804, 'main': 'Clouds', 'description': 'overcast clouds', 'icon': '04n'}], 'base': 'stations', 'main': {'temp': 54.34, 'feels_like': 53.83, 'temp_min': 52.68, 'temp_max': 55.27, 'pressure': 1012, 'humidity': 93}, 'visibility': 10000, 'wind': {'speed': 9.22, 'deg': 200}, 'clouds': {'all': 100}, 'dt': 1707954149, 'sys': {'type': 2, 'id': 2075535, 'country': 'GB', 'sunrise': 1707895066, 'sunset': 1707930712}, 'timezone': 0, 'id': 2643743, 'name': 'London', 'cod': 200}

Define Pydantic arguments schema for these methods

To better convert Python functions to Langchain Tools, I found it helpful to also describe their inputs using Pydantic classes.

Those will be passed together with the function as arguments to the Langchain method that creates Tools from Python functions.

For some reason, Pydantic v2 is not yet supported by Langchain, note that Pydantic v1 is used here.

class GetWeatherForCityInput(BaseModel):
    """
    Pydantic schema for the get_weather function inputs.
    """
    city_name: str = Field(..., description="The name of the city for which to fetch weather data.")
    units: str = Field(default="imperial", description="Units of measurement. Use 'metric' for Celsius or 'imperial' for Fahrenheit. Defaults to 'metric'.")

Define prompts

We will use two input prompts: a system prompt and a user input prompt.

In this case, the system prompt describes what needs to be done, and the user initialization prompt contains the question in it.

We describe in detail what needs to be done in the system prompt.

Also, we will pass the chat history (memory) in the user prompt.

system_init_prompt = """
You are a driving assistant capable of accessing weather data in any location. 
With this weather data, you provide detailed information about how safe it would be to drive in this location.
If two locations are provided, you also check two or three locations between them to make sure the entire road is good to drive.
"""

user_init_prompt = """
Chat history is: {}.
The question is: {}. 
Go!
"""

Define parts of the agent using LCEL

Here, we define the parts used in the agent and create the agent and the agent executor.

First, we create the LLM object from ChatOpenAI class for OpeAI API. We pass OpenAI API key here as a parameter.

We then initialize the memory object to be used with the agent. We are using the conversation summary buffer memory type. Note that we are using GPT-3.5 instead of GPT-4 here because it is faster, cheaper, and good enough for text summarization tasks. Also, we set max token limit to 1024 tokens. This parameter determines the messages history length before we start summarizing the messages. Increase it to have a longer window of the memory, and decrease it to shorten it.

Then, we create a tools list from the Python function. Here, we use a method from StructuredTool to create the Tools. The Tools are combined in a list, and then bind() method is used to add them to the LLM object that we created above.

In the next step, we initialize the prompt object from the prompt messages that we defined above. It contains the system prompt and a formatted user init prompt.

The agent is defined using LCEL, which is a recommended way to define chains and agents in Langchain. This article describes why. The agent combines input formatting, prompt, llm with tools, and a parser. In the case of OpenAI function, it is convenient to use OpenAIFunctionsAgentOutputParser right out of the box, as we do here.

Finally, we initialize the agent executor and set verbose to True to display intermediate steps. This will help us to understand how reasoning works in Langchain Agents. The agent executor is now initialized with the 'memory' parameter that automatically loads, parses, and updates memory variables.

# Initialize the LLM
llm = ChatOpenAI(
    temperature=0.5,
    model_name="gpt-4",
    openai_api_key=OPENAI_API_KEY,
)

# Initialize the memory: conversation summary buffer
memory = ConversationSummaryBufferMemory(
    llm=ChatOpenAI(
        model_name="gpt-3.5-turbo", # Use a cheaper model to summarize the history
        openai_api_key=OPENAI_API_KEY,
    ),
    memory_key="chat_history", # What dict key to use to parse in the agent
    return_messages=True,
    max_token_limit=1024, # The bigger the limit, the more unsummarized messages
)

# Initialize the tools
tools = [
    StructuredTool.from_function(
        func=get_weather_for_city,
        args_schema=GetWeatherForCityInput,
        description="Function to get weather for specified city.",
    ), 
]
llm_with_tools = llm.bind(
    functions=[format_tool_to_openai_function(t) for t in tools]
)

# Initialize the prompt
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_init_prompt),
        ("user", user_init_prompt.format("{chat_history}", "{input}")),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ],
)

# Initialize agent
agent = (
    {
        "input": lambda x: x["input"],
        "agent_scratchpad": lambda x: format_to_openai_function_messages(
            x["intermediate_steps"]
        ),
        "chat_history": lambda x: x["chat_history"],
    }
    | prompt
    | llm_with_tools
    | OpenAIFunctionsAgentOutputParser()
)

# Initialize the agent executor
agent_executor = AgentExecutor(agent=agent, 
                               tools=tools, 
                               memory=memory,
                               verbose=True)

Run the chat with the agent executor

Let's run the chat to see how it works.

We start by asking for the driving recommendations for the trip from San Francisco to Las Vegas and get a response from the agent.

Then, we tell it that we would like to proceed to New York. If the memory works correctly, the chat assistant should remember the previous conversation and provide driving recommendations from Las Vegas to New York.

Finally, we confirm that the memory is working correctly by asking to summarize the whole trip. We type 'exit' to leave the chat.

print("Welcome to the chatbot. Type 'exit' to leave the chat.")

while True:
    user_message = input("You: ")
    if user_message.lower() == "exit":
        print("Exiting chat. Have a great day!")
        break

    response = agent_executor.invoke({"input": user_message})
    response = response.get("output")

    print(f"Chatbot: {response}")

Welcome to the chatbot. Type 'exit' to leave the chat.

You:  How safe is it to drive from San Francisco to Las Vegas? 

Entering new AgentExecutor chain...

Invoking: get_weather_for_city with {'city_name': 'San Francisco', 'units': 'imperial'}

{'coord': {'lon': -122.4194, 'lat': 37.7749}, 'weather': [{'id': 804, 'main': 'Clouds', 'description': 'overcast clouds', 'icon': '04d'}], 'base': 'stations', 'main': {'temp': 54.72, 'feels_like': 54.03, 'temp_min': 52.32, 'temp_max': 57.88, 'pressure': 1015, 'humidity': 88}, 'visibility': 10000, 'wind': {'speed': 20, 'deg': 142, 'gust': 28.99}, 'clouds': {'all': 100}, 'dt': 1707954565, 'sys': {'type': 2, 'id': 2003880, 'country': 'US', 'sunrise': 1707922868, 'sunset': 1707961609}, 'timezone': -28800, 'id': 5391959, 'name': 'San Francisco', 'cod': 200}

Invoking: get_weather_for_city with {'city_name': 'Las Vegas', 'units': 'imperial'}

{'coord': {'lon': -115.1372, 'lat': 36.175}, 'weather': [{'id': 800, 'main': 'Clear', 'description': 'clear sky', 'icon': '01d'}], 'base': 'stations', 'main': {'temp': 62.74, 'feels_like': 60.22, 'temp_min': 61.11, 'temp_max': 64.83, 'pressure': 1017, 'humidity': 32}, 'visibility': 10000, 'wind': {'speed': 10, 'deg': 90, 'gust': 0}, 'clouds': {'all': 0}, 'dt': 1707954794, 'sys': {'type': 2, 'id': 2083590, 'country': 'US', 'sunrise': 1707920986, 'sunset': 1707959997}, 'timezone': -28800, 'id': 5506956, 'name': 'Las Vegas', 'cod': 200}

Invoking: get_weather_for_city with {'city_name': 'Bakersfield', 'units': 'imperial'}

{'coord': {'lon': -119.0187, 'lat': 35.3733}, 'weather': [{'id': 800, 'main': 'Clear', 'description': 'clear sky', 'icon': '01d'}], 'base': 'stations', 'main': {'temp': 63.07, 'feels_like': 61.81, 'temp_min': 56.88, 'temp_max': 64.15, 'pressure': 1017, 'humidity': 58}, 'visibility': 10000, 'wind': {'speed': 8.05, 'deg': 320}, 'clouds': {'all': 0}, 'dt': 1707954993, 'sys': {'type': 2, 'id': 2019205, 'country': 'US', 'sunrise': 1707921850, 'sunset': 1707960995}, 'timezone': -28800, 'id': 5325738, 'name': 'Bakersfield', 'cod': 200}

Invoking: get_weather_for_city with {'city_name': 'Fresno', 'units': 'imperial'}

{'coord': {'lon': -119.8343, 'lat': 36.6666}, 'weather': [{'id': 800, 'main': 'Clear', 'description': 'clear sky', 'icon': '01d'}], 'base': 'stations', 'main': {'temp': 63.03, 'feels_like': 61.39, 'temp_min': 61.45, 'temp_max': 64, 'pressure': 1019, 'humidity': 50}, 'visibility': 10000, 'wind': {'speed': 9.22, 'deg': 170}, 'clouds': {'all': 0}, 'dt': 1707954426, 'sys': {'type': 1, 'id': 4068, 'country': 'US', 'sunrise': 1707922153, 'sunset': 1707961083}, 'timezone': -28800, 'id': 5350964, 'name': 'Fresno', 'cod': 200}

The weather conditions for the drive from San Francisco to Las Vegas are as follows:


San Francisco: The weather is overcast with clouds and the temperature is around 54.72°F. The wind speed is quite high at 20 mph, gusting up to 28.99 mph. Visibility is good at 10,000 meters.
Bakersfield: The weather is clear with a temperature of 63.07°F. The wind speed is moderate at 8.05 mph. Visibility is good at 10,000 meters.
Fresno: The weather is clear with a temperature of 63.03°F. The wind speed is moderate at 9.22 mph. Visibility is good at 10,000 meters.
Las Vegas: The weather is clear with a temperature of 62.74°F. The wind speed is moderate at 10 mph. Visibility is good at 10,000 meters.


Overall, it seems safe to drive from San Francisco to Las Vegas considering the weather conditions. However, the high wind speed in San Francisco might require extra caution. Please also consider the road conditions and traffic situation, and drive safely.

Finished chain.

Chatbot: The weather conditions for the drive from San Francisco to Las Vegas are as follows:


San Francisco: The weather is overcast with clouds and the temperature is around 54.72°F. The wind speed is quite high at 20 mph, gusting up to 28.99 mph. Visibility is good at 10,000 meters.
Bakersfield: The weather is clear with a temperature of 63.07°F. The wind speed is moderate at 8.05 mph. Visibility is good at 10,000 meters.
Fresno: The weather is clear with a temperature of 63.03°F. The wind speed is moderate at 9.22 mph. Visibility is good at 10,000 meters.
Las Vegas: The weather is clear with a temperature of 62.74°F. The wind speed is moderate at 10 mph. Visibility is good at 10,000 meters.


Overall, it seems safe to drive from San Francisco to Las Vegas considering the weather conditions. However, the high wind speed in San Francisco might require extra caution. Please also consider the road conditions and traffic situation, and drive safely.

You:  Then I would like to proceed to New York

Entering new AgentExecutor chain...

Invoking: get_weather_for_city with {'city_name': 'New York'}

{'coord': {'lon': -74.006, 'lat': 40.7143}, 'weather': [{'id': 800, 'main': 'Clear', 'description': 'clear sky', 'icon': '01n'}], 'base': 'stations', 'main': {'temp': 32.76, 'feels_like': 21.88, 'temp_min': 28.94, 'temp_max': 35.22, 'pressure': 1020, 'humidity': 41}, 'visibility': 10000, 'wind': {'speed': 17, 'deg': 314, 'gust': 17}, 'clouds': {'all': 0}, 'dt': 1707954762, 'sys': {'type': 2, 'id': 2008776, 'country': 'US', 'sunrise': 1707911526, 'sunset': 1707949714}, 'timezone': -18000, 'id': 5128581, 'name': 'New York', 'cod': 200}

Invoking: get_weather_for_city with {'city_name': 'Las Vegas'}

{'coord': {'lon': -115.1372, 'lat': 36.175}, 'weather': [{'id': 800, 'main': 'Clear', 'description': 'clear sky', 'icon': '01d'}], 'base': 'stations', 'main': {'temp': 62.73, 'feels_like': 60.21, 'temp_min': 61.11, 'temp_max': 64.83, 'pressure': 1017, 'humidity': 32}, 'visibility': 10000, 'wind': {'speed': 10, 'deg': 90, 'gust': 0}, 'clouds': {'all': 0}, 'dt': 1707954946, 'sys': {'type': 2, 'id': 2083590, 'country': 'US', 'sunrise': 1707920986, 'sunset': 1707959997}, 'timezone': -28800, 'id': 5506956, 'name': 'Las Vegas', 'cod': 200}

Invoking: get_weather_for_city with {'city_name': 'Denver'}

{'coord': {'lon': -104.9847, 'lat': 39.7392}, 'weather': [{'id': 803, 'main': 'Clouds', 'description': 'broken clouds', 'icon': '04d'}], 'base': 'stations', 'main': {'temp': 47.62, 'feels_like': 45.43, 'temp_min': 41.07, 'temp_max': 50.18, 'pressure': 1013, 'humidity': 37}, 'visibility': 10000, 'wind': {'speed': 5.01, 'deg': 0, 'gust': 11.99}, 'clouds': {'all': 75}, 'dt': 1707954392, 'sys': {'type': 2, 'id': 2004334, 'country': 'US', 'sunrise': 1707918863, 'sunset': 1707957246}, 'timezone': -25200, 'id': 5419384, 'name': 'Denver', 'cod': 200}

Invoking: get_weather_for_city with {'city_name': 'Chicago'}

{'coord': {'lon': -87.65, 'lat': 41.85}, 'weather': [{'id': 803, 'main': 'Clouds', 'description': 'broken clouds', 'icon': '04n'}], 'base': 'stations', 'main': {'temp': 41.77, 'feels_like': 40.37, 'temp_min': 38.17, 'temp_max': 45.18, 'pressure': 1022, 'humidity': 54}, 'visibility': 10000, 'wind': {'speed': 3, 'deg': 195, 'gust': 3}, 'clouds': {'all': 81}, 'dt': 1707955040, 'sys': {'type': 2, 'id': 2005153, 'country': 'US', 'sunrise': 1707914907, 'sunset': 1707952881}, 'timezone': -21600, 'id': 4887398, 'name': 'Chicago', 'cod': 200} 

The weather conditions for the drive from Las Vegas to New York, passing through Denver and Chicago, are as follows:


Las Vegas: The weather is clear with a temperature of 62.73°F. The wind speed is moderate at 10 mph. Visibility is good at 10,000 meters.
Denver: The weather is cloudy with a temperature of 47.62°F. The wind speed is light at 5.01 mph. Visibility is good at 10,000 meters.
Chicago: The weather is cloudy with a temperature of 41.77°F. The wind speed is light at 3 mph. Visibility is good at 10,000 meters.
New York: The weather is clear with a temperature of 32.76°F. The wind speed is high at 17 mph. Visibility is good at 10,000 meters.


Overall, it seems safe to drive from Las Vegas to New York considering the weather conditions. However, the high wind speed in New York might require extra caution. Please also consider the road conditions and traffic situation, and drive safely.

Finished chain.

Chatbot: The weather conditions for the drive from Las Vegas to New York, passing through Denver and Chicago, are as follows:


Las Vegas: The weather is clear with a temperature of 62.73°F. The wind speed is moderate at 10 mph. Visibility is good at 10,000 meters.
Denver: The weather is cloudy with a temperature of 47.62°F. The wind speed is light at 5.01 mph. Visibility is good at 10,000 meters.
Chicago: The weather is cloudy with a temperature of 41.77°F. The wind speed is light at 3 mph. Visibility is good at 10,000 meters.
New York: The weather is clear with a temperature of 32.76°F. The wind speed is high at 17 mph. Visibility is good at 10,000 meters.


Overall, it seems safe to drive from Las Vegas to New York considering the weather conditions. However, the high wind speed in New York might require extra caution. Please also consider the road conditions and traffic situation, and drive safely.

You:  So, summarize the whole trip

Entering new AgentExecutor chain...

The weather conditions for the entire trip from San Francisco to New York, passing through Las Vegas, Denver, and Chicago, are as follows:


San Francisco: Overcast with clouds, temperature around 54.72°F, high wind speed at 20 mph, gusting up to 28.99 mph, good visibility at 10,000 meters.
Bakersfield: Clear weather, temperature of 63.07°F, moderate wind speed at 8.05 mph, good visibility at 10,000 meters.
Fresno: Clear weather, temperature of 63.03°F, moderate wind speed at 9.22 mph, good visibility at 10,000 meters.
Las Vegas: Clear weather, temperature of 62.74°F, moderate wind speed at 10 mph, good visibility at 10,000 meters.
Denver: Cloudy, temperature of 47.62°F, light wind speed at 5.01 mph, good visibility at 10,000 meters.
Chicago: Cloudy, temperature of 41.77°F, light wind speed at 3 mph, good visibility at 10,000 meters.
New York: Clear weather, temperature of 32.76°F, high wind speed at 17 mph, good visibility at 10,000 meters.


Overall, the weather conditions seem safe for driving. However, high wind speeds in San Francisco and New York may require extra caution. It's important to also consider road conditions and traffic situation. Safe journey!

Finished chain.

Chatbot: The weather conditions for the entire trip from San Francisco to New York, passing through Las Vegas, Denver, and Chicago, are as follows:


San Francisco: Overcast with clouds, temperature around 54.72°F, high wind speed at 20 mph, gusting up to 28.99 mph, good visibility at 10,000 meters.
Bakersfield: Clear weather, temperature of 63.07°F, moderate wind speed at 8.05 mph, good visibility at 10,000 meters.
Fresno: Clear weather, temperature of 63.03°F, moderate wind speed at 9.22 mph, good visibility at 10,000 meters.
Las Vegas: Clear weather, temperature of 62.74°F, moderate wind speed at 10 mph, good visibility at 10,000 meters.
Denver: Cloudy, temperature of 47.62°F, light wind speed at 5.01 mph, good visibility at 10,000 meters.
Chicago: Cloudy, temperature of 41.77°F, light wind speed at 3 mph, good visibility at 10,000 meters.
New York: Clear weather, temperature of 32.76°F, high wind speed at 17 mph, good visibility at 10,000 meters.


Overall, the weather conditions seem safe for driving. However, high wind speeds in San Francisco and New York may require extra caution. It's important to also consider road conditions and traffic situation. Safe journey!

You:  exit

Exiting chat. Have a great day!

Conclusion

We have created an OpenAI LLM agent capable of invoking the function to retrieve weather data, to provide driving assistance for the user. Different types of LLM memory have been considered, and conversation summary buffer memory was integrated into the example. As a result, the chat now has an efficient memory, at a lower cost.

DEV Community

Memory in LLM agents

The role of memory in LLM chats

LLM memory types

Example: a driving assistant using weather data

Define the method used in the agent

Define Pydantic arguments schema for these methods

Define prompts

Define parts of the agent using LCEL

Run the chat with the agent executor

Conclusion

Top comments (0)

Read next

Introducing Coco AI in two minutes - an open-source alternative to Glean

Top MLOps Interview Questions and Answers

How Transformers Revolutionized AI with Attention Mechanisms

The Latest Trends in Telemedicine App Development