DEV Community

Cover image for Calling code with local LLM is a hoax
Emilien Lancelot
Emilien Lancelot

Posted on

Calling code with local LLM is a hoax

Image description

Having a local LLM spewing text is good. But what you need is the LLM to execute YOUR code!

Introduction

Is calling tools even doable? Sure chatGPT makes it easy. But what of your local LLMs. In this article, we'll be trying multiple agent frameworks with tool-calling capabilities and see if our local LLM can use them.

My configuration is:

  • RTX4090 with 32GB of RAM

Using the following LLMs for testing:

  • llama3:8b
  • dolphin-mixtral:8x7b-v2.7-q4_K_M
  • mistral:latest

Powered locally by Ollama.

I. AutoGPT

Image description

AutoGPT is a framework that seems nice. It has a cool CLI and a flutter UI to create agents from the browser. Its main purpose is to work with your local stuff (documents, audio, videos, etc)

BUT

It mostly relies on chatGPT or any proprietary LLM providers to do the heavy lifting. At least that's how I understand it.

Using local models

Here you can find the configuration file where we must set our config.

We must trick AutoGPT to use the Ollama endpoint like it's chatGPT.



## OPENAI_API_KEY - OpenAI API Key (Example: sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
OPENAI_API_KEY="helloworld"

...

## OPENAI_API_BASE_URL - Custom url for the OpenAI API, useful for connecting to custom backends. No effect if USE_AZURE is true, leave blank to keep the default url
# the following is an example:
OPENAI_API_BASE_URL=http://localhost:11434/v1

...

## SMART_LLM - Smart language model (Default: gpt-4-turbo)
SMART_LLM=dolphin-mixtral:8x7b-v2.7-q4_K_M

## FAST_LLM - Fast language model (Default: gpt-3.5-turbo)
FAST_LLM=mistral:latest


Enter fullscreen mode Exit fullscreen mode

This should do the trick.



./autogpt.sh run

value is not a valid enumeration member; permitted: 'text-embedding-ada-002', 'text-embedding-3-small', 'text-embedding-3-large', 'gpt-3.5-turbo-0301', 'gpt-3.5-turbo-0613', 'gpt-3.5-turbo-16k-0613', 'gpt-3.5-turbo-1106', 'gpt-3.5-turbo-0125', 'gpt-3.5-turbo', 'gpt-3.5-turbo-16k', 'gpt-4-0314', 'gpt-4-32k-0314', 'gpt-4-0613', 'gpt-4-32k-0613', 'gpt-4-1106-preview', 'gpt-4-1106-vision-preview', 'gpt-4-0125-preview', 'gpt-4-turbo-2024-04-09', 'gpt-4', 'gpt-4-32k', 'gpt-4-turbo', 'gpt-4-turbo-preview', 'gpt-4-vision-preview' (type=type_error.enum; enum_values=[<OpenAIModelName.EMBEDDING_v2: 'text-embedding-ada-002'>, <OpenAIModelName.EMBEDDING_v3_S: 'text-embedding-3-small'>, <OpenAIModelName.EMBEDDING_v3_L: 'text-embedding-3-large'>, <OpenAIModelName.GPT3_v1: 'gpt-3.5-turbo-0301'>, <OpenAIModelName.GPT3_v2: 'gpt-3.5-turbo-0613'>, <OpenAIModelName.GPT3_v2_16k: 'gpt-3.5-turbo-16k-0613'>, <OpenAIModelName.GPT3_v3: 'gpt-3.5-turbo-1106'>, <OpenAIModelName.GPT3_v4: 'gpt-3.5-turbo-0125'>, <OpenAIModelName.GPT3_ROLLING: 'gpt-3.5-turbo'>, <OpenAIModelName.GPT3_ROLLING_16k: 'gpt-3.5-turbo-16k'>, <OpenAIModelName.GPT4_v1: 'gpt-4-0314'>, <OpenAIModelName.GPT4_v1_32k: 'gpt-4-32k-0314'>, <OpenAIModelName.GPT4_v2: 'gpt-4-0613'>, <OpenAIModelName.GPT4_v2_32k: 'gpt-4-32k-0613'>, <OpenAIModelName.GPT4_v3: 'gpt-4-1106-preview'>, <OpenAIModelName.GPT4_v3_VISION: 'gpt-4-1106-vision-preview'>, <OpenAIModelName.GPT4_v4: 'gpt-4-0125-preview'>, <OpenAIModelName.GPT4_v5: 'gpt-4-turbo-2024-04-09'>, <OpenAIModelName.GPT4_ROLLING: 'gpt-4'>, <OpenAIModelName.GPT4_ROLLING_32k: 'gpt-4-32k'>, <OpenAIModelName.GPT4_TURBO: 'gpt-4-turbo'>, <OpenAIModelName.GPT4_TURBO_PREVIEW: 'gpt-4-turbo-preview'>, <OpenAIModelName.GPT4_VISION: 'gpt-4-vision-preview'>])


Enter fullscreen mode Exit fullscreen mode

Seems like it's not... The model name MUST be a proprietary name like "GPT4-turbo" or any other from the above list. Unfortunately, my models are not named like that.


Now to see If it could go a bit further with a fake (but compliant) model name I set "GPT4-turbo" and ran again.



./autogpt.sh run
2024-05-19 16:03:01,937 ERROR  Invalid OpenAI API key! Please set your OpenAI API key in .env or as an environment variable.
2024-05-19 16:03:01,938 INFO  You can get your key from https://platform.openai.com/account/api-keys


Enter fullscreen mode Exit fullscreen mode

It doesn't like my API key. I've tried many different keys. It won't go further.

Clonclusion on autoGPT

To fix the model names you could create a custom model in Ollama called GPT4-turbo which you would base on any local model you already have. It's just a way to rename your model and trick AutoGPT. But that wouldn't fix the API key error.

Also as mentioned HERE you could maybe duplicate the OpenAi model provider file from AutoGPT and remove any non-compliant parts. But I'm unsure how to perform such an operation.

The documentation doesn't have anything about using local models and doesn't mention calling tools.

In the end, I don't think that AutoGPT is ready for local model use and you should wait and hope the paradigm shifts toward a more local approach.

II. LangChain & LangGraph

Image description

Langchain has been at the core of many projects since the beginning of the AI gold rush. Why it's not the king already is probably because of its complex syntax that many developers don't have the time to learn.

Langchain has a way of using the most obscure Python functionalities and makes you feel like you never have read Python code before.

For instance:



chain = prompt | model | outputparser
chain.invoke("Question.")


Enter fullscreen mode Exit fullscreen mode

The LCEL system of Python uses pipes ("|") to string things up. This is rendered possible in Python by overriding Python's __or__ magic method. in other words, Langchain overrides operators like you would in C++.

But did we really need this kind of idea? I'll let you make up your mind…


Now about using local models:

Langchain has 2 plugins:

  • Ollama chat: Allows you to chat with an LLM
  • Ollama-functions: Allows an LLM to answer using a specific formatting output. For instance, if you want your LLM to answer as JSON or as YAML then you can define the format type, keys, and values type that you expect.

Beware with the "function calling" capability ! It's a troll from OpenAI… Worst feature naming possible! It doesn't calls functions like "using tools" would. It's only about formating the output of the LLM.

=> Now, what about tool calling (aka executing real code locally)?

Well… The Ollama plugin doesn't have this functionality…



@tool
def multiply(first_number: int, second_number: int):
    """Multiplies two numbers together."""
    return first_number * second_number

model = ChatOllama(model="mistral:latest")
model_with_tools = model.bind_tools([multiply]) # <== Binding tool here


Enter fullscreen mode Exit fullscreen mode

Running this will output:



ChatOllama doesn't have a method bind_tools()


Enter fullscreen mode Exit fullscreen mode

And I can confirm it doesn't… So we're F*****.

Conclusion on Langchain & Langraph

I'm a bit disappointed. As this framework powers many others like CrewAI etc I thought that it would have a nice integration with local tools. In the end, it's not that great. It's just a complicated mess that doesn't fix our main concern.

III. Rivet

Image description

I must say that I love this one! It's kind of new but has tremendous potential in the future.

It's some kind of IDE for LLM interactions that uses a canvas to create an execution diagram (DAG). It can run in the browser but you can also export the DAG and run it as code to empower your software.

Image description

Look at this! How cool it looks! There is an Ollama plugin so you can use it locally.

Just make sure to click the 3 dots in the top right and change the executor to "node" otherwise it might not run.

Unfortunately, I didn't find a way to call custom tools and the documentation is quite lacking anyway. It's clearly a project that needs to be watched for further updates!

Conclusion on Rivet

Cool software! Free and open source. Love the canvas system that feels like what LangGraph should have.

Still requires tool calling before being useful. But have fun with it if you have a chatGPT account.

IV. AutoGen

Image description

One of the best candidates on this list. Autogen is backed up by one of the largest tech companies out there.

I have done the tutorial and I must say… I don't understand most of what I'm doing! If the first pages are okay, the situation rapidly gets out of control and then you would need an AI agent framework to explain to you how all of this works.

However, it does have all you need and supports Ollama out of the box:



code_writer_agent = ConversableAgent(
    "code_writer_agent",
    system_message=code_writer_system_message,
    llm_config={"config_list": 
      [{"model": "dolphin-mixtral:8x7b-v2.7-q4_K_M",
      "api_key": "hello world",
      "base_url": "http://127.0.0.1:11434/v1"}]},
    code_execution_config=False,
)


Enter fullscreen mode Exit fullscreen mode

The best functionalities are IMO:

  • Generating code on the fly and executing it
  • Call tools (aka calling your code)
  • Human input

But does tool calling works ?

Still doesn't… Only OpenAI-compatible tools calling LLMs can use this. So Ollama + Mistral won't make the cut. However, the code generation and execution thingy works quite well. Also, note that calling LangChain tools is not supported.


Available chat mechanisms you can use

  • Two chats pattern: Two LLMs speak to each other to complete the task

Image description

  • Sequential chats: Tasks will be evaluated in the order you specified

Image description

This is starting to get complicated. The carryover mechanism which contains the context accumulated over the multiple conversations is a hard concept to grasp. And why is each task still is a conversation between 2 agents ?? And why is it A=>B, A=>C, A=>D, A=>E ? Why always start with A ? God knows.

  • Group chat: Don't expect an explanation...

Image description

This is when things get out of hand! One agent seems to be the brain, installing some kind of hierarchy between the agents. If the concept is appealing, the examples from the documentation are not really helpful.


It also supports the last trending prompting stuff like:

  • ReAct: Allows to decompose actions and make a plan. Then it tries to follow each step and if things go wrong it makes another plan and starts again. It's all about creating context that has a semantic meaning to the LLM and helps it focus on what it should do right now.

  • Reflection: It's kind of like ReAct but with an emphasis on its own output. After "speaking" it will ask itself "Is this correct ?". And it seems that iterating over its own answers yields better results.

As always, "better results" means "fewer hallucinations" as this is the main issue with LLMs.


AutogenStudio

Also if you don't want to mess with code you can download the AutoGenStudio software that allows you to define agents without the need of coding. It's an interesting piece of software but doesn't really help you grasp the core functionality of the framework.

Image description

Conclusion on AutoGen

AutoGen clearly has a bright future in front of it. As it's made by Microsoft, we can only hope that they won't pull the plug on it or make it an OpenAI only software.

However, still, no tool calling is available with local LLMs. :-(

V. CrewAI

Image description

Another excellent piece of software. 

If the documentation is okay and the framework simple it does have a few issues.

On the brightside :

  • Ollama support
  • LangChain tools calling
  • Custom tools calling
  • Human input

On the dark side:

  • Tool calling still isn't working!
  • Human input doesn't always trigger
  • Low consistency with infinite loops
  • Bugs
  • Soooooo many prompts to write

Available chat mechanisms you can use

There are "sequential" and "hierarchical". Sequential will allow your LLMs to go through the tasks in the order you choose. Hierarchical on the other side will create a ghost agent that automatically decides which one of your agents should be triggered using its description.

Hierarchical would be great if only it worked. There are constant errors about agents that can find their co-workers. It rapidly gets tedious.


The Framework proposes 3 types of classes:

First, you have the Agents which have the following prompts bound to them:

  • A role: What it does for a living 
  • A goal: What it should do in the team
  • A backstory: A story of its life…


writer = Agent(
  role='Writer',
  goal='Write a fake anecdote using a number.',
  backstory='An experienced writer with vivid imagination.',
  llm=ollama_mistral,
  verbose=True
)


Enter fullscreen mode Exit fullscreen mode

Then you have the Tasks that also have prompts:

  • description: What should the task do 
  • expected_output: The output that is expected of this task


teacher_task = Task(
  description='Decompose the arithmetic operations.',
  expected_output='A consise list of operation to execute',
  agent=teacher
)


Enter fullscreen mode Exit fullscreen mode

Finally, you have the Tools:

Tools can be bound to Agents to give them capabilities. But for some reason, they can also be bound to tasks… Which I don't think makes much sense.



@tool("sleep")
def my_sleep(nb_seconds: int) -> str:
    """Will sleep the amount of specified seconds provided as a number"""
    print(nb_seconds)
    return time.sleep(nb_seconds)


Enter fullscreen mode Exit fullscreen mode

I like having the @tool decorator. You simply have to pass a string that describes your tool and the LLM should know if it should use it or not.

In the end, you'll have so many prompts to write that you'll lose yourself.

Does this prompt belong to a task or an agent? Does this tool belong to this agent or this one? Or maybe the tool should be bound to the task itself… So many questions but so few answers as the CrewAI documentation is quite scarce!

Conclusion on crewAI

If you wish to have agents speak to each other then it's the simpler framework out there. Besides having to many prompts to write it's quick and easy. However, calling tools don't work so we still have the same issue.

Also, the consistency is quite bad. Often will you see you're agents going into infinite loops.


A note on the constant Youtube AI trend bullshit: Haven't you noticed how many YouTubers have made videos on Agent frameworks. The subject is always about writing about stupid AI trends and making poor RAG systems. Well, that's because there is currently not much you can do as calling local tools isn't a thing right now. Except if you use chatGPT, Grok, or Claude.

VI. Conclusion of all conclusions

We're screwed.


Honestly, it's time that we get a way to better integrate low-cost LLMs in our applications. Calling tools is the way to go but needs a simpler architecture and one that doesn't rely on openAI's complex format.

What use of small models like PHI would we have on our mobile devices if the only thing it can do is spew text and can't integrate any of it with our applications?


If I have made mistakes or overseen anything please let me know in the comments.

Any idea how to get local code executed is good to know. Please advise in the comment section !

Thx for reading. Leave a thumbs up if you liked this article. ❤


Other authors you might like

https://medium.com/@rootOrNothingElse/the-rise-of-human-based-botnets-unconventional-threats-in-cyberspace-cb084b87c5bf

Top comments (6)

Collapse
 
john3_major3_b9b94060c98a profile image
John3 Major3

hello, I do testing on function calling and local llm (hermes+mistral7B, mixtral 8x7B). To be able to use langgraph, I decided to implement a fake mistral cloud (just a flask web server getting langgraph requests, calling the llm and sending back the response). I had to implement mistral API but it s really easy. I get tools from the request and inject them to a custom prompt . I force the LLM to think and reason and giving answers in json as thought+action (a couple of phrases) . It works quite well.

Collapse
 
docteurrs profile image
Emilien Lancelot

Okay, very interesting ! Trying myself with the function calling Mistral model that got released recently + langgraph.
I don't quite get your "mistral cloud" part. Couldn't you use the Mistral python package locally ? What's in the server side exactly ?🤔

Collapse
 
nigel447 profile image
nigel447

really well written and well worth the time to read ++

Collapse
 
docteurrs profile image
Emilien Lancelot

Thx a lot !

Collapse
 
shamrockmuffin profile image
shamrockmuffin

i think you should consider testing llamaindex and cloudllama before saying that llm code calling is a hoax

Collapse
 
docteurrs profile image
Emilien Lancelot

I havven't tested llamaindex because it felt like a cheap version of langchain focused on RAG. Do you find it any good ?

Didn't know about cloudllama. Will check that out.

And as for the title, I meant it as the community should stop lying to us about local models having infinite power of reasonning and frameworks being ready for production. There are really great LLMs out there and great framework ideas but it's not as good as advertised and I felt like people needed a reminder that opensource is not yet there.

When I finally achieve good function calling with an opensource LLM I'll write another article to explain how I did it.

It's all for the community ;-)