▶️▶️ Read this article on Medium.◀️◀️
I. The Leak That Sparked Open Source Innovation
Like most of us, I started my AI journey with a simple llama.cpp
wrapper. It was a simple chat loop with no memory between messages but… IT SPOKE!
And that it did very well. This was an exciting perspective that pulled us back in time. In 2008 exactly. At a time when the first IronMan aired in theaters and speaking AI was to become one of the most trending things in movies.
Since then we have had many speaking "smart" assistants on smartphones. Even though it was fun, I must have used it no more than 10 minutes in total during the 9 years I owned my Galaxy-S4.
The human AI…But speaking AIs continued spawning, each time getting us closer to a functional Jarvis. But the more we would come close to it, the more it fainted. Showing limitations that killed the experience quicker than it took us to deactivate the power-hungry functionality. No Google Home or Alexa would save us from dumb AIs.
But then it happened!
ChatGPT came out of nowhere and a new spark was created. A shimmering light showed us a glimpse of hope. Hope that we didn't know was to be a new revolution, a gift we weren't ready to receive.
But only a spark it was. And to get us all back in the game we needed a fire!
Hopefully, someone took the responsibility and lit the blaze.
Some guy, which we still don't know anything about, leaked the first Large Language Model (LLM) on 4Chan. It was Facebook's, for which I won't cry as they have fed on us for far too long. This long-coming reckoning was to be the spark that lit the entire forest inside developers! We could boot an LLM directly on our computers! We would speak to it, and it responded. And damn it was good. At the same time, image generation was on the loose, too, with Midjourney and the first open source models being released every week. Not one day had passed since a new web UI was created. Not one day had passed without one new model iteration being revealed.
The Jarvis dream was back, and this time, it was real!
II. I tested them all…
To use an LLM you need 3 things:
- An LLM
- An inference server to run the above LLM
- A way to interact with the above inference server
As an inference server, I started with LMstudio which had the good taste of displaying the amount of RAM needed to run the model. A fitting functionality that showed once more that successful tech is driven by small details.
I then switched to Ollama because, as a developer, I'm more into command lines. Also, the performance was a bit better, and layers would be loaded in RAM by the software itself. Less to configure yourself is always better as you don't have infinite time to deal with the ever-changing landscape of AI.
Both LMstudio and Ollama had chat integration but what to do with it?
The Jarvis ideal could only be achieved if we had a way to programmatically interact with the LLM.
CrewAI
Many of us started there. The notion of Agents embodying one instance of an LLM had us started with CrewAI for one reason. A reason that existed since the dawn of time!
It was EASY!
A simple and straightforward definition of an agent
CrewAI had this straightforward agent definition that allowed developers to start prototyping quickly. It mostly used multi-turn chat, which let us believe that LLMs speaking with themselves would solve all our problems, including world hunger.
Unfortunately, making them speak was not an issue. But getting things done was a whole other fight, and CrewAI just didn't deliver. Prompts were too complex for our small 8B LLMs that had reasoning troubles. I guess that plugging CrewAI with a frontier model like ChatGPT yielded better results, but the open source community couldn't use CrewAI as performance was just too bad.
LangGraph
LangGraph reinvented how to deal with agents.
It was said that "compiling a graph of predicted prompts would help guide the LLM toward truthness" and was the way to go with our dumb open source LLMs. And they were right! At least the conceptual idea was good.
Example of a compiled graph
It shed light on how bad the performances of open source LLMs were and how much guidance they needed to actually achieve computable results.
But the issue with LangGraph was the sheer complexity of it. Also, the bad documentation didn't help. It was outdated, missed parts, and had many wrong code snippets.
On the graph side, the hardcoded links lacked flexibility as software development needs room to evolve during its creation cycle!
Modifying the graph representation of possible interactions was a lot of work, and prompt engineering pseudo-science made the whole thing exhausting. Programming doesn't like hard links between things. Programming is about iterating a lot and having things stitched to each other will inevitably slow you down.
In the end, I raged-quit LangChain as it was too difficult to learn and the documentation was too unhelpful.
The Tool Calling disaster
Things started to look bad when we discovered that chatting programmatically with an LLM was cool but didn't achieve anything. To create something with our LLM it had to interact with the exterior world. Not just write poems or bad jokes.
This is why "tool calling" was created. It allowed LLMs to call 'classic' programming functions. For instance, it could get the weather of a city by calling a real weather API or accessing files on your computer. The possibilities were endless.
This was the start of something awesome and we all went looking for a way to make this real.
The tool-calling mechanism was first available in chatGPT. It had this silly name of "function calling" that made us all believe that chatGPT could magically call a programming function.
We were wrong. 'Function calling' was merely a way of outputting text in a standardized way. The way in question was JSON.
The LLM would produce a known JSON architecture that was parsed by the software running the LLM. Recognizing the pattern it would call the given function with the right parameters.
OpenAI's "function calling" JSON to trigger a tool call
The idea was good. But the JSON architecture was not. When tried with small 8B models it failed to output the JSON correctly hence making the tool calling impossible. 😭
▶️ Therefore, the community was stuck with text-spewing LLMs that couldn't enter the tool-calling league of frontier models.
Following this, new models came with specific training on tool calling to help them output the correct JSON hence splitting models into two categories. The one with tool-calling abilities and the one without.
The one without became of no interest and the one that could, still didn't have this good of a success rate. Better than basic LLMs but not that great... Calling a function was still a complex endeavor.
CrewAI was incapable of calling any tools. Making it simple but useless.
LangGraph on the other hand could output the JSON but not call the function by itself. It would have been your job to parse the JSON and call it yourself. Also, for opensource models, it only worked with Ollama but only had partial support because… Why care about opensource ?
In my quest to find the best framework I tested many others: Autogen, BabyAGI, AutoGPT, Langoid, etc. None that had a correct Agent implementation and tool-calling support for local open source LLMs. None.
My very own Jarvis dream was about to go to rest once again as I would never plug anything into chatGPT. As a strong open source believer and even though I love chatGPT I'm not letting it access my emails, my domotic, my life… It must be open source or nothing. And nothing it was…
III. The solution
On the blink of dispair some unknown tool came up. It took the form of a llama in a deep dark sky. A constellation of small sparkles that may light the path ahead after all!
Yacana
- Github: https://github.com/rememberSoftwares/yacana
- Read The Docs: https://remembersoftwares.github.io/yacana
I. Installation
All it took was a simple pip install:
pip install yacana
II. Agents
The 'Agents' concept is the human way of speaking to LLMs. We have anthropomorphized it so that it feels more natural. Yacana understands that and allows creating Agents with a name (and an optional system_prompt to set their behavior)
⚠️Just a heads-up: for now, Yacana only supports Ollama. If you don't have it installed already, it's only one command away.
from yacana import Agent
agent1 = Agent("Experience book writter", "llama3.1:8b")
The second parameter is the LLM model name used by Ollama.
III. Tasks
Chaining prompts together was an idea of LangGraph. Even though no graphs are available in Yacana we do get the same feeling of linking prompts together but not hard linking them!
We can soft link tasks with something we, as developers, have done all our lives: programming! No fancy web UIs needed! To chain two tasks together you simply have to call them one after the other. This might look silly right now, but you'll soon see why it's not.
from yacana import Task
task1 = Task("Write a few lines about the deep blue sky", agent1)
task2 = Task("Change the subject to 'ocean'", agent1)
task1.solve()
task2.solve()
$ python writer.py
Output of our two tasks that the agent solved
Yacana shows prompts in green and the LLM answer in purple. This makes tracking conversations a bit easier.
Let's do some basic programming!
So, Yacana doesn't provide graphs. But you don't need graphs! You've got something way better! Object Oriented Programming (OOP)!
We'll start by making an Animal base class:
class Animal:
def __init__(self, name):
self.name = name
def make_sound(self):
raise NotImplementedError("Subclass must implement abstract method")
This will be the parent class for the next two classes Cat
and Dog
:
# Subclass Cat
class Cat(Animal):
def make_sound(self):
return f"{self.name} says: Meow!"
# Subclass Dog
class Dog(Animal):
def make_sound(self):
return f"{self.name} says: Woof!"
Instantiating our pets:
cat = Cat("Whiskers")
dog = Dog("Buddy")
print(cat.make_sound())
print(dog.make_sound())
🔼 Output of this very complexe program...
Now let's change our hardcoded pets to AI cyber pets!
CyberCat and CyberDog playing Doom via Bluetooth
from yacana import Agent, Task
# Parent class
class Animal:
def __init__(self, name, system_prompt):
self.agent: Agent = Agent(name, "llama3:8b", system_prompt=system_prompt)
def make_sound(self):
raise NotImplementedError("Subclass must implement abstract method")
# Subclass Cat
class CyberCat(Animal):
def make_sound(self):
task = Task("Do a cat noise", self.agent).solve()
return task.content
# Subclass Dog
class CyberDog(Animal):
def make_sound(self):
task = Task("Do a dog noise", self.agent).solve()
return task.content
# Example usage
cyber_cat = CyberCat("Whiskers", "You are a cat called Whiskers")
cyber_dog = CyberDog("Buddy", "You are a dog called Buddy")
print(cyber_cat.make_sound())
print(cyber_dog.make_sound())
We now have
- An Agent defined in the
Animal
class. - Two Cyber AI pets
CyberCat
andCyberDog
that each take a name and asystem_prompt
. - The
.make_soud()
method now has the task of generating a sound based on thesytem_prompt
representing itself.
ℹ️ The default logging can be deactivated to get a cleaner output.
Tool calling
What's great with cyber pets is that they can buy their kibble themselves!
Let's make a tool that simulates calling a web API to order more kibble.
def order_kibble(animal: str, weight_grams: int) -> str:
# Validation
if animal.lower() != "cat" and animal.lower() != "dog":
raise ToolError("Animal parameter can only be one word which describes your class like 'cat' or 'dog'.")
if not isinstance(weight_grams, int):
raise ToolError("Parameter weight_grams mut be an integer.")
# Making a fake call to https://order-kibble.com
# Fake API return
if weight_grams > 500:
return "To much kibble was ordered for one day. Please order less or you'll become a fat pet..."
else:
return f"Successfully ordered {weight_grams} grams of {animal.lower()} kibble on order-kibble.com!"
The important parts of this function are:
- The function name will be used by Yacana so choose something meaningful.
- Duck typing the arguments: Yacana will use this information to infer the types of each parameter for the function call.
- Implementing validation and raising meaningful errors: Same as classic server-side validation we cannot trust the LLM to send us valid data. You should at least check for types. If something is wrong raise a meaningful ToolError(...) exception message that the LLM will use to try and call the tool again!
- If it's a success then return a meaningful string relevant to the tool call. Note that the final result of the Task will be the Tool return value.
Creating a Tool from a function is as simple as choosing a tool name, description, and function reference:
Tool("Kibble_order",
"Takes your animal type (cat, doc, etc) and the quantity of kibble to order as input and places an order.",
order_kibble)
Now let's add a .run()
method to our Animal
class. Each time it's called it loses 100 calories (max 200) and when reaching 0 calories the pet orders a kibble refill by itself. Good boy!
def run(self):
# Running!
running_task_output: str = Task("Do running noises.", self.agent).solve().content
# Losing calories
self.calorie -= 100
if self.calorie <= 0:
# Need to eat! Let's order some kibble!
tool_output: str = Task("You are hungry. Order kibble.", self.agent, tools=[self.kibble_order_tool]).solve().content
print(tool_output)
self.calorie = 200
return running_task_output
Full code below before making our pets run:
from yacana import Agent, Task, Tool, ToolError
def order_kibble(animal: str, weight_grams: int) -> str:
# Validation
if animal.lower() != "cat" and animal.lower() != "dog":
raise ToolError("Animal parameter can only be one word which describes your class like 'cat' or 'dog'.")
if not isinstance(weight_grams, int):
raise ToolError("Parameter weight_grams mut be an integer.")
# Making a call to https://order-kibble.com
# Fake API return
if weight_grams > 500:
return "To much kibble was ordered for one day. Please order less or you'll become a fat pet..."
else:
return f"Successfully ordered {weight_grams} grams of {animal.lower()} kibble on order-kibble.com!"
# Parent class
class Animal:
def __init__(self, name, system_prompt):
self.agent: Agent = Agent(name, "llama3:8b", system_prompt=system_prompt)
self.kibble_order_tool: Tool = Tool("Kibble_order", "Takes your animal type (cat, doc, etc) and the quantity of kibble to order as input and places an order.", order_kibble)
self.calorie = 200
def make_sound(self):
raise NotImplementedError("Subclass must implement abstract method")
def run(self):
# Running!
running_task_output: str = Task("Do running noises.", self.agent).solve().content
# Losing calories
self.calorie -= 100
if self.calorie <= 0:
# Need to eat! Let's order some kibble!
tool_output: str = Task("You are hungry. Order kibble.", self.agent, tools=[self.kibble_order_tool]).solve().content
print(tool_output)
self.calorie = 200
return running_task_output
# Subclass Cat
class CyberCat(Animal):
def make_sound(self):
task = Task("Do a cat noise", self.agent).solve()
return task.content
# Subclass Dog
class CyberDog(Animal):
def make_sound(self):
task = Task("Do a dog noise", self.agent).solve()
return task.content
Let's make our dog run twice so that its calories reach 0 and it needs to refill!
cyber_dog = CyberDog("Buddy", "You are a dog")
cyber_dog.run()
cyber_dog.run()
Let's break down the output:
🔼 The dog ran twice as expected.
🔼 As the calories reached 0 it triggered the tool call to order kibble. Here you can witness Yacana's Tool-calling magic! But something went wrong 😨! Can you spot the issue??
…
The weight_grams
value was sent as a string : {"annimal": "dog", "weight_grams": "500"}
Although, it was ducked typed as an int!
🔼 Lastly, we can see that Yacana got our raise() message about the parameter type and retried calling the tool.
The LLM took into account the new information and this time, weight_grams was sent as an integer and the tool was called successfully!
Our cyber dog placed its order! 🐶
In the end, Yacana does not provide a static graph. However, it does integrate quite nicely into standard development practices. From Yacana's only POV this is how it is conceptually created:
Graph nodes version of Yacana
IV. Going further
We'll wrap up here for this introduction to Yacana. However, there are many more functionalities to explore and many use cases other than dystopic cyber pets to imagine. From routing to multi-turn chat and history management, the framework has many things to propose.
Consider giving a like to this article and following me on Medium and dev.to so you won't miss the coming series of articles about Yacana.
Have fun with the framework and read the doc: https://remembersoftwares.github.io/yacana
Top comments (0)