Large Language Models are a unique emergent technology of 2024. Its fascinating capabilities to produce coherent text influences many areas and use cases, especially pushing the boundaries of classic natural language tasks.
Over the course of this year, I explored LLMs in various aspects: Their rapid evolution from 2019 until today, their applicability to NLP tasks, the various options to run LLMs locally, fine-tuning, and finally agent-based systems. This exploration was driven by one goal: Designing a universal personal assistant.
This blog article formulates core requirements, then distills my LLM experience into the most promising solution architectures, and projects the next step to implement a personal assistant.
This article originally appeared at my blog admantium.com.
Universal Personal Assistant Requirements
The core requirements of a personal assistant are as follows:
- Local interference: The LLM(s) must run locally to ensure privacy of all information included in prompts generated for LLM interference, the answers generated by the LLMs, and the complete history of all interactions.
- Access to local data: Data sources for the LLMs can be in a local network. To follow the privacy concern of the first requirement, this data needs to be made available to the LLM and stay local during processing.
- Access to online data: The LLM should have access to any other online data source too, including the ability to query search engines and use APIs.
- Data-source agnostic: Any data source can yield a different data format: plain text files, structured formats like markdown or XML, and even binary files. The LLM needs skills to understand the data formats in their native form, or it needs tools that restructure or parse formats into a more suitable form.
Personal Assistance Design Approaches
In the blog article Question-Answer System Architectures using LLMs, I envisioned six different approaches how an LLMs' capability can be improved. With the gained understanding and experience, the approaches show a clear applicability.
- Question-Answering/Linguistic Finetuning: Gen1 and Gen2 LLMs could reliably produce spans of texts based on their consumed training data. For them to possess linguistic skills like interference, reasoning and question-answering, explicit fine-tuning was essential. Although the LLMs could be tailored towards their application areas, they could not generalize to new domains
- Domain-Embedding: The idea to use an external vector-based representation for an LLM and using similarity search on this data before formulating an answer evolved to the dominant retrieval-augmented generation technique.
- Instruction Fine-Tuning: To improve the LLMs capability and adaptability to new, unforeseen tasks, additional training with instruction data set provided a breaking point. With Gen3 and especially with Gen4, it could be observed that these models could complete tasks that they were not trained for. The quest for general task solving capabilities was opened.
- Prompt Engineering: Given the vast amount of training material since Gen3, an LLMs text generation capability can be influenced greatly with a specifically crafted prompt. A prompt is a primer, it sets the context, constraints, and type of language used. A big however must follow: prompts do not program a LLM to a fixed behavior, actually produced text differs from invocation to invocation. One could also say that prompt engineering is the remedy to instruction fine-tuning, an essentiality to reduce the confusion of its skills when generating texts.
- Retrieval Augmented Generation and Agents: Although Gen4 models are trained on massive amounts of data and are capable task solvers, they can only produce texts for content that they consumed during training. Essentially, the timestamp of the training data terminates the imbued world knowledge. Retrieval Augmented Generation counters this limit. By constructing a prompt with relevant content for a given task, the LLM can work with new facts. This idea evolved to and is generalized with Agents. An LLM agent is the combination of prompts, tools, and a history of past interaction. This enables powerful Gen4 LLMs with a large context window to continuously work on a task.
Summarizing all of these approaches, reflecting the experiences made, the essence is this: All Gen4 LLMs are task-agnostic, instruction following text generators. With a carefully crafted prompt, and injected with the relevant content via RAG, they can solve tasks for most domains. Finally, by using LLMs as agents, access to any data source via tools as well as the history of past interactions is accessible.
The most promising architecture to design a personal assistant is to use a agent framework. Their built-in functions to manage prompts, the declaration of tools including RAG, and access to a continued history can fulfill all stated requirements.
Tracking LLM & Tool Evolution
In 2024, a steady influx of new LLMs, tools, and frameworks could be seen. The following sections briefly update what was covered in this blog beforehand, thereby shaping concrete next steps.
LLM
Following the designation of parameter count as a measure for LLM capability, these new models were published:
- 3B: Gemma2, phi3.5
- 7B: LLama3.1, LLama3.2
- 13B+: Command-R, Gemma 2, Mixtral
- Commercial Models: OpenAI GPT-4o, claude-3-5-sonnet, Gemini 1.5
Agent Frameworks
The most promising agent frameworks from my research gained updates, and a new framework was published that promises great flexibility.
- AutoGen: Improved and novel types for communication patterns in agent systems are available, termed conversation programming and finite state machines. Also, the AgentEval framework has been integrated into autogen, providing an integrated method for self-improving LLM responses with a given task specification
- CrewAi: The new template method provides a simplified and reusable way to configure a project. In essence, it creates a default directory structure with custom YAML files that contain the agents and tasks definition separate from the actual Python code. And this makes the code much more and better readable. The second novel feature is the integration of a new set of built-in tasks from the crewAI-tools project. Finally, crews can now also be trained and a new planner declaration supports task execution by an upfront step.
- AgentZero: A new framework that strikes a balance between integrated features and open configuration. At the core, a robust agent execution engine with integrated function execution support. Essentially, it routes functions calls directly to Docker containers and returns the result to the LLM. This is a promising feature when custom data sources need to be targeted.
Tools
LLM answer quality directly relates to its given prompts, and therefore, effective prompt engineering is necessary. The landscape of prompt managing platforms and libraries increased manifold. Some tools now actively incorporate specific tweaks of the most recent commercial models, enabling the formulation of prompts that are injected with model-specific formulations. Example libraries are dspy, LMQL, Outlines, and Prompttools,
On the other hand, LLMs can be tricked by sophisticated prompts into revealing their training data or generating inappropriate texts. This danger, especially harmful when the access to an LLM is public, emphasizes the importance of careful prompt and LLM answer moderation. Libraries that tackle this challenge are Guardrails and Guidance, and likewise, LLM invocation frameworks add functions to manage prompts more effectively.
Specialized projects that facilitate automatic document indexing and LLM invocation with the document content are gaining traction, for example PrivateGPT, QAnything, and LazyLLM. Another novelty is the integration of LLMs into applications and tools: The Semantic Kernel project aims to integrate LLM invocation during programming and inside the code itself.
Outlook
In the next articles, I will implement and evaluate concrete agent frameworks or a combination of individual tools to design a personal agent.
Considering the requirements and given the fact that LLM invocation inherently generates random text, a crucial question is the evaluation of capabilities. And yet, as the manifold LLM benchmarks show, there is not one single universal answer. Therefore, I will evaluate the agent on two very specific uses cases:
- Reading comprehension: I have a staple of eBooks from the science fiction settings Battletech. The assistant should answer questions about their content, including persons, places, and individual plot details.
- Sensor database: In my apartment, several sensors track temperature and humidity. All data is measured in continuous intervals, then stored in a timeseries database. The assistant should access this data and answer questions like averaging the measured temperature at a specific place over a period of time.
Conclusion
To design a custom personal assistant with the help of LLMs is a challenging task. This article reflected my learnings in 2024, it distilled experiences gained from exploring Gen1 to Gen4 models, from understanding and applying fine-tuning, to using RAG and agents frameworks. The essential requirements are these: running locally, with access to local data sources, consuming different types of data sources. And the most viable option to implement such an assistant is as a flexible agent system, running locally, and with custom tools that provide access and transformation for data sources. In the next articles, the concrete implementation of a personal assistant using three different technology stacks will be tested.
Top comments (0)