In the whirlwind of AI advancements, it's easy to get caught up in the hype. Many companies boast about leveraging AI, often merely as a facade for a basic ChatGPT implementation, making a few calls to their API.
As developers and AI enthusiasts, we need to ask therefore: what truly adds real value to a company? Let’s climb the AI Application Value Ladder 🤖🪜, a mental framework where we balance implementation difficulty against a company's unique selling point (USP).
Value Level 1: Custom Instructions & Prompt Engineering
Difficulty: Easy
Value: Low
Team Required:
- Domain Experts: Junior, Intermediate or Senior
- Software Developers: None
- ML Developers: None
- Eval QA: Junior, Intermediate or Senior
At this initial level, we focus on customizing AI models to access proprietary data or mimic specific personalities. This is basic and straightforward, often involving system prompts via GUIs or APIs and ChatGPT custom instructions. While valuable for specific purposes, its overall impact is limited.
Value Level 2: Function Calling
Difficulty: Medium - Hard
Value: Medium - High - Very High
Team Required:
- Domain Experts: Junior, Intermediate or Senior
- Software Developers: Junior, Intermediate or Senior
- ML Developers: None
- Eval QA: Junior, Intermediate or Senior
Here, AI models execute software actions predefined by human programmers. This step involves bridging structured software functionality with the more vague data handling of large language models (LLMs). It's a significant step up in both complexity and value.
For more information, I have a whole blog post here on Function Calling.
Value Level 3: Basic RAG (Retrieval Augmented Generation)
Difficulty: Easy - Medium
Value: Low - High
Team Required:
- Domain Experts: Junior, Intermediate or Senior
- Software Developers: Junior, Intermediate or Senior
- ML Developers: None
- Eval QA: Junior, Intermediate or Senior
Basic RAG is employed when an AI model through semantic search retrieve proprietary data or context (information that the base model doesn't know), which is stored in a so called vector database.
It helps reduce hallucinations (inaccurate or fictional outputs) and examples include the ARC AI Portal - an internal app my company made where after corporate conventions one could, in near-real-time, ask questions about what was said by speakers at the convention.
However, it's complex, unpredictable, and rather hacky as it's not genuinely machine learning-based; we're not actually teaching a model how to do something.
Value Level 4: Advanced RAG
Difficulty: Hard - Very Hard
Value: High - Very High
Team Required:
- Domain Experts: Junior, Intermediate or Senior
- Software Developers: Intermediate or Senior
- ML Developers: None
- Eval QA: Intermediate or Senior
Advanced RAG steps up the complexity with summary queries, re-ranking, and multi-step RAG pipelines, like those used in the data framework library Llamaindex. While offering high value, it's expensive, notoriously tricky to get right, slow, and still not a true ML application.
Value Level 5: Fine-tuning
Difficulty: Very Hard
Value: High - Very High
Team Required:
- Domain Experts: Junior, Intermediate or Senior
- Software Developers: Junior, Intermediate or Senior
- ML Developers: Junior, Intermediate or Senior
- Eval QA: Junior, Intermediate or Senior
Used in actual ML applications, fine-tuning is key for giving an AI model unique abilities or styles. OpenAI's Function Calling behaviour itself is a good example how a model can learn to use different tools effectively through fine-tuning.
This process is less about accessing proprietary data (as in RAG) and more about training the model in a specific manner. In contrast to levels 2, 3, and 4 which can be achieved by programming, this level requires machine learning knowledge and the skills to gather and clean high-quality datasets.
Value Level 6: ML/Programmer Multi-Model Hybrid
Difficulty: Hardest
Value: Highest
Team Required:
- Domain Experts: Junior, Intermediate or Senior
- Software Developers: Intermediate or Senior
- ML Developers: Intermediate or Senior
- Eval QA: Intermediate or Senior
The pinnacle of the AI Application Value Ladder 🤖🪜 involves creating multi-model AI systems, combining the previous levels. This method integrates various models of different sizes and merges software with ML development, leading to advanced, performant, and cost-efficient systems.
An example is Builder.io's translation of Figma designs into code. Rather than relying solely on the more expensive and slower ChatGPT 4, they effectively segmented their challenges, applying fine-tuned, smaller, and faster models for each, in combination with RAG and regular programming.
Conclusion
The AI Application Value Ladder 🤖🪜 serves as a guide to understanding the varied levels of value creation in AI development. It outlines how each step, from basic prompt engineering to complex multi-model systems, contributes differently to a company's AI capabilities.
As the field of AI continues to evolve rapidly, embracing agents and multi-sense models, having a general framework like the 🤖🪜 is crucial. It helps in discerning which innovations truly advance our capabilities, ensuring we stay ahead in a landscape of constant change.
Dawid Dahl is a full-stack developer at UMAIN | ARC. In his free time, he enjoys philosophy, analog synthesizers, consciousness, techno, Huayan and Madhyamika Prasangika, and being with friends and family.
Top comments (3)
Incredible read as always Dawid!
This is great information. I myself am working on 4 and 5. Looking to move to 6.
So happy to hear you found it valuable, @akshayballal! 🙂🙏🏻