TL;DR
AI has traditionally been a very difficult field for web developers to break into... until now π With the introduction of large language models (LLMs) like ChatGPT, it seems like nowadays anyone can become an AI engineer. But make no mistake, this cannot be further from the truth.
In this article, I will reveal the current top AI libraries that makes a mediocre AI engineer exceptional. As an ex-Google, ex-Microsoft AI engineer myself, I will show you how exceptional AI engineers use these libraries to build great applications.
Are you ready to up-skill yourself and be one step closer to becoming an AI wizard before 2024? Lets begin π€
1. DeepEval - Open-source Evaluation Infrastructure for LLMs
A good engineer can build, but an exceptional engineer can communicate the value of what they're built. DeepEval allows you to do exactly that.
DeepEval allows you to unit test and debug your large language model (LLM, or just AI) applications at scale in both development and production in under 10 lines of code.
Why is this valuable you ask? Because companies nowadays want to be seen as an innovative AI company and so stakeholders prefer engineers that can not just build like an indie hacker, but know how to ship reliable AI applications like a seasonal AI specialist.**
import pytest
from deepeval import assert_test
from deepeval.test_case import LLMTestCase
from deepeval.metrics import AnswerRelevancyMetric
import chatbot
def test_chatbot():
input = "How to become an AI engineer in 2024?"
test_case = LLMTestCase(input=input, actual_output=chatbot(input))
answer_relevancy_metric = AnswerRelevancyMetric()
assert_test(test_case, [answer_relevancy_metric])
2. Unstructured - Pre-processing for Unstructured Data
LLMs thrive because they are versatile and can handle a large variety of inputs, but not all. Unstructured helps you easily transform unstructured data like webpages, PDFs, tables into readable formats for LLMs.
What does this mean? This means you can now enable your AI application to be customized on your internal documents. Unstructured is amazing because it in my opinion, operates at the right level of abstraction - it gives the boring hard work while giving you enough control as a developer.
from unstructured.partition.auto import partition
elements = partition(filename="example-docs/eml/fake-email.eml")
print("\n\n".join([str(el) for el in elements]))
3. Airbyte - Data Integration for LLMs
Connect data sources, move data around, basically most of what you need to build a real-time AI application, using Airbyte. Allows your LLMs to be connected to information outside of the data it was trained on.
Alike Unstructured, Airbyte provides a great level of abstraction over the work an AI engineer does.
4. Qdrant - Fast Vector Search for LLMs
Ever wondered what happens if you feed in too much data to ChatGPT? That's right, you'll encounter a context overflow error.
That's because LLMs cannot take in infinite information. To help with that, we need a way to only feed in relevant information. And this process, is known as retrieval augmented generation (RAG). Here's another great article on what RAG is.
Qdrant is a vector database that helps you do just that. It stores and retrieve relevant information at blazing fast speed, ensuring your application stays up to date with the real world.
5. MemGPT - Memory Management for LLMs
So Qdrant helps give LLMs "long-term memory", but what happens if there's too much to "remember"? MemGPT helps you manage memory for this exact use case.
MemGPT is like a cache for vector databases, with its own proprietary way to clearing caches. It helps you manage redundant information in your knowledge bases, making your AI application more performant and accurate.
6. LiteLLM - LLM proxy
LiteLLM is a proxy for multiple LLMs. It is great for experimentation and combined with DeepEval, allows you to pick the best model for your use case. The best part? it allows you to use any model it supports in the same OpenAI interface.
from litellm import completion
import os
## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-openai-key"
messages = [{ "content": "Hello, how are you?","role": "user"}]
# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)
Closing Remarks
That's all folks, thanks for reading and I'd hope you learned a few things along the way!
Please like and comment if enjoyed this article, and as always, don't forget to give open-source some love by starring their repos as a token of appreciation π.
Top comments (23)
Great list! I agree, it's so hard to choose just 5. I'd add usemage.ai/ -> it's not a library per se, but if you want to generate a full React/Node.js app from a short description, this is the best tool out there (and it's free, no OpenAI key required!)
Keep up the great work :)
Great addition!
Many more to go.
aiconfig - github.com/lastmile-ai/aiconfig
weaviate - github.com/weaviate/weaviate
chroma - github.com/chroma-core/chroma
haystack - github.com/deepset-ai/haystack
txtai - github.com/neuml/txtai
Excluded for a reason :) Quality over quantity
Great stuff. Coding with LLMs is just getting started and people need help finding all the great tools.
Would also check out CopilotKit - React library for building in-app chatbots & Textareas.
github.com/CopilotKit/CopilotKit
Nice work!
Here's an OpenSource project, helpful in running ML models in the background.
Get Job Execution Reminders β° via Webhook using WebhookPlan
View on GitHub
Seems cool!
Another killer! Thanks for your spam post!
Ur welcome!
did you inspire from Devs Killer website maybe?
devskiller.com/
MemGPT being on this list is awesome. It's a nice little cache for your "Vector Databases."
I see what you did there!
π
I love your banner! Really cool list thanks for sharing!
Thank you, glad you liked it!
I am using DeepEval for almost all of my AI projects and so far I love it! Honestly love the platform and the intuitive design
Great list. β¨
Anytime :)
Nice list to have some motivation to try new things!
Glad you liked it!