🔪 6 Killer Open-Source Libraries to Achieve AI Mastery in 2024 🔥🪄

#webdev #programming #opensource #ai

TL;DR

AI has traditionally been a very difficult field for web developers to break into... until now 😌 With the introduction of large language models (LLMs) like ChatGPT, it seems like nowadays anyone can become an AI engineer. But make no mistake, this cannot be further from the truth.

In this article, I will reveal the current top AI libraries that makes a mediocre AI engineer exceptional. As an ex-Google, ex-Microsoft AI engineer myself, I will show you how exceptional AI engineers use these libraries to build great applications.

Are you ready to up-skill yourself and be one step closer to becoming an AI wizard before 2024? Lets begin 🤗

1. DeepEval - Open-source Evaluation Infrastructure for LLMs

A good engineer can build, but an exceptional engineer can communicate the value of what they're built. DeepEval allows you to do exactly that.

DeepEval allows you to unit test and debug your large language model (LLM, or just AI) applications at scale in both development and production in under 10 lines of code.

Why is this valuable you ask? Because companies nowadays want to be seen as an innovative AI company and so stakeholders prefer engineers that can not just build like an indie hacker, but know how to ship reliable AI applications like a seasonal AI specialist.**

import pytest
from deepeval import assert_test
from deepeval.test_case import LLMTestCase
from deepeval.metrics import AnswerRelevancyMetric
import chatbot

def test_chatbot():
   input = "How to become an AI engineer in 2024?"
   test_case = LLMTestCase(input=input, actual_output=chatbot(input))
   answer_relevancy_metric = AnswerRelevancyMetric()
   assert_test(test_case, [answer_relevancy_metric])

🌟 Star DeepEval on GitHub

2. Unstructured - Pre-processing for Unstructured Data

LLMs thrive because they are versatile and can handle a large variety of inputs, but not all. Unstructured helps you easily transform unstructured data like webpages, PDFs, tables into readable formats for LLMs.

What does this mean? This means you can now enable your AI application to be customized on your internal documents. Unstructured is amazing because it in my opinion, operates at the right level of abstraction - it gives the boring hard work while giving you enough control as a developer.

from unstructured.partition.auto import partition

elements = partition(filename="example-docs/eml/fake-email.eml")
print("\n\n".join([str(el) for el in elements]))

🌟 Star Unstructured

3. Airbyte - Data Integration for LLMs

Connect data sources, move data around, basically most of what you need to build a real-time AI application, using Airbyte. Allows your LLMs to be connected to information outside of the data it was trained on.

Alike Unstructured, Airbyte provides a great level of abstraction over the work an AI engineer does.

🌟 Star Airbyte

4. Qdrant - Fast Vector Search for LLMs

Ever wondered what happens if you feed in too much data to ChatGPT? That's right, you'll encounter a context overflow error.

That's because LLMs cannot take in infinite information. To help with that, we need a way to only feed in relevant information. And this process, is known as retrieval augmented generation (RAG). Here's another great article on what RAG is.

Qdrant is a vector database that helps you do just that. It stores and retrieve relevant information at blazing fast speed, ensuring your application stays up to date with the real world.

🌟 Star Qdrant

5. MemGPT - Memory Management for LLMs

So Qdrant helps give LLMs "long-term memory", but what happens if there's too much to "remember"? MemGPT helps you manage memory for this exact use case.

MemGPT is like a cache for vector databases, with its own proprietary way to clearing caches. It helps you manage redundant information in your knowledge bases, making your AI application more performant and accurate.

🌟 Star MemGPT

6. LiteLLM - LLM proxy

LiteLLM is a proxy for multiple LLMs. It is great for experimentation and combined with DeepEval, allows you to pick the best model for your use case. The best part? it allows you to use any model it supports in the same OpenAI interface.

from litellm import completion
import os

## set ENV variables 
os.environ["OPENAI_API_KEY"] = "your-openai-key" 

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

🌟 Star LiteLLM

Closing Remarks

That's all folks, thanks for reading and I'd hope you learned a few things along the way!

Please like and comment if enjoyed this article, and as always, don't forget to give open-source some love by starring their repos as a token of appreciation 🌟.

Top comments (23)

Matija Sosic • Dec 11 '23

Great list! I agree, it's so hard to choose just 5. I'd add usemage.ai/ -> it's not a library per se, but if you want to generate a full React/Node.js app from a short description, this is the best tool out there (and it's free, no OpenAI key required!)

Keep up the great work :)