First 15 Open Source Advent projects

#opensource #llm #tutorial #programming

Just 10 days to go!

We launched Open Source Advent at the begininng of this month to celebrate 25 different open source projects. It has been fun sharing these projects and I thought I would reshare the first 15 projects! Take a look at the repo, try the tutorial and let us know what you build!

Naturally, everyone who worked on these Open-Source projects would love a little Christmas 🎄💕 love by getting a GitHub star for their projects.

1. Milvus by Zilliz | Github

Milvus is an open-source vector database that powers embedding similarity search and AI applications and strives to make vector search accessible to every organization. Milvus can store, index, and manage a billion+ embedding vectors generated by deep neural networks and other machine learning (ML) models. It is the project we all work on here at Zilliz, so, of course it is on the list. 😇

2. FiftyOne by Voxel51 | Github | tutorial

FiftyOne is the open source toolkit for building high-quality datasets and computer vision models. With FiftyOne you can visualize, curate, manage, and QA data, and automate the workflows that make enterprise machine learning possible. They spoke at the last Unstructured Meetup and you can check out the recording here (29:10 - Speaker Jacob Marks, Vector search with computer vision data using Voxel51)

3. Quivr | GitHub | tutorial

Quivr is your personal productivity assistant to chat with your dumped files (PDF, CSV) & apps using GPT 3.5 / 4 turbo, Private, Anthropic, VertexAI, LLMs that you can share with users! Alternative to OpenAI GPTs.

4. Haystack by Deepset | Github | tutorial

Haystack is an end-to-end NLP framework that enables you to build applications powered by LLMs, Transformer models, vector search, and more. Whether you want to perform question answering, answer generation, semantic document search, or build tools capable of complex decision-making and query resolution, you can use state-of-the-art NLP models with Haystack to build end-to-end NLP applications to solve your use case. We have a video on some examples of retrieval augmentation in Haystack.

5. Proton by Timeplus | Github | tutorial

Proton is a streaming analytics database, based on ClickHouse, written in C++. Fast. Powerful, Easy

6. Ydata-synthetic and Ydata-profiling by YData | Github | tutorial

Ydata-profiling is a Python package for automated Data Quality profiling reports in a single line of code. Ydata-synthetic is a package to generate synthetic tabular and time-series data with state-of-the-art generative models.

7. Apache Flink | Github | tutorial

Apache Flink is the leading framework and distributed processing engine for stateful computations over unbounded and bounded data streams.

8. LangChain RB | Github | tutorial

LangChain RB is an original Langchain-inspired Ruby framework. The goal is to abstract complexity and difficult concepts to make building AI/ML-supercharged applications approachable for traditional Ruby software engineers. If you are a Ruby fan, we have a video to show you how to build a GenAI App End-to-End with Ruby

9. Flyte by Union AI | Github | tutorial

Flyte is an open-source orchestrator that facilitates building production-grade data and ML pipelines. It is built for scalability and reproducibility, leveraging Kubernetes as its underlying platform. With Flyte, user teams can construct pipelines using the Python SDK and seamlessly deploy them in both cloud and on-premises environments, enabling distributed processing and efficient resource utilization.

10. DVC by Iterative | Github | tutorial

DVC is a command line tool to help you develop reproducible machine learning projects.

But Wait!, There's More!

11. Phoenix by Arize AI | Github | tutorial

Phoenix is Arize AI's open-source observability library designed for experimentation, fine-tuning, and troubleshooting your LLM, CV, and NLP models in a notebook.

12. TruLens by TruEra | Github | tutorial

Observability of LLM and Multimodal apps with deep instrumentation and comprehensive evals.

13. OpenLLM by BentoML | Github | tutorial

OpenLLM is an open-source platform designed to facilitate the deployment and operation of large language models (LLMs) in real-world applications. With OpenLLM, you can run inference on any open-source LLM, deploy them on the cloud or on-premises, and build powerful AI applications.

14. LabelStudio by Human Signal | Github | tutorial

A flexible data labeling tool for all data types. Prepare training data for computer vision, natural language processing, speech, voice, and video models.