COMMUNITY
We had so much fun at the meetup this week in Palo Ato and can't wait to see you all again next month. We haven't had the chance to upload the video, yet, however, the Berlin and SF videos are up for your viewing pleasure.
Learn About Vector Databases
There are so many databases with Vector Search capabilities that it can be overwhelming to know where to start! This week, let's focus on learning about similarity metrics, the diffrence between sparse and dense vectors and get our hands dirty with some hands-on tutorials.
- Similarity Metrics for Vector Search like Euclidean distance or cosine similarity are used to measure how closely vectors relate to each other in high-dimensional space. Choosing an appropriate metric is crucial, as it can significantly enhance the performance of machine learning tasks such as classification and clustering.
- Getting Started: Pgvector Guide for Developers Exploring Vector Databases. If you are a postgres fan, you can build a little prototype with PGVector.
- Beginner Guide to Implementing Vector Databases, including key considerations and steps to get started with a vector database and implementation best practices.
Get Started with Milvus
Milvus is an open source vector database that is a popular choice for builing all kinds of AI applications.
- Getting Started with a Milvus Connection. It comes with everything you need to get started built right in, and runs on your local machine.
- JSON Metadata Filtering in Milvus is useful when you want to use data other than vectors to fine tune your search results.
- Hybrid Search with Milvus is another example of using different kinds of vectors and meta data to get the best search results.
- Multimodal RAG with CLIP, Llama3, and Milvus is all the rage! Try this tutorial to see they power of multi-modal search.
Vector Embeddings
In general, there are two types of vectors: dense vectors and sparse vectors. While they can be utilized for similar tasks, each has advantages and disadvantages.
- Sparse and Dense Embeddings
- Mastering BM25: A Deep Dive into the Algorithm and Application in Milvus
- Comparing SPLADE Sparse Vectors with BM25
- Exploring BGE M3: The Future of Information Retrieval with Milvus
You can also train your own models, learn more about sentence transformers and even give time series embedding a go!
- Training Your Own Text Embedding Model
- All MPNET Base v2: Enhancing Sentence Embedding with AI
- Time Series Embedding Data Analysis
Vector Indexes
Most vector search solutions rely on HNSW, but there are many other vector indexes and understanding the differences will help you create a performant and cost effective AI application. Here are two that you might not have heard about yet:
Learn RAG
Chunking Strategies
- Guide to Chunking Strategies for RAG
- Beginner Guide to Website Chunking and Embedding for Your GenAI Applications
Optimizing your RAG applications
- Retrieval-Augmented Generation with Citations
- RAG Evaluation Using Ragas
- Building RAG Apps Without OpenAI Part I
- How LangChain Implements Self-Querying
- Optimize RAG with Rerankers: The Role and Tradeoffs
More cool tutorials on agents with Llama 3
- Local Agentic RAG with Langraph and Llama3
- A Beginner's Guide to Using Llama 3 with Ollama, Milvus, LangChain
GITHUB REPOS
Milvus Milvus is an open-source vector database built to power embedding similarity search and AI applications.
Akcio: Enhancing LLM-Powered ChatBot with CVP Stack A full chatbot app all open-source for you to try out for your self!
GPT Cache. GPTCache is an open-source tool designed to improve the efficiency and speed of GPT-based applications by implementing a cache to store the responses generated by language models.
VectorDBBench. VectorDBBench is an open-source benchmarking tool to help you evaluate the performance of mainstream vector databases and cloud services with yoru specific use case.
Top comments (0)