COMMUNITY
Join us at one of our Unstructured Data Meetups! SF Unstructured Data Meetup - November 14, 2023 - watch the video
- Mihail Eric, Founder, Storia.ai
- Jacob Marks, MLE/DevEvangelist, Voxel51
- Josh Reini, Data Scientist/DevRel, TruEra
The next one in San Francisco is on Jan 16, 2024. Please register early because they fill up fast! We will be joined by
- Jack Retterer, DevRel, Unstructured.io
- George Williams, Organizer, Big-ANN NeurIPS 2023
Milvus version 2.3.2 & 2.3.3 is here!
New features
🔍 Support for Array Data Types - Milvus now supports Array data types, allowing for precise metadata filtering. For example, in e-commerce, this enables advanced searches based on multiple product tags, ensuring that search results are highly relevant to user queries.
🧹 Support for Complex Delete Expressions - With Milvus 2.3.2 or 2.3.3, developers can specify detailed criteria for data removal, enabling precise cleanup, such as rolling old data or GDPR compliance-driven deletion based on user IDs. Note that deletion is not atomic; use it cautiously for precise data management.
🗂️ TiKV Integration for Metadata Storage - By integrating TiKV for metadata storage, Milvus gains improved scalability and stability. TiKV's architecture is designed to handle large-scale metadata storage efficiently, ensuring that Milvus can scale to meet the demands of growing datasets without sacrificing stability.
🌀 Support for FP16 Vector Type - Milvus's support for the FP16 vector type enhances machine learning efficiency. This data format is widely used in deep learning and ML for its ability to represent and process numerical values more efficiently, resulting in faster and more resource-efficient machine learning operations.
📊 Support for Vector Index MMAP - Go beyond mapping just the raw data 2.3.0; now you can also map the index. This feature enables you to store more data on the same machine and provides flexibility in data storage while saving costs.
ARTICLES
Natural Language Processing (NLP)
- An Introduction to Natural Language Processing
- Top 20 NLP Models to Empower Your ML Application
- Tokens, N-Grams, and Bag-of-Words Models
- Primer on Neural Networks and Embeddings for Language Models
RAG
- Do We Still Need Vector Databases for RAG with OpenAI's Releasing of Its Built-In Retrieval?
- Grounding Our Chat Towards Data Science Results
- How LangChain Implements Self Querying
TUTORIAL
Getting Started with a Milvus Connection
Milvus has four SDKs: Go, Java, Python, React, and Ruby. In this blog, we’ll show steps for Python.
Using AI to Find Your Celebrity Stylist. In this tutorial, you will learn how to utilize a fine-tuned model to segment clothing in images. You will then crop out each labeled article and resize the images to the same size. Finally, store the embeddings generated from those images in Milvus, an open-source vector database.
VIDEOS
Frank's RedHot Takes
- Why Open Source is the future of AI Data Infrastructure
- Are Vector Databases dangerous?
- Vector Database and Search
- OpenAI DevDay 2023 Retrieval API
GITHUB REPOS
Milvus Vector Database. Milvus is an open source vector database used to store, index, and manage massive embedding vectors generated by deep neural networks and other machine learning (ML) models.
GPT Cache. GPTCache is an open-source tool designed to improve the efficiency and speed of GPT-based applications by implementing a cache to store the responses generated by language models.
VectorDBBench. VectorDBBench is an open-source benchmarking tool to help you evaluate the performance of mainstream vector databases and cloud services with yoru specific use case.
Top comments (0)