In Part 1, we set up PostgreSQL with pgvector. Now, let's see how vector search actually works.
Contents
- What are Embeddings?
- Loading Sample Data
- Exploring Vector Search
- Understanding PostgreSQL Operators
- Next Steps
What are Embeddings?
An embedding is like a smart summary of content in numbers. The distance between two embeddings indicates their level of similarity. A small distance suggests that the vectors are quite similar, and a large distance indicates that they are less related.
๐ Book A: Web Development (Distance: 0.2) โฌ
๏ธ Very Similar!
๐ Book B: JavaScript 101 (Distance: 0.3) โฌ
๏ธ Similar!
๐ Book C: Cooking Recipes (Distance: 0.9) โ Not Similar
Loading Sample Data
Now, let's populate our database with some data. We'll use:
- Open Library API for book data
- OpenAI API to create embeddings
- pgvector to store and search them
Project Structure
pgvector-setup/ # From Part 1
โโโ compose.yml
โโโ postgres/
โ โโโ schema.sql
โโโ .env # New: for API keys
โโโ scripts/ # New: for data loading
โโโ requirements.txt
โโโ Dockerfile
โโโ load_data.py
Create a Script
Let's start with a script to load data from external APIs. The full script is Here.
Setting Up Data Loading
Create .env
:
OPENAI_API_KEY=your_openai_api_key
Update compose.yml
to add the data loader:
services:
# ... existing db service from Part 1
data_loader:
build:
context: ./scripts
environment:
- DATABASE_URL=postgresql://postgres:password@db:5432/example_db
- OPENAI_API_KEY=${OPENAI_API_KEY}
depends_on:
- db
Load the data:
docker compose up data_loader
You should see 10 programming books with their metadata.
Exploring Vector Search
Connect to your database:
docker exec -it pgvector-db psql -U postgres -d example_db
Understanding Vector Data
Let's peek at what embeddings actually look like:
-- View first 5 dimensions of an embedding
SELECT
name,
(embedding::text::float[])[1:5] as first_5_dimensions
FROM items
LIMIT 1;
- Each embedding has 1536 dimensions (using OpenAI's model)
- Values typically range from -1 to 1
- These numbers represent semantic meaning
Finding Similar Books
Try a simple similarity search:
-- Find 3 books similar to any book about Web
SELECT name, metadata
FROM items
ORDER BY embedding <-> (
SELECT embedding
FROM items
WHERE metadata->>'title' LIKE '%Web%'
LIMIT 1
)
LIMIT 3;
- Find a book with "Web" in its title
- Get that book's embedding (its mathematical representation)
- Compare this embedding with all other books' embeddings
- Get the 3 most similar books (smallest distances)
Understanding PostgreSQL Operators
Let's break down the operators used in vector search queries:
JSON Text Operator: ->>
Extracts text value from a JSON field.
Example:
-- If metadata = {"title": "ABC"}, it returns "ABC"
SELECT metadata->>'title' FROM items;
Vector Distance Operator: <->
Measures similarity between two vectors.
- Smaller distance = More similar
- Larger distance = Less similar
Example:
-- Find similar books
SELECT name, embedding <-> query_embedding as distance
FROM items
ORDER BY distance
LIMIT 3;
Next Steps
Up next, we'll:
- Build a FastAPI application
- Create search endpoints
- Make our vector search accessible via API
Stay tuned for Part 3: "Building a Vector Search API"! ๐
Feel free to drop a comment below! ๐ฌ
Top comments (0)