What is a Vector Database?
A vector database is a specialized database designed to store, index, and query high-dimensional vectors efficiently. Unlike traditional databases that excel at exact matches, vector databases find similar items.
Why Do We Need Them?
Traditional databases can't efficiently search vectors. Finding the nearest neighbors among millions of 1536-dimensional vectors requires specialized algorithms.
Traditional DB
Exact match: WHERE id = 123
Fast for equality, useless for similarity
Vector DB
Similarity search: Find top 10 nearest
Optimized for approximate nearest neighbor
How They Work
Indexing Algorithms
Vector databases use specialized indexes:
- HNSW (Hierarchical Navigable Small World) - Graph-based, most popular
- IVF (Inverted File Index) - Cluster-based
- PQ (Product Quantization) - Compression-based
- Annoy - Tree-based, by Spotify
ANN vs Exact Search
Most vector databases use Approximate Nearest Neighbor (ANN) search. It's much faster than exact search with minimal accuracy loss.
Popular Vector Databases
| Database | Type | Best For |
|---|---|---|
| Pinecone | Managed | Production, ease of use |
| Weaviate | Open Source | Hybrid search, GraphQL |
| Qdrant | Open Source | Performance, filtering |
| Milvus | Open Source | Scale, enterprise |
| Chroma | Open Source | Local dev, Python |
| pgvector | Extension | PostgreSQL users |
Basic Operations
1. Insert Vectors
# Pinecone example
index.upsert([
{"id": "vec1", "values": [0.1, 0.2, ...], "metadata": {"text": "..."}},
{"id": "vec2", "values": [0.3, 0.4, ...], "metadata": {"text": "..."}}
])
2. Query Similar Vectors
results = index.query(
vector=[0.1, 0.2, ...],
top_k=10,
include_metadata=True
)
# Returns: [{"id": "vec1", "score": 0.95}, ...]
3. Filter Results
results = index.query(
vector=[0.1, 0.2, ...],
top_k=10,
filter={"category": "science"}
)
How to Choose
Choose Managed (Pinecone) if:
- You want zero infrastructure management
- You need enterprise support
- Budget allows for managed services
Choose Open Source (Qdrant, Weaviate) if:
- You want to self-host
- You need full control
- Cost is a concern at scale
Choose pgvector if:
- You already use PostgreSQL
- You want vectors alongside relational data
- Scale is moderate (<1M vectors)
Next Steps
- Vector DB Comparison - Detailed feature comparison
- Search Tutorial - Build your first search
- RAG - Use vector DBs with LLMs