What Are Vector Databases and Why Do AI Apps Need Them?
A vector database stores numerical representations of data — called embeddings — and retrieves them based on similarity rather than exact matches. When an AI application needs to find "documents about contract risk" without those exact words appearing, a vector database finds them in milliseconds. That capability is what makes modern RAG assistants, semantic search, and recommendation systems work.
What Is a Vector, Exactly?
Every piece of content — a sentence, an image, a product description — can be converted into a list of numbers called a vector or embedding. A sentence embedding might have 1,536 numbers (OpenAI's text-embedding-3-small output size). Each number captures a dimension of meaning.
Vectors that represent similar meaning end up numerically close to each other in that high-dimensional space. "Lease termination clause" and "how to end a rental agreement" produce vectors only a small angular distance apart, even though they share no words.
Three things define a useful embedding:
All embeddings from the same index must be produced by the same model. Mixing models in one collection breaks retrieval quality completely — the numbers become incomparable.
How a Vector Database Is Different From a Relational or Document Database
Traditional databases answer questions like "find all rows where status = 'active' and date > 2026-01-01." They excel at exact lookups and structured filters.
Vector databases answer questions like "find the 10 items most similar to this query." That requires a different data structure — typically an Approximate Nearest Neighbor (ANN) index — because brute-force comparison across millions of 1,536-dimension vectors would take seconds, not milliseconds.
| Feature | Relational DB (PostgreSQL) | Document DB (MongoDB) | Vector DB (Qdrant, Pinecone) |
|---|---|---|---|
| Query type | Exact match, range | Field lookup, full-text | Semantic similarity (ANN) |
| Data shape | Tables, rows | JSON documents | Embedding vectors + metadata |
| Typical latency | <5 ms | <10 ms | 5–50 ms at scale |
| Best for | Transactions, reports | Flexible schemas | AI retrieval, recommendations |
| Filtering | SQL WHERE | Query operators | Metadata pre-filter + ANN |
The Core Operations in a Vector Database
Once you understand these four operations, every vector database product becomes easier to evaluate:
Always store the source document ID in the vector's metadata payload. When content changes, you need a reliable way to find and replace the old vector — or you'll end up serving stale results.
Why ANN Indexing Matters
Brute-force similarity search compares a query against every stored vector. At 1 million vectors of 1,536 dimensions, that means over 1.5 billion multiplications per query. Even on fast hardware, that hits 500 ms easily.
ANN indexes trade a small accuracy loss (~1–5%) for 100x–1,000x speed improvement. The major index types:
For most production RAG systems with fewer than 10 million chunks, HNSW with cosine similarity delivers sub-20 ms queries without tuning.
Where Vector Databases Appear in AI Stacks
Retrieval-Augmented Generation (RAG)
RAG is the dominant use case. The pipeline:
- Embed all your documents and store vectors in the database
- When a user asks a question, embed the question
- Query the vector database for the top-k most relevant document chunks
- Pass those chunks as context to the LLM alongside the user question
- LLM generates a grounded answer using real source content
Semantic Search
Site search, internal knowledge bases, and e-commerce product discovery all benefit from semantic search. A customer typing "waterproof jacket for cold weather" finds fleece-lined rain shells even if those exact words don't appear in the product title.
Recommendation Systems
User interaction histories are embedded and compared to item embeddings. "Find 10 products similar to what this user clicked" becomes a vector query. This powers personalization without collaborative filtering's cold-start problem.
Duplicate and Near-Duplicate Detection
Legal teams use vector similarity to find contracts that are substantially similar even if reformatted. Content teams flag near-duplicate articles before publication. The threshold is a similarity score (e.g., cosine > 0.92 flags as duplicate).
Using a vector database without metadata filtering is a common mistake. Searching across a single flat collection with millions of vectors from multiple tenants or time ranges will return irrelevant results. Always segment with pre-filters or namespaces from day one.
Key Architecture Decisions When Adopting a Vector Database
Managed Cloud vs. Self-Hosted
Managed options (Pinecone, Weaviate Cloud, Zilliz) take infrastructure off your plate. You pay per query and per stored vector. At 10 million vectors queried 1,000 times per day, expect $300–$800/month on most managed platforms.
Self-hosted Qdrant or Weaviate on a 16-core VM with 64 GB RAM handles 50–100 million vectors and costs $200–$400/month in cloud compute, but requires operational overhead.
pgvector (a PostgreSQL extension) is a practical choice if you already run Postgres and have fewer than 5 million vectors. You avoid a new infrastructure dependency entirely.Chunking Strategy
Your retrieval quality is only as good as your chunking. Common approaches:
Metadata Schema
Design metadata fields before you ingest anything. Common fields: source_id, source_url, created_at, document_type, tenant_id, language. Retroactively adding filter fields means re-ingesting everything.
The quality of a vector database deployment depends 80% on upstream decisions — embedding model choice, chunk size, and metadata schema — and only 20% on which database product you pick. Optimize the data pipeline first.
How to Evaluate Vector Databases
When comparing options, measure these dimensions:
Common Implementation Mistakes
Teams new to vector databases make predictable errors:
Key Takeaways
Vector databases are the retrieval layer that makes AI applications accurate at scale. The core value is simple: store meaning as numbers, retrieve by similarity. But production deployments require deliberate choices around chunking, metadata, index type, and whether to run managed or self-hosted.
For teams evaluating the space:
- Start with pgvector if you already run Postgres and expect fewer than 2 million vectors
- Use Qdrant or Weaviate self-hosted for mid-scale (2M–100M vectors) with full control
- Use Pinecone or Zilliz Cloud when you want zero infrastructure management and predictable pricing
- Revisit the embedding model before tuning the database — model quality matters more
Frequently Asked Questions
What is the difference between a vector database and a vector store?
The terms are often used interchangeably. A vector store is any system that persists and retrieves vectors — including in-memory libraries like FAISS. A vector database adds production features on top: persistence, replication, metadata filtering, access control, and a client API. For proof-of-concept work, a vector store suffices; for production, use a vector database.
Can I use my existing PostgreSQL database as a vector database?
Yes, via the pgvector extension. It adds a vector column type and HNSW/IVF indexes. For datasets under 2–5 million vectors with moderate query volumes (under 500 QPS), pgvector performs well and eliminates a separate infrastructure dependency. Beyond that scale, dedicated vector databases outperform it on latency and throughput.
How much does a vector database cost to run?
Costs vary widely. Managed Pinecone starts around $0.08 per 1 million queries and $0.000096 per vector per month stored. A collection of 1 million vectors queried 500,000 times per month runs roughly $100–$200/month. Self-hosted Qdrant on a $200/month VM handles comparable workloads but requires engineering time to operate.
How many vectors can a vector database hold?
Modern vector databases scale to hundreds of millions or billions of vectors with the right hardware. Pinecone supports up to 100M vectors per index on its standard plan. Qdrant and Weaviate scale horizontally with sharding. For most business applications — internal knowledge bases, product catalogs, support documents — 1–10 million vectors is typical, which any current database handles comfortably.
What embedding model should I use?
For English text retrieval, OpenAI text-embedding-3-large (3,072 dimensions) delivers top benchmark performance. For cost-sensitive applications, text-embedding-3-small (1,536 dimensions) is 5x cheaper with ~5% lower recall. Cohere Embed v3 is competitive and supports multilingual use cases well. Always benchmark on your actual data — MTEB leaderboard rankings don't always translate to your specific domain.
Do vector databases replace traditional search engines like Elasticsearch?
Usually not — they complement them. Elasticsearch excels at keyword match, BM25 scoring, and structured filters. Vector databases excel at semantic similarity. Production search systems increasingly use hybrid retrieval: BM25 for exact term matches plus ANN for semantic matches, combined with a re-ranking model. This hybrid approach consistently outperforms either method alone by 10–20% on recall benchmarks.
Frequently Asked Questions
What is the difference between a vector database and a vector store?
A vector store is any system that persists and retrieves vectors, including in-memory libraries like FAISS. A vector database adds production features: persistence, replication, metadata filtering, access control, and a client API. For proof-of-concept work, a vector store suffices; for production, use a vector database.
Can I use my existing PostgreSQL database as a vector database?
Yes, via the pgvector extension. It adds a vector column type and HNSW/IVF indexes. For datasets under 2–5 million vectors with moderate query volumes (under 500 QPS), pgvector performs well and eliminates a separate infrastructure dependency. Beyond that scale, dedicated vector databases outperform it on latency and throughput.
How much does a vector database cost to run?
Managed Pinecone starts around $0.08 per 1 million queries and $0.000096 per vector per month stored. A collection of 1 million vectors queried 500,000 times per month runs roughly $100–$200/month. Self-hosted Qdrant on a $200/month VM handles comparable workloads but requires engineering time to operate.
How many vectors can a vector database hold?
Modern vector databases scale to hundreds of millions or billions of vectors with the right hardware. Pinecone supports up to 100M vectors per index on its standard plan. Qdrant and Weaviate scale horizontally with sharding. For most business applications, 1–10 million vectors is typical, which any current database handles comfortably.
What embedding model should I use?
For English text retrieval, OpenAI text-embedding-3-large (3,072 dimensions) delivers top benchmark performance. For cost-sensitive applications, text-embedding-3-small (1,536 dimensions) is 5x cheaper with ~5% lower recall. Cohere Embed v3 is competitive and supports multilingual use cases well. Always benchmark on your actual data — MTEB leaderboard rankings don't always translate to your specific domain.
Do vector databases replace traditional search engines like Elasticsearch?
Usually not — they complement them. Elasticsearch excels at keyword match, BM25 scoring, and structured filters. Vector databases excel at semantic similarity. Production search systems increasingly use hybrid retrieval: BM25 for exact term matches plus ANN for semantic matches, combined with a re-ranking model. This hybrid approach consistently outperforms either method alone by 10–20% on recall benchmarks.