May 31, 2026Updated June 3, 20268 min readby Vladimir Kamenev

What Are Vector Databases and Why Do AI Apps Need Them?

A vector database stores numerical representations of data — called embeddings — and retrieves them based on similarity rather than exact matches. When an AI application needs to find "documents about contract risk" without those exact words appearing, a vector database finds them in milliseconds. That capability is what makes modern RAG assistants, semantic search, and recommendation systems work.

What Is a Vector, Exactly?

Every piece of content — a sentence, an image, a product description — can be converted into a list of numbers called a vector or embedding. A sentence embedding might have 1,536 numbers (OpenAI's text-embedding-3-small output size). Each number captures a dimension of meaning.

Vectors that represent similar meaning end up numerically close to each other in that high-dimensional space. "Lease termination clause" and "how to end a rental agreement" produce vectors only a small angular distance apart, even though they share no words.

Three things define a useful embedding:

Dimensionality — how many numbers represent each item (128 to 3,072 are common ranges)

Model — the encoder used (OpenAI, Cohere, Sentence-Transformers, etc.)

Metric — how distance is measured (cosine similarity, dot product, Euclidean distance)

📌

Note

All embeddings from the same index must be produced by the same model. Mixing models in one collection breaks retrieval quality completely — the numbers become incomparable.

How a Vector Database Is Different From a Relational or Document Database

Traditional databases answer questions like "find all rows where status = 'active' and date > 2026-01-01." They excel at exact lookups and structured filters.

Vector databases answer questions like "find the 10 items most similar to this query." That requires a different data structure — typically an Approximate Nearest Neighbor (ANN) index — because brute-force comparison across millions of 1,536-dimension vectors would take seconds, not milliseconds.

Feature	Relational DB (PostgreSQL)	Document DB (MongoDB)	Vector DB (Qdrant, Pinecone)
Query type	Exact match, range	Field lookup, full-text	Semantic similarity (ANN)
Data shape	Tables, rows	JSON documents	Embedding vectors + metadata
Typical latency	<5 ms	<10 ms	5–50 ms at scale
Best for	Transactions, reports	Flexible schemas	AI retrieval, recommendations
Filtering	SQL WHERE	Query operators	Metadata pre-filter + ANN

Many teams use both: PostgreSQL handles billing and users, while Qdrant or Pinecone handles semantic search over documents.

The Core Operations in a Vector Database

Once you understand these four operations, every vector database product becomes easier to evaluate:

Upsert — store a vector plus its payload (metadata like document ID, timestamp, source URL)

Query — supply a query vector, get back the top-k most similar vectors and their metadata

Filter — combine ANN search with metadata filters ("similar to X AND created after 2025")

Delete — remove vectors by ID when the source content is updated or removed

💡

Tip

Always store the source document ID in the vector's metadata payload. When content changes, you need a reliable way to find and replace the old vector — or you'll end up serving stale results.

Why ANN Indexing Matters

Brute-force similarity search compares a query against every stored vector. At 1 million vectors of 1,536 dimensions, that means over 1.5 billion multiplications per query. Even on fast hardware, that hits 500 ms easily.

ANN indexes trade a small accuracy loss (~1–5%) for 100x–1,000x speed improvement. The major index types:

HNSW (Hierarchical Navigable Small World) — best recall, high memory use; used by Qdrant, Weaviate, pgvector

IVF (Inverted File Index) — clusters vectors, searches only relevant clusters; good for very large datasets

Flat — exact search, no index; fine for <100k vectors

PQ (Product Quantization) — compresses vectors to reduce RAM at some recall cost

For most production RAG systems with fewer than 10 million chunks, HNSW with cosine similarity delivers sub-20 ms queries without tuning.

Where Vector Databases Appear in AI Stacks

Retrieval-Augmented Generation (RAG)

RAG is the dominant use case. The pipeline:

Embed all your documents and store vectors in the database
When a user asks a question, embed the question
Query the vector database for the top-k most relevant document chunks
Pass those chunks as context to the LLM alongside the user question
LLM generates a grounded answer using real source content

This pattern cuts hallucination rates dramatically compared to relying on the LLM's training data alone. In building RAG pipelines for clients, I've found that retrieval quality — not the LLM itself — is the leading driver of answer accuracy.

Semantic Search

Site search, internal knowledge bases, and e-commerce product discovery all benefit from semantic search. A customer typing "waterproof jacket for cold weather" finds fleece-lined rain shells even if those exact words don't appear in the product title.

Recommendation Systems

User interaction histories are embedded and compared to item embeddings. "Find 10 products similar to what this user clicked" becomes a vector query. This powers personalization without collaborative filtering's cold-start problem.

Duplicate and Near-Duplicate Detection

Legal teams use vector similarity to find contracts that are substantially similar even if reformatted. Content teams flag near-duplicate articles before publication. The threshold is a similarity score (e.g., cosine > 0.92 flags as duplicate).

⚠️

Warning

Using a vector database without metadata filtering is a common mistake. Searching across a single flat collection with millions of vectors from multiple tenants or time ranges will return irrelevant results. Always segment with pre-filters or namespaces from day one.

Key Architecture Decisions When Adopting a Vector Database

Managed Cloud vs. Self-Hosted

Managed options (Pinecone, Weaviate Cloud, Zilliz) take infrastructure off your plate. You pay per query and per stored vector. At 10 million vectors queried 1,000 times per day, expect $300–$800/month on most managed platforms.

Self-hosted Qdrant or Weaviate on a 16-core VM with 64 GB RAM handles 50–100 million vectors and costs $200–$400/month in cloud compute, but requires operational overhead.

pgvector (a PostgreSQL extension) is a practical choice if you already run Postgres and have fewer than 5 million vectors. You avoid a new infrastructure dependency entirely.

Chunking Strategy

Your retrieval quality is only as good as your chunking. Common approaches:

Fixed-size chunks — 256–512 tokens with 10–20% overlap; simple, works for most text

Semantic chunking — split on sentence boundaries or paragraph breaks; better coherence

Hierarchical chunking — store both paragraph-level and document-level embeddings; retrieve paragraph, return document section

Metadata Schema

Design metadata fields before you ingest anything. Common fields: source_id, source_url, created_at, document_type, tenant_id, language. Retroactively adding filter fields means re-ingesting everything.

✨

Key takeaway

The quality of a vector database deployment depends 80% on upstream decisions — embedding model choice, chunk size, and metadata schema — and only 20% on which database product you pick. Optimize the data pipeline first.

How to Evaluate Vector Databases

When comparing options, measure these dimensions:

Recall at k — what fraction of true top-k results appear in the returned top-k (target >0.95 at k=10)

QPS (queries per second) — at your expected load

P99 latency — the tail matters more than the average for user-facing features

Filter performance — how much latency increases when you add metadata filters

Multitenancy — namespaces or collection isolation for SaaS products

On-disk support — can it spill to disk when RAM is insufficient?

Common Implementation Mistakes

Teams new to vector databases make predictable errors:

Embedding stale content — not updating vectors when source documents change

No re-ranking — returning raw ANN results instead of re-scoring with a cross-encoder

Single large collection — mixing unrelated content domains in one namespace, hurting precision

Ignoring chunk overlap — splitting sentences across chunks breaks context

Testing only at small scale — performance at 10k vectors says nothing about 10M

Key Takeaways

Vector databases are the retrieval layer that makes AI applications accurate at scale. The core value is simple: store meaning as numbers, retrieve by similarity. But production deployments require deliberate choices around chunking, metadata, index type, and whether to run managed or self-hosted.

For teams evaluating the space:

Start with pgvector if you already run Postgres and expect fewer than 2 million vectors
Use Qdrant or Weaviate self-hosted for mid-scale (2M–100M vectors) with full control
Use Pinecone or Zilliz Cloud when you want zero infrastructure management and predictable pricing
Revisit the embedding model before tuning the database — model quality matters more

DeGenito.Ai designs and builds complete RAG and semantic search pipelines, including embedding strategy, vector database selection, and production monitoring. If you're starting a vector database project, reach out for a scoped architecture review.

Frequently Asked Questions

What is the difference between a vector database and a vector store?

The terms are often used interchangeably. A vector store is any system that persists and retrieves vectors — including in-memory libraries like FAISS. A vector database adds production features on top: persistence, replication, metadata filtering, access control, and a client API. For proof-of-concept work, a vector store suffices; for production, use a vector database.

Can I use my existing PostgreSQL database as a vector database?

Yes, via the pgvector extension. It adds a vector column type and HNSW/IVF indexes. For datasets under 2–5 million vectors with moderate query volumes (under 500 QPS), pgvector performs well and eliminates a separate infrastructure dependency. Beyond that scale, dedicated vector databases outperform it on latency and throughput.

How much does a vector database cost to run?

Costs vary widely. Managed Pinecone starts around $0.08 per 1 million queries and $0.000096 per vector per month stored. A collection of 1 million vectors queried 500,000 times per month runs roughly $100–$200/month. Self-hosted Qdrant on a $200/month VM handles comparable workloads but requires engineering time to operate.

How many vectors can a vector database hold?

Modern vector databases scale to hundreds of millions or billions of vectors with the right hardware. Pinecone supports up to 100M vectors per index on its standard plan. Qdrant and Weaviate scale horizontally with sharding. For most business applications — internal knowledge bases, product catalogs, support documents — 1–10 million vectors is typical, which any current database handles comfortably.

What embedding model should I use?

For English text retrieval, OpenAI text-embedding-3-large (3,072 dimensions) delivers top benchmark performance. For cost-sensitive applications, text-embedding-3-small (1,536 dimensions) is 5x cheaper with ~5% lower recall. Cohere Embed v3 is competitive and supports multilingual use cases well. Always benchmark on your actual data — MTEB leaderboard rankings don't always translate to your specific domain.

Do vector databases replace traditional search engines like Elasticsearch?

Usually not — they complement them. Elasticsearch excels at keyword match, BM25 scoring, and structured filters. Vector databases excel at semantic similarity. Production search systems increasingly use hybrid retrieval: BM25 for exact term matches plus ANN for semantic matches, combined with a re-ranking model. This hybrid approach consistently outperforms either method alone by 10–20% on recall benchmarks.

Frequently Asked Questions

What is the difference between a vector database and a vector store?

A vector store is any system that persists and retrieves vectors, including in-memory libraries like FAISS. A vector database adds production features: persistence, replication, metadata filtering, access control, and a client API. For proof-of-concept work, a vector store suffices; for production, use a vector database.

Can I use my existing PostgreSQL database as a vector database?

Yes, via the pgvector extension. It adds a vector column type and HNSW/IVF indexes. For datasets under 2–5 million vectors with moderate query volumes (under 500 QPS), pgvector performs well and eliminates a separate infrastructure dependency. Beyond that scale, dedicated vector databases outperform it on latency and throughput.

How much does a vector database cost to run?

Managed Pinecone starts around $0.08 per 1 million queries and $0.000096 per vector per month stored. A collection of 1 million vectors queried 500,000 times per month runs roughly $100–$200/month. Self-hosted Qdrant on a $200/month VM handles comparable workloads but requires engineering time to operate.

How many vectors can a vector database hold?

Modern vector databases scale to hundreds of millions or billions of vectors with the right hardware. Pinecone supports up to 100M vectors per index on its standard plan. Qdrant and Weaviate scale horizontally with sharding. For most business applications, 1–10 million vectors is typical, which any current database handles comfortably.

What embedding model should I use?

For English text retrieval, OpenAI text-embedding-3-large (3,072 dimensions) delivers top benchmark performance. For cost-sensitive applications, text-embedding-3-small (1,536 dimensions) is 5x cheaper with ~5% lower recall. Cohere Embed v3 is competitive and supports multilingual use cases well. Always benchmark on your actual data — MTEB leaderboard rankings don't always translate to your specific domain.

What Are Vector Databases and Why Do AI Apps Need Them?

What Is a Vector, Exactly?

How a Vector Database Is Different From a Relational or Document Database

The Core Operations in a Vector Database

Why ANN Indexing Matters

Where Vector Databases Appear in AI Stacks

Retrieval-Augmented Generation (RAG)

Semantic Search

Recommendation Systems

Duplicate and Near-Duplicate Detection

Key Architecture Decisions When Adopting a Vector Database

Managed Cloud vs. Self-Hosted

Chunking Strategy

Metadata Schema

How to Evaluate Vector Databases

Common Implementation Mistakes

Key Takeaways

Frequently Asked Questions

What is the difference between a vector database and a vector store?

Can I use my existing PostgreSQL database as a vector database?

How much does a vector database cost to run?

How many vectors can a vector database hold?

What embedding model should I use?

Do vector databases replace traditional search engines like Elasticsearch?

Frequently Asked Questions

What is the difference between a vector database and a vector store?

Can I use my existing PostgreSQL database as a vector database?

How much does a vector database cost to run?

How many vectors can a vector database hold?

What embedding model should I use?

Do vector databases replace traditional search engines like Elasticsearch?

Want us to build your website free?