What Are Vector Databases and Why Do AI Apps Need Them?

A vector database stores numerical representations of data — called embeddings — and retrieves them based on similarity rather than exact matches. When an AI application needs to find "documents about contract risk" without those exact words appearing, a vector database finds them in milliseconds. That capability is what makes modern RAG assistants, semantic search, and recommendation systems work.

What Is a Vector, Exactly?

Every piece of content — a sentence, an image, a product description — can be converted into a list of numbers called a vector or embedding. A sentence embedding might have 1,536 numbers (OpenAI's text-embedding-3-small output size). Each number captures a dimension of meaning.

Vectors that represent similar meaning end up numerically close to each other in that high-dimensional space. "Lease termination clause" and "how to end a rental agreement" produce vectors only a small angular distance apart, even though they share no words.

Three things define a useful embedding:

  • Dimensionality — how many numbers represent each item (128 to 3,072 are common ranges)
  • Model — the encoder used (OpenAI, Cohere, Sentence-Transformers, etc.)
  • Metric — how distance is measured (cosine similarity, dot product, Euclidean distance)
  • 📌
    Note

    All embeddings from the same index must be produced by the same model. Mixing models in one collection breaks retrieval quality completely — the numbers become incomparable.

    How a Vector Database Is Different From a Relational or Document Database

    Traditional databases answer questions like "find all rows where status = 'active' and date > 2026-01-01." They excel at exact lookups and structured filters.

    Vector databases answer questions like "find the 10 items most similar to this query." That requires a different data structure — typically an Approximate Nearest Neighbor (ANN) index — because brute-force comparison across millions of 1,536-dimension vectors would take seconds, not milliseconds.

    FeatureRelational DB (PostgreSQL)Document DB (MongoDB)Vector DB (Qdrant, Pinecone)
    Query typeExact match, rangeField lookup, full-textSemantic similarity (ANN)
    Data shapeTables, rowsJSON documentsEmbedding vectors + metadata
    Typical latency<5 ms<10 ms5–50 ms at scale
    Best forTransactions, reportsFlexible schemasAI retrieval, recommendations
    FilteringSQL WHEREQuery operatorsMetadata pre-filter + ANN
    Many teams use both: PostgreSQL handles billing and users, while Qdrant or Pinecone handles semantic search over documents.

    The Core Operations in a Vector Database

    Once you understand these four operations, every vector database product becomes easier to evaluate:

  • Upsert — store a vector plus its payload (metadata like document ID, timestamp, source URL)
  • Query — supply a query vector, get back the top-k most similar vectors and their metadata
  • Filter — combine ANN search with metadata filters ("similar to X AND created after 2025")
  • Delete — remove vectors by ID when the source content is updated or removed
  • 💡
    Tip

    Always store the source document ID in the vector's metadata payload. When content changes, you need a reliable way to find and replace the old vector — or you'll end up serving stale results.

    Why ANN Indexing Matters

    Brute-force similarity search compares a query against every stored vector. At 1 million vectors of 1,536 dimensions, that means over 1.5 billion multiplications per query. Even on fast hardware, that hits 500 ms easily.

    ANN indexes trade a small accuracy loss (~1–5%) for 100x–1,000x speed improvement. The major index types:

  • HNSW (Hierarchical Navigable Small World) — best recall, high memory use; used by Qdrant, Weaviate, pgvector
  • IVF (Inverted File Index) — clusters vectors, searches only relevant clusters; good for very large datasets
  • Flat — exact search, no index; fine for <100k vectors
  • PQ (Product Quantization) — compresses vectors to reduce RAM at some recall cost
  • For most production RAG systems with fewer than 10 million chunks, HNSW with cosine similarity delivers sub-20 ms queries without tuning.

    Where Vector Databases Appear in AI Stacks

    Retrieval-Augmented Generation (RAG)

    RAG is the dominant use case. The pipeline:

    1. Embed all your documents and store vectors in the database
    2. When a user asks a question, embed the question
    3. Query the vector database for the top-k most relevant document chunks
    4. Pass those chunks as context to the LLM alongside the user question
    5. LLM generates a grounded answer using real source content
    This pattern cuts hallucination rates dramatically compared to relying on the LLM's training data alone. In building RAG pipelines for clients, I've found that retrieval quality — not the LLM itself — is the leading driver of answer accuracy.

    Semantic Search

    Site search, internal knowledge bases, and e-commerce product discovery all benefit from semantic search. A customer typing "waterproof jacket for cold weather" finds fleece-lined rain shells even if those exact words don't appear in the product title.

    Recommendation Systems

    User interaction histories are embedded and compared to item embeddings. "Find 10 products similar to what this user clicked" becomes a vector query. This powers personalization without collaborative filtering's cold-start problem.

    Duplicate and Near-Duplicate Detection

    Legal teams use vector similarity to find contracts that are substantially similar even if reformatted. Content teams flag near-duplicate articles before publication. The threshold is a similarity score (e.g., cosine > 0.92 flags as duplicate).

    ⚠️
    Warning

    Using a vector database without metadata filtering is a common mistake. Searching across a single flat collection with millions of vectors from multiple tenants or time ranges will return irrelevant results. Always segment with pre-filters or namespaces from day one.

    Key Architecture Decisions When Adopting a Vector Database

    Managed Cloud vs. Self-Hosted

    Managed options (Pinecone, Weaviate Cloud, Zilliz) take infrastructure off your plate. You pay per query and per stored vector. At 10 million vectors queried 1,000 times per day, expect $300–$800/month on most managed platforms.

    Self-hosted Qdrant or Weaviate on a 16-core VM with 64 GB RAM handles 50–100 million vectors and costs $200–$400/month in cloud compute, but requires operational overhead.

    pgvector (a PostgreSQL extension) is a practical choice if you already run Postgres and have fewer than 5 million vectors. You avoid a new infrastructure dependency entirely.

    Chunking Strategy

    Your retrieval quality is only as good as your chunking. Common approaches:

  • Fixed-size chunks — 256–512 tokens with 10–20% overlap; simple, works for most text
  • Semantic chunking — split on sentence boundaries or paragraph breaks; better coherence
  • Hierarchical chunking — store both paragraph-level and document-level embeddings; retrieve paragraph, return document section
  • Metadata Schema

    Design metadata fields before you ingest anything. Common fields: source_id, source_url, created_at, document_type, tenant_id, language. Retroactively adding filter fields means re-ingesting everything.

    Key takeaway

    The quality of a vector database deployment depends 80% on upstream decisions — embedding model choice, chunk size, and metadata schema — and only 20% on which database product you pick. Optimize the data pipeline first.

    How to Evaluate Vector Databases

    When comparing options, measure these dimensions:

  • Recall at k — what fraction of true top-k results appear in the returned top-k (target >0.95 at k=10)
  • QPS (queries per second) — at your expected load
  • P99 latency — the tail matters more than the average for user-facing features
  • Filter performance — how much latency increases when you add metadata filters
  • Multitenancy — namespaces or collection isolation for SaaS products
  • On-disk support — can it spill to disk when RAM is insufficient?
  • Common Implementation Mistakes

    Teams new to vector databases make predictable errors:

  • Embedding stale content — not updating vectors when source documents change
  • No re-ranking — returning raw ANN results instead of re-scoring with a cross-encoder
  • Single large collection — mixing unrelated content domains in one namespace, hurting precision
  • Ignoring chunk overlap — splitting sentences across chunks breaks context
  • Testing only at small scale — performance at 10k vectors says nothing about 10M
  • Key Takeaways

    Vector databases are the retrieval layer that makes AI applications accurate at scale. The core value is simple: store meaning as numbers, retrieve by similarity. But production deployments require deliberate choices around chunking, metadata, index type, and whether to run managed or self-hosted.

    For teams evaluating the space:

    • Start with pgvector if you already run Postgres and expect fewer than 2 million vectors
    • Use Qdrant or Weaviate self-hosted for mid-scale (2M–100M vectors) with full control
    • Use Pinecone or Zilliz Cloud when you want zero infrastructure management and predictable pricing
    • Revisit the embedding model before tuning the database — model quality matters more
    DeGenito.Ai designs and builds complete RAG and semantic search pipelines, including embedding strategy, vector database selection, and production monitoring. If you're starting a vector database project, reach out for a scoped architecture review.

    Frequently Asked Questions

    What is the difference between a vector database and a vector store?

    The terms are often used interchangeably. A vector store is any system that persists and retrieves vectors — including in-memory libraries like FAISS. A vector database adds production features on top: persistence, replication, metadata filtering, access control, and a client API. For proof-of-concept work, a vector store suffices; for production, use a vector database.

    Can I use my existing PostgreSQL database as a vector database?

    Yes, via the pgvector extension. It adds a vector column type and HNSW/IVF indexes. For datasets under 2–5 million vectors with moderate query volumes (under 500 QPS), pgvector performs well and eliminates a separate infrastructure dependency. Beyond that scale, dedicated vector databases outperform it on latency and throughput.

    How much does a vector database cost to run?

    Costs vary widely. Managed Pinecone starts around $0.08 per 1 million queries and $0.000096 per vector per month stored. A collection of 1 million vectors queried 500,000 times per month runs roughly $100–$200/month. Self-hosted Qdrant on a $200/month VM handles comparable workloads but requires engineering time to operate.

    How many vectors can a vector database hold?

    Modern vector databases scale to hundreds of millions or billions of vectors with the right hardware. Pinecone supports up to 100M vectors per index on its standard plan. Qdrant and Weaviate scale horizontally with sharding. For most business applications — internal knowledge bases, product catalogs, support documents — 1–10 million vectors is typical, which any current database handles comfortably.

    What embedding model should I use?

    For English text retrieval, OpenAI text-embedding-3-large (3,072 dimensions) delivers top benchmark performance. For cost-sensitive applications, text-embedding-3-small (1,536 dimensions) is 5x cheaper with ~5% lower recall. Cohere Embed v3 is competitive and supports multilingual use cases well. Always benchmark on your actual data — MTEB leaderboard rankings don't always translate to your specific domain.

    Do vector databases replace traditional search engines like Elasticsearch?

    Usually not — they complement them. Elasticsearch excels at keyword match, BM25 scoring, and structured filters. Vector databases excel at semantic similarity. Production search systems increasingly use hybrid retrieval: BM25 for exact term matches plus ANN for semantic matches, combined with a re-ranking model. This hybrid approach consistently outperforms either method alone by 10–20% on recall benchmarks.

    Frequently Asked Questions

    What is the difference between a vector database and a vector store?

    A vector store is any system that persists and retrieves vectors, including in-memory libraries like FAISS. A vector database adds production features: persistence, replication, metadata filtering, access control, and a client API. For proof-of-concept work, a vector store suffices; for production, use a vector database.

    Can I use my existing PostgreSQL database as a vector database?

    Yes, via the pgvector extension. It adds a vector column type and HNSW/IVF indexes. For datasets under 2–5 million vectors with moderate query volumes (under 500 QPS), pgvector performs well and eliminates a separate infrastructure dependency. Beyond that scale, dedicated vector databases outperform it on latency and throughput.

    How much does a vector database cost to run?

    Managed Pinecone starts around $0.08 per 1 million queries and $0.000096 per vector per month stored. A collection of 1 million vectors queried 500,000 times per month runs roughly $100–$200/month. Self-hosted Qdrant on a $200/month VM handles comparable workloads but requires engineering time to operate.

    How many vectors can a vector database hold?

    Modern vector databases scale to hundreds of millions or billions of vectors with the right hardware. Pinecone supports up to 100M vectors per index on its standard plan. Qdrant and Weaviate scale horizontally with sharding. For most business applications, 1–10 million vectors is typical, which any current database handles comfortably.

    What embedding model should I use?

    For English text retrieval, OpenAI text-embedding-3-large (3,072 dimensions) delivers top benchmark performance. For cost-sensitive applications, text-embedding-3-small (1,536 dimensions) is 5x cheaper with ~5% lower recall. Cohere Embed v3 is competitive and supports multilingual use cases well. Always benchmark on your actual data — MTEB leaderboard rankings don't always translate to your specific domain.

    Do vector databases replace traditional search engines like Elasticsearch?

    Usually not — they complement them. Elasticsearch excels at keyword match, BM25 scoring, and structured filters. Vector databases excel at semantic similarity. Production search systems increasingly use hybrid retrieval: BM25 for exact term matches plus ANN for semantic matches, combined with a re-ranking model. This hybrid approach consistently outperforms either method alone by 10–20% on recall benchmarks.

    VK
    Vladimir Kamenev
    Generative AI solutions

    25 year in industry and still running strong

    Want us to build your website free?

    Custom website + 30+ SEO articles/month + AI search optimization. Starting at $149/month, no contracts.

    Get Your Free Website →