Fix: ChromaDB Not Working — Persistent Client, Collection Errors, and Embedding Function Issues

Q: How do I fix "ChromaDB Not Working — Persistent Client, Collection Errors, and Embedding Function Issues"?

How to fix ChromaDB errors — persistent client not saving data, collection already exists error, dimension mismatch in embeddings, embedding function required, HTTP client connection refused, and memory growing unbounded.

The Error

You restart your Python process and your ChromaDB collection is empty:

client = chromadb.Client()
collection = client.create_collection("my_docs")
# ... add documents ...

# Restart Python
client = chromadb.Client()
collection = client.get_collection("my_docs")   # ValueError: Collection not found

Or you try to create a collection that already exists:

chromadb.errors.UniqueConstraintError:
Collection my_collection already exists.

Or you add new documents to an existing collection and get a dimension mismatch:

InvalidDimensionException:
Embedding dimension 1536 does not match collection dimensionality 384

Or the HTTP client can’t connect:

ConnectionError: Could not connect to tenant default_tenant at http://localhost:8000

Or your collection grows unbounded as you add documents in a loop:

# After 100k inserts
MemoryError: Unable to allocate array with shape (...)

ChromaDB is the most popular open-source vector database for RAG — lightweight, runs in-process by default, integrates with every LLM framework. Its simplicity is deceiving: the difference between in-memory and persistent clients, the strict dimension matching, and the implicit embedding function all produce specific failure modes that newcomers hit repeatedly. This guide covers them.

Why This Happens

Chroma has three client modes: Client() (in-memory, lost on restart), PersistentClient() (on-disk, survives restarts), and HttpClient() (remote server). New users often start with Client() from tutorials, not realizing data doesn’t persist.

Every collection has a fixed embedding dimensionality set on first write. Trying to add documents with different-sized embeddings (e.g., switching from all-MiniLM-L6-v2 at 384 dims to text-embedding-3-large at 3072 dims) fails. This is a feature, not a bug — mixing embedding spaces would produce garbage similarity scores.

Diagnostic Timeline

ChromaDB problems rarely show themselves as outright errors — queries return wrong results, collections appear empty, or performance degrades at scale. Here is the path a senior engineer follows.

Minute 0 — First guess: reset the persist directory. You delete ./chroma_db and re-run your ingestion script, expecting a fresh start to fix everything. The new collection has the same problem. Reset rarely helps — Chroma’s behavior is deterministic given the same inputs, so the same bugs reproduce.

Minute 5 — Compare client and server versions. Run pip show chromadb on the client and check the server’s version via its logs or /api/v1/version endpoint. Chroma’s REST API has had breaking changes between 0.4, 0.5, and 0.6. A 0.5 client against a 0.4 server returns 404 on every collection call without raising a clear “version mismatch” error. Pin both sides to the same version.

Minute 10 — Audit the embedding function across init calls. The most common cause of “queries return nothing” is a different embedding function on create_collection vs the later get_collection call. If you create the collection with OpenAI embeddings but later open it with the default sentence-transformer, every query embeds with the wrong model and returns nearest-noise. Chroma does not warn — it just gives wrong results. Pass the embedding function explicitly on every get_collection call, or store the model name in collection metadata and assert on load.

Minute 20 — Check distance metric. Default is L2 (Euclidean). If you compute similarity as 1 - distance assuming cosine, your scores are nonsense. Inspect collection.metadata["hnsw:space"] — if it is l2, your similarity math is wrong. Either rebuild the collection with metadata={"hnsw:space": "cosine"} or fix the math.

Minute 30 — HNSW parameters at scale. Default hnsw:M=16 and hnsw:construction_ef=100 work fine up to ~100k vectors. At 1M+, recall drops and query latency climbs. If you only see issues past a certain collection size, this is the cause. Rebuild with M=32, construction_ef=200, search_ef=100 for production-scale collections. The old collection cannot be retuned in place — Chroma does not support pq parameter changes on existing indexes.

Minute 45 — Memory pressure from collection.get(). Open a memory profiler. If RSS spikes during querying, someone in your codebase called collection.get() without limit=. That call loads every document, embedding, and metadata into Python memory at once — fatal at 100k+ vectors. Always paginate.

Fix 1: Persistent vs In-Memory Client

import chromadb

# WRONG — data lost on restart
client = chromadb.Client()   # In-memory
collection = client.create_collection("docs")
collection.add(documents=["hello world"], ids=["1"])
# Restart Python — data gone

# CORRECT — persistent storage
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection("docs")
collection.add(documents=["hello world"], ids=["1"])
# Restart — data is loaded from ./chroma_db

get_or_create_collection is the idempotent pattern — no error if collection exists:

# WRONG — raises if exists
collection = client.create_collection("my_docs")

# CORRECT — returns existing or creates new
collection = client.get_or_create_collection("my_docs")

Verify persistence works:

import chromadb

client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection("test")
collection.add(documents=["test"], ids=["1"])
print(f"Before restart: {collection.count()} documents")

# (Restart Python manually, then run)
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_collection("test")
print(f"After restart: {collection.count()} documents")   # Should match

Check storage location:

ls ./chroma_db/
# chroma.sqlite3
# <collection-id>/
#   data_level0.bin
#   header.bin
#   ...

Chroma stores data in SQLite + HNSW index files. Don’t move or rename files while the client is open.

Common Mistake: Using Client() in development because tutorials do, then deploying the same code to production. The first user’s data appears to work, but on server restart it all vanishes. Always use PersistentClient(path=...) unless you explicitly want ephemeral storage (tests, one-off scripts).

Fix 2: Embedding Function Mismatch

InvalidDimensionException: Embedding dimension 1536 does not match collection dimensionality 384

Each collection stores vectors of a fixed size. You can’t mix models with different embedding sizes in the same collection.

Set an explicit embedding function when creating a collection:

from chromadb.utils import embedding_functions

# Default: sentence-transformers/all-MiniLM-L6-v2 (384 dims)
# Used if you don't specify one

# Option 1: OpenAI embeddings (1536 or 3072 dims)
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key="sk-...",
    model_name="text-embedding-3-small",   # 1536 dims
    # Or "text-embedding-3-large" (3072 dims)
)

collection = client.create_collection(
    name="openai_docs",
    embedding_function=openai_ef,
)

# Option 2: HuggingFace model
hf_ef = embedding_functions.HuggingFaceEmbeddingFunction(
    api_key="hf_...",
    model_name="sentence-transformers/all-mpnet-base-v2",
)

# Option 3: Local sentence-transformers (no API)
st_ef = embedding_functions.SentenceTransformerEmbeddingFunction(
    model_name="BAAI/bge-large-en-v1.5",
)

# Option 4: Cohere
cohere_ef = embedding_functions.CohereEmbeddingFunction(
    api_key="...",
    model_name="embed-english-v3.0",
)

The embedding function must match every time you load the collection:

# Save collection
collection = client.create_collection(
    name="docs",
    embedding_function=openai_ef,
)

# Load later — same embedding function required
collection = client.get_collection(
    name="docs",
    embedding_function=openai_ef,   # Must match or queries break
)

Pro Tip: Store metadata about which embedding function was used in the collection metadata. On load, assert the configured function matches. This prevents the subtle bug where queries return nonsense because the query embedding is from a different model than the stored vectors.

collection = client.create_collection(
    name="docs",
    embedding_function=openai_ef,
    metadata={
        "embedding_model": "text-embedding-3-small",
        "embedding_dims": 1536,
        "created_at": "2025-04-09",
    },
)

Bring your own embeddings (skip Chroma’s embedding function):

# Pre-compute embeddings with any model
import numpy as np

texts = ["doc 1", "doc 2", "doc 3"]
embeddings = my_model.encode(texts)   # Shape: (3, 768) for example

collection = client.create_collection(name="custom_embeds")
collection.add(
    documents=texts,
    embeddings=embeddings.tolist(),   # Pass explicitly
    ids=["1", "2", "3"],
)

# For queries, also pass pre-computed embedding
query_embedding = my_model.encode(["search query"])
results = collection.query(
    query_embeddings=query_embedding.tolist(),
    n_results=5,
)

Fix 3: Collection Management and Metadata Filters

# List all collections
collections = client.list_collections()
for c in collections:
    print(c.name, c.count())

# Delete a collection
client.delete_collection(name="old_docs")

# Delete specific items from a collection
collection.delete(ids=["doc1", "doc2"])

# Delete by metadata filter
collection.delete(where={"source": "outdated_blog"})

Add with metadata for filtering later:

collection.add(
    documents=[
        "The quarterly revenue was $5M",
        "The team consists of 50 engineers",
        "Product X launches in Q3 2025",
    ],
    metadatas=[
        {"type": "finance", "quarter": "Q1", "year": 2025},
        {"type": "hr", "department": "engineering"},
        {"type": "product", "launch_year": 2025},
    ],
    ids=["fin-1", "hr-1", "prod-1"],
)

# Query with metadata filter
results = collection.query(
    query_texts=["What's the revenue?"],
    n_results=5,
    where={"type": "finance"},   # Only search finance docs
)

# Complex filters
results = collection.query(
    query_texts=["team info"],
    where={"$and": [
        {"type": "hr"},
        {"department": "engineering"},
    ]},
)

# Comparison operators
results = collection.query(
    query_texts=["products"],
    where={"launch_year": {"$gte": 2024}},
)

Supported filter operators:

Operator	Meaning
`$eq`	Equals (default if you pass a scalar)
`$ne`	Not equals
`$gt`, `$gte`	Greater than (or equal)
`$lt`, `$lte`	Less than (or equal)
`$in`	Value in list
`$nin`	Value not in list
`$and`, `$or`	Logical operators

Filter by document content (not just metadata):

results = collection.query(
    query_texts=["search term"],
    n_results=5,
    where_document={"$contains": "specific word"},   # Full-text filter
)

Fix 4: Query Results and Similarity Scores

results = collection.query(
    query_texts=["What is the capital of France?"],
    n_results=3,
)

print(results)
# {
#   'ids': [['doc-2', 'doc-5', 'doc-1']],
#   'distances': [[0.12, 0.34, 0.56]],
#   'documents': [['Paris is...', 'Europe has...', 'France is...']],
#   'metadatas': [[{...}, {...}, {...}]],
# }

Distance vs similarity — lower is better (default):

Chroma uses L2 (Euclidean) distance by default. Lower distance means more similar.

Change distance metric at collection creation:

collection = client.create_collection(
    name="docs",
    metadata={"hnsw:space": "cosine"},   # "l2" (default), "cosine", "ip" (inner product)
)

Metric	When to use
`l2`	Default. Fine for most cases.
`cosine`	When embedding magnitude doesn’t matter (most LLM embeddings)
`ip`	Inner product — fastest, requires normalized vectors

Convert distance to similarity (for cosine):

# Cosine distance is in [0, 2]. Similarity = 1 - distance
for dist in results['distances'][0]:
    similarity = 1 - dist   # Higher = more similar
    print(f"Similarity: {similarity:.3f}")

Filter by score threshold:

# Chroma doesn't have a built-in threshold — filter in Python
results = collection.query(query_texts=[query], n_results=50)
threshold = 0.7   # Cosine similarity threshold
filtered = [
    (doc, meta, 1 - dist)
    for doc, meta, dist in zip(
        results['documents'][0],
        results['metadatas'][0],
        results['distances'][0],
    )
    if (1 - dist) >= threshold
]

Multiple queries at once (batch):

results = collection.query(
    query_texts=["query 1", "query 2", "query 3"],
    n_results=5,
)
# results['documents'] is a list of lists — one per query

Fix 5: HTTP Client for Server Mode

ConnectionError: Could not connect to tenant default_tenant at http://localhost:8000

For production, run Chroma as a standalone server and use HttpClient:

Start Chroma server:

# Install server version
pip install "chromadb[server]"

# Start server (stores data at CHROMA_PERSIST_DIRECTORY)
chroma run --host 0.0.0.0 --port 8000 --path ./chroma_data

# Or with Docker
docker pull chromadb/chroma
docker run -p 8000:8000 -v ./chroma_data:/chroma/chroma chromadb/chroma

Connect as HTTP client:

import chromadb

client = chromadb.HttpClient(
    host="localhost",
    port=8000,
    # Or for production
    # ssl=True, host="chroma.example.com", port=443,
)

# API is identical to PersistentClient
collection = client.get_or_create_collection("docs")
collection.add(documents=[...], ids=[...])

Authentication for production:

# Server side — set auth token
export CHROMA_SERVER_AUTHN_PROVIDER="chromadb.auth.token_authn.TokenAuthenticationServerProvider"
export CHROMA_SERVER_AUTHN_CREDENTIALS="your-secret-token"
chroma run --host 0.0.0.0 --port 8000

# Client side
from chromadb.config import Settings

client = chromadb.HttpClient(
    host="localhost",
    port=8000,
    settings=Settings(
        chroma_client_auth_provider="chromadb.auth.token_authn.TokenAuthClientProvider",
        chroma_client_auth_credentials="your-secret-token",
    ),
)

Health check:

client.heartbeat()   # Returns nanoseconds since epoch if server is up
# Raises if connection fails

Fix 6: Batch Operations and Performance

Adding documents one at a time is slow due to embedding API calls and index rebuilds.

# SLOW — one API call per document
for i, text in enumerate(documents):
    collection.add(documents=[text], ids=[str(i)])

# FAST — single batch
collection.add(
    documents=documents,
    ids=[str(i) for i in range(len(documents))],
)

Chunk large batches to avoid request size limits:

def add_in_batches(collection, docs, ids, batch_size=500):
    for i in range(0, len(docs), batch_size):
        batch_docs = docs[i:i + batch_size]
        batch_ids = ids[i:i + batch_size]
        collection.add(documents=batch_docs, ids=batch_ids)
        print(f"Added {i + len(batch_docs)} / {len(docs)}")

add_in_batches(collection, documents, ids)

Pre-compute embeddings in parallel for maximum throughput:

from concurrent.futures import ThreadPoolExecutor

def embed_batch(texts):
    return embedding_fn(texts)

batches = [documents[i:i+100] for i in range(0, len(documents), 100)]

with ThreadPoolExecutor(max_workers=10) as executor:
    embeddings = list(executor.map(embed_batch, batches))

all_embeddings = [emb for batch in embeddings for emb in batch]

collection.add(
    documents=documents,
    embeddings=all_embeddings,
    ids=[str(i) for i in range(len(documents))],
)

Upsert vs add — upsert updates existing IDs, add fails on duplicates:

# Fails if ID exists
collection.add(documents=["new"], ids=["1"])   # DuplicateIDError if "1" exists

# Upsert — creates or updates
collection.upsert(documents=["updated"], ids=["1"])

Fix 7: Memory Usage and Collection Size

Chroma keeps the full HNSW index in memory for fast queries. Very large collections (>1M vectors) can exhaust RAM.

Monitor collection size:

print(f"Count: {collection.count()}")
print(f"Metadata: {collection.metadata}")

# Peek at first few
print(collection.peek(5))

Limit HNSW memory with hnsw:construction_ef and hnsw:M:

collection = client.create_collection(
    name="big_docs",
    metadata={
        "hnsw:space": "cosine",
        "hnsw:construction_ef": 100,   # Higher = better recall, slower insert
        "hnsw:M": 16,                   # Graph connectivity (16 = default)
        "hnsw:search_ef": 50,           # Query-time recall vs speed
    },
)

Split into multiple collections by category, region, or time:

# Instead of one 10M-document collection:
collection_2024 = client.get_or_create_collection("docs_2024")
collection_2025 = client.get_or_create_collection("docs_2025")

# Query the relevant one
relevant = collection_2025.query(query_texts=[q], n_results=10)

Common Mistake: Calling collection.get() with no arguments. This returns ALL documents in the collection — fine for 100, catastrophic for 1M. Always use limit= and offset=:

# WRONG — loads everything into memory
all_docs = collection.get()

# CORRECT — page through
batch = collection.get(limit=100, offset=0)
# Next page:
batch = collection.get(limit=100, offset=100)

Fix 8: Integration with LLM Frameworks

LangChain:

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Create or load
vectorstore = Chroma(
    collection_name="my_docs",
    embedding_function=embeddings,
    persist_directory="./chroma_db",   # Uses PersistentClient under the hood
)

vectorstore.add_texts(texts=["doc 1", "doc 2"], metadatas=[{...}, {...}])
results = vectorstore.similarity_search("query", k=5)

For LangChain-specific patterns and errors, see LangChain Python not working.

LlamaIndex:

from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

chroma_client = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = chroma_client.get_or_create_collection("my_docs")

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

For LlamaIndex-specific patterns, see LlamaIndex not working.

Still Not Working?

Chroma vs Other Vector Databases

Chroma — Simplest, runs in-process, great for prototypes and small-to-medium datasets (<1M vectors). Limited horizontal scaling.
Qdrant — Production-grade, scales horizontally, richer filtering. Slightly more complex setup.
Pinecone — Managed SaaS, no ops required, good for quick production. Costs at scale.
Weaviate — Hybrid search (vector + keyword) built-in, GraphQL API.
pgvector — Postgres extension. Best when you already have Postgres and don’t need specialized features.

Debugging Silent Query Failures

If queries return nothing when they should return matches:

Check count — collection.count() should match expectations
Check embedding function — querying with a different model than used for adding returns zero matches
Check distance distribution — inspect results['distances'][0] to see if matches exist but scores are bad
Check metadata filter — overly restrictive where clauses eliminate valid matches

Backup and Export

# Export all data to a file
import json

all_data = collection.get(include=["documents", "metadatas", "embeddings"])
with open("backup.json", "w") as f:
    json.dump(all_data, f)

# Restore to a new collection
new_collection = client.create_collection(name="restored")
new_collection.add(
    documents=all_data["documents"],
    embeddings=all_data["embeddings"],
    metadatas=all_data["metadatas"],
    ids=all_data["ids"],
)

Using with OpenAI Embeddings

For OpenAI API key setup and rate limit handling when using OpenAI embeddings with Chroma, see OpenAI API not working. For HuggingFace-based embedding models as alternatives, see HuggingFace Transformers not working.

Concurrent Writes Producing `database is locked`

Chroma’s persistent backend uses SQLite. SQLite serializes writes, so multiple processes hitting the same PersistentClient(path=...) directory will collide. The error appears as sqlite3.OperationalError: database is locked. For multi-process ingestion, run a chromadb server with HttpClient connections instead — the server serializes writes internally and clients see consistent behavior.

Collection Metadata Changes Not Sticking

Setting metadata={...} on create_collection works, but later updates via collection.modify(metadata={...}) do not always trigger HNSW reindexing. If you change hnsw:space from l2 to cosine on an existing collection, the metric metadata updates but the underlying index does not — queries still use L2. The only safe path is to create a new collection with the desired metric and migrate the data.

Add Fails Silently After First Batch

If the first batch of collection.add() succeeds but subsequent batches do nothing, check whether the IDs collide. Chroma raises DuplicateIDError only in newer versions — older versions silently skip duplicates. If you generate IDs with range(len(batch)) you reuse 0, 1, 2, ... for every batch. Use UUIDs or include a batch offset to guarantee uniqueness.

Fix: ChromaDB Not Working — Persistent Client, Collection Errors, and Embedding Function Issues

The Error

Why This Happens

Diagnostic Timeline

Fix 1: Persistent vs In-Memory Client

Fix 2: Embedding Function Mismatch

Fix 3: Collection Management and Metadata Filters

Fix 4: Query Results and Similarity Scores

Fix 5: HTTP Client for Server Mode

Fix 6: Batch Operations and Performance

Fix 7: Memory Usage and Collection Size

Fix 8: Integration with LLM Frameworks

Still Not Working?

Chroma vs Other Vector Databases

Debugging Silent Query Failures

Backup and Export

Using with OpenAI Embeddings

Concurrent Writes Producing `database is locked`

Collection Metadata Changes Not Sticking

Add Fails Silently After First Batch

Related Articles

Fix: Milvus Not Working — Connection Errors, Schema Setup, and Index Build Failures

Fix: Pinecone Not Working — Index Creation, Serverless vs Pod, and Python SDK v3 Migration

Fix: Qdrant Not Working — Connection Errors, Collection Setup, and Filter Syntax Issues

Fix: Weaviate Not Working — Client v4 Migration, Schema Setup, and Vectorizer Errors

The Error

Why This Happens

Diagnostic Timeline

Fix 1: Persistent vs In-Memory Client

Fix 2: Embedding Function Mismatch

Fix 3: Collection Management and Metadata Filters

Fix 4: Query Results and Similarity Scores

Fix 5: HTTP Client for Server Mode

Fix 6: Batch Operations and Performance

Fix 7: Memory Usage and Collection Size

Fix 8: Integration with LLM Frameworks

Still Not Working?

Chroma vs Other Vector Databases

Debugging Silent Query Failures

Backup and Export

Using with OpenAI Embeddings

Concurrent Writes Producing database is locked

Collection Metadata Changes Not Sticking

Add Fails Silently After First Batch

Related Articles

Fix: Milvus Not Working — Connection Errors, Schema Setup, and Index Build Failures

Fix: Pinecone Not Working — Index Creation, Serverless vs Pod, and Python SDK v3 Migration

Fix: Qdrant Not Working — Connection Errors, Collection Setup, and Filter Syntax Issues

Fix: Weaviate Not Working — Client v4 Migration, Schema Setup, and Vectorizer Errors

Concurrent Writes Producing `database is locked`