Skip to content

Fix: ChromaDB Not Working — Persistent Client, Collection Errors, and Embedding Function Issues

FixDevs ·

Quick Answer

How to fix ChromaDB errors — persistent client not saving data, collection already exists error, dimension mismatch in embeddings, embedding function required, HTTP client connection refused, and memory growing unbounded.

The Error

You restart your Python process and your ChromaDB collection is empty:

client = chromadb.Client()
collection = client.create_collection("my_docs")
# ... add documents ...

# Restart Python
client = chromadb.Client()
collection = client.get_collection("my_docs")   # ValueError: Collection not found

Or you try to create a collection that already exists:

chromadb.errors.UniqueConstraintError:
Collection my_collection already exists.

Or you add new documents to an existing collection and get a dimension mismatch:

InvalidDimensionException:
Embedding dimension 1536 does not match collection dimensionality 384

Or the HTTP client can’t connect:

ConnectionError: Could not connect to tenant default_tenant at http://localhost:8000

Or your collection grows unbounded as you add documents in a loop:

# After 100k inserts
MemoryError: Unable to allocate array with shape (...)

ChromaDB is the most popular open-source vector database for RAG — lightweight, runs in-process by default, integrates with every LLM framework. Its simplicity is deceiving: the difference between in-memory and persistent clients, the strict dimension matching, and the implicit embedding function all produce specific failure modes that newcomers hit repeatedly. This guide covers them.

Why This Happens

Chroma has three client modes: Client() (in-memory, lost on restart), PersistentClient() (on-disk, survives restarts), and HttpClient() (remote server). New users often start with Client() from tutorials, not realizing data doesn’t persist.

Every collection has a fixed embedding dimensionality set on first write. Trying to add documents with different-sized embeddings (e.g., switching from all-MiniLM-L6-v2 at 384 dims to text-embedding-3-large at 3072 dims) fails. This is a feature, not a bug — mixing embedding spaces would produce garbage similarity scores.

Fix 1: Persistent vs In-Memory Client

import chromadb

# WRONG — data lost on restart
client = chromadb.Client()   # In-memory
collection = client.create_collection("docs")
collection.add(documents=["hello world"], ids=["1"])
# Restart Python — data gone

# CORRECT — persistent storage
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection("docs")
collection.add(documents=["hello world"], ids=["1"])
# Restart — data is loaded from ./chroma_db

get_or_create_collection is the idempotent pattern — no error if collection exists:

# WRONG — raises if exists
collection = client.create_collection("my_docs")

# CORRECT — returns existing or creates new
collection = client.get_or_create_collection("my_docs")

Verify persistence works:

import chromadb

client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection("test")
collection.add(documents=["test"], ids=["1"])
print(f"Before restart: {collection.count()} documents")

# (Restart Python manually, then run)
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_collection("test")
print(f"After restart: {collection.count()} documents")   # Should match

Check storage location:

ls ./chroma_db/
# chroma.sqlite3
# <collection-id>/
#   data_level0.bin
#   header.bin
#   ...

Chroma stores data in SQLite + HNSW index files. Don’t move or rename files while the client is open.

Common Mistake: Using Client() in development because tutorials do, then deploying the same code to production. The first user’s data appears to work, but on server restart it all vanishes. Always use PersistentClient(path=...) unless you explicitly want ephemeral storage (tests, one-off scripts).

Fix 2: Embedding Function Mismatch

InvalidDimensionException: Embedding dimension 1536 does not match collection dimensionality 384

Each collection stores vectors of a fixed size. You can’t mix models with different embedding sizes in the same collection.

Set an explicit embedding function when creating a collection:

from chromadb.utils import embedding_functions

# Default: sentence-transformers/all-MiniLM-L6-v2 (384 dims)
# Used if you don't specify one

# Option 1: OpenAI embeddings (1536 or 3072 dims)
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key="sk-...",
    model_name="text-embedding-3-small",   # 1536 dims
    # Or "text-embedding-3-large" (3072 dims)
)

collection = client.create_collection(
    name="openai_docs",
    embedding_function=openai_ef,
)

# Option 2: HuggingFace model
hf_ef = embedding_functions.HuggingFaceEmbeddingFunction(
    api_key="hf_...",
    model_name="sentence-transformers/all-mpnet-base-v2",
)

# Option 3: Local sentence-transformers (no API)
st_ef = embedding_functions.SentenceTransformerEmbeddingFunction(
    model_name="BAAI/bge-large-en-v1.5",
)

# Option 4: Cohere
cohere_ef = embedding_functions.CohereEmbeddingFunction(
    api_key="...",
    model_name="embed-english-v3.0",
)

The embedding function must match every time you load the collection:

# Save collection
collection = client.create_collection(
    name="docs",
    embedding_function=openai_ef,
)

# Load later — same embedding function required
collection = client.get_collection(
    name="docs",
    embedding_function=openai_ef,   # Must match or queries break
)

Pro Tip: Store metadata about which embedding function was used in the collection metadata. On load, assert the configured function matches. This prevents the subtle bug where queries return nonsense because the query embedding is from a different model than the stored vectors.

collection = client.create_collection(
    name="docs",
    embedding_function=openai_ef,
    metadata={
        "embedding_model": "text-embedding-3-small",
        "embedding_dims": 1536,
        "created_at": "2025-04-09",
    },
)

Bring your own embeddings (skip Chroma’s embedding function):

# Pre-compute embeddings with any model
import numpy as np

texts = ["doc 1", "doc 2", "doc 3"]
embeddings = my_model.encode(texts)   # Shape: (3, 768) for example

collection = client.create_collection(name="custom_embeds")
collection.add(
    documents=texts,
    embeddings=embeddings.tolist(),   # Pass explicitly
    ids=["1", "2", "3"],
)

# For queries, also pass pre-computed embedding
query_embedding = my_model.encode(["search query"])
results = collection.query(
    query_embeddings=query_embedding.tolist(),
    n_results=5,
)

Fix 3: Collection Management and Metadata Filters

# List all collections
collections = client.list_collections()
for c in collections:
    print(c.name, c.count())

# Delete a collection
client.delete_collection(name="old_docs")

# Delete specific items from a collection
collection.delete(ids=["doc1", "doc2"])

# Delete by metadata filter
collection.delete(where={"source": "outdated_blog"})

Add with metadata for filtering later:

collection.add(
    documents=[
        "The quarterly revenue was $5M",
        "The team consists of 50 engineers",
        "Product X launches in Q3 2025",
    ],
    metadatas=[
        {"type": "finance", "quarter": "Q1", "year": 2025},
        {"type": "hr", "department": "engineering"},
        {"type": "product", "launch_year": 2025},
    ],
    ids=["fin-1", "hr-1", "prod-1"],
)

# Query with metadata filter
results = collection.query(
    query_texts=["What's the revenue?"],
    n_results=5,
    where={"type": "finance"},   # Only search finance docs
)

# Complex filters
results = collection.query(
    query_texts=["team info"],
    where={"$and": [
        {"type": "hr"},
        {"department": "engineering"},
    ]},
)

# Comparison operators
results = collection.query(
    query_texts=["products"],
    where={"launch_year": {"$gte": 2024}},
)

Supported filter operators:

OperatorMeaning
$eqEquals (default if you pass a scalar)
$neNot equals
$gt, $gteGreater than (or equal)
$lt, $lteLess than (or equal)
$inValue in list
$ninValue not in list
$and, $orLogical operators

Filter by document content (not just metadata):

results = collection.query(
    query_texts=["search term"],
    n_results=5,
    where_document={"$contains": "specific word"},   # Full-text filter
)

Fix 4: Query Results and Similarity Scores

results = collection.query(
    query_texts=["What is the capital of France?"],
    n_results=3,
)

print(results)
# {
#   'ids': [['doc-2', 'doc-5', 'doc-1']],
#   'distances': [[0.12, 0.34, 0.56]],
#   'documents': [['Paris is...', 'Europe has...', 'France is...']],
#   'metadatas': [[{...}, {...}, {...}]],
# }

Distance vs similarity — lower is better (default):

Chroma uses L2 (Euclidean) distance by default. Lower distance means more similar.

Change distance metric at collection creation:

collection = client.create_collection(
    name="docs",
    metadata={"hnsw:space": "cosine"},   # "l2" (default), "cosine", "ip" (inner product)
)
MetricWhen to use
l2Default. Fine for most cases.
cosineWhen embedding magnitude doesn’t matter (most LLM embeddings)
ipInner product — fastest, requires normalized vectors

Convert distance to similarity (for cosine):

# Cosine distance is in [0, 2]. Similarity = 1 - distance
for dist in results['distances'][0]:
    similarity = 1 - dist   # Higher = more similar
    print(f"Similarity: {similarity:.3f}")

Filter by score threshold:

# Chroma doesn't have a built-in threshold — filter in Python
results = collection.query(query_texts=[query], n_results=50)
threshold = 0.7   # Cosine similarity threshold
filtered = [
    (doc, meta, 1 - dist)
    for doc, meta, dist in zip(
        results['documents'][0],
        results['metadatas'][0],
        results['distances'][0],
    )
    if (1 - dist) >= threshold
]

Multiple queries at once (batch):

results = collection.query(
    query_texts=["query 1", "query 2", "query 3"],
    n_results=5,
)
# results['documents'] is a list of lists — one per query

Fix 5: HTTP Client for Server Mode

ConnectionError: Could not connect to tenant default_tenant at http://localhost:8000

For production, run Chroma as a standalone server and use HttpClient:

Start Chroma server:

# Install server version
pip install "chromadb[server]"

# Start server (stores data at CHROMA_PERSIST_DIRECTORY)
chroma run --host 0.0.0.0 --port 8000 --path ./chroma_data

# Or with Docker
docker pull chromadb/chroma
docker run -p 8000:8000 -v ./chroma_data:/chroma/chroma chromadb/chroma

Connect as HTTP client:

import chromadb

client = chromadb.HttpClient(
    host="localhost",
    port=8000,
    # Or for production
    # ssl=True, host="chroma.example.com", port=443,
)

# API is identical to PersistentClient
collection = client.get_or_create_collection("docs")
collection.add(documents=[...], ids=[...])

Authentication for production:

# Server side — set auth token
export CHROMA_SERVER_AUTHN_PROVIDER="chromadb.auth.token_authn.TokenAuthenticationServerProvider"
export CHROMA_SERVER_AUTHN_CREDENTIALS="your-secret-token"
chroma run --host 0.0.0.0 --port 8000
# Client side
from chromadb.config import Settings

client = chromadb.HttpClient(
    host="localhost",
    port=8000,
    settings=Settings(
        chroma_client_auth_provider="chromadb.auth.token_authn.TokenAuthClientProvider",
        chroma_client_auth_credentials="your-secret-token",
    ),
)

Health check:

client.heartbeat()   # Returns nanoseconds since epoch if server is up
# Raises if connection fails

Fix 6: Batch Operations and Performance

Adding documents one at a time is slow due to embedding API calls and index rebuilds.

# SLOW — one API call per document
for i, text in enumerate(documents):
    collection.add(documents=[text], ids=[str(i)])

# FAST — single batch
collection.add(
    documents=documents,
    ids=[str(i) for i in range(len(documents))],
)

Chunk large batches to avoid request size limits:

def add_in_batches(collection, docs, ids, batch_size=500):
    for i in range(0, len(docs), batch_size):
        batch_docs = docs[i:i + batch_size]
        batch_ids = ids[i:i + batch_size]
        collection.add(documents=batch_docs, ids=batch_ids)
        print(f"Added {i + len(batch_docs)} / {len(docs)}")

add_in_batches(collection, documents, ids)

Pre-compute embeddings in parallel for maximum throughput:

from concurrent.futures import ThreadPoolExecutor

def embed_batch(texts):
    return embedding_fn(texts)

batches = [documents[i:i+100] for i in range(0, len(documents), 100)]

with ThreadPoolExecutor(max_workers=10) as executor:
    embeddings = list(executor.map(embed_batch, batches))

all_embeddings = [emb for batch in embeddings for emb in batch]

collection.add(
    documents=documents,
    embeddings=all_embeddings,
    ids=[str(i) for i in range(len(documents))],
)

Upsert vs addupsert updates existing IDs, add fails on duplicates:

# Fails if ID exists
collection.add(documents=["new"], ids=["1"])   # DuplicateIDError if "1" exists

# Upsert — creates or updates
collection.upsert(documents=["updated"], ids=["1"])

Fix 7: Memory Usage and Collection Size

Chroma keeps the full HNSW index in memory for fast queries. Very large collections (>1M vectors) can exhaust RAM.

Monitor collection size:

print(f"Count: {collection.count()}")
print(f"Metadata: {collection.metadata}")

# Peek at first few
print(collection.peek(5))

Limit HNSW memory with hnsw:construction_ef and hnsw:M:

collection = client.create_collection(
    name="big_docs",
    metadata={
        "hnsw:space": "cosine",
        "hnsw:construction_ef": 100,   # Higher = better recall, slower insert
        "hnsw:M": 16,                   # Graph connectivity (16 = default)
        "hnsw:search_ef": 50,           # Query-time recall vs speed
    },
)

Split into multiple collections by category, region, or time:

# Instead of one 10M-document collection:
collection_2024 = client.get_or_create_collection("docs_2024")
collection_2025 = client.get_or_create_collection("docs_2025")

# Query the relevant one
relevant = collection_2025.query(query_texts=[q], n_results=10)

Common Mistake: Calling collection.get() with no arguments. This returns ALL documents in the collection — fine for 100, catastrophic for 1M. Always use limit= and offset=:

# WRONG — loads everything into memory
all_docs = collection.get()

# CORRECT — page through
batch = collection.get(limit=100, offset=0)
# Next page:
batch = collection.get(limit=100, offset=100)

Fix 8: Integration with LLM Frameworks

LangChain:

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Create or load
vectorstore = Chroma(
    collection_name="my_docs",
    embedding_function=embeddings,
    persist_directory="./chroma_db",   # Uses PersistentClient under the hood
)

vectorstore.add_texts(texts=["doc 1", "doc 2"], metadatas=[{...}, {...}])
results = vectorstore.similarity_search("query", k=5)

For LangChain-specific patterns and errors, see LangChain Python not working.

LlamaIndex:

from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

chroma_client = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = chroma_client.get_or_create_collection("my_docs")

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

For LlamaIndex-specific patterns, see LlamaIndex not working.

Still Not Working?

Chroma vs Other Vector Databases

  • Chroma — Simplest, runs in-process, great for prototypes and small-to-medium datasets (<1M vectors). Limited horizontal scaling.
  • Qdrant — Production-grade, scales horizontally, richer filtering. Slightly more complex setup.
  • Pinecone — Managed SaaS, no ops required, good for quick production. Costs at scale.
  • Weaviate — Hybrid search (vector + keyword) built-in, GraphQL API.
  • pgvector — Postgres extension. Best when you already have Postgres and don’t need specialized features.

Debugging Silent Query Failures

If queries return nothing when they should return matches:

  1. Check countcollection.count() should match expectations
  2. Check embedding function — querying with a different model than used for adding returns zero matches
  3. Check distance distribution — inspect results['distances'][0] to see if matches exist but scores are bad
  4. Check metadata filter — overly restrictive where clauses eliminate valid matches

Backup and Export

# Export all data to a file
import json

all_data = collection.get(include=["documents", "metadatas", "embeddings"])
with open("backup.json", "w") as f:
    json.dump(all_data, f)

# Restore to a new collection
new_collection = client.create_collection(name="restored")
new_collection.add(
    documents=all_data["documents"],
    embeddings=all_data["embeddings"],
    metadatas=all_data["metadatas"],
    ids=all_data["ids"],
)

Using with OpenAI Embeddings

For OpenAI API key setup and rate limit handling when using OpenAI embeddings with Chroma, see OpenAI API not working. For HuggingFace-based embedding models as alternatives, see HuggingFace Transformers not working.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles