Fix: Qdrant Not Working — Connection Errors, Collection Setup, and Filter Syntax Issues
Part of: Python Errors
Quick Answer
How to fix Qdrant errors — connection refused to localhost 6333, collection not found create_collection, vector size mismatch, filter must match schema, payload index missing slow queries, and timeout on large batch uploads.
The Error
You connect to Qdrant and it refuses:
httpx.ConnectError: [Errno 111] Connection refused
qdrant_client.http.exceptions.UnexpectedResponse: Unexpected Response: 404Or you create a collection and get a vector size error on upload:
qdrant_client.http.exceptions.UnexpectedResponse:
Vector size 1536 does not match collection vector size 384Or a filter query returns no results, even though you know the data matches:
results = client.search(
collection_name="docs",
query_vector=vec,
query_filter=Filter(must=[FieldCondition(key="category", match=MatchValue(value="news"))]),
limit=10,
)
# Empty list, but you know there are "news" docsOr queries are painfully slow on filtered searches over millions of vectors:
# Vector-only search: 20ms
# Vector + filter: 5000msQdrant is a Rust-based vector database — faster and more production-ready than Chroma, with richer filtering and HNSW tuning. The filter language is strict (unlike Chroma’s flexible dict filters), and payload fields need explicit indexing for fast filtered search. This guide covers each common failure.
Why This Happens
Qdrant runs as a server (HTTP on 6333, gRPC on 6334) — you never use it in-process. Connection issues usually trace to the server not running, wrong port, or firewall blocking. Collection configuration is strict: the vector size, distance metric, and payload indexes are all set at creation time.
Qdrant’s filter language uses nested Pydantic models (Filter, FieldCondition, MatchValue) rather than flat dicts. This gives you type safety but makes the syntax verbose — copying a dict-style filter from LangChain or Chroma docs doesn’t work.
Diagnostic Timeline: “Results Look Wrong” — Triage in Order
Your first instinct is to rebuild the collection from scratch. Don’t. Rebuilding hides the actual problem and you will hit it again next week. Here is the real path.
Minute 0 — Count the points. Run client.count(collection_name="my_docs", exact=True). If the count is zero or much smaller than expected, your upsert failed silently — likely a vector-size mismatch swallowed by wait=False. If the count looks right, the data is there and the search is wrong.
Minute 1 — Inspect one real payload. Run client.scroll(collection_name="my_docs", limit=1). Read the payload. Compare field names byte-for-byte with your filter — Category vs category, published_at (string) vs filter using Range (numeric). About 70% of “empty results” bugs end here.
Minute 2 — Run the search without the filter. If unfiltered search returns sensible neighbors, the vector index is fine and only the filter is wrong. If unfiltered search returns nonsense too, check that your query embedding was generated with the same model and same normalization as the points. A query embedding from text-embedding-3-large against a collection of text-embedding-3-small vectors will not error — it will return random-looking neighbors.
Minute 3 — Check the distance metric. A collection created with Distance.COSINE interprets score_threshold=0.7 as “scores >= 0.7 are good.” A collection created with Distance.EUCLID interprets the same threshold as “distance <= 0.7.” Mix these up and your thresholding silently filters every result. Run client.get_collection("my_docs").config.params.vectors.distance to confirm.
Minute 5 — Check whether HNSW is even built yet. If you uploaded a million points and immediately queried, the HNSW index may not have finished building. client.get_collection("my_docs").status should be "green". If it is "yellow", optimization is in progress and queries fall back to brute force, which is slow but correct. If it is "red", indexing failed and recall is degraded.
Minute 8 — Payload index missing for the filter field. Filtered search over millions of vectors without a payload index is a sequential scan. Symptom: queries take 5+ seconds instead of milliseconds. Add create_payload_index for every field you filter on. See Fix 5.
The first guess (“rebuild the collection”) is wrong about nine times out of ten. The real cause is one of: wrong distance metric, payload index missing, or HNSW still building at query time.
Fix 1: Connecting to Qdrant
Start a local Qdrant server (easiest via Docker):
# Docker — data persists in ./qdrant_storage
docker run -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant
# Or install the binary
# https://github.com/qdrant/qdrant/releases
./qdrantConnect from Python:
from qdrant_client import QdrantClient
# Option 1: Local server
client = QdrantClient(host="localhost", port=6333)
# Option 2: URL-based (preferred for cloud)
client = QdrantClient(url="http://localhost:6333")
# Option 3: gRPC (faster for high-throughput)
client = QdrantClient(host="localhost", port=6334, prefer_grpc=True)
# Option 4: In-memory (tests, no persistence)
client = QdrantClient(":memory:")
# Option 5: Local file (no server, embedded mode)
client = QdrantClient(path="./qdrant_local")Qdrant Cloud (managed):
client = QdrantClient(
url="https://your-cluster.qdrant.io",
api_key="your-api-key",
)Verify connection:
print(client.get_collections()) # Returns CollectionsResponse with listIf this raises ConnectionError, the server isn’t reachable.
Common Mistake: Trying to use QdrantClient(path=...) and QdrantClient(host=...) in the same process simultaneously. Local mode uses an embedded database and doesn’t communicate with a server. Pick one mode per application.
Fix 2: Collection Creation and Vector Size
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams
client = QdrantClient("localhost", port=6333)
# Create collection — vector size must match your embedding model
client.create_collection(
collection_name="my_docs",
vectors_config=VectorParams(
size=1536, # text-embedding-3-small = 1536
distance=Distance.COSINE, # or DOT, EUCLID, MANHATTAN
),
)Common embedding sizes:
| Model | Dimension |
|---|---|
text-embedding-3-small (OpenAI) | 1536 |
text-embedding-3-large (OpenAI) | 3072 |
text-embedding-ada-002 (OpenAI) | 1536 |
all-MiniLM-L6-v2 (ST) | 384 |
all-mpnet-base-v2 (ST) | 768 |
bge-large-en-v1.5 (BGE) | 1024 |
embed-english-v3.0 (Cohere) | 1024 |
Idempotent creation — use recreate_collection or check existence:
from qdrant_client.http.exceptions import UnexpectedResponse
# Option 1: recreate (destroys existing!)
client.recreate_collection(
collection_name="my_docs",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)
# Option 2: create only if missing
try:
client.create_collection(
collection_name="my_docs",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)
except UnexpectedResponse as e:
if "already exists" not in str(e):
raise
# Collection exists — continue
# Option 3: explicitly check
if not client.collection_exists("my_docs"):
client.create_collection(
collection_name="my_docs",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)Multiple named vectors per point (e.g., text and image embeddings for the same document):
client.create_collection(
collection_name="multimodal",
vectors_config={
"text": VectorParams(size=1536, distance=Distance.COSINE),
"image": VectorParams(size=512, distance=Distance.COSINE),
},
)
# Upsert with named vectors
from qdrant_client.models import PointStruct
client.upsert(
collection_name="multimodal",
points=[
PointStruct(
id=1,
vector={"text": text_embedding, "image": image_embedding},
payload={"name": "Product A"},
),
],
)Fix 3: Upserting Points
from qdrant_client.models import PointStruct
import uuid
# Simple upsert
points = [
PointStruct(
id=i, # int or str (UUID). Use UUIDs for distributed inserts.
vector=embedding_vector, # List[float] of correct size
payload={"text": doc_text, "source": "blog", "year": 2025},
)
for i, (doc_text, embedding_vector) in enumerate(zip(docs, vectors))
]
client.upsert(collection_name="my_docs", points=points, wait=True)
# wait=True — block until indexed. Set False for async.Batch upload for performance:
def upload_in_batches(client, collection_name, points, batch_size=100):
for i in range(0, len(points), batch_size):
batch = points[i:i + batch_size]
client.upsert(collection_name=collection_name, points=batch)
print(f"Uploaded {i + len(batch)} / {len(points)}")
upload_in_batches(client, "my_docs", points)wait=False with batched upload for maximum throughput:
for batch in batches:
client.upsert(collection_name="my_docs", points=batch, wait=False)
# After all uploads, wait for indexing to catch up
import time
while True:
info = client.get_collection("my_docs")
if info.status == "green": # Fully indexed
break
time.sleep(1)ID types — integers are fastest, but UUIDs are safer for distributed writes:
# Integer IDs
PointStruct(id=1, vector=vec, payload=data)
PointStruct(id=2, vector=vec, payload=data)
# UUID IDs (as string)
PointStruct(id=str(uuid.uuid4()), vector=vec, payload=data)Don’t mix integer and UUID IDs within the same collection.
Fix 4: Filter Syntax
Qdrant’s filter language uses three top-level operators: must, should, must_not. Each takes a list of conditions.
from qdrant_client.models import (
Filter, FieldCondition, MatchValue, MatchAny, Range,
)
# Simple equals filter
filter = Filter(
must=[FieldCondition(key="category", match=MatchValue(value="news"))]
)
# Multiple conditions (AND)
filter = Filter(
must=[
FieldCondition(key="category", match=MatchValue(value="news")),
FieldCondition(key="year", match=MatchValue(value=2025)),
],
)
# OR conditions (should)
filter = Filter(
should=[
FieldCondition(key="category", match=MatchValue(value="news")),
FieldCondition(key="category", match=MatchValue(value="blog")),
],
)
# NOT conditions
filter = Filter(
must_not=[FieldCondition(key="draft", match=MatchValue(value=True))]
)
# Match any of a list
filter = Filter(
must=[FieldCondition(key="category", match=MatchAny(any=["news", "blog", "tutorial"]))]
)
# Range filter
filter = Filter(
must=[
FieldCondition(
key="year",
range=Range(gte=2020, lte=2025),
),
],
)
# Combined — complex filter
filter = Filter(
must=[
FieldCondition(key="category", match=MatchValue(value="news")),
FieldCondition(key="year", range=Range(gte=2024)),
],
must_not=[
FieldCondition(key="archived", match=MatchValue(value=True)),
],
)Apply to search:
results = client.search(
collection_name="my_docs",
query_vector=embedding,
query_filter=filter,
limit=10,
)Nested payload fields — use dot notation:
# Payload: {"metadata": {"author": "Alice", "year": 2025}}
filter = Filter(
must=[FieldCondition(key="metadata.author", match=MatchValue(value="Alice"))]
)Pro Tip: Qdrant’s filter API looks verbose next to Chroma’s dict-based filters, but the Pydantic models catch type errors at construction time rather than at query time. When filtering on a field that doesn’t exist or has wrong type, you get immediate validation errors — not silent empty results.
Fix 5: Payload Indexing for Fast Filtered Search
# Unfiltered search: 20ms
# Filtered search: 5000msWithout payload indexes, Qdrant must scan every point to apply filters — linear time. Create indexes on fields you filter by frequently:
from qdrant_client.models import PayloadSchemaType
client.create_payload_index(
collection_name="my_docs",
field_name="category",
field_schema=PayloadSchemaType.KEYWORD, # For exact string match
)
client.create_payload_index(
collection_name="my_docs",
field_name="year",
field_schema=PayloadSchemaType.INTEGER, # For integer range queries
)
client.create_payload_index(
collection_name="my_docs",
field_name="published_at",
field_schema=PayloadSchemaType.DATETIME,
)Schema types:
| Type | Use for |
|---|---|
KEYWORD | Exact string match (category, tag, status) |
TEXT | Full-text search (tokenized) |
INTEGER | Numeric comparisons (year, count) |
FLOAT | Decimal comparisons (price, score) |
BOOL | Boolean (draft, published) |
GEO | Geospatial coordinates |
DATETIME | ISO-8601 timestamps |
UUID | UUID-formatted strings |
After indexing, filtered queries drop from seconds to milliseconds on large collections (>100k points).
Common Mistake: Creating a collection and immediately expecting fast filtered queries. The collection’s vector index (HNSW) is automatic, but payload indexes are not — you must create each one explicitly. Without them, filters are sequential scans.
Full-text search on a payload field:
from qdrant_client.models import MatchText
client.create_payload_index(
collection_name="my_docs",
field_name="content",
field_schema=PayloadSchemaType.TEXT,
)
# Full-text filter
filter = Filter(
must=[FieldCondition(key="content", match=MatchText(text="quantum computing"))]
)Fix 6: Query Parameters and Search
Basic search:
results = client.search(
collection_name="my_docs",
query_vector=query_embedding,
limit=10,
score_threshold=0.7, # Only results with score >= 0.7
with_payload=True, # Include payload in results (default True)
with_vectors=False, # Usually False — saves bandwidth
)
for result in results:
print(f"Score: {result.score:.3f}, ID: {result.id}")
print(f"Payload: {result.payload}")Search with named vectors:
# If collection has multiple named vectors
results = client.search(
collection_name="multimodal",
query_vector=("text", text_embedding), # Tuple: (vector_name, vector)
limit=10,
)score_threshold interpretation depends on distance metric:
COSINE: 1.0 is identical, -1.0 is opposite. Usescore_threshold=0.8typically.DOT: Larger is more similar (for normalized vectors, behaves like cosine).EUCLID: Smaller is more similar.score_thresholdacts as an upper bound on distance.
Batch search (multiple queries in one request):
from qdrant_client.models import SearchRequest
results = client.search_batch(
collection_name="my_docs",
requests=[
SearchRequest(vector=q1_embedding, limit=5),
SearchRequest(vector=q2_embedding, limit=5),
SearchRequest(vector=q3_embedding, limit=5),
],
)
for i, query_results in enumerate(results):
print(f"Query {i}: {len(query_results)} results")Recommendations (search using existing points as query):
results = client.recommend(
collection_name="my_docs",
positive=[1, 2, 3], # IDs of positive examples
negative=[10, 11], # IDs of negative examples
limit=10,
)Fix 7: Scroll — Paginating Through All Points
# Get all points — WRONG for large collections
all_points = client.scroll(collection_name="my_docs", limit=100000) # Blows memory
# CORRECT — paginate
offset = None
while True:
points, offset = client.scroll(
collection_name="my_docs",
limit=100,
offset=offset,
with_payload=True,
with_vectors=False,
)
for point in points:
process(point)
if offset is None:
break # No more pagesScroll with filter — useful for exporting or cleaning up:
# Get all draft documents
offset = None
drafts = []
while True:
points, offset = client.scroll(
collection_name="my_docs",
scroll_filter=Filter(
must=[FieldCondition(key="draft", match=MatchValue(value=True))]
),
limit=100,
offset=offset,
)
drafts.extend(points)
if offset is None:
break
print(f"Found {len(drafts)} drafts")Fix 8: Deleting Points and Collections
from qdrant_client.models import PointIdsList, FilterSelector
# Delete specific IDs
client.delete(
collection_name="my_docs",
points_selector=PointIdsList(points=[1, 2, 3]),
)
# Delete by filter
client.delete(
collection_name="my_docs",
points_selector=FilterSelector(
filter=Filter(
must=[FieldCondition(key="archived", match=MatchValue(value=True))]
),
),
)
# Delete entire collection
client.delete_collection(collection_name="my_docs")Update payload (without changing the vector):
client.set_payload(
collection_name="my_docs",
payload={"updated_at": "2025-04-09", "version": 2},
points=[1, 2, 3],
)
# Overwrite full payload
client.overwrite_payload(
collection_name="my_docs",
payload={"status": "archived"},
points=[1],
)
# Delete specific payload keys
client.delete_payload(
collection_name="my_docs",
keys=["old_field"],
points=[1, 2, 3],
)Still Not Working?
Qdrant vs Other Vector Databases
- Qdrant — Production-grade, fast, rich filtering, horizontal scaling. Strong choice for most production workloads.
- Chroma — Simpler, runs in-process. Best for prototypes. For Chroma-specific patterns, see ChromaDB not working.
- Pinecone — Managed SaaS. No ops, but costs scale.
- Milvus — Enterprise-scale, complex to operate.
- pgvector — Postgres extension. Good when you already run Postgres.
Quantization for Lower Memory
For very large collections (>10M vectors), enable scalar or binary quantization to reduce memory by 4–32x:
from qdrant_client.models import (
VectorParams, Distance, ScalarQuantization, ScalarQuantizationConfig, ScalarType,
)
client.create_collection(
collection_name="big_docs",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
quantization_config=ScalarQuantization(
scalar=ScalarQuantizationConfig(
type=ScalarType.INT8, # 4x memory reduction
quantile=0.99,
always_ram=True, # Keep quantized vectors in RAM
),
),
)Binary quantization is even more aggressive (32x reduction) but trades recall:
from qdrant_client.models import BinaryQuantization, BinaryQuantizationConfig
client.create_collection(
collection_name="huge_docs",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
quantization_config=BinaryQuantization(
binary=BinaryQuantizationConfig(always_ram=True),
),
)Snapshots and Backup
# Create snapshot
snapshot_info = client.create_snapshot(collection_name="my_docs")
print(snapshot_info) # Contains download URL
# List snapshots
snapshots = client.list_snapshots(collection_name="my_docs")
# Restore from snapshot
# (via HTTP API or copy the snapshot file to Qdrant storage)Collection Configuration Tuning
For large-scale collections, tune HNSW parameters:
from qdrant_client.models import HnswConfigDiff, OptimizersConfigDiff
client.update_collection(
collection_name="my_docs",
hnsw_config=HnswConfigDiff(
m=16, # Graph connectivity (default 16)
ef_construct=200, # Index build quality (default 100)
full_scan_threshold=10000, # Below this count, full scan instead of HNSW
),
optimizer_config=OptimizersConfigDiff(
indexing_threshold=20000, # Start building HNSW after this many points
),
)Integration with LangChain and LlamaIndex
# LangChain
from langchain_qdrant import QdrantVectorStore
vector_store = QdrantVectorStore.from_existing_collection(
collection_name="my_docs",
embedding=embeddings,
url="http://localhost:6333",
)For LangChain-specific patterns, see LangChain Python not working.
# LlamaIndex
from llama_index.vector_stores.qdrant import QdrantVectorStore
import qdrant_client
client = qdrant_client.QdrantClient(host="localhost", port=6333)
vector_store = QdrantVectorStore(client=client, collection_name="my_docs")For LlamaIndex setup patterns, see LlamaIndex not working. For OpenAI embedding configuration with Qdrant, see OpenAI API not working.
Debugging Empty Results
When a filtered search returns nothing unexpected, check in order:
client.count(collection_name="my_docs")— are there any points at all?client.scroll(collection_name="my_docs", limit=1)— inspect a real payload to confirm field names- Run the same search without the filter — vector alone should return results
- Run the filter as a scroll (no vector) — does the filter itself match anything?
- Check that all filtered fields have payload indexes for performance
Most “empty result” bugs trace to a field name mismatch (e.g., filtering on category but the payload has Category) or a wrong type (filtering year as a string when stored as integer).
HNSW vs Flat Index at Scale
For collections under 10,000 points, Qdrant uses brute-force scan by default — full_scan_threshold defaults to 10000. Recall is perfect. Above that threshold, HNSW kicks in and recall drops to ~95-99% depending on ef and m. The transition is silent. If you ran benchmarks at 5,000 points and shipped to production with 5 million, the recall you measured is no longer the recall you get. Always benchmark at production scale, and tune ef (search-time) and ef_construct (build-time) for the recall floor your application needs. For RAG, recall below 95% noticeably degrades answer quality.
Wrong Distance Metric for the Embedding Model
OpenAI text-embedding-3-* models are normalized, so cosine and dot product give identical rankings. BGE and instructor models are also normalized. But many sentence-transformers models (e.g., older multi-qa-MiniLM-L6-cos-v1) are tuned for cosine specifically — using Distance.DOT against them gives subtly wrong rankings. Check the model card before creating the collection. When in doubt, Distance.COSINE is the safe default for English text embeddings.
Async Client Hanging on wait=True
The AsyncQdrantClient returns a coroutine for every operation. If you mix await client.upsert(...) calls inside a synchronous wrapper that does not actually await, the upsert never executes — and the next query returns empty. Confirm the client class matches your code path: QdrantClient is sync, AsyncQdrantClient is async. Many tutorials show the sync client; copy-pasting into an async FastAPI route silently breaks because the sync client blocks the event loop on every call.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: Milvus Not Working — Connection Errors, Schema Setup, and Index Build Failures
How to fix Milvus errors — pymilvus connection refused localhost 19530, collection schema mismatch, index not built before search, partition not found, embedded vs standalone vs cluster, and flush before search.
Fix: ChromaDB Not Working — Persistent Client, Collection Errors, and Embedding Function Issues
How to fix ChromaDB errors — persistent client not saving data, collection already exists error, dimension mismatch in embeddings, embedding function required, HTTP client connection refused, and memory growing unbounded.
Fix: Pinecone Not Working — Index Creation, Serverless vs Pod, and Python SDK v3 Migration
How to fix Pinecone errors — ApiException 401 unauthorized, index not found, dimension mismatch, serverless spec required, Python SDK v3 breaking changes, namespace confusion, and upsert rate limit 429.
Fix: Weaviate Not Working — Client v4 Migration, Schema Setup, and Vectorizer Errors
How to fix Weaviate errors — client v3 to v4 migration breaking imports, schema creation property mismatch, vectorizer module not loaded, connection refused localhost 8080, batch import errors, and hybrid search alpha tuning.