Fix: Qdrant Not Working — Connection Errors, Collection Setup, and Filter Syntax Issues
Quick Answer
How to fix Qdrant errors — connection refused to localhost 6333, collection not found create_collection, vector size mismatch, filter must match schema, payload index missing slow queries, and timeout on large batch uploads.
The Error
You connect to Qdrant and it refuses:
httpx.ConnectError: [Errno 111] Connection refused
qdrant_client.http.exceptions.UnexpectedResponse: Unexpected Response: 404Or you create a collection and get a vector size error on upload:
qdrant_client.http.exceptions.UnexpectedResponse:
Vector size 1536 does not match collection vector size 384Or a filter query returns no results, even though you know the data matches:
results = client.search(
collection_name="docs",
query_vector=vec,
query_filter=Filter(must=[FieldCondition(key="category", match=MatchValue(value="news"))]),
limit=10,
)
# Empty list, but you know there are "news" docsOr queries are painfully slow on filtered searches over millions of vectors:
# Vector-only search: 20ms
# Vector + filter: 5000msQdrant is a Rust-based vector database — faster and more production-ready than Chroma, with richer filtering and HNSW tuning. The filter language is strict (unlike Chroma’s flexible dict filters), and payload fields need explicit indexing for fast filtered search. This guide covers each common failure.
Why This Happens
Qdrant runs as a server (HTTP on 6333, gRPC on 6334) — you never use it in-process. Connection issues usually trace to the server not running, wrong port, or firewall blocking. Collection configuration is strict: the vector size, distance metric, and payload indexes are all set at creation time.
Qdrant’s filter language uses nested Pydantic models (Filter, FieldCondition, MatchValue) rather than flat dicts. This gives you type safety but makes the syntax verbose — copying a dict-style filter from LangChain or Chroma docs doesn’t work.
Fix 1: Connecting to Qdrant
Start a local Qdrant server (easiest via Docker):
# Docker — data persists in ./qdrant_storage
docker run -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant
# Or install the binary
# https://github.com/qdrant/qdrant/releases
./qdrantConnect from Python:
from qdrant_client import QdrantClient
# Option 1: Local server
client = QdrantClient(host="localhost", port=6333)
# Option 2: URL-based (preferred for cloud)
client = QdrantClient(url="http://localhost:6333")
# Option 3: gRPC (faster for high-throughput)
client = QdrantClient(host="localhost", port=6334, prefer_grpc=True)
# Option 4: In-memory (tests, no persistence)
client = QdrantClient(":memory:")
# Option 5: Local file (no server, embedded mode)
client = QdrantClient(path="./qdrant_local")Qdrant Cloud (managed):
client = QdrantClient(
url="https://your-cluster.qdrant.io",
api_key="your-api-key",
)Verify connection:
print(client.get_collections()) # Returns CollectionsResponse with listIf this raises ConnectionError, the server isn’t reachable.
Common Mistake: Trying to use QdrantClient(path=...) and QdrantClient(host=...) in the same process simultaneously. Local mode uses an embedded database and doesn’t communicate with a server. Pick one mode per application.
Fix 2: Collection Creation and Vector Size
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams
client = QdrantClient("localhost", port=6333)
# Create collection — vector size must match your embedding model
client.create_collection(
collection_name="my_docs",
vectors_config=VectorParams(
size=1536, # text-embedding-3-small = 1536
distance=Distance.COSINE, # or DOT, EUCLID, MANHATTAN
),
)Common embedding sizes:
| Model | Dimension |
|---|---|
text-embedding-3-small (OpenAI) | 1536 |
text-embedding-3-large (OpenAI) | 3072 |
text-embedding-ada-002 (OpenAI) | 1536 |
all-MiniLM-L6-v2 (ST) | 384 |
all-mpnet-base-v2 (ST) | 768 |
bge-large-en-v1.5 (BGE) | 1024 |
embed-english-v3.0 (Cohere) | 1024 |
Idempotent creation — use recreate_collection or check existence:
from qdrant_client.http.exceptions import UnexpectedResponse
# Option 1: recreate (destroys existing!)
client.recreate_collection(
collection_name="my_docs",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)
# Option 2: create only if missing
try:
client.create_collection(
collection_name="my_docs",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)
except UnexpectedResponse as e:
if "already exists" not in str(e):
raise
# Collection exists — continue
# Option 3: explicitly check
if not client.collection_exists("my_docs"):
client.create_collection(
collection_name="my_docs",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)Multiple named vectors per point (e.g., text and image embeddings for the same document):
client.create_collection(
collection_name="multimodal",
vectors_config={
"text": VectorParams(size=1536, distance=Distance.COSINE),
"image": VectorParams(size=512, distance=Distance.COSINE),
},
)
# Upsert with named vectors
from qdrant_client.models import PointStruct
client.upsert(
collection_name="multimodal",
points=[
PointStruct(
id=1,
vector={"text": text_embedding, "image": image_embedding},
payload={"name": "Product A"},
),
],
)Fix 3: Upserting Points
from qdrant_client.models import PointStruct
import uuid
# Simple upsert
points = [
PointStruct(
id=i, # int or str (UUID). Use UUIDs for distributed inserts.
vector=embedding_vector, # List[float] of correct size
payload={"text": doc_text, "source": "blog", "year": 2025},
)
for i, (doc_text, embedding_vector) in enumerate(zip(docs, vectors))
]
client.upsert(collection_name="my_docs", points=points, wait=True)
# wait=True — block until indexed. Set False for async.Batch upload for performance:
def upload_in_batches(client, collection_name, points, batch_size=100):
for i in range(0, len(points), batch_size):
batch = points[i:i + batch_size]
client.upsert(collection_name=collection_name, points=batch)
print(f"Uploaded {i + len(batch)} / {len(points)}")
upload_in_batches(client, "my_docs", points)wait=False with batched upload for maximum throughput:
for batch in batches:
client.upsert(collection_name="my_docs", points=batch, wait=False)
# After all uploads, wait for indexing to catch up
import time
while True:
info = client.get_collection("my_docs")
if info.status == "green": # Fully indexed
break
time.sleep(1)ID types — integers are fastest, but UUIDs are safer for distributed writes:
# Integer IDs
PointStruct(id=1, vector=vec, payload=data)
PointStruct(id=2, vector=vec, payload=data)
# UUID IDs (as string)
PointStruct(id=str(uuid.uuid4()), vector=vec, payload=data)Don’t mix integer and UUID IDs within the same collection.
Fix 4: Filter Syntax
Qdrant’s filter language uses three top-level operators: must, should, must_not. Each takes a list of conditions.
from qdrant_client.models import (
Filter, FieldCondition, MatchValue, MatchAny, Range,
)
# Simple equals filter
filter = Filter(
must=[FieldCondition(key="category", match=MatchValue(value="news"))]
)
# Multiple conditions (AND)
filter = Filter(
must=[
FieldCondition(key="category", match=MatchValue(value="news")),
FieldCondition(key="year", match=MatchValue(value=2025)),
],
)
# OR conditions (should)
filter = Filter(
should=[
FieldCondition(key="category", match=MatchValue(value="news")),
FieldCondition(key="category", match=MatchValue(value="blog")),
],
)
# NOT conditions
filter = Filter(
must_not=[FieldCondition(key="draft", match=MatchValue(value=True))]
)
# Match any of a list
filter = Filter(
must=[FieldCondition(key="category", match=MatchAny(any=["news", "blog", "tutorial"]))]
)
# Range filter
filter = Filter(
must=[
FieldCondition(
key="year",
range=Range(gte=2020, lte=2025),
),
],
)
# Combined — complex filter
filter = Filter(
must=[
FieldCondition(key="category", match=MatchValue(value="news")),
FieldCondition(key="year", range=Range(gte=2024)),
],
must_not=[
FieldCondition(key="archived", match=MatchValue(value=True)),
],
)Apply to search:
results = client.search(
collection_name="my_docs",
query_vector=embedding,
query_filter=filter,
limit=10,
)Nested payload fields — use dot notation:
# Payload: {"metadata": {"author": "Alice", "year": 2025}}
filter = Filter(
must=[FieldCondition(key="metadata.author", match=MatchValue(value="Alice"))]
)Pro Tip: Qdrant’s filter API looks verbose next to Chroma’s dict-based filters, but the Pydantic models catch type errors at construction time rather than at query time. When filtering on a field that doesn’t exist or has wrong type, you get immediate validation errors — not silent empty results.
Fix 5: Payload Indexing for Fast Filtered Search
# Unfiltered search: 20ms
# Filtered search: 5000msWithout payload indexes, Qdrant must scan every point to apply filters — linear time. Create indexes on fields you filter by frequently:
from qdrant_client.models import PayloadSchemaType
client.create_payload_index(
collection_name="my_docs",
field_name="category",
field_schema=PayloadSchemaType.KEYWORD, # For exact string match
)
client.create_payload_index(
collection_name="my_docs",
field_name="year",
field_schema=PayloadSchemaType.INTEGER, # For integer range queries
)
client.create_payload_index(
collection_name="my_docs",
field_name="published_at",
field_schema=PayloadSchemaType.DATETIME,
)Schema types:
| Type | Use for |
|---|---|
KEYWORD | Exact string match (category, tag, status) |
TEXT | Full-text search (tokenized) |
INTEGER | Numeric comparisons (year, count) |
FLOAT | Decimal comparisons (price, score) |
BOOL | Boolean (draft, published) |
GEO | Geospatial coordinates |
DATETIME | ISO-8601 timestamps |
UUID | UUID-formatted strings |
After indexing, filtered queries drop from seconds to milliseconds on large collections (>100k points).
Common Mistake: Creating a collection and immediately expecting fast filtered queries. The collection’s vector index (HNSW) is automatic, but payload indexes are not — you must create each one explicitly. Without them, filters are sequential scans.
Full-text search on a payload field:
from qdrant_client.models import MatchText
client.create_payload_index(
collection_name="my_docs",
field_name="content",
field_schema=PayloadSchemaType.TEXT,
)
# Full-text filter
filter = Filter(
must=[FieldCondition(key="content", match=MatchText(text="quantum computing"))]
)Fix 6: Query Parameters and Search
Basic search:
results = client.search(
collection_name="my_docs",
query_vector=query_embedding,
limit=10,
score_threshold=0.7, # Only results with score >= 0.7
with_payload=True, # Include payload in results (default True)
with_vectors=False, # Usually False — saves bandwidth
)
for result in results:
print(f"Score: {result.score:.3f}, ID: {result.id}")
print(f"Payload: {result.payload}")Search with named vectors:
# If collection has multiple named vectors
results = client.search(
collection_name="multimodal",
query_vector=("text", text_embedding), # Tuple: (vector_name, vector)
limit=10,
)score_threshold interpretation depends on distance metric:
COSINE: 1.0 is identical, -1.0 is opposite. Usescore_threshold=0.8typically.DOT: Larger is more similar (for normalized vectors, behaves like cosine).EUCLID: Smaller is more similar.score_thresholdacts as an upper bound on distance.
Batch search (multiple queries in one request):
from qdrant_client.models import SearchRequest
results = client.search_batch(
collection_name="my_docs",
requests=[
SearchRequest(vector=q1_embedding, limit=5),
SearchRequest(vector=q2_embedding, limit=5),
SearchRequest(vector=q3_embedding, limit=5),
],
)
for i, query_results in enumerate(results):
print(f"Query {i}: {len(query_results)} results")Recommendations (search using existing points as query):
results = client.recommend(
collection_name="my_docs",
positive=[1, 2, 3], # IDs of positive examples
negative=[10, 11], # IDs of negative examples
limit=10,
)Fix 7: Scroll — Paginating Through All Points
# Get all points — WRONG for large collections
all_points = client.scroll(collection_name="my_docs", limit=100000) # Blows memory
# CORRECT — paginate
offset = None
while True:
points, offset = client.scroll(
collection_name="my_docs",
limit=100,
offset=offset,
with_payload=True,
with_vectors=False,
)
for point in points:
process(point)
if offset is None:
break # No more pagesScroll with filter — useful for exporting or cleaning up:
# Get all draft documents
offset = None
drafts = []
while True:
points, offset = client.scroll(
collection_name="my_docs",
scroll_filter=Filter(
must=[FieldCondition(key="draft", match=MatchValue(value=True))]
),
limit=100,
offset=offset,
)
drafts.extend(points)
if offset is None:
break
print(f"Found {len(drafts)} drafts")Fix 8: Deleting Points and Collections
from qdrant_client.models import PointIdsList, FilterSelector
# Delete specific IDs
client.delete(
collection_name="my_docs",
points_selector=PointIdsList(points=[1, 2, 3]),
)
# Delete by filter
client.delete(
collection_name="my_docs",
points_selector=FilterSelector(
filter=Filter(
must=[FieldCondition(key="archived", match=MatchValue(value=True))]
),
),
)
# Delete entire collection
client.delete_collection(collection_name="my_docs")Update payload (without changing the vector):
client.set_payload(
collection_name="my_docs",
payload={"updated_at": "2025-04-09", "version": 2},
points=[1, 2, 3],
)
# Overwrite full payload
client.overwrite_payload(
collection_name="my_docs",
payload={"status": "archived"},
points=[1],
)
# Delete specific payload keys
client.delete_payload(
collection_name="my_docs",
keys=["old_field"],
points=[1, 2, 3],
)Still Not Working?
Qdrant vs Other Vector Databases
- Qdrant — Production-grade, fast, rich filtering, horizontal scaling. Strong choice for most production workloads.
- Chroma — Simpler, runs in-process. Best for prototypes. For Chroma-specific patterns, see ChromaDB not working.
- Pinecone — Managed SaaS. No ops, but costs scale.
- Milvus — Enterprise-scale, complex to operate.
- pgvector — Postgres extension. Good when you already run Postgres.
Quantization for Lower Memory
For very large collections (>10M vectors), enable scalar or binary quantization to reduce memory by 4–32x:
from qdrant_client.models import (
VectorParams, Distance, ScalarQuantization, ScalarQuantizationConfig, ScalarType,
)
client.create_collection(
collection_name="big_docs",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
quantization_config=ScalarQuantization(
scalar=ScalarQuantizationConfig(
type=ScalarType.INT8, # 4x memory reduction
quantile=0.99,
always_ram=True, # Keep quantized vectors in RAM
),
),
)Binary quantization is even more aggressive (32x reduction) but trades recall:
from qdrant_client.models import BinaryQuantization, BinaryQuantizationConfig
client.create_collection(
collection_name="huge_docs",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
quantization_config=BinaryQuantization(
binary=BinaryQuantizationConfig(always_ram=True),
),
)Snapshots and Backup
# Create snapshot
snapshot_info = client.create_snapshot(collection_name="my_docs")
print(snapshot_info) # Contains download URL
# List snapshots
snapshots = client.list_snapshots(collection_name="my_docs")
# Restore from snapshot
# (via HTTP API or copy the snapshot file to Qdrant storage)Collection Configuration Tuning
For large-scale collections, tune HNSW parameters:
from qdrant_client.models import HnswConfigDiff, OptimizersConfigDiff
client.update_collection(
collection_name="my_docs",
hnsw_config=HnswConfigDiff(
m=16, # Graph connectivity (default 16)
ef_construct=200, # Index build quality (default 100)
full_scan_threshold=10000, # Below this count, full scan instead of HNSW
),
optimizer_config=OptimizersConfigDiff(
indexing_threshold=20000, # Start building HNSW after this many points
),
)Integration with LangChain and LlamaIndex
# LangChain
from langchain_qdrant import QdrantVectorStore
vector_store = QdrantVectorStore.from_existing_collection(
collection_name="my_docs",
embedding=embeddings,
url="http://localhost:6333",
)For LangChain-specific patterns, see LangChain Python not working.
# LlamaIndex
from llama_index.vector_stores.qdrant import QdrantVectorStore
import qdrant_client
client = qdrant_client.QdrantClient(host="localhost", port=6333)
vector_store = QdrantVectorStore(client=client, collection_name="my_docs")For LlamaIndex setup patterns, see LlamaIndex not working. For OpenAI embedding configuration with Qdrant, see OpenAI API not working.
Debugging Empty Results
When a filtered search returns nothing unexpected, check in order:
client.count(collection_name="my_docs")— are there any points at all?client.scroll(collection_name="my_docs", limit=1)— inspect a real payload to confirm field names- Run the same search without the filter — vector alone should return results
- Run the filter as a scroll (no vector) — does the filter itself match anything?
- Check that all filtered fields have payload indexes for performance
Most “empty result” bugs trace to a field name mismatch (e.g., filtering on category but the payload has Category) or a wrong type (filtering year as a string when stored as integer).
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: ChromaDB Not Working — Persistent Client, Collection Errors, and Embedding Function Issues
How to fix ChromaDB errors — persistent client not saving data, collection already exists error, dimension mismatch in embeddings, embedding function required, HTTP client connection refused, and memory growing unbounded.
Fix: Pinecone Not Working — Index Creation, Serverless vs Pod, and Python SDK v3 Migration
How to fix Pinecone errors — ApiException 401 unauthorized, index not found, dimension mismatch, serverless spec required, Python SDK v3 breaking changes, namespace confusion, and upsert rate limit 429.
Fix: pgvector Not Working — Extension Install, Index Not Used, and Dimension Errors
How to fix pgvector errors — extension does not exist CREATE EXTENSION vector, dimension mismatch on insert, HNSW index not used slow queries, distance operator confusion, psycopg register adapter, and ivfflat vs hnsw selection.
Fix: LlamaIndex Not Working — Import Errors, Vector Store Issues, and Query Engine Failures
How to fix LlamaIndex errors — ImportError llama_index.core module not found, ServiceContext deprecated use Settings instead, vector store index not persisting, query engine returns irrelevant results, and LlamaIndex 0.10 migration.