Fix: Qdrant Not Working — Connection Errors, Collection Setup, and Filter Syntax Issues

Q: How do I fix "Qdrant Not Working — Connection Errors, Collection Setup, and Filter Syntax Issues"?

How to fix Qdrant errors — connection refused to localhost 6333, collection not found create_collection, vector size mismatch, filter must match schema, payload index missing slow queries, and timeout on large batch uploads.

The Error

You connect to Qdrant and it refuses:

httpx.ConnectError: [Errno 111] Connection refused
qdrant_client.http.exceptions.UnexpectedResponse: Unexpected Response: 404

Or you create a collection and get a vector size error on upload:

qdrant_client.http.exceptions.UnexpectedResponse: 
Vector size 1536 does not match collection vector size 384

Or a filter query returns no results, even though you know the data matches:

results = client.search(
    collection_name="docs",
    query_vector=vec,
    query_filter=Filter(must=[FieldCondition(key="category", match=MatchValue(value="news"))]),
    limit=10,
)
# Empty list, but you know there are "news" docs

Or queries are painfully slow on filtered searches over millions of vectors:

# Vector-only search: 20ms
# Vector + filter: 5000ms

Qdrant is a Rust-based vector database — faster and more production-ready than Chroma, with richer filtering and HNSW tuning. The filter language is strict (unlike Chroma’s flexible dict filters), and payload fields need explicit indexing for fast filtered search. This guide covers each common failure.

Why This Happens

Qdrant runs as a server (HTTP on 6333, gRPC on 6334) — you never use it in-process. Connection issues usually trace to the server not running, wrong port, or firewall blocking. Collection configuration is strict: the vector size, distance metric, and payload indexes are all set at creation time.

Qdrant’s filter language uses nested Pydantic models (Filter, FieldCondition, MatchValue) rather than flat dicts. This gives you type safety but makes the syntax verbose — copying a dict-style filter from LangChain or Chroma docs doesn’t work.

Diagnostic Timeline: “Results Look Wrong” — Triage in Order

Your first instinct is to rebuild the collection from scratch. Don’t. Rebuilding hides the actual problem and you will hit it again next week. Here is the real path.

Minute 0 — Count the points. Run client.count(collection_name="my_docs", exact=True). If the count is zero or much smaller than expected, your upsert failed silently — likely a vector-size mismatch swallowed by wait=False. If the count looks right, the data is there and the search is wrong.

Minute 1 — Inspect one real payload. Run client.scroll(collection_name="my_docs", limit=1). Read the payload. Compare field names byte-for-byte with your filter — Category vs category, published_at (string) vs filter using Range (numeric). About 70% of “empty results” bugs end here.

Minute 2 — Run the search without the filter. If unfiltered search returns sensible neighbors, the vector index is fine and only the filter is wrong. If unfiltered search returns nonsense too, check that your query embedding was generated with the same model and same normalization as the points. A query embedding from text-embedding-3-large against a collection of text-embedding-3-small vectors will not error — it will return random-looking neighbors.

Minute 3 — Check the distance metric. A collection created with Distance.COSINE interprets score_threshold=0.7 as “scores >= 0.7 are good.” A collection created with Distance.EUCLID interprets the same threshold as “distance <= 0.7.” Mix these up and your thresholding silently filters every result. Run client.get_collection("my_docs").config.params.vectors.distance to confirm.

Minute 5 — Check whether HNSW is even built yet. If you uploaded a million points and immediately queried, the HNSW index may not have finished building. client.get_collection("my_docs").status should be "green". If it is "yellow", optimization is in progress and queries fall back to brute force, which is slow but correct. If it is "red", indexing failed and recall is degraded.

Minute 8 — Payload index missing for the filter field. Filtered search over millions of vectors without a payload index is a sequential scan. Symptom: queries take 5+ seconds instead of milliseconds. Add create_payload_index for every field you filter on. See Fix 5.

The first guess (“rebuild the collection”) is wrong about nine times out of ten. The real cause is one of: wrong distance metric, payload index missing, or HNSW still building at query time.

Fix 1: Connecting to Qdrant

Start a local Qdrant server (easiest via Docker):

# Docker — data persists in ./qdrant_storage
docker run -p 6333:6333 -p 6334:6334 \
    -v $(pwd)/qdrant_storage:/qdrant/storage \
    qdrant/qdrant

# Or install the binary
# https://github.com/qdrant/qdrant/releases
./qdrant

Connect from Python:

from qdrant_client import QdrantClient

# Option 1: Local server
client = QdrantClient(host="localhost", port=6333)

# Option 2: URL-based (preferred for cloud)
client = QdrantClient(url="http://localhost:6333")

# Option 3: gRPC (faster for high-throughput)
client = QdrantClient(host="localhost", port=6334, prefer_grpc=True)

# Option 4: In-memory (tests, no persistence)
client = QdrantClient(":memory:")

# Option 5: Local file (no server, embedded mode)
client = QdrantClient(path="./qdrant_local")

Qdrant Cloud (managed):

client = QdrantClient(
    url="https://your-cluster.qdrant.io",
    api_key="your-api-key",
)

Verify connection:

print(client.get_collections())   # Returns CollectionsResponse with list

If this raises ConnectionError, the server isn’t reachable.

Common Mistake: Trying to use QdrantClient(path=...) and QdrantClient(host=...) in the same process simultaneously. Local mode uses an embedded database and doesn’t communicate with a server. Pick one mode per application.

Fix 2: Collection Creation and Vector Size

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams

client = QdrantClient("localhost", port=6333)

# Create collection — vector size must match your embedding model
client.create_collection(
    collection_name="my_docs",
    vectors_config=VectorParams(
        size=1536,                    # text-embedding-3-small = 1536
        distance=Distance.COSINE,     # or DOT, EUCLID, MANHATTAN
    ),
)

Common embedding sizes:

Model	Dimension
`text-embedding-3-small` (OpenAI)	1536
`text-embedding-3-large` (OpenAI)	3072
`text-embedding-ada-002` (OpenAI)	1536
`all-MiniLM-L6-v2` (ST)	384
`all-mpnet-base-v2` (ST)	768
`bge-large-en-v1.5` (BGE)	1024
`embed-english-v3.0` (Cohere)	1024

Idempotent creation — use recreate_collection or check existence:

from qdrant_client.http.exceptions import UnexpectedResponse

# Option 1: recreate (destroys existing!)
client.recreate_collection(
    collection_name="my_docs",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)

# Option 2: create only if missing
try:
    client.create_collection(
        collection_name="my_docs",
        vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
    )
except UnexpectedResponse as e:
    if "already exists" not in str(e):
        raise
    # Collection exists — continue

# Option 3: explicitly check
if not client.collection_exists("my_docs"):
    client.create_collection(
        collection_name="my_docs",
        vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
    )

Multiple named vectors per point (e.g., text and image embeddings for the same document):

client.create_collection(
    collection_name="multimodal",
    vectors_config={
        "text": VectorParams(size=1536, distance=Distance.COSINE),
        "image": VectorParams(size=512, distance=Distance.COSINE),
    },
)

# Upsert with named vectors
from qdrant_client.models import PointStruct

client.upsert(
    collection_name="multimodal",
    points=[
        PointStruct(
            id=1,
            vector={"text": text_embedding, "image": image_embedding},
            payload={"name": "Product A"},
        ),
    ],
)

Fix 3: Upserting Points

from qdrant_client.models import PointStruct
import uuid

# Simple upsert
points = [
    PointStruct(
        id=i,   # int or str (UUID). Use UUIDs for distributed inserts.
        vector=embedding_vector,   # List[float] of correct size
        payload={"text": doc_text, "source": "blog", "year": 2025},
    )
    for i, (doc_text, embedding_vector) in enumerate(zip(docs, vectors))
]

client.upsert(collection_name="my_docs", points=points, wait=True)
# wait=True — block until indexed. Set False for async.

Batch upload for performance:

def upload_in_batches(client, collection_name, points, batch_size=100):
    for i in range(0, len(points), batch_size):
        batch = points[i:i + batch_size]
        client.upsert(collection_name=collection_name, points=batch)
        print(f"Uploaded {i + len(batch)} / {len(points)}")

upload_in_batches(client, "my_docs", points)

wait=False with batched upload for maximum throughput:

for batch in batches:
    client.upsert(collection_name="my_docs", points=batch, wait=False)

# After all uploads, wait for indexing to catch up
import time
while True:
    info = client.get_collection("my_docs")
    if info.status == "green":   # Fully indexed
        break
    time.sleep(1)

ID types — integers are fastest, but UUIDs are safer for distributed writes:

# Integer IDs
PointStruct(id=1, vector=vec, payload=data)
PointStruct(id=2, vector=vec, payload=data)

# UUID IDs (as string)
PointStruct(id=str(uuid.uuid4()), vector=vec, payload=data)

Don’t mix integer and UUID IDs within the same collection.

Fix 4: Filter Syntax

Qdrant’s filter language uses three top-level operators: must, should, must_not. Each takes a list of conditions.

from qdrant_client.models import (
    Filter, FieldCondition, MatchValue, MatchAny, Range,
)

# Simple equals filter
filter = Filter(
    must=[FieldCondition(key="category", match=MatchValue(value="news"))]
)

# Multiple conditions (AND)
filter = Filter(
    must=[
        FieldCondition(key="category", match=MatchValue(value="news")),
        FieldCondition(key="year", match=MatchValue(value=2025)),
    ],
)

# OR conditions (should)
filter = Filter(
    should=[
        FieldCondition(key="category", match=MatchValue(value="news")),
        FieldCondition(key="category", match=MatchValue(value="blog")),
    ],
)

# NOT conditions
filter = Filter(
    must_not=[FieldCondition(key="draft", match=MatchValue(value=True))]
)

# Match any of a list
filter = Filter(
    must=[FieldCondition(key="category", match=MatchAny(any=["news", "blog", "tutorial"]))]
)

# Range filter
filter = Filter(
    must=[
        FieldCondition(
            key="year",
            range=Range(gte=2020, lte=2025),
        ),
    ],
)

# Combined — complex filter
filter = Filter(
    must=[
        FieldCondition(key="category", match=MatchValue(value="news")),
        FieldCondition(key="year", range=Range(gte=2024)),
    ],
    must_not=[
        FieldCondition(key="archived", match=MatchValue(value=True)),
    ],
)

Apply to search:

results = client.search(
    collection_name="my_docs",
    query_vector=embedding,
    query_filter=filter,
    limit=10,
)

Nested payload fields — use dot notation:

# Payload: {"metadata": {"author": "Alice", "year": 2025}}
filter = Filter(
    must=[FieldCondition(key="metadata.author", match=MatchValue(value="Alice"))]
)

Pro Tip: Qdrant’s filter API looks verbose next to Chroma’s dict-based filters, but the Pydantic models catch type errors at construction time rather than at query time. When filtering on a field that doesn’t exist or has wrong type, you get immediate validation errors — not silent empty results.

Fix 5: Payload Indexing for Fast Filtered Search

# Unfiltered search: 20ms
# Filtered search: 5000ms

Without payload indexes, Qdrant must scan every point to apply filters — linear time. Create indexes on fields you filter by frequently:

from qdrant_client.models import PayloadSchemaType

client.create_payload_index(
    collection_name="my_docs",
    field_name="category",
    field_schema=PayloadSchemaType.KEYWORD,   # For exact string match
)

client.create_payload_index(
    collection_name="my_docs",
    field_name="year",
    field_schema=PayloadSchemaType.INTEGER,   # For integer range queries
)

client.create_payload_index(
    collection_name="my_docs",
    field_name="published_at",
    field_schema=PayloadSchemaType.DATETIME,
)

Schema types:

Type	Use for
`KEYWORD`	Exact string match (category, tag, status)
`TEXT`	Full-text search (tokenized)
`INTEGER`	Numeric comparisons (year, count)
`FLOAT`	Decimal comparisons (price, score)
`BOOL`	Boolean (draft, published)
`GEO`	Geospatial coordinates
`DATETIME`	ISO-8601 timestamps
`UUID`	UUID-formatted strings

After indexing, filtered queries drop from seconds to milliseconds on large collections (>100k points).

Common Mistake: Creating a collection and immediately expecting fast filtered queries. The collection’s vector index (HNSW) is automatic, but payload indexes are not — you must create each one explicitly. Without them, filters are sequential scans.

Full-text search on a payload field:

from qdrant_client.models import MatchText

client.create_payload_index(
    collection_name="my_docs",
    field_name="content",
    field_schema=PayloadSchemaType.TEXT,
)

# Full-text filter
filter = Filter(
    must=[FieldCondition(key="content", match=MatchText(text="quantum computing"))]
)

Fix 6: Query Parameters and Search

Basic search:

results = client.search(
    collection_name="my_docs",
    query_vector=query_embedding,
    limit=10,
    score_threshold=0.7,   # Only results with score >= 0.7
    with_payload=True,     # Include payload in results (default True)
    with_vectors=False,    # Usually False — saves bandwidth
)

for result in results:
    print(f"Score: {result.score:.3f}, ID: {result.id}")
    print(f"Payload: {result.payload}")

Search with named vectors:

# If collection has multiple named vectors
results = client.search(
    collection_name="multimodal",
    query_vector=("text", text_embedding),   # Tuple: (vector_name, vector)
    limit=10,
)

score_threshold interpretation depends on distance metric:

COSINE: 1.0 is identical, -1.0 is opposite. Use score_threshold=0.8 typically.
DOT: Larger is more similar (for normalized vectors, behaves like cosine).
EUCLID: Smaller is more similar. score_threshold acts as an upper bound on distance.

Batch search (multiple queries in one request):

from qdrant_client.models import SearchRequest

results = client.search_batch(
    collection_name="my_docs",
    requests=[
        SearchRequest(vector=q1_embedding, limit=5),
        SearchRequest(vector=q2_embedding, limit=5),
        SearchRequest(vector=q3_embedding, limit=5),
    ],
)

for i, query_results in enumerate(results):
    print(f"Query {i}: {len(query_results)} results")

Recommendations (search using existing points as query):

results = client.recommend(
    collection_name="my_docs",
    positive=[1, 2, 3],   # IDs of positive examples
    negative=[10, 11],    # IDs of negative examples
    limit=10,
)

Fix 7: Scroll — Paginating Through All Points

# Get all points — WRONG for large collections
all_points = client.scroll(collection_name="my_docs", limit=100000)   # Blows memory

# CORRECT — paginate
offset = None
while True:
    points, offset = client.scroll(
        collection_name="my_docs",
        limit=100,
        offset=offset,
        with_payload=True,
        with_vectors=False,
    )
    for point in points:
        process(point)
    if offset is None:
        break   # No more pages

Scroll with filter — useful for exporting or cleaning up:

# Get all draft documents
offset = None
drafts = []
while True:
    points, offset = client.scroll(
        collection_name="my_docs",
        scroll_filter=Filter(
            must=[FieldCondition(key="draft", match=MatchValue(value=True))]
        ),
        limit=100,
        offset=offset,
    )
    drafts.extend(points)
    if offset is None:
        break

print(f"Found {len(drafts)} drafts")

Fix 8: Deleting Points and Collections

from qdrant_client.models import PointIdsList, FilterSelector

# Delete specific IDs
client.delete(
    collection_name="my_docs",
    points_selector=PointIdsList(points=[1, 2, 3]),
)

# Delete by filter
client.delete(
    collection_name="my_docs",
    points_selector=FilterSelector(
        filter=Filter(
            must=[FieldCondition(key="archived", match=MatchValue(value=True))]
        ),
    ),
)

# Delete entire collection
client.delete_collection(collection_name="my_docs")

Update payload (without changing the vector):

client.set_payload(
    collection_name="my_docs",
    payload={"updated_at": "2025-04-09", "version": 2},
    points=[1, 2, 3],
)

# Overwrite full payload
client.overwrite_payload(
    collection_name="my_docs",
    payload={"status": "archived"},
    points=[1],
)

# Delete specific payload keys
client.delete_payload(
    collection_name="my_docs",
    keys=["old_field"],
    points=[1, 2, 3],
)

Still Not Working?

Qdrant vs Other Vector Databases

Qdrant — Production-grade, fast, rich filtering, horizontal scaling. Strong choice for most production workloads.
Chroma — Simpler, runs in-process. Best for prototypes. For Chroma-specific patterns, see ChromaDB not working.
Pinecone — Managed SaaS. No ops, but costs scale.
Milvus — Enterprise-scale, complex to operate.
pgvector — Postgres extension. Good when you already run Postgres.

Quantization for Lower Memory

For very large collections (>10M vectors), enable scalar or binary quantization to reduce memory by 4–32x:

from qdrant_client.models import (
    VectorParams, Distance, ScalarQuantization, ScalarQuantizationConfig, ScalarType,
)

client.create_collection(
    collection_name="big_docs",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
    quantization_config=ScalarQuantization(
        scalar=ScalarQuantizationConfig(
            type=ScalarType.INT8,   # 4x memory reduction
            quantile=0.99,
            always_ram=True,          # Keep quantized vectors in RAM
        ),
    ),
)

Binary quantization is even more aggressive (32x reduction) but trades recall:

from qdrant_client.models import BinaryQuantization, BinaryQuantizationConfig

client.create_collection(
    collection_name="huge_docs",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
    quantization_config=BinaryQuantization(
        binary=BinaryQuantizationConfig(always_ram=True),
    ),
)

Snapshots and Backup

# Create snapshot
snapshot_info = client.create_snapshot(collection_name="my_docs")
print(snapshot_info)   # Contains download URL

# List snapshots
snapshots = client.list_snapshots(collection_name="my_docs")

# Restore from snapshot
# (via HTTP API or copy the snapshot file to Qdrant storage)

Collection Configuration Tuning

For large-scale collections, tune HNSW parameters:

from qdrant_client.models import HnswConfigDiff, OptimizersConfigDiff

client.update_collection(
    collection_name="my_docs",
    hnsw_config=HnswConfigDiff(
        m=16,                 # Graph connectivity (default 16)
        ef_construct=200,     # Index build quality (default 100)
        full_scan_threshold=10000,   # Below this count, full scan instead of HNSW
    ),
    optimizer_config=OptimizersConfigDiff(
        indexing_threshold=20000,   # Start building HNSW after this many points
    ),
)

Integration with LangChain and LlamaIndex

# LangChain
from langchain_qdrant import QdrantVectorStore

vector_store = QdrantVectorStore.from_existing_collection(
    collection_name="my_docs",
    embedding=embeddings,
    url="http://localhost:6333",
)

For LangChain-specific patterns, see LangChain Python not working.

# LlamaIndex
from llama_index.vector_stores.qdrant import QdrantVectorStore
import qdrant_client

client = qdrant_client.QdrantClient(host="localhost", port=6333)
vector_store = QdrantVectorStore(client=client, collection_name="my_docs")

For LlamaIndex setup patterns, see LlamaIndex not working. For OpenAI embedding configuration with Qdrant, see OpenAI API not working.

Debugging Empty Results

When a filtered search returns nothing unexpected, check in order:

client.count(collection_name="my_docs") — are there any points at all?
client.scroll(collection_name="my_docs", limit=1) — inspect a real payload to confirm field names
Run the same search without the filter — vector alone should return results
Run the filter as a scroll (no vector) — does the filter itself match anything?
Check that all filtered fields have payload indexes for performance

Most “empty result” bugs trace to a field name mismatch (e.g., filtering on category but the payload has Category) or a wrong type (filtering year as a string when stored as integer).

HNSW vs Flat Index at Scale

For collections under 10,000 points, Qdrant uses brute-force scan by default — full_scan_threshold defaults to 10000. Recall is perfect. Above that threshold, HNSW kicks in and recall drops to ~95-99% depending on ef and m. The transition is silent. If you ran benchmarks at 5,000 points and shipped to production with 5 million, the recall you measured is no longer the recall you get. Always benchmark at production scale, and tune ef (search-time) and ef_construct (build-time) for the recall floor your application needs. For RAG, recall below 95% noticeably degrades answer quality.

Wrong Distance Metric for the Embedding Model

OpenAI text-embedding-3-* models are normalized, so cosine and dot product give identical rankings. BGE and instructor models are also normalized. But many sentence-transformers models (e.g., older multi-qa-MiniLM-L6-cos-v1) are tuned for cosine specifically — using Distance.DOT against them gives subtly wrong rankings. Check the model card before creating the collection. When in doubt, Distance.COSINE is the safe default for English text embeddings.

Async Client Hanging on `wait=True`

The AsyncQdrantClient returns a coroutine for every operation. If you mix await client.upsert(...) calls inside a synchronous wrapper that does not actually await, the upsert never executes — and the next query returns empty. Confirm the client class matches your code path: QdrantClient is sync, AsyncQdrantClient is async. Many tutorials show the sync client; copy-pasting into an async FastAPI route silently breaks because the sync client blocks the event loop on every call.

Fix: Qdrant Not Working — Connection Errors, Collection Setup, and Filter Syntax Issues

The Error

Why This Happens

Diagnostic Timeline: “Results Look Wrong” — Triage in Order

Fix 1: Connecting to Qdrant

Fix 2: Collection Creation and Vector Size

Fix 3: Upserting Points

Fix 4: Filter Syntax

Fix 5: Payload Indexing for Fast Filtered Search

Fix 6: Query Parameters and Search

Fix 7: Scroll — Paginating Through All Points

Fix 8: Deleting Points and Collections

Still Not Working?

Qdrant vs Other Vector Databases

Quantization for Lower Memory

Snapshots and Backup

Collection Configuration Tuning

Integration with LangChain and LlamaIndex

Debugging Empty Results

HNSW vs Flat Index at Scale

Wrong Distance Metric for the Embedding Model

Async Client Hanging on `wait=True`

Related Articles

Fix: Milvus Not Working — Connection Errors, Schema Setup, and Index Build Failures

Fix: ChromaDB Not Working — Persistent Client, Collection Errors, and Embedding Function Issues

Fix: Pinecone Not Working — Index Creation, Serverless vs Pod, and Python SDK v3 Migration

Fix: Weaviate Not Working — Client v4 Migration, Schema Setup, and Vectorizer Errors

The Error

Why This Happens

Diagnostic Timeline: “Results Look Wrong” — Triage in Order

Fix 1: Connecting to Qdrant

Fix 2: Collection Creation and Vector Size

Fix 3: Upserting Points

Fix 4: Filter Syntax

Fix 5: Payload Indexing for Fast Filtered Search

Fix 6: Query Parameters and Search

Fix 7: Scroll — Paginating Through All Points

Fix 8: Deleting Points and Collections

Still Not Working?

Qdrant vs Other Vector Databases

Quantization for Lower Memory

Snapshots and Backup

Collection Configuration Tuning

Integration with LangChain and LlamaIndex

Debugging Empty Results

HNSW vs Flat Index at Scale

Wrong Distance Metric for the Embedding Model

Async Client Hanging on wait=True

Related Articles

Fix: Milvus Not Working — Connection Errors, Schema Setup, and Index Build Failures

Fix: ChromaDB Not Working — Persistent Client, Collection Errors, and Embedding Function Issues

Fix: Pinecone Not Working — Index Creation, Serverless vs Pod, and Python SDK v3 Migration

Fix: Weaviate Not Working — Client v4 Migration, Schema Setup, and Vectorizer Errors

Async Client Hanging on `wait=True`