Skip to content

Fix: Qdrant Not Working — Connection Errors, Collection Setup, and Filter Syntax Issues

FixDevs ·

Quick Answer

How to fix Qdrant errors — connection refused to localhost 6333, collection not found create_collection, vector size mismatch, filter must match schema, payload index missing slow queries, and timeout on large batch uploads.

The Error

You connect to Qdrant and it refuses:

httpx.ConnectError: [Errno 111] Connection refused
qdrant_client.http.exceptions.UnexpectedResponse: Unexpected Response: 404

Or you create a collection and get a vector size error on upload:

qdrant_client.http.exceptions.UnexpectedResponse: 
Vector size 1536 does not match collection vector size 384

Or a filter query returns no results, even though you know the data matches:

results = client.search(
    collection_name="docs",
    query_vector=vec,
    query_filter=Filter(must=[FieldCondition(key="category", match=MatchValue(value="news"))]),
    limit=10,
)
# Empty list, but you know there are "news" docs

Or queries are painfully slow on filtered searches over millions of vectors:

# Vector-only search: 20ms
# Vector + filter: 5000ms

Qdrant is a Rust-based vector database — faster and more production-ready than Chroma, with richer filtering and HNSW tuning. The filter language is strict (unlike Chroma’s flexible dict filters), and payload fields need explicit indexing for fast filtered search. This guide covers each common failure.

Why This Happens

Qdrant runs as a server (HTTP on 6333, gRPC on 6334) — you never use it in-process. Connection issues usually trace to the server not running, wrong port, or firewall blocking. Collection configuration is strict: the vector size, distance metric, and payload indexes are all set at creation time.

Qdrant’s filter language uses nested Pydantic models (Filter, FieldCondition, MatchValue) rather than flat dicts. This gives you type safety but makes the syntax verbose — copying a dict-style filter from LangChain or Chroma docs doesn’t work.

Fix 1: Connecting to Qdrant

Start a local Qdrant server (easiest via Docker):

# Docker — data persists in ./qdrant_storage
docker run -p 6333:6333 -p 6334:6334 \
    -v $(pwd)/qdrant_storage:/qdrant/storage \
    qdrant/qdrant

# Or install the binary
# https://github.com/qdrant/qdrant/releases
./qdrant

Connect from Python:

from qdrant_client import QdrantClient

# Option 1: Local server
client = QdrantClient(host="localhost", port=6333)

# Option 2: URL-based (preferred for cloud)
client = QdrantClient(url="http://localhost:6333")

# Option 3: gRPC (faster for high-throughput)
client = QdrantClient(host="localhost", port=6334, prefer_grpc=True)

# Option 4: In-memory (tests, no persistence)
client = QdrantClient(":memory:")

# Option 5: Local file (no server, embedded mode)
client = QdrantClient(path="./qdrant_local")

Qdrant Cloud (managed):

client = QdrantClient(
    url="https://your-cluster.qdrant.io",
    api_key="your-api-key",
)

Verify connection:

print(client.get_collections())   # Returns CollectionsResponse with list

If this raises ConnectionError, the server isn’t reachable.

Common Mistake: Trying to use QdrantClient(path=...) and QdrantClient(host=...) in the same process simultaneously. Local mode uses an embedded database and doesn’t communicate with a server. Pick one mode per application.

Fix 2: Collection Creation and Vector Size

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams

client = QdrantClient("localhost", port=6333)

# Create collection — vector size must match your embedding model
client.create_collection(
    collection_name="my_docs",
    vectors_config=VectorParams(
        size=1536,                    # text-embedding-3-small = 1536
        distance=Distance.COSINE,     # or DOT, EUCLID, MANHATTAN
    ),
)

Common embedding sizes:

ModelDimension
text-embedding-3-small (OpenAI)1536
text-embedding-3-large (OpenAI)3072
text-embedding-ada-002 (OpenAI)1536
all-MiniLM-L6-v2 (ST)384
all-mpnet-base-v2 (ST)768
bge-large-en-v1.5 (BGE)1024
embed-english-v3.0 (Cohere)1024

Idempotent creation — use recreate_collection or check existence:

from qdrant_client.http.exceptions import UnexpectedResponse

# Option 1: recreate (destroys existing!)
client.recreate_collection(
    collection_name="my_docs",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)

# Option 2: create only if missing
try:
    client.create_collection(
        collection_name="my_docs",
        vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
    )
except UnexpectedResponse as e:
    if "already exists" not in str(e):
        raise
    # Collection exists — continue

# Option 3: explicitly check
if not client.collection_exists("my_docs"):
    client.create_collection(
        collection_name="my_docs",
        vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
    )

Multiple named vectors per point (e.g., text and image embeddings for the same document):

client.create_collection(
    collection_name="multimodal",
    vectors_config={
        "text": VectorParams(size=1536, distance=Distance.COSINE),
        "image": VectorParams(size=512, distance=Distance.COSINE),
    },
)

# Upsert with named vectors
from qdrant_client.models import PointStruct

client.upsert(
    collection_name="multimodal",
    points=[
        PointStruct(
            id=1,
            vector={"text": text_embedding, "image": image_embedding},
            payload={"name": "Product A"},
        ),
    ],
)

Fix 3: Upserting Points

from qdrant_client.models import PointStruct
import uuid

# Simple upsert
points = [
    PointStruct(
        id=i,   # int or str (UUID). Use UUIDs for distributed inserts.
        vector=embedding_vector,   # List[float] of correct size
        payload={"text": doc_text, "source": "blog", "year": 2025},
    )
    for i, (doc_text, embedding_vector) in enumerate(zip(docs, vectors))
]

client.upsert(collection_name="my_docs", points=points, wait=True)
# wait=True — block until indexed. Set False for async.

Batch upload for performance:

def upload_in_batches(client, collection_name, points, batch_size=100):
    for i in range(0, len(points), batch_size):
        batch = points[i:i + batch_size]
        client.upsert(collection_name=collection_name, points=batch)
        print(f"Uploaded {i + len(batch)} / {len(points)}")

upload_in_batches(client, "my_docs", points)

wait=False with batched upload for maximum throughput:

for batch in batches:
    client.upsert(collection_name="my_docs", points=batch, wait=False)

# After all uploads, wait for indexing to catch up
import time
while True:
    info = client.get_collection("my_docs")
    if info.status == "green":   # Fully indexed
        break
    time.sleep(1)

ID types — integers are fastest, but UUIDs are safer for distributed writes:

# Integer IDs
PointStruct(id=1, vector=vec, payload=data)
PointStruct(id=2, vector=vec, payload=data)

# UUID IDs (as string)
PointStruct(id=str(uuid.uuid4()), vector=vec, payload=data)

Don’t mix integer and UUID IDs within the same collection.

Fix 4: Filter Syntax

Qdrant’s filter language uses three top-level operators: must, should, must_not. Each takes a list of conditions.

from qdrant_client.models import (
    Filter, FieldCondition, MatchValue, MatchAny, Range,
)

# Simple equals filter
filter = Filter(
    must=[FieldCondition(key="category", match=MatchValue(value="news"))]
)

# Multiple conditions (AND)
filter = Filter(
    must=[
        FieldCondition(key="category", match=MatchValue(value="news")),
        FieldCondition(key="year", match=MatchValue(value=2025)),
    ],
)

# OR conditions (should)
filter = Filter(
    should=[
        FieldCondition(key="category", match=MatchValue(value="news")),
        FieldCondition(key="category", match=MatchValue(value="blog")),
    ],
)

# NOT conditions
filter = Filter(
    must_not=[FieldCondition(key="draft", match=MatchValue(value=True))]
)

# Match any of a list
filter = Filter(
    must=[FieldCondition(key="category", match=MatchAny(any=["news", "blog", "tutorial"]))]
)

# Range filter
filter = Filter(
    must=[
        FieldCondition(
            key="year",
            range=Range(gte=2020, lte=2025),
        ),
    ],
)

# Combined — complex filter
filter = Filter(
    must=[
        FieldCondition(key="category", match=MatchValue(value="news")),
        FieldCondition(key="year", range=Range(gte=2024)),
    ],
    must_not=[
        FieldCondition(key="archived", match=MatchValue(value=True)),
    ],
)

Apply to search:

results = client.search(
    collection_name="my_docs",
    query_vector=embedding,
    query_filter=filter,
    limit=10,
)

Nested payload fields — use dot notation:

# Payload: {"metadata": {"author": "Alice", "year": 2025}}
filter = Filter(
    must=[FieldCondition(key="metadata.author", match=MatchValue(value="Alice"))]
)

Pro Tip: Qdrant’s filter API looks verbose next to Chroma’s dict-based filters, but the Pydantic models catch type errors at construction time rather than at query time. When filtering on a field that doesn’t exist or has wrong type, you get immediate validation errors — not silent empty results.

# Unfiltered search: 20ms
# Filtered search: 5000ms

Without payload indexes, Qdrant must scan every point to apply filters — linear time. Create indexes on fields you filter by frequently:

from qdrant_client.models import PayloadSchemaType

client.create_payload_index(
    collection_name="my_docs",
    field_name="category",
    field_schema=PayloadSchemaType.KEYWORD,   # For exact string match
)

client.create_payload_index(
    collection_name="my_docs",
    field_name="year",
    field_schema=PayloadSchemaType.INTEGER,   # For integer range queries
)

client.create_payload_index(
    collection_name="my_docs",
    field_name="published_at",
    field_schema=PayloadSchemaType.DATETIME,
)

Schema types:

TypeUse for
KEYWORDExact string match (category, tag, status)
TEXTFull-text search (tokenized)
INTEGERNumeric comparisons (year, count)
FLOATDecimal comparisons (price, score)
BOOLBoolean (draft, published)
GEOGeospatial coordinates
DATETIMEISO-8601 timestamps
UUIDUUID-formatted strings

After indexing, filtered queries drop from seconds to milliseconds on large collections (>100k points).

Common Mistake: Creating a collection and immediately expecting fast filtered queries. The collection’s vector index (HNSW) is automatic, but payload indexes are not — you must create each one explicitly. Without them, filters are sequential scans.

Full-text search on a payload field:

from qdrant_client.models import MatchText

client.create_payload_index(
    collection_name="my_docs",
    field_name="content",
    field_schema=PayloadSchemaType.TEXT,
)

# Full-text filter
filter = Filter(
    must=[FieldCondition(key="content", match=MatchText(text="quantum computing"))]
)

Basic search:

results = client.search(
    collection_name="my_docs",
    query_vector=query_embedding,
    limit=10,
    score_threshold=0.7,   # Only results with score >= 0.7
    with_payload=True,     # Include payload in results (default True)
    with_vectors=False,    # Usually False — saves bandwidth
)

for result in results:
    print(f"Score: {result.score:.3f}, ID: {result.id}")
    print(f"Payload: {result.payload}")

Search with named vectors:

# If collection has multiple named vectors
results = client.search(
    collection_name="multimodal",
    query_vector=("text", text_embedding),   # Tuple: (vector_name, vector)
    limit=10,
)

score_threshold interpretation depends on distance metric:

  • COSINE: 1.0 is identical, -1.0 is opposite. Use score_threshold=0.8 typically.
  • DOT: Larger is more similar (for normalized vectors, behaves like cosine).
  • EUCLID: Smaller is more similar. score_threshold acts as an upper bound on distance.

Batch search (multiple queries in one request):

from qdrant_client.models import SearchRequest

results = client.search_batch(
    collection_name="my_docs",
    requests=[
        SearchRequest(vector=q1_embedding, limit=5),
        SearchRequest(vector=q2_embedding, limit=5),
        SearchRequest(vector=q3_embedding, limit=5),
    ],
)

for i, query_results in enumerate(results):
    print(f"Query {i}: {len(query_results)} results")

Recommendations (search using existing points as query):

results = client.recommend(
    collection_name="my_docs",
    positive=[1, 2, 3],   # IDs of positive examples
    negative=[10, 11],    # IDs of negative examples
    limit=10,
)

Fix 7: Scroll — Paginating Through All Points

# Get all points — WRONG for large collections
all_points = client.scroll(collection_name="my_docs", limit=100000)   # Blows memory

# CORRECT — paginate
offset = None
while True:
    points, offset = client.scroll(
        collection_name="my_docs",
        limit=100,
        offset=offset,
        with_payload=True,
        with_vectors=False,
    )
    for point in points:
        process(point)
    if offset is None:
        break   # No more pages

Scroll with filter — useful for exporting or cleaning up:

# Get all draft documents
offset = None
drafts = []
while True:
    points, offset = client.scroll(
        collection_name="my_docs",
        scroll_filter=Filter(
            must=[FieldCondition(key="draft", match=MatchValue(value=True))]
        ),
        limit=100,
        offset=offset,
    )
    drafts.extend(points)
    if offset is None:
        break

print(f"Found {len(drafts)} drafts")

Fix 8: Deleting Points and Collections

from qdrant_client.models import PointIdsList, FilterSelector

# Delete specific IDs
client.delete(
    collection_name="my_docs",
    points_selector=PointIdsList(points=[1, 2, 3]),
)

# Delete by filter
client.delete(
    collection_name="my_docs",
    points_selector=FilterSelector(
        filter=Filter(
            must=[FieldCondition(key="archived", match=MatchValue(value=True))]
        ),
    ),
)

# Delete entire collection
client.delete_collection(collection_name="my_docs")

Update payload (without changing the vector):

client.set_payload(
    collection_name="my_docs",
    payload={"updated_at": "2025-04-09", "version": 2},
    points=[1, 2, 3],
)

# Overwrite full payload
client.overwrite_payload(
    collection_name="my_docs",
    payload={"status": "archived"},
    points=[1],
)

# Delete specific payload keys
client.delete_payload(
    collection_name="my_docs",
    keys=["old_field"],
    points=[1, 2, 3],
)

Still Not Working?

Qdrant vs Other Vector Databases

  • Qdrant — Production-grade, fast, rich filtering, horizontal scaling. Strong choice for most production workloads.
  • Chroma — Simpler, runs in-process. Best for prototypes. For Chroma-specific patterns, see ChromaDB not working.
  • Pinecone — Managed SaaS. No ops, but costs scale.
  • Milvus — Enterprise-scale, complex to operate.
  • pgvector — Postgres extension. Good when you already run Postgres.

Quantization for Lower Memory

For very large collections (>10M vectors), enable scalar or binary quantization to reduce memory by 4–32x:

from qdrant_client.models import (
    VectorParams, Distance, ScalarQuantization, ScalarQuantizationConfig, ScalarType,
)

client.create_collection(
    collection_name="big_docs",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
    quantization_config=ScalarQuantization(
        scalar=ScalarQuantizationConfig(
            type=ScalarType.INT8,   # 4x memory reduction
            quantile=0.99,
            always_ram=True,          # Keep quantized vectors in RAM
        ),
    ),
)

Binary quantization is even more aggressive (32x reduction) but trades recall:

from qdrant_client.models import BinaryQuantization, BinaryQuantizationConfig

client.create_collection(
    collection_name="huge_docs",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
    quantization_config=BinaryQuantization(
        binary=BinaryQuantizationConfig(always_ram=True),
    ),
)

Snapshots and Backup

# Create snapshot
snapshot_info = client.create_snapshot(collection_name="my_docs")
print(snapshot_info)   # Contains download URL

# List snapshots
snapshots = client.list_snapshots(collection_name="my_docs")

# Restore from snapshot
# (via HTTP API or copy the snapshot file to Qdrant storage)

Collection Configuration Tuning

For large-scale collections, tune HNSW parameters:

from qdrant_client.models import HnswConfigDiff, OptimizersConfigDiff

client.update_collection(
    collection_name="my_docs",
    hnsw_config=HnswConfigDiff(
        m=16,                 # Graph connectivity (default 16)
        ef_construct=200,     # Index build quality (default 100)
        full_scan_threshold=10000,   # Below this count, full scan instead of HNSW
    ),
    optimizer_config=OptimizersConfigDiff(
        indexing_threshold=20000,   # Start building HNSW after this many points
    ),
)

Integration with LangChain and LlamaIndex

# LangChain
from langchain_qdrant import QdrantVectorStore

vector_store = QdrantVectorStore.from_existing_collection(
    collection_name="my_docs",
    embedding=embeddings,
    url="http://localhost:6333",
)

For LangChain-specific patterns, see LangChain Python not working.

# LlamaIndex
from llama_index.vector_stores.qdrant import QdrantVectorStore
import qdrant_client

client = qdrant_client.QdrantClient(host="localhost", port=6333)
vector_store = QdrantVectorStore(client=client, collection_name="my_docs")

For LlamaIndex setup patterns, see LlamaIndex not working. For OpenAI embedding configuration with Qdrant, see OpenAI API not working.

Debugging Empty Results

When a filtered search returns nothing unexpected, check in order:

  1. client.count(collection_name="my_docs") — are there any points at all?
  2. client.scroll(collection_name="my_docs", limit=1) — inspect a real payload to confirm field names
  3. Run the same search without the filter — vector alone should return results
  4. Run the filter as a scroll (no vector) — does the filter itself match anything?
  5. Check that all filtered fields have payload indexes for performance

Most “empty result” bugs trace to a field name mismatch (e.g., filtering on category but the payload has Category) or a wrong type (filtering year as a string when stored as integer).

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles