Fix: Milvus Not Working — Connection Errors, Schema Setup, and Index Build Failures
Quick Answer
How to fix Milvus errors — pymilvus connection refused localhost 19530, collection schema mismatch, index not built before search, partition not found, embedded vs standalone vs cluster, and flush before search.
The Error
You install pymilvus and the first connection fails:
MilvusException: <MilvusException: (code=2, message=Fail connecting to server on localhost:19530. Timeout)>Or you create a collection and inserts fail with schema errors:
MilvusException: (code=1, message=field dimension mismatch, expected 1536, got 768)Or you query without building an index:
MilvusException: (code=22, message=collection not loaded)
MilvusException: (code=15, message=index not exist)Or you insert data and immediately search — but no results come back:
collection.insert(rows)
results = collection.search(query_vec, ...)
# Empty results, even though you just inserted matching dataOr partitions get confused:
MilvusException: (code=23, message=partition not found)Or you try to use Milvus Lite and the API differs from Milvus standalone:
client = MilvusClient("milvus_lite.db")
client.search(...)
# But your code uses Collection() API — incompatibleMilvus is the most production-scale open-source vector database — used at companies with billions of vectors. It’s heavier than Chroma or Qdrant (requires multiple coordinators, query nodes, and data nodes in cluster mode) but scales further. The Python client pymilvus has two APIs (legacy Collection and modern MilvusClient), three deployment modes (Lite, Standalone, Cluster), and the index-then-load workflow that’s unique among vector DBs. This guide covers each common failure.
Why This Happens
Milvus separates insert and search more strictly than other vector DBs. Inserted vectors live in a “growing segment” until they’re “sealed” — only sealed segments can be indexed, and only indexed segments are searched. Search before indexing returns nothing or errors; insert and immediate search may miss recent data unless you flush().
The two client APIs (Collection vs MilvusClient) reflect two generations of pymilvus. MilvusClient (added in pymilvus 2.3+) is the modern, simpler API; Collection is the lower-level legacy API still widely documented in tutorials.
Fix 1: Choosing a Deployment Mode
Milvus Lite — embedded, file-based (great for prototypes):
pip install pymilvusfrom pymilvus import MilvusClient
client = MilvusClient("./milvus_lite.db")
# All data in a single SQLite-like file
# No server neededMilvus Standalone — single-process server:
# Docker
docker run -d --name milvus-standalone \
-p 19530:19530 -p 9091:9091 \
-v $(pwd)/volumes/milvus:/var/lib/milvus \
milvusdb/milvus:latest milvus run standalonefrom pymilvus import MilvusClient
client = MilvusClient(uri="http://localhost:19530")Milvus Cluster — distributed (multiple nodes, requires Kubernetes for serious deployments):
# Helm
helm install my-milvus milvus/milvus --set cluster.enabled=trueComparison:
| Mode | Vectors | Use case |
|---|---|---|
| Lite | < 1M | Prototyping, embedded apps, single user |
| Standalone | < 10M | Small production, single-machine |
| Cluster | Billions+ | Production at scale |
Common Mistake: Starting with Milvus Cluster for a 100k-vector workload. The operational complexity (8+ pods, etcd, MinIO, Pulsar) is overkill — Lite or Standalone runs the same workload with one process and 10x less ops burden. Scale up to Cluster only when you measurably need it.
Fix 2: MilvusClient vs Collection API
Modern API (MilvusClient, recommended):
from pymilvus import MilvusClient
client = MilvusClient(uri="http://localhost:19530")
# Create collection with schema
client.create_collection(
collection_name="articles",
dimension=1536,
metric_type="COSINE",
primary_field_name="id",
vector_field_name="embedding",
)
# Insert
client.insert(
collection_name="articles",
data=[
{"id": 1, "embedding": [0.1, 0.2, ...], "title": "Article 1"},
{"id": 2, "embedding": [0.3, 0.4, ...], "title": "Article 2"},
],
)
# Search
results = client.search(
collection_name="articles",
data=[query_embedding],
limit=10,
output_fields=["title"],
)Legacy API (Collection):
from pymilvus import (
connections, Collection, FieldSchema, CollectionSchema, DataType,
)
connections.connect(host="localhost", port="19530")
# Define schema
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=1536),
FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=200),
]
schema = CollectionSchema(fields=fields)
collection = Collection(name="articles", schema=schema)
# Build index
collection.create_index(
field_name="embedding",
index_params={"metric_type": "COSINE", "index_type": "HNSW", "params": {"M": 16, "efConstruction": 200}},
)
# Load to memory
collection.load()
# Insert
collection.insert([[1, 2], [vec1, vec2], ["a", "b"]])
# Search
results = collection.search(
[query_vec], "embedding", {"metric_type": "COSINE"}, limit=10,
)Use MilvusClient for new code. The legacy API still works but the modern client is simpler and converges to the same backend. Most online tutorials use Collection — translate to MilvusClient for cleaner code.
Fix 3: Schema Setup and Field Types
from pymilvus import MilvusClient, DataType
client = MilvusClient(uri="http://localhost:19530")
# Explicit schema with multiple fields
schema = client.create_schema(auto_id=True, enable_dynamic_field=True)
schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name="embedding", datatype=DataType.FLOAT_VECTOR, dim=1536)
schema.add_field(field_name="title", datatype=DataType.VARCHAR, max_length=500)
schema.add_field(field_name="published_at", datatype=DataType.INT64) # Unix timestamp
schema.add_field(field_name="tags", datatype=DataType.ARRAY, element_type=DataType.VARCHAR, max_capacity=10, max_length=50)
# Prepare index parameters
index_params = client.prepare_index_params()
index_params.add_index(
field_name="embedding",
index_type="HNSW",
metric_type="COSINE",
params={"M": 16, "efConstruction": 200},
)
client.create_collection(
collection_name="articles",
schema=schema,
index_params=index_params,
)Data types:
| DataType | Use for |
|---|---|
INT8, INT16, INT32, INT64 | Integer fields, primary key |
FLOAT, DOUBLE | Numeric fields |
BOOL | Boolean |
VARCHAR | Strings (specify max_length) |
JSON | Arbitrary JSON values |
ARRAY | Lists of primitives (specify element_type and max_capacity) |
FLOAT_VECTOR | Dense vectors (specify dim) |
BINARY_VECTOR | Binary vectors (dimension in bits) |
SPARSE_FLOAT_VECTOR | Sparse vectors (for hybrid search) |
Dynamic fields — let you add arbitrary fields per row without schema changes:
schema = client.create_schema(auto_id=True, enable_dynamic_field=True)
# Now insert can include any fields
client.insert(
collection_name="articles",
data=[
{"embedding": vec, "title": "...", "any_extra_field": "value"},
],
)Common Mistake: Forgetting max_length on VARCHAR fields. Milvus requires explicit length limits; without one, schema creation fails with a confusing error about “field has no max length”. Always set max_length=N for VARCHAR — pick generously since the cost is small.
Fix 4: Index Building (Required Before Search)
MilvusException: (code=15, message=index not exist)Milvus requires an index on the vector field before you can search. Unlike Chroma/Qdrant where indexes are automatic, Milvus needs explicit create_index():
# After creating the collection
index_params = client.prepare_index_params()
index_params.add_index(
field_name="embedding",
index_type="HNSW", # Or IVF_FLAT, IVF_SQ8, IVF_PQ, FLAT, AUTOINDEX
metric_type="COSINE",
params={"M": 16, "efConstruction": 200},
)
client.create_index(
collection_name="articles",
index_params=index_params,
)Index types:
| Index | Best for |
|---|---|
FLAT | Exact search, < 10k vectors |
IVF_FLAT | Medium datasets (10k–1M) |
IVF_SQ8 | Like IVF_FLAT but 4x less memory (slight accuracy loss) |
IVF_PQ | Large datasets (>1M) where memory matters |
HNSW | Best speed/accuracy tradeoff, default for many workloads |
AUTOINDEX | Let Milvus pick based on data size (Milvus 2.4+) |
DISKANN | Datasets too large for RAM (disk-based) |
GPU_IVF_FLAT, GPU_IVF_PQ | GPU-accelerated (Milvus GPU build) |
Load collection to memory before searching:
client.load_collection(collection_name="articles")
# Vectors and index now in memory; search works
# When done, free memory
client.release_collection(collection_name="articles")Pro Tip: Use AUTOINDEX for the simplest path — Milvus picks a sensible index type and parameters based on your data size. Only specify HNSW/IVF/PQ explicitly when you’ve benchmarked and know a specific choice wins. AUTOINDEX defaults are tuned by the Milvus team based on extensive testing.
Fix 5: Flush Before Search (or Wait)
client.insert(collection_name="articles", data=[...])
results = client.search(collection_name="articles", data=[query_vec], limit=10)
# Empty results — insertions haven't been "flushed" to searchable segmentsMilvus buffers inserts in growing segments. Searches only see sealed segments by default. To search immediately after insert:
client.insert(collection_name="articles", data=[...])
client.flush(collection_name="articles") # Force seal of current growing segment
client.search(...) # Now sees the insertsOr use consistency_level:
results = client.search(
collection_name="articles",
data=[query_vec],
limit=10,
consistency_level="Strong", # Wait for all data to be searchable
)Consistency levels:
| Level | Behavior |
|---|---|
Strong | Always search latest data (slowest) |
Bounded | Search data up to N seconds old (default) |
Session | See your own writes |
Eventually | Search whatever’s available (fastest, may miss recent inserts) |
Common Mistake: Calling flush() after every insert. Each flush triggers segment compaction — frequent flushing kills throughput. For bulk loads, insert thousands of rows, then flush once. For real-time apps, use consistency_level="Session" so your queries see your writes without triggering full flushes.
Fix 6: Searching with Filters
results = client.search(
collection_name="articles",
data=[query_vec],
limit=10,
output_fields=["title", "published_at"],
filter="published_at > 1700000000 and tags contains 'ml'",
search_params={"params": {"ef": 50}}, # HNSW search-time parameter
)Filter syntax uses boolean expressions:
# Comparison
filter='age > 18'
filter='name == "Alice"'
filter='status != "deleted"'
# Logical
filter='age > 18 and status == "active"'
filter='age < 13 or age > 65'
# IN
filter='category in ["news", "blog"]'
# String functions
filter='title like "Intro to %"'
# Array contains
filter='tags contains "machine-learning"'
filter='tags contains_any ["ai", "ml"]'
# JSON field access
filter='metadata["author_id"] == 123'Filter performance — Milvus uses scalar indexes if you create them:
client.create_index(
collection_name="articles",
index_params=client.prepare_index_params().add_index(
field_name="published_at",
index_type="STL_SORT", # Sorted index for range queries
),
)Without scalar indexes, filtered queries scan all rows — slow for selective filters on large collections.
Fix 7: Partitions for Multi-Tenancy
# Create partitions
client.create_partition(
collection_name="articles",
partition_name="2024",
)
client.create_partition(
collection_name="articles",
partition_name="2025",
)
# Insert into specific partition
client.insert(
collection_name="articles",
partition_name="2025",
data=[...],
)
# Search in specific partition (much faster than searching all)
results = client.search(
collection_name="articles",
partition_names=["2025"],
data=[query_vec],
limit=10,
)Use partitions for time-based or tenant-based data separation. Partition queries skip unrelated data entirely — major speedup vs scanning everything and filtering after.
Common Mistake: Creating thousands of partitions (e.g., one per user). Milvus has overhead per partition and supports a few hundred efficiently. For high-cardinality tenancy, use scalar field filters instead. Partitions work best for low-cardinality coarse groupings (year, region, environment).
For comparing Milvus partitions to Weaviate’s multi-tenancy or Qdrant’s collections, see Weaviate not working and Qdrant not working.
Fix 8: Hybrid Search (Sparse + Dense)
Milvus 2.4+ supports hybrid search combining dense and sparse vectors:
from pymilvus import MilvusClient, DataType, AnnSearchRequest, RRFRanker
client = MilvusClient(uri="http://localhost:19530")
# Schema with both vector types
schema = client.create_schema()
schema.add_field("id", DataType.INT64, is_primary=True)
schema.add_field("dense_vec", DataType.FLOAT_VECTOR, dim=1536)
schema.add_field("sparse_vec", DataType.SPARSE_FLOAT_VECTOR)
schema.add_field("text", DataType.VARCHAR, max_length=1000)
# Index both
index_params = client.prepare_index_params()
index_params.add_index("dense_vec", index_type="HNSW", metric_type="COSINE", params={"M": 16})
index_params.add_index("sparse_vec", index_type="SPARSE_INVERTED_INDEX", metric_type="IP")
client.create_collection(collection_name="hybrid_docs", schema=schema, index_params=index_params)
# Insert with both vector types
client.insert(
collection_name="hybrid_docs",
data=[
{
"id": 1,
"dense_vec": dense_embedding, # From embedding model
"sparse_vec": {0: 0.5, 42: 0.3, 100: 0.8}, # From BM25/SPLADE
"text": "Article content...",
},
],
)
# Hybrid search
dense_req = AnnSearchRequest(
data=[query_dense],
anns_field="dense_vec",
param={"params": {"ef": 50}},
limit=10,
)
sparse_req = AnnSearchRequest(
data=[query_sparse],
anns_field="sparse_vec",
param={},
limit=10,
)
results = client.hybrid_search(
collection_name="hybrid_docs",
reqs=[dense_req, sparse_req],
ranker=RRFRanker(k=60), # Reciprocal Rank Fusion
limit=5,
)Ranker options:
RRFRanker(k=60)— Reciprocal Rank Fusion, parameter-free aggregationWeightedRanker(0.7, 0.3)— Weighted combination (dense 0.7, sparse 0.3)
For comparing Milvus hybrid search to alternatives, see Pinecone not working (Pinecone also supports sparse-dense hybrid).
Still Not Working?
Milvus vs Other Vector DBs
- Milvus — Production-scale, multiple index types, GPU support, complex deployment. Best for billion-vector workloads.
- Chroma — Simplest, embedded. See ChromaDB not working.
- Qdrant — Self-hosted with rich filters. See Qdrant not working.
- Weaviate — Hybrid + GraphQL + generative built-in. See Weaviate not working.
- Pinecone — Managed, zero ops. See Pinecone not working.
Milvus is the right choice when you’ve outgrown Chroma/Qdrant and need to scale to billions of vectors. For smaller workloads, the deployment complexity isn’t worth it.
Connection Timeouts in Docker
MilvusException: (code=2, message=Fail connecting to server)If running Milvus in Docker and connecting from outside:
docker run -d --name milvus-standalone \
-p 19530:19530 -p 9091:9091 \
milvusdb/milvus:latest milvus run standaloneThen in Python:
client = MilvusClient(uri="http://localhost:19530")If Milvus is on a remote host, ensure firewall allows port 19530 (gRPC) and optionally 9091 (HTTP).
Authentication
Milvus 2.3+ supports user authentication:
client = MilvusClient(
uri="http://localhost:19530",
token="root:Milvus", # username:password
)For Zilliz Cloud (managed Milvus):
client = MilvusClient(
uri="https://your-cluster.api.gcp-us-west1.zillizcloud.com",
token="your-api-key",
)Embedding Models
Pymilvus integrates with embedding providers:
from pymilvus.model.dense import OpenAIEmbeddingFunction
ef = OpenAIEmbeddingFunction(model_name="text-embedding-3-small", api_key="sk-...")
embeddings = ef.encode_documents(["text 1", "text 2"])
client.insert(
collection_name="articles",
data=[
{"id": 1, "embedding": embeddings[0], "text": "text 1"},
],
)For OpenAI API patterns that pair with Milvus, see OpenAI API not working. For HuggingFace alternatives, see HuggingFace Transformers not working.
LangChain and LlamaIndex Integration
# LangChain
from langchain_milvus import Milvus
from langchain_openai import OpenAIEmbeddings
vector_store = Milvus(
embedding_function=OpenAIEmbeddings(),
collection_name="articles",
connection_args={"uri": "http://localhost:19530"},
)
vector_store.add_texts(["doc 1", "doc 2"])
results = vector_store.similarity_search("query", k=5)For LangChain integration patterns, see LangChain Python not working.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: ChromaDB Not Working — Persistent Client, Collection Errors, and Embedding Function Issues
How to fix ChromaDB errors — persistent client not saving data, collection already exists error, dimension mismatch in embeddings, embedding function required, HTTP client connection refused, and memory growing unbounded.
Fix: Pinecone Not Working — Index Creation, Serverless vs Pod, and Python SDK v3 Migration
How to fix Pinecone errors — ApiException 401 unauthorized, index not found, dimension mismatch, serverless spec required, Python SDK v3 breaking changes, namespace confusion, and upsert rate limit 429.
Fix: Qdrant Not Working — Connection Errors, Collection Setup, and Filter Syntax Issues
How to fix Qdrant errors — connection refused to localhost 6333, collection not found create_collection, vector size mismatch, filter must match schema, payload index missing slow queries, and timeout on large batch uploads.
Fix: Weaviate Not Working — Client v4 Migration, Schema Setup, and Vectorizer Errors
How to fix Weaviate errors — client v3 to v4 migration breaking imports, schema creation property mismatch, vectorizer module not loaded, connection refused localhost 8080, batch import errors, and hybrid search alpha tuning.