Skip to content

Fix: Weaviate Not Working — Client v4 Migration, Schema Setup, and Vectorizer Errors

FixDevs ·

Quick Answer

How to fix Weaviate errors — client v3 to v4 migration breaking imports, schema creation property mismatch, vectorizer module not loaded, connection refused localhost 8080, batch import errors, and hybrid search alpha tuning.

The Error

You install the Python client and tutorials don’t work:

import weaviate
client = weaviate.Client("http://localhost:8080")
# AttributeError: module 'weaviate' has no attribute 'Client'

Or the schema rejects your collection:

weaviate.exceptions.UnexpectedStatusCodeError:
Collection creation failed: data type 'string' is not supported, use 'text'

Or the vectorizer module isn’t available:

no module 'text2vec-openai' configured on cluster

Or batch imports fail silently and you only notice missing data later:

with client.batch as batch:
    for doc in docs:
        batch.add_data_object(...)
# Some succeed, some don't — no exception raised

Or hybrid search alpha tuning produces unexpected results:

client.query.get("Article", ["title", "content"]) \
    .with_hybrid(query="ML", alpha=0.5) \
    .do()
# Returns only keyword matches, no semantic matches

Weaviate is the hybrid vector database — combines vector similarity with keyword search and GraphQL queries. The v4 Python client (released late 2023) introduced a completely new API surface with type-safe collections, breaking every v3 tutorial. The module system (text2vec-openai, text2vec-cohere, text2vec-transformers) requires explicit configuration on the cluster side. This guide covers each.

Why This Happens

Weaviate’s v4 client redesigned the API around typed Collections instead of dictionaries — much cleaner but completely incompatible with v3. Most tutorials online predate this and use weaviate.Client(url) which doesn’t exist in v4.

Vectorizer modules run on the Weaviate server — when you write a document, the server calls the module to compute the embedding. If the module isn’t enabled (via env vars at startup), schema creation fails or documents get no vectors.

Fix 1: v3 to v4 Client Migration

# OLD — v3 (broken in v4)
import weaviate

client = weaviate.Client("http://localhost:8080")
client.schema.create_class({...})

# NEW — v4
import weaviate

client = weaviate.connect_to_local()
# Or for cloud
client = weaviate.connect_to_wcs(
    cluster_url="https://...weaviate.network",
    auth_credentials=weaviate.auth.AuthApiKey("..."),
)

# Always close
client.close()

Context manager (recommended):

import weaviate

with weaviate.connect_to_local() as client:
    # Operations here
    print(client.is_ready())

v3 → v4 API changes:

v3v4
weaviate.Client(url)weaviate.connect_to_local() / connect_to_wcs()
client.schema.create_class({...})client.collections.create(...)
client.data_object.create({...}, "Article")articles = client.collections.get("Article"); articles.data.insert({...})
client.query.get("Article", ["title"]).do()articles.query.fetch_objects()
Dict-based responsesTyped objects with attributes

Connection helpers:

# Local Weaviate (Docker on localhost:8080)
client = weaviate.connect_to_local()

# Custom local config
client = weaviate.connect_to_local(
    host="localhost",
    port=8080,
    grpc_port=50051,
    headers={"X-OpenAI-Api-Key": "sk-..."},   # For OpenAI vectorizer
)

# Weaviate Cloud Services
client = weaviate.connect_to_wcs(
    cluster_url="https://your-cluster.weaviate.network",
    auth_credentials=weaviate.auth.AuthApiKey("your-api-key"),
    headers={"X-OpenAI-Api-Key": "sk-..."},
)

# Custom URL with auth
client = weaviate.connect_to_custom(
    http_host="weaviate.example.com",
    http_port=443,
    http_secure=True,
    grpc_host="weaviate.example.com",
    grpc_port=443,
    grpc_secure=True,
    auth_credentials=weaviate.auth.AuthApiKey("..."),
)

Common Mistake: Following a tutorial that uses weaviate.Client(url) — that API doesn’t exist in v4. The error (module has no attribute 'Client') is clear, but new users assume v4 is broken. Always check the tutorial date — anything before late 2023 uses v3.

Fix 2: Creating Collections (Schemas)

import weaviate
import weaviate.classes.config as wvc

with weaviate.connect_to_local() as client:
    client.collections.create(
        name="Article",
        properties=[
            wvc.Property(name="title", data_type=wvc.DataType.TEXT),
            wvc.Property(name="content", data_type=wvc.DataType.TEXT),
            wvc.Property(name="published_at", data_type=wvc.DataType.DATE),
            wvc.Property(name="author_id", data_type=wvc.DataType.INT),
            wvc.Property(name="tags", data_type=wvc.DataType.TEXT_ARRAY),
        ],
        vectorizer_config=wvc.Configure.Vectorizer.text2vec_openai(
            model="text-embedding-3-small",
        ),
        generative_config=wvc.Configure.Generative.openai(
            model="gpt-4o-mini",
        ),
    )

Data types:

v4 DataTypeUse for
TEXTStrings (full-text + vector indexed)
TEXT_ARRAYList of strings
INTIntegers
NUMBERFloats
BOOLBooleans
DATERFC 3339 datetime
UUIDUUID strings
GEO_COORDINATESGeo lat/lng
BLOBBase64-encoded binary
OBJECTNested object
OBJECT_ARRAYList of nested objects

Common error — wrong data type:

data type 'string' is not supported, use 'text'

Weaviate v1.18+ requires TEXT not string. The v4 client uses the new names; manual REST API calls or v3 examples may use string and fail.

Vectorizer module options:

# OpenAI embeddings (requires X-OpenAI-Api-Key header)
vectorizer_config=wvc.Configure.Vectorizer.text2vec_openai(
    model="text-embedding-3-small",   # or text-embedding-3-large
)

# Cohere
vectorizer_config=wvc.Configure.Vectorizer.text2vec_cohere(
    model="embed-english-v3.0",
)

# Local HuggingFace transformer
vectorizer_config=wvc.Configure.Vectorizer.text2vec_huggingface(
    model="sentence-transformers/all-MiniLM-L6-v2",
)

# Local Weaviate transformers container
vectorizer_config=wvc.Configure.Vectorizer.text2vec_transformers()

# No vectorizer — provide your own vectors at insert time
vectorizer_config=wvc.Configure.Vectorizer.none()

Fix 3: Enabling Modules on the Server

no module 'text2vec-openai' configured on cluster

Modules must be enabled when starting Weaviate.

Docker Compose example:

# docker-compose.yml
services:
  weaviate:
    image: cr.weaviate.io/semitechnologies/weaviate:latest
    ports:
      - "8080:8080"
      - "50051:50051"
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"
      PERSISTENCE_DATA_PATH: "/var/lib/weaviate"
      DEFAULT_VECTORIZER_MODULE: "text2vec-openai"
      ENABLE_MODULES: "text2vec-openai,text2vec-cohere,text2vec-huggingface,generative-openai,qna-openai,ref2vec-centroid"
      CLUSTER_HOSTNAME: "node1"
    volumes:
      - ./weaviate_data:/var/lib/weaviate

Pass API keys at query time (recommended over baking them into the cluster):

client = weaviate.connect_to_local(
    headers={
        "X-OpenAI-Api-Key": "sk-...",
        "X-Cohere-Api-Key": "...",
    },
)

The cluster forwards these headers to the vectorizer module. Multiple users on the same cluster can use their own API keys.

Verify modules:

meta = client.get_meta()
print(meta["modules"])
# {'text2vec-openai': {...}, 'generative-openai': {...}, ...}

Pro Tip: Enable more modules than you currently use — they’re inert until referenced. Enabling text2vec-openai, text2vec-cohere, text2vec-huggingface, generative-openai, and generative-cohere covers most use cases. Adding modules later requires a server restart.

Fix 4: Inserting and Batch Imports

articles = client.collections.get("Article")

# Single insert
articles.data.insert({
    "title": "My Article",
    "content": "Full text...",
    "published_at": "2025-04-24T10:00:00Z",
})

# Batch insert (efficient for many items)
with articles.batch.dynamic() as batch:
    for doc in documents:
        batch.add_object(
            properties={
                "title": doc["title"],
                "content": doc["content"],
            },
        )

# Check for batch errors
if len(articles.batch.failed_objects) > 0:
    for failed in articles.batch.failed_objects:
        print(f"Failed: {failed.message}")

Common Mistake: Not checking failed_objects after a batch import. Weaviate’s batch API queues operations and continues on failures — you’d never know rows were dropped without checking. Always inspect failed_objects and failed_references after a batch:

with articles.batch.dynamic() as batch:
    for doc in documents:
        batch.add_object(properties=doc)
    # Batch flushes on context exit

print(f"Inserted: {len(documents) - len(articles.batch.failed_objects)}")
print(f"Failed: {len(articles.batch.failed_objects)}")
for failed in articles.batch.failed_objects[:5]:   # Show first 5 errors
    print(f"  {failed.original_uuid}: {failed.message}")

Batch with custom vectors (skip the vectorizer):

import numpy as np

with articles.batch.dynamic() as batch:
    for doc, vec in zip(documents, embeddings):
        batch.add_object(
            properties=doc,
            vector=vec.tolist(),   # Use your pre-computed embedding
        )

Fixed-size batch:

with articles.batch.fixed_size(batch_size=200, concurrent_requests=2) as batch:
    for doc in documents:
        batch.add_object(properties=doc)

fixed_size flushes when the batch hits batch_size; dynamic adapts based on server response time.

Fix 5: Querying — Vector, Keyword, and Hybrid

articles = client.collections.get("Article")

# Vector search (semantic)
response = articles.query.near_text(
    query="machine learning",
    limit=10,
)
for obj in response.objects:
    print(obj.properties["title"], obj.metadata.distance)

# Keyword search (BM25)
response = articles.query.bm25(
    query="machine learning",
    limit=10,
)

# Hybrid search (combines both)
response = articles.query.hybrid(
    query="machine learning",
    alpha=0.75,   # 0 = pure keyword, 1 = pure vector
    limit=10,
)

alpha parameter for hybrid:

AlphaBehavior
0.0Pure BM25 keyword search
0.25Mostly keyword, some vector
0.5Balanced (default)
0.75Mostly vector, some keyword
1.0Pure vector search

Common Mistake: Setting alpha=0.5 and being surprised by what dominates. The two scores have different distributions — BM25 scores are unbounded, vector scores are typically [0, 1]. Weaviate normalizes them, but the exact balance varies by query. Try alpha=0.7 or alpha=0.3 and test on real queries; the “right” alpha is workload-specific.

Filter results:

import weaviate.classes.query as wq

response = articles.query.hybrid(
    query="machine learning",
    filters=wq.Filter.by_property("author_id").equal(123),
    limit=10,
)

# Compound filter
filters = wq.Filter.all_of([
    wq.Filter.by_property("published_at").greater_than("2024-01-01"),
    wq.Filter.by_property("tags").contains_any(["ml", "ai"]),
])

response = articles.query.near_text(
    query="...",
    filters=filters,
    limit=10,
)

Return specific properties:

import weaviate.classes.query as wq

response = articles.query.fetch_objects(
    return_properties=["title", "author_id"],
    return_metadata=wq.MetadataQuery(distance=True, creation_time=True),
    limit=10,
)

Fix 6: References Between Collections

# Create related collections
client.collections.create(
    name="Author",
    properties=[wvc.Property(name="name", data_type=wvc.DataType.TEXT)],
)

client.collections.create(
    name="Article",
    properties=[
        wvc.Property(name="title", data_type=wvc.DataType.TEXT),
    ],
    references=[
        wvc.ReferenceProperty(name="author", target_collection="Author"),
    ],
)

Insert with reference:

import uuid

author_uuid = client.collections.get("Author").data.insert(
    {"name": "Alice"},
)

article_uuid = client.collections.get("Article").data.insert(
    properties={"title": "ML Intro"},
    references={"author": author_uuid},
)

Query with references:

import weaviate.classes.query as wq

response = articles.query.fetch_objects(
    return_references=wq.QueryReference(
        link_on="author",
        return_properties=["name"],
    ),
    limit=10,
)

for obj in response.objects:
    print(obj.properties["title"])
    for ref in obj.references["author"].objects:
        print(f"  by {ref.properties['name']}")

For comparing Weaviate’s reference model to other vector databases, see Pinecone not working and Qdrant not working.

Fix 7: Generative Search (RAG Built-In)

Weaviate can run generation directly — pass retrieved docs to an LLM as context, return generated text:

import weaviate.classes.generate as wgen

articles = client.collections.get("Article")

response = articles.generate.near_text(
    query="machine learning basics",
    grouped_task="Summarize these articles in 2 paragraphs.",
    limit=5,
)

print(response.generated)   # Single generated text from all 5 docs

Per-result generation:

response = articles.generate.near_text(
    query="how do transformers work",
    single_prompt="Rewrite this title as a question: {title}",
    limit=5,
)

for obj in response.objects:
    print(obj.properties["title"])
    print(f"  → {obj.generated}")   # Generated per object

Configure generative module per query:

import weaviate.classes.generate as wgen
import weaviate.classes.config as wvc

# At collection level
client.collections.create(
    name="Article",
    properties=[...],
    generative_config=wvc.Configure.Generative.openai(model="gpt-4o-mini"),
)

# Override at query time (requires X-OpenAI-Api-Key header on the client)
response = articles.generate.near_text(
    query="...",
    grouped_task="Summarize",
    generative_provider=wgen.GenerativeConfig.openai(model="gpt-4o"),
)

For LLM API patterns that interact with Weaviate’s generative module, see OpenAI API not working.

Fix 8: Backup and Migration

# Backup to filesystem (set up BACKUP_FILESYSTEM_PATH in docker env)
client.backup.create(
    backup_id="backup-2025-04-24",
    backend="filesystem",
    include_collections=["Article", "Author"],
    wait_for_completion=True,
)

# Restore
client.backup.restore(
    backup_id="backup-2025-04-24",
    backend="filesystem",
    include_collections=["Article"],
    wait_for_completion=True,
)

S3 backup backend:

# docker-compose.yml additions
environment:
  ENABLE_MODULES: "backup-s3,..."
  BACKUP_S3_BUCKET: "my-weaviate-backups"
  BACKUP_S3_ENDPOINT: "s3.amazonaws.com"
  AWS_ACCESS_KEY_ID: "..."
  AWS_SECRET_ACCESS_KEY: "..."
client.backup.create(backup_id="weekly-2025-04-24", backend="s3")

Common Mistake: Backing up before a schema change, then restoring after the schema change — Weaviate restores the OLD schema and your new properties are lost. Backup includes schema. Always backup AFTER schema changes you want to preserve.

Still Not Working?

Weaviate vs Other Vector DBs

  • Weaviate — Hybrid search built-in, GraphQL queries, generative search module, references between collections. Best for RAG with structured metadata.
  • Pinecone — Managed SaaS, simpler, no built-in hybrid. See Pinecone not working.
  • Qdrant — Strong filtering, self-hostable. See Qdrant not working.
  • ChromaDB — Simplest for prototypes. See ChromaDB not working.

Weaviate’s hybrid search and generative module make it strong for RAG. Choose Pinecone for zero-ops; Qdrant for self-hosted with rich filters.

Connection Refused Errors

ConnectionRefusedError: [Errno 111] Connection refused

Weaviate not running. Verify:

docker ps                            # Is the container up?
curl http://localhost:8080/v1/.well-known/ready   # Ready endpoint

If Docker says “exited”, check logs:

docker logs weaviate-1
# Look for "module not found", "permission denied", or memory errors

Multi-Tenancy

Weaviate supports tenant isolation — each tenant gets its own vector index, with independent data:

import weaviate.classes.config as wvc

client.collections.create(
    name="Article",
    properties=[wvc.Property(name="title", data_type=wvc.DataType.TEXT)],
    multi_tenancy_config=wvc.Configure.multi_tenancy(enabled=True),
)

articles = client.collections.get("Article")

# Create tenants
articles.tenants.create([
    wvc.Tenant(name="customer-a"),
    wvc.Tenant(name="customer-b"),
])

# Insert into specific tenant
tenant_a = articles.with_tenant("customer-a")
tenant_a.data.insert({"title": "Customer A's article"})

# Query within tenant
response = articles.with_tenant("customer-a").query.fetch_objects()

Multi-tenancy is far more efficient than putting tenant IDs in metadata and filtering — Weaviate stores separate HNSW indexes per tenant, queries are isolated, and inactive tenants can be offloaded to disk to save RAM.

Cross-References vs Embedded Properties

When modeling related data, choose between references (separate collection + link) or embedded objects (nested property):

# Embedded (denormalized — duplicate data, fast queries)
client.collections.create(
    name="Article",
    properties=[
        wvc.Property(name="title", data_type=wvc.DataType.TEXT),
        wvc.Property(
            name="author",
            data_type=wvc.DataType.OBJECT,
            nested_properties=[
                wvc.Property(name="name", data_type=wvc.DataType.TEXT),
                wvc.Property(name="bio", data_type=wvc.DataType.TEXT),
            ],
        ),
    ],
)

# Referenced (normalized — single source of truth, requires join queries)
# See Fix 6 above

Use embedded for immutable or rarely-updated relationships (article author at write time). Use references when the related data changes frequently (current user profile).

Memory and Resource Limits

Weaviate’s vector indexes live in RAM by default. For collections with millions of objects:

# Configure HNSW to use disk + flat compression
client.collections.create(
    name="Article",
    properties=[...],
    vector_index_config=wvc.Configure.VectorIndex.hnsw(
        ef_construction=128,
        max_connections=16,
        vector_cache_max_objects=100_000,
    ),
)

For Docker memory tuning that affects Weaviate cluster stability, see Kubernetes OOMKilled.

Async Client

import weaviate

async with weaviate.use_async_with_local() as client:
    await client.is_ready()
    articles = client.collections.get("Article")
    response = await articles.query.fetch_objects(limit=10)

For async patterns that pair with Weaviate, see Python asyncio not running.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles