Fix: Pinecone Not Working — Index Creation, Serverless vs Pod, and Python SDK v3 Migration

Q: How do I fix "Pinecone Not Working — Index Creation, Serverless vs Pod, and Python SDK v3 Migration"?

How to fix Pinecone errors — ApiException 401 unauthorized, index not found, dimension mismatch, serverless spec required, Python SDK v3 breaking changes, namespace confusion, and upsert rate limit 429.

The Error

You try to connect to Pinecone and get a 401:

pinecone.core.openapi.shared.exceptions.UnauthorizedException:
(401) Invalid API Key

Or the v3 SDK raises an import error that doesn’t match any tutorial:

AttributeError: module 'pinecone' has no attribute 'init'

Or you try to create an index and the error mentions a required spec:

ValueError: create_index() requires spec= argument specifying either ServerlessSpec or PodSpec

Or you upload vectors and the dimension doesn’t match your index:

400 Client Error: Bad Request: Vector dimension 1536 does not match index dimension 384

Or upserts suddenly rate-limit under load:

PineconeApiException: (429) Too Many Requests: Rate limit exceeded

Pinecone is the dominant managed vector database — fully hosted, scalable, no infrastructure to manage. But the Python SDK had a major rewrite (v3, December 2023) that breaks every tutorial written before then. Old code using pinecone.init() fails immediately. This guide covers the migration plus the specific errors that come from the managed model.

Why This Happens

Pinecone v3 replaced the module-level API (pinecone.init(), pinecone.Index()) with an instance-based API (Pinecone(api_key=), pc.Index(name)). Code written for v2 doesn’t work — there’s no compatibility layer. Most Pinecone articles on the internet are v2 and break immediately when copied.

Pinecone’s serverless indexes (the new default) have different semantics than pod-based indexes: pay-per-query instead of reserved compute, different scaling characteristics, and different region availability. Creating an index without specifying spec= fails because the SDK doesn’t know which type you want.

Diagnostic Timeline

Most teams reach for the wrong fix first. Here is how the real debugging session usually unfolds when Pinecone “stops working” in production.

Minute 0 — First guess: re-upsert everything. You see queries returning empty or scoring oddly, so you wipe the index and re-upsert your full corpus. The upsert succeeds, but queries still return the same garbage. Re-upsert almost never fixes Pinecone — the data is rarely the problem.

Minute 5 — Check describe_index_stats(). Look at total_vector_count per namespace. If your app writes to namespace prod but you are querying namespace "" (or vice versa), the query sees zero vectors. Namespace drift between writer and reader is the single most common cause of “empty results despite successful upserts.”

Minute 10 — Confirm index dimension. Run pc.describe_index("my-index").dimension and compare with len(embedding) from your model. A 384-dim collection cannot accept 1536-dim vectors, and Pinecone silently rejects mismatched batches in some SDK versions instead of raising. If you switched embedding models without rebuilding the index, this is your root cause.

Minute 20 — Inspect metadata size. If upserts succeed for small docs but fail at scale with 400 Bad Request: metadata too large, you are hitting the 40 KB per-vector metadata limit. Teams that store full article text in metadata hit this around 8-10k characters per vector.

Minute 30 — Serverless cold start. Serverless indexes idle to near-zero compute. The first query after an idle period can take 2-5 seconds while the index warms. If your monitoring shows occasional p99 spikes but steady p50, this is the cause, not a regression.

Minute 45 — Region and rate limits. Check the Pinecone console for the index’s read-unit and write-unit consumption. A 429 is obvious, but throttled writes also manifest as silent partial upserts where index.upsert() returns success but only some vectors land. Wrap upserts in a verification step that compares input count against post-upsert total_vector_count delta.

Fix 1: Python SDK v3 Migration

Old (v2) pattern — broken in v3:

import pinecone

pinecone.init(api_key="...", environment="us-west1-gcp")   # AttributeError in v3
index = pinecone.Index("my-index")

New (v3) pattern:

from pinecone import Pinecone

pc = Pinecone(api_key="your-api-key")   # No 'environment' arg anymore

# Get an index handle
index = pc.Index("my-index")

# Use it
index.upsert(vectors=[...])
results = index.query(vector=[...], top_k=5)

Install the correct version:

pip install "pinecone>=3.0"
# Package renamed from pinecone-client to pinecone in v5
pip install pinecone   # v5+

Environment variable for API key:

export PINECONE_API_KEY=your-api-key

from pinecone import Pinecone

pc = Pinecone()   # Reads PINECONE_API_KEY automatically

Migration checklist:

v2	v3
`pinecone.init(api_key=..., environment=...)`	`pc = Pinecone(api_key=...)`
`pinecone.list_indexes()`	`pc.list_indexes()`
`pinecone.create_index(name, dimension, metric)`	`pc.create_index(name, dimension, metric, spec=...)`
`pinecone.Index(name)`	`pc.Index(name)`
`pinecone.delete_index(name)`	`pc.delete_index(name)`

Common Mistake: Following a tutorial from 2022 or 2023 that uses pinecone.init(). Check the date of any Pinecone article — anything before December 2023 is v2 and won’t work. The error message (module has no attribute 'init') is the tell.

Fix 2: Creating an Index — Serverless vs Pod

from pinecone import Pinecone, ServerlessSpec, PodSpec

pc = Pinecone(api_key="...")

# Serverless (new default, pay-per-query)
pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",   # "cosine", "euclidean", "dotproduct"
    spec=ServerlessSpec(
        cloud="aws",           # "aws", "gcp", "azure"
        region="us-east-1",    # Region must support serverless
    ),
)

# Pod-based (reserved compute, predictable cost)
pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=PodSpec(
        environment="us-east1-gcp",
        pod_type="p1.x1",      # "s1" (storage-optimized), "p1" (performance), "p2" (high-performance)
        pods=1,
        replicas=1,
        shards=1,
    ),
)

Serverless vs pod comparison:

Feature	Serverless	Pod-based
Cost model	Pay per query + storage	Monthly per pod
Scaling	Automatic	Manual (add pods)
Min cost	Very low (idle = $0.30/month)	Always on ($60+/pod/month)
Latency (cold)	Higher on first query	Consistent
Region availability	Limited	Broader
Best for	Dev, low-traffic, bursty	Production, steady traffic, low latency

Wait for index readiness — creation is async:

import time

pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)

# Poll until ready
while not pc.describe_index("my-index").status["ready"]:
    time.sleep(1)

index = pc.Index("my-index")

Idempotent creation:

existing = [idx.name for idx in pc.list_indexes()]
if "my-index" not in existing:
    pc.create_index(
        name="my-index",
        dimension=1536,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1"),
    )

Fix 3: Upserting Vectors and ID Management

from pinecone import Pinecone

pc = Pinecone(api_key="...")
index = pc.Index("my-index")

# Upsert — list of (id, vector, metadata) tuples OR dicts
index.upsert(
    vectors=[
        {"id": "doc-1", "values": [0.1, 0.2, ...], "metadata": {"category": "news"}},
        {"id": "doc-2", "values": [0.3, 0.4, ...], "metadata": {"category": "blog"}},
    ],
)

# Tuple form
index.upsert(
    vectors=[
        ("doc-1", [0.1, 0.2, ...], {"category": "news"}),
        ("doc-2", [0.3, 0.4, ...], {"category": "blog"}),
    ],
)

# Without metadata
index.upsert(vectors=[("doc-1", [0.1, 0.2, ...])])

Batch size limits — max 100 vectors per upsert request, max 2MB total:

def upsert_in_batches(index, vectors, batch_size=100):
    for i in range(0, len(vectors), batch_size):
        batch = vectors[i:i + batch_size]
        index.upsert(vectors=batch)
        print(f"Upserted {i + len(batch)} / {len(vectors)}")

upsert_in_batches(index, my_vectors)

Async upsert with parallel batches for throughput:

from pinecone import Pinecone
from pinecone.grpc import PineconeGRPC

# gRPC client — faster than HTTP for bulk ops
pc = PineconeGRPC(api_key="...")
index = pc.Index("my-index")

# Parallel upsert
async_results = [
    index.upsert(vectors=batch, async_req=True)
    for batch in batches
]

# Wait for all to complete
results = [r.result() for r in async_results]

IDs — use meaningful, unique values:

import uuid

# Option 1: UUID for dedup-free IDs
{"id": str(uuid.uuid4()), "values": [...]}

# Option 2: Hash-based for content dedup (same content = same ID)
import hashlib
text = "some document content"
doc_id = hashlib.sha256(text.encode()).hexdigest()[:16]

# Option 3: Domain-meaningful IDs
{"id": f"user-{user_id}-post-{post_id}", "values": [...]}

Pinecone upsert is idempotent — same ID overwrites. No separate “update” method needed.

Fix 4: Namespaces for Multi-Tenancy

Namespaces partition an index logically — queries within a namespace only see that namespace’s vectors.

# Upsert into specific namespaces
index.upsert(
    vectors=[("doc-1", vec)],
    namespace="tenant-a",
)
index.upsert(
    vectors=[("doc-2", vec)],
    namespace="tenant-b",
)

# Query a specific namespace
results = index.query(
    vector=query_vec,
    top_k=10,
    namespace="tenant-a",   # Only sees tenant-a docs
)

Default namespace is empty string "":

index.upsert(vectors=[("x", vec)])   # Goes to default namespace
index.query(vector=q, top_k=5)       # Queries default namespace

# Explicitly
index.upsert(vectors=[("x", vec)], namespace="")

List all namespaces:

stats = index.describe_index_stats()
print(stats.namespaces)
# {'tenant-a': {'vector_count': 1000}, 'tenant-b': {'vector_count': 500}, '': {'vector_count': 100}}

Pro Tip: Use namespaces for logical separation (multi-tenant, environment splits) rather than metadata filters. Namespace queries are faster than filtered queries because Pinecone only scans the namespace’s partition. For strict tenant isolation, namespaces are also safer — there’s no risk of leaking data between tenants via a mistakenly built filter.

Delete all vectors in a namespace:

index.delete(delete_all=True, namespace="tenant-a")

Fix 5: Querying with Filters

results = index.query(
    vector=query_embedding,
    top_k=10,
    include_metadata=True,
    include_values=False,   # Return stored vectors too (usually False to save bandwidth)
    filter={
        "category": {"$eq": "news"},
        "year": {"$gte": 2024},
    },
)

for match in results.matches:
    print(f"ID: {match.id}, Score: {match.score:.3f}")
    print(f"Metadata: {match.metadata}")

Filter operators:

Operator	Meaning
`$eq`	Equals
`$ne`	Not equals
`$gt`, `$gte`, `$lt`, `$lte`	Numeric comparison
`$in`	Value in list
`$nin`	Value not in list
`$and`, `$or`	Logical combination
`$exists`	Field present

Complex filters:

filter = {
    "$and": [
        {"category": {"$in": ["news", "blog"]}},
        {"$or": [
            {"year": {"$gte": 2024}},
            {"featured": True},
        ]},
    ],
}

Filter by ID (single vector lookup):

# Fetch by exact IDs — no vector search
result = index.fetch(ids=["doc-1", "doc-2"], namespace="")
for id_, vec in result.vectors.items():
    print(vec.values, vec.metadata)

Common Mistake: Using == operator in filter dicts like {"category": "news"} expecting equality match. Pinecone requires explicit $eq: {"category": {"$eq": "news"}}. The plain form does work for equality as a convenience, but inconsistent filter syntax causes confusing failures when you add more operators.

Fix 6: Rate Limits and 429 Errors

PineconeApiException: (429) Too Many Requests

Serverless indexes have per-minute rate limits that scale with usage. Pod-based indexes have per-pod QPS limits.

Check your limits in the Pinecone console (dashboard → index → metrics).

Backoff and retry with tenacity:

from tenacity import retry, stop_after_attempt, wait_exponential
from pinecone.core.openapi.shared.exceptions import PineconeApiException

@retry(
    stop=stop_after_attempt(5),
    wait=wait_exponential(multiplier=1, min=1, max=30),
    retry=lambda state: (
        isinstance(state.outcome.exception(), PineconeApiException)
        and state.outcome.exception().status == 429
    ),
)
def safe_upsert(index, vectors):
    return index.upsert(vectors=vectors)

safe_upsert(index, my_batch)

Parallelism throttling for bulk uploads:

from concurrent.futures import ThreadPoolExecutor
import time

def upsert_batch(batch):
    try:
        index.upsert(vectors=batch)
    except PineconeApiException as e:
        if e.status == 429:
            time.sleep(5)
            index.upsert(vectors=batch)   # Retry after backoff
        else:
            raise

# Limit concurrent requests
with ThreadPoolExecutor(max_workers=5) as executor:
    list(executor.map(upsert_batch, batches))

Serverless read units determine rate limits. If you’re hitting limits constantly, contact Pinecone support to request higher limits or upgrade to a pod-based index with guaranteed capacity.

Fix 7: Metadata Indexing and Storage Limits

Pinecone automatically indexes all metadata fields — filtering is fast out of the box. But there are limits:

Max 40 KB metadata per vector
Max 40 metadata fields per vector
String fields indexed up to 512 bytes

Strategies for large metadata:

# WRONG — full document in metadata
index.upsert(vectors=[
    ("doc-1", vec, {
        "content": full_article_text,   # Might be 100 KB
        "category": "news",
    }),
])

# CORRECT — metadata just for filtering, content elsewhere (S3, DB)
index.upsert(vectors=[
    ("doc-1", vec, {
        "category": "news",
        "year": 2025,
        "doc_s3_path": "s3://my-bucket/articles/doc-1.txt",
    }),
])

# Fetch full content from S3 when you need it

Selective metadata indexing (pod-based only) — reduce memory by disabling index on fields you don’t filter by:

pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=PodSpec(
        environment="us-east1-gcp",
        pod_type="p1.x1",
        metadata_config={
            "indexed": ["category", "year", "author"],   # Only these are filterable
        },
    ),
)
# Other metadata stored but not indexed — smaller memory footprint

Serverless indexes index all metadata by default.

Fix 8: Deletion and Updates

# Delete specific IDs
index.delete(ids=["doc-1", "doc-2"], namespace="")

# Delete by metadata filter (pod-based only; not supported on serverless)
index.delete(
    filter={"archived": {"$eq": True}},
    namespace="",
)

# Delete entire namespace
index.delete(delete_all=True, namespace="tenant-a")

# Delete the whole index (irreversible)
pc.delete_index("my-index")

Update metadata without changing the vector:

index.update(
    id="doc-1",
    set_metadata={"status": "archived", "updated_at": "2025-04-09"},
    namespace="",
)

Update the vector too:

index.update(
    id="doc-1",
    values=new_embedding,
    set_metadata={"version": 2},
)

Still Not Working?

Pinecone vs Other Vector DBs

Pinecone — Managed SaaS, minimal ops. Best when you want zero infra. Costs scale with usage.
Qdrant — Self-hosted or managed, rich filtering. Best for custom deployments. See Qdrant not working.
Chroma — Local-first, simplest for prototypes. See ChromaDB not working.
Weaviate — Hybrid search (vector + keyword), GraphQL API.
pgvector — Postgres extension, best when you already run Postgres.

Embedding Model Integration

# With OpenAI embeddings
from openai import OpenAI

openai_client = OpenAI()

def embed(text: str):
    response = openai_client.embeddings.create(
        model="text-embedding-3-small",
        input=text,
    )
    return response.data[0].embedding

# Index a document
text = "Your document content"
vec = embed(text)
index.upsert(vectors=[("doc-1", vec, {"text_preview": text[:200]})])

# Query
query_vec = embed("search query")
results = index.query(vector=query_vec, top_k=5, include_metadata=True)

For OpenAI API key setup and rate limit handling, see OpenAI API not working.

Integration with LangChain and LlamaIndex

# LangChain
from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vector_store = PineconeVectorStore(
    index_name="my-index",
    embedding=embeddings,
    pinecone_api_key="your-key",
)

vector_store.add_texts(texts=["doc 1", "doc 2"])
results = vector_store.similarity_search("query", k=5)

For LangChain integration patterns, see LangChain Python not working.

Monitoring and Index Stats

stats = index.describe_index_stats()
print(f"Total vectors: {stats.total_vector_count}")
print(f"Dimension: {stats.dimension}")
print(f"Index fullness: {stats.index_fullness}")   # 0.0–1.0

for ns, ns_stats in stats.namespaces.items():
    print(f"Namespace '{ns}': {ns_stats.vector_count} vectors")

Check these regularly in production to catch ingestion issues early. An unexpectedly low total_vector_count often indicates silent upsert failures.

Hybrid Search with Sparse Vectors

Pinecone supports sparse-dense hybrid search — combining vector similarity with keyword matching:

# Create a hybrid index (dotproduct required for sparse-dense)
pc.create_index(
    name="hybrid-index",
    dimension=1536,
    metric="dotproduct",
    spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)

# Upsert with both dense and sparse vectors
index.upsert(vectors=[
    {
        "id": "doc-1",
        "values": dense_vec,   # From embedding model
        "sparse_values": {      # From BM25 or SPLADE
            "indices": [10, 42, 108],
            "values": [0.9, 0.5, 0.3],
        },
        "metadata": {"text": "..."},
    },
])

# Query with both
results = index.query(
    vector=query_dense,
    sparse_vector={"indices": [...], "values": [...]},
    top_k=10,
)

This catches exact keyword matches (product codes, names) that pure vector search misses while still benefiting from semantic similarity.

Moving from Free to Paid Tier

The free tier has index count and throughput limits. When upgrading:

Backup existing vectors via fetch + store to disk
Delete the free-tier index
Create a new index on a paid project with desired ServerlessSpec
Re-upsert from your backup

Pinecone does not auto-migrate indexes between tiers — always back up first before tier changes.

Score Distributions Look Wrong After a Model Swap

If you switched embedding models (for example, OpenAI text-embedding-3-small to a HuggingFace model) and kept the old index, distances will look nonsensical even when dimensions happen to align. Different embedding models occupy different regions of vector space — cosine scores between vectors from different models are meaningless. Always create a fresh index when changing embedding models, even if the dimensions match.

Async Upserts Reporting Success But Vectors Missing

The gRPC client’s async_req=True returns futures, not confirmations. If you do not call .result() on every future before exiting your script, partial upserts get silently dropped when the process terminates. Always materialize the results list before moving on. The same pattern applies in serverless functions — process termination cuts off pending async writes.

Cross-Region Latency Surprises

A serverless index in us-east-1 queried from a Lambda in eu-west-1 adds 80-120 ms per call on top of the index’s own latency. If your p50 query time jumped after a deployment, check whether the new compute environment is in the same region as the index. The console shows the region under index details — co-locate clients with the index whenever possible.