Skip to content

Fix: Pinecone Not Working — Index Creation, Serverless vs Pod, and Python SDK v3 Migration

FixDevs ·

Quick Answer

How to fix Pinecone errors — ApiException 401 unauthorized, index not found, dimension mismatch, serverless spec required, Python SDK v3 breaking changes, namespace confusion, and upsert rate limit 429.

The Error

You try to connect to Pinecone and get a 401:

pinecone.core.openapi.shared.exceptions.UnauthorizedException:
(401) Invalid API Key

Or the v3 SDK raises an import error that doesn’t match any tutorial:

AttributeError: module 'pinecone' has no attribute 'init'

Or you try to create an index and the error mentions a required spec:

ValueError: create_index() requires spec= argument specifying either ServerlessSpec or PodSpec

Or you upload vectors and the dimension doesn’t match your index:

400 Client Error: Bad Request: Vector dimension 1536 does not match index dimension 384

Or upserts suddenly rate-limit under load:

PineconeApiException: (429) Too Many Requests: Rate limit exceeded

Pinecone is the dominant managed vector database — fully hosted, scalable, no infrastructure to manage. But the Python SDK had a major rewrite (v3, December 2023) that breaks every tutorial written before then. Old code using pinecone.init() fails immediately. This guide covers the migration plus the specific errors that come from the managed model.

Why This Happens

Pinecone v3 replaced the module-level API (pinecone.init(), pinecone.Index()) with an instance-based API (Pinecone(api_key=), pc.Index(name)). Code written for v2 doesn’t work — there’s no compatibility layer. Most Pinecone articles on the internet are v2 and break immediately when copied.

Pinecone’s serverless indexes (the new default) have different semantics than pod-based indexes: pay-per-query instead of reserved compute, different scaling characteristics, and different region availability. Creating an index without specifying spec= fails because the SDK doesn’t know which type you want.

Fix 1: Python SDK v3 Migration

Old (v2) pattern — broken in v3:

import pinecone

pinecone.init(api_key="...", environment="us-west1-gcp")   # AttributeError in v3
index = pinecone.Index("my-index")

New (v3) pattern:

from pinecone import Pinecone

pc = Pinecone(api_key="your-api-key")   # No 'environment' arg anymore

# Get an index handle
index = pc.Index("my-index")

# Use it
index.upsert(vectors=[...])
results = index.query(vector=[...], top_k=5)

Install the correct version:

pip install "pinecone>=3.0"
# Package renamed from pinecone-client to pinecone in v5
pip install pinecone   # v5+

Environment variable for API key:

export PINECONE_API_KEY=your-api-key
from pinecone import Pinecone

pc = Pinecone()   # Reads PINECONE_API_KEY automatically

Migration checklist:

v2v3
pinecone.init(api_key=..., environment=...)pc = Pinecone(api_key=...)
pinecone.list_indexes()pc.list_indexes()
pinecone.create_index(name, dimension, metric)pc.create_index(name, dimension, metric, spec=...)
pinecone.Index(name)pc.Index(name)
pinecone.delete_index(name)pc.delete_index(name)

Common Mistake: Following a tutorial from 2022 or 2023 that uses pinecone.init(). Check the date of any Pinecone article — anything before December 2023 is v2 and won’t work. The error message (module has no attribute 'init') is the tell.

Fix 2: Creating an Index — Serverless vs Pod

from pinecone import Pinecone, ServerlessSpec, PodSpec

pc = Pinecone(api_key="...")

# Serverless (new default, pay-per-query)
pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",   # "cosine", "euclidean", "dotproduct"
    spec=ServerlessSpec(
        cloud="aws",           # "aws", "gcp", "azure"
        region="us-east-1",    # Region must support serverless
    ),
)

# Pod-based (reserved compute, predictable cost)
pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=PodSpec(
        environment="us-east1-gcp",
        pod_type="p1.x1",      # "s1" (storage-optimized), "p1" (performance), "p2" (high-performance)
        pods=1,
        replicas=1,
        shards=1,
    ),
)

Serverless vs pod comparison:

FeatureServerlessPod-based
Cost modelPay per query + storageMonthly per pod
ScalingAutomaticManual (add pods)
Min costVery low (idle = $0.30/month)Always on ($60+/pod/month)
Latency (cold)Higher on first queryConsistent
Region availabilityLimitedBroader
Best forDev, low-traffic, burstyProduction, steady traffic, low latency

Wait for index readiness — creation is async:

import time

pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)

# Poll until ready
while not pc.describe_index("my-index").status["ready"]:
    time.sleep(1)

index = pc.Index("my-index")

Idempotent creation:

existing = [idx.name for idx in pc.list_indexes()]
if "my-index" not in existing:
    pc.create_index(
        name="my-index",
        dimension=1536,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1"),
    )

Fix 3: Upserting Vectors and ID Management

from pinecone import Pinecone

pc = Pinecone(api_key="...")
index = pc.Index("my-index")

# Upsert — list of (id, vector, metadata) tuples OR dicts
index.upsert(
    vectors=[
        {"id": "doc-1", "values": [0.1, 0.2, ...], "metadata": {"category": "news"}},
        {"id": "doc-2", "values": [0.3, 0.4, ...], "metadata": {"category": "blog"}},
    ],
)

# Tuple form
index.upsert(
    vectors=[
        ("doc-1", [0.1, 0.2, ...], {"category": "news"}),
        ("doc-2", [0.3, 0.4, ...], {"category": "blog"}),
    ],
)

# Without metadata
index.upsert(vectors=[("doc-1", [0.1, 0.2, ...])])

Batch size limits — max 100 vectors per upsert request, max 2MB total:

def upsert_in_batches(index, vectors, batch_size=100):
    for i in range(0, len(vectors), batch_size):
        batch = vectors[i:i + batch_size]
        index.upsert(vectors=batch)
        print(f"Upserted {i + len(batch)} / {len(vectors)}")

upsert_in_batches(index, my_vectors)

Async upsert with parallel batches for throughput:

from pinecone import Pinecone
from pinecone.grpc import PineconeGRPC

# gRPC client — faster than HTTP for bulk ops
pc = PineconeGRPC(api_key="...")
index = pc.Index("my-index")

# Parallel upsert
async_results = [
    index.upsert(vectors=batch, async_req=True)
    for batch in batches
]

# Wait for all to complete
results = [r.result() for r in async_results]

IDs — use meaningful, unique values:

import uuid

# Option 1: UUID for dedup-free IDs
{"id": str(uuid.uuid4()), "values": [...]}

# Option 2: Hash-based for content dedup (same content = same ID)
import hashlib
text = "some document content"
doc_id = hashlib.sha256(text.encode()).hexdigest()[:16]

# Option 3: Domain-meaningful IDs
{"id": f"user-{user_id}-post-{post_id}", "values": [...]}

Pinecone upsert is idempotent — same ID overwrites. No separate “update” method needed.

Fix 4: Namespaces for Multi-Tenancy

Namespaces partition an index logically — queries within a namespace only see that namespace’s vectors.

# Upsert into specific namespaces
index.upsert(
    vectors=[("doc-1", vec)],
    namespace="tenant-a",
)
index.upsert(
    vectors=[("doc-2", vec)],
    namespace="tenant-b",
)

# Query a specific namespace
results = index.query(
    vector=query_vec,
    top_k=10,
    namespace="tenant-a",   # Only sees tenant-a docs
)

Default namespace is empty string "":

index.upsert(vectors=[("x", vec)])   # Goes to default namespace
index.query(vector=q, top_k=5)       # Queries default namespace

# Explicitly
index.upsert(vectors=[("x", vec)], namespace="")

List all namespaces:

stats = index.describe_index_stats()
print(stats.namespaces)
# {'tenant-a': {'vector_count': 1000}, 'tenant-b': {'vector_count': 500}, '': {'vector_count': 100}}

Pro Tip: Use namespaces for logical separation (multi-tenant, environment splits) rather than metadata filters. Namespace queries are faster than filtered queries because Pinecone only scans the namespace’s partition. For strict tenant isolation, namespaces are also safer — there’s no risk of leaking data between tenants via a mistakenly built filter.

Delete all vectors in a namespace:

index.delete(delete_all=True, namespace="tenant-a")

Fix 5: Querying with Filters

results = index.query(
    vector=query_embedding,
    top_k=10,
    include_metadata=True,
    include_values=False,   # Return stored vectors too (usually False to save bandwidth)
    filter={
        "category": {"$eq": "news"},
        "year": {"$gte": 2024},
    },
)

for match in results.matches:
    print(f"ID: {match.id}, Score: {match.score:.3f}")
    print(f"Metadata: {match.metadata}")

Filter operators:

OperatorMeaning
$eqEquals
$neNot equals
$gt, $gte, $lt, $lteNumeric comparison
$inValue in list
$ninValue not in list
$and, $orLogical combination
$existsField present

Complex filters:

filter = {
    "$and": [
        {"category": {"$in": ["news", "blog"]}},
        {"$or": [
            {"year": {"$gte": 2024}},
            {"featured": True},
        ]},
    ],
}

Filter by ID (single vector lookup):

# Fetch by exact IDs — no vector search
result = index.fetch(ids=["doc-1", "doc-2"], namespace="")
for id_, vec in result.vectors.items():
    print(vec.values, vec.metadata)

Common Mistake: Using == operator in filter dicts like {"category": "news"} expecting equality match. Pinecone requires explicit $eq: {"category": {"$eq": "news"}}. The plain form does work for equality as a convenience, but inconsistent filter syntax causes confusing failures when you add more operators.

Fix 6: Rate Limits and 429 Errors

PineconeApiException: (429) Too Many Requests

Serverless indexes have per-minute rate limits that scale with usage. Pod-based indexes have per-pod QPS limits.

Check your limits in the Pinecone console (dashboard → index → metrics).

Backoff and retry with tenacity:

from tenacity import retry, stop_after_attempt, wait_exponential
from pinecone.core.openapi.shared.exceptions import PineconeApiException

@retry(
    stop=stop_after_attempt(5),
    wait=wait_exponential(multiplier=1, min=1, max=30),
    retry=lambda state: (
        isinstance(state.outcome.exception(), PineconeApiException)
        and state.outcome.exception().status == 429
    ),
)
def safe_upsert(index, vectors):
    return index.upsert(vectors=vectors)

safe_upsert(index, my_batch)

Parallelism throttling for bulk uploads:

from concurrent.futures import ThreadPoolExecutor
import time

def upsert_batch(batch):
    try:
        index.upsert(vectors=batch)
    except PineconeApiException as e:
        if e.status == 429:
            time.sleep(5)
            index.upsert(vectors=batch)   # Retry after backoff
        else:
            raise

# Limit concurrent requests
with ThreadPoolExecutor(max_workers=5) as executor:
    list(executor.map(upsert_batch, batches))

Serverless read units determine rate limits. If you’re hitting limits constantly, contact Pinecone support to request higher limits or upgrade to a pod-based index with guaranteed capacity.

Fix 7: Metadata Indexing and Storage Limits

Pinecone automatically indexes all metadata fields — filtering is fast out of the box. But there are limits:

  • Max 40 KB metadata per vector
  • Max 40 metadata fields per vector
  • String fields indexed up to 512 bytes

Strategies for large metadata:

# WRONG — full document in metadata
index.upsert(vectors=[
    ("doc-1", vec, {
        "content": full_article_text,   # Might be 100 KB
        "category": "news",
    }),
])

# CORRECT — metadata just for filtering, content elsewhere (S3, DB)
index.upsert(vectors=[
    ("doc-1", vec, {
        "category": "news",
        "year": 2025,
        "doc_s3_path": "s3://my-bucket/articles/doc-1.txt",
    }),
])

# Fetch full content from S3 when you need it

Selective metadata indexing (pod-based only) — reduce memory by disabling index on fields you don’t filter by:

pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=PodSpec(
        environment="us-east1-gcp",
        pod_type="p1.x1",
        metadata_config={
            "indexed": ["category", "year", "author"],   # Only these are filterable
        },
    ),
)
# Other metadata stored but not indexed — smaller memory footprint

Serverless indexes index all metadata by default.

Fix 8: Deletion and Updates

# Delete specific IDs
index.delete(ids=["doc-1", "doc-2"], namespace="")

# Delete by metadata filter (pod-based only; not supported on serverless)
index.delete(
    filter={"archived": {"$eq": True}},
    namespace="",
)

# Delete entire namespace
index.delete(delete_all=True, namespace="tenant-a")

# Delete the whole index (irreversible)
pc.delete_index("my-index")

Update metadata without changing the vector:

index.update(
    id="doc-1",
    set_metadata={"status": "archived", "updated_at": "2025-04-09"},
    namespace="",
)

Update the vector too:

index.update(
    id="doc-1",
    values=new_embedding,
    set_metadata={"version": 2},
)

Still Not Working?

Pinecone vs Other Vector DBs

  • Pinecone — Managed SaaS, minimal ops. Best when you want zero infra. Costs scale with usage.
  • Qdrant — Self-hosted or managed, rich filtering. Best for custom deployments. See Qdrant not working.
  • Chroma — Local-first, simplest for prototypes. See ChromaDB not working.
  • Weaviate — Hybrid search (vector + keyword), GraphQL API.
  • pgvector — Postgres extension, best when you already run Postgres.

Embedding Model Integration

# With OpenAI embeddings
from openai import OpenAI

openai_client = OpenAI()

def embed(text: str):
    response = openai_client.embeddings.create(
        model="text-embedding-3-small",
        input=text,
    )
    return response.data[0].embedding

# Index a document
text = "Your document content"
vec = embed(text)
index.upsert(vectors=[("doc-1", vec, {"text_preview": text[:200]})])

# Query
query_vec = embed("search query")
results = index.query(vector=query_vec, top_k=5, include_metadata=True)

For OpenAI API key setup and rate limit handling, see OpenAI API not working.

Integration with LangChain and LlamaIndex

# LangChain
from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vector_store = PineconeVectorStore(
    index_name="my-index",
    embedding=embeddings,
    pinecone_api_key="your-key",
)

vector_store.add_texts(texts=["doc 1", "doc 2"])
results = vector_store.similarity_search("query", k=5)

For LangChain integration patterns, see LangChain Python not working. For LlamaIndex RAG setup with Pinecone, see LlamaIndex not working.

Monitoring and Index Stats

stats = index.describe_index_stats()
print(f"Total vectors: {stats.total_vector_count}")
print(f"Dimension: {stats.dimension}")
print(f"Index fullness: {stats.index_fullness}")   # 0.0–1.0

for ns, ns_stats in stats.namespaces.items():
    print(f"Namespace '{ns}': {ns_stats.vector_count} vectors")

Check these regularly in production to catch ingestion issues early. An unexpectedly low total_vector_count often indicates silent upsert failures.

Hybrid Search with Sparse Vectors

Pinecone supports sparse-dense hybrid search — combining vector similarity with keyword matching:

# Create a hybrid index (dotproduct required for sparse-dense)
pc.create_index(
    name="hybrid-index",
    dimension=1536,
    metric="dotproduct",
    spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)

# Upsert with both dense and sparse vectors
index.upsert(vectors=[
    {
        "id": "doc-1",
        "values": dense_vec,   # From embedding model
        "sparse_values": {      # From BM25 or SPLADE
            "indices": [10, 42, 108],
            "values": [0.9, 0.5, 0.3],
        },
        "metadata": {"text": "..."},
    },
])

# Query with both
results = index.query(
    vector=query_dense,
    sparse_vector={"indices": [...], "values": [...]},
    top_k=10,
)

This catches exact keyword matches (product codes, names) that pure vector search misses while still benefiting from semantic similarity.

Moving from Free to Paid Tier

The free tier has index count and throughput limits. When upgrading:

  1. Backup existing vectors via fetch + store to disk
  2. Delete the free-tier index
  3. Create a new index on a paid project with desired ServerlessSpec
  4. Re-upsert from your backup

Pinecone does not auto-migrate indexes between tiers — always back up first before tier changes.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles