Fix: Pinecone Not Working — Index Creation, Serverless vs Pod, and Python SDK v3 Migration
Quick Answer
How to fix Pinecone errors — ApiException 401 unauthorized, index not found, dimension mismatch, serverless spec required, Python SDK v3 breaking changes, namespace confusion, and upsert rate limit 429.
The Error
You try to connect to Pinecone and get a 401:
pinecone.core.openapi.shared.exceptions.UnauthorizedException:
(401) Invalid API KeyOr the v3 SDK raises an import error that doesn’t match any tutorial:
AttributeError: module 'pinecone' has no attribute 'init'Or you try to create an index and the error mentions a required spec:
ValueError: create_index() requires spec= argument specifying either ServerlessSpec or PodSpecOr you upload vectors and the dimension doesn’t match your index:
400 Client Error: Bad Request: Vector dimension 1536 does not match index dimension 384Or upserts suddenly rate-limit under load:
PineconeApiException: (429) Too Many Requests: Rate limit exceededPinecone is the dominant managed vector database — fully hosted, scalable, no infrastructure to manage. But the Python SDK had a major rewrite (v3, December 2023) that breaks every tutorial written before then. Old code using pinecone.init() fails immediately. This guide covers the migration plus the specific errors that come from the managed model.
Why This Happens
Pinecone v3 replaced the module-level API (pinecone.init(), pinecone.Index()) with an instance-based API (Pinecone(api_key=), pc.Index(name)). Code written for v2 doesn’t work — there’s no compatibility layer. Most Pinecone articles on the internet are v2 and break immediately when copied.
Pinecone’s serverless indexes (the new default) have different semantics than pod-based indexes: pay-per-query instead of reserved compute, different scaling characteristics, and different region availability. Creating an index without specifying spec= fails because the SDK doesn’t know which type you want.
Fix 1: Python SDK v3 Migration
Old (v2) pattern — broken in v3:
import pinecone
pinecone.init(api_key="...", environment="us-west1-gcp") # AttributeError in v3
index = pinecone.Index("my-index")New (v3) pattern:
from pinecone import Pinecone
pc = Pinecone(api_key="your-api-key") # No 'environment' arg anymore
# Get an index handle
index = pc.Index("my-index")
# Use it
index.upsert(vectors=[...])
results = index.query(vector=[...], top_k=5)Install the correct version:
pip install "pinecone>=3.0"
# Package renamed from pinecone-client to pinecone in v5
pip install pinecone # v5+Environment variable for API key:
export PINECONE_API_KEY=your-api-keyfrom pinecone import Pinecone
pc = Pinecone() # Reads PINECONE_API_KEY automaticallyMigration checklist:
| v2 | v3 |
|---|---|
pinecone.init(api_key=..., environment=...) | pc = Pinecone(api_key=...) |
pinecone.list_indexes() | pc.list_indexes() |
pinecone.create_index(name, dimension, metric) | pc.create_index(name, dimension, metric, spec=...) |
pinecone.Index(name) | pc.Index(name) |
pinecone.delete_index(name) | pc.delete_index(name) |
Common Mistake: Following a tutorial from 2022 or 2023 that uses pinecone.init(). Check the date of any Pinecone article — anything before December 2023 is v2 and won’t work. The error message (module has no attribute 'init') is the tell.
Fix 2: Creating an Index — Serverless vs Pod
from pinecone import Pinecone, ServerlessSpec, PodSpec
pc = Pinecone(api_key="...")
# Serverless (new default, pay-per-query)
pc.create_index(
name="my-index",
dimension=1536,
metric="cosine", # "cosine", "euclidean", "dotproduct"
spec=ServerlessSpec(
cloud="aws", # "aws", "gcp", "azure"
region="us-east-1", # Region must support serverless
),
)
# Pod-based (reserved compute, predictable cost)
pc.create_index(
name="my-index",
dimension=1536,
metric="cosine",
spec=PodSpec(
environment="us-east1-gcp",
pod_type="p1.x1", # "s1" (storage-optimized), "p1" (performance), "p2" (high-performance)
pods=1,
replicas=1,
shards=1,
),
)Serverless vs pod comparison:
| Feature | Serverless | Pod-based |
|---|---|---|
| Cost model | Pay per query + storage | Monthly per pod |
| Scaling | Automatic | Manual (add pods) |
| Min cost | Very low (idle = $0.30/month) | Always on ($60+/pod/month) |
| Latency (cold) | Higher on first query | Consistent |
| Region availability | Limited | Broader |
| Best for | Dev, low-traffic, bursty | Production, steady traffic, low latency |
Wait for index readiness — creation is async:
import time
pc.create_index(
name="my-index",
dimension=1536,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)
# Poll until ready
while not pc.describe_index("my-index").status["ready"]:
time.sleep(1)
index = pc.Index("my-index")Idempotent creation:
existing = [idx.name for idx in pc.list_indexes()]
if "my-index" not in existing:
pc.create_index(
name="my-index",
dimension=1536,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)Fix 3: Upserting Vectors and ID Management
from pinecone import Pinecone
pc = Pinecone(api_key="...")
index = pc.Index("my-index")
# Upsert — list of (id, vector, metadata) tuples OR dicts
index.upsert(
vectors=[
{"id": "doc-1", "values": [0.1, 0.2, ...], "metadata": {"category": "news"}},
{"id": "doc-2", "values": [0.3, 0.4, ...], "metadata": {"category": "blog"}},
],
)
# Tuple form
index.upsert(
vectors=[
("doc-1", [0.1, 0.2, ...], {"category": "news"}),
("doc-2", [0.3, 0.4, ...], {"category": "blog"}),
],
)
# Without metadata
index.upsert(vectors=[("doc-1", [0.1, 0.2, ...])])Batch size limits — max 100 vectors per upsert request, max 2MB total:
def upsert_in_batches(index, vectors, batch_size=100):
for i in range(0, len(vectors), batch_size):
batch = vectors[i:i + batch_size]
index.upsert(vectors=batch)
print(f"Upserted {i + len(batch)} / {len(vectors)}")
upsert_in_batches(index, my_vectors)Async upsert with parallel batches for throughput:
from pinecone import Pinecone
from pinecone.grpc import PineconeGRPC
# gRPC client — faster than HTTP for bulk ops
pc = PineconeGRPC(api_key="...")
index = pc.Index("my-index")
# Parallel upsert
async_results = [
index.upsert(vectors=batch, async_req=True)
for batch in batches
]
# Wait for all to complete
results = [r.result() for r in async_results]IDs — use meaningful, unique values:
import uuid
# Option 1: UUID for dedup-free IDs
{"id": str(uuid.uuid4()), "values": [...]}
# Option 2: Hash-based for content dedup (same content = same ID)
import hashlib
text = "some document content"
doc_id = hashlib.sha256(text.encode()).hexdigest()[:16]
# Option 3: Domain-meaningful IDs
{"id": f"user-{user_id}-post-{post_id}", "values": [...]}Pinecone upsert is idempotent — same ID overwrites. No separate “update” method needed.
Fix 4: Namespaces for Multi-Tenancy
Namespaces partition an index logically — queries within a namespace only see that namespace’s vectors.
# Upsert into specific namespaces
index.upsert(
vectors=[("doc-1", vec)],
namespace="tenant-a",
)
index.upsert(
vectors=[("doc-2", vec)],
namespace="tenant-b",
)
# Query a specific namespace
results = index.query(
vector=query_vec,
top_k=10,
namespace="tenant-a", # Only sees tenant-a docs
)Default namespace is empty string "":
index.upsert(vectors=[("x", vec)]) # Goes to default namespace
index.query(vector=q, top_k=5) # Queries default namespace
# Explicitly
index.upsert(vectors=[("x", vec)], namespace="")List all namespaces:
stats = index.describe_index_stats()
print(stats.namespaces)
# {'tenant-a': {'vector_count': 1000}, 'tenant-b': {'vector_count': 500}, '': {'vector_count': 100}}Pro Tip: Use namespaces for logical separation (multi-tenant, environment splits) rather than metadata filters. Namespace queries are faster than filtered queries because Pinecone only scans the namespace’s partition. For strict tenant isolation, namespaces are also safer — there’s no risk of leaking data between tenants via a mistakenly built filter.
Delete all vectors in a namespace:
index.delete(delete_all=True, namespace="tenant-a")Fix 5: Querying with Filters
results = index.query(
vector=query_embedding,
top_k=10,
include_metadata=True,
include_values=False, # Return stored vectors too (usually False to save bandwidth)
filter={
"category": {"$eq": "news"},
"year": {"$gte": 2024},
},
)
for match in results.matches:
print(f"ID: {match.id}, Score: {match.score:.3f}")
print(f"Metadata: {match.metadata}")Filter operators:
| Operator | Meaning |
|---|---|
$eq | Equals |
$ne | Not equals |
$gt, $gte, $lt, $lte | Numeric comparison |
$in | Value in list |
$nin | Value not in list |
$and, $or | Logical combination |
$exists | Field present |
Complex filters:
filter = {
"$and": [
{"category": {"$in": ["news", "blog"]}},
{"$or": [
{"year": {"$gte": 2024}},
{"featured": True},
]},
],
}Filter by ID (single vector lookup):
# Fetch by exact IDs — no vector search
result = index.fetch(ids=["doc-1", "doc-2"], namespace="")
for id_, vec in result.vectors.items():
print(vec.values, vec.metadata)Common Mistake: Using == operator in filter dicts like {"category": "news"} expecting equality match. Pinecone requires explicit $eq: {"category": {"$eq": "news"}}. The plain form does work for equality as a convenience, but inconsistent filter syntax causes confusing failures when you add more operators.
Fix 6: Rate Limits and 429 Errors
PineconeApiException: (429) Too Many RequestsServerless indexes have per-minute rate limits that scale with usage. Pod-based indexes have per-pod QPS limits.
Check your limits in the Pinecone console (dashboard → index → metrics).
Backoff and retry with tenacity:
from tenacity import retry, stop_after_attempt, wait_exponential
from pinecone.core.openapi.shared.exceptions import PineconeApiException
@retry(
stop=stop_after_attempt(5),
wait=wait_exponential(multiplier=1, min=1, max=30),
retry=lambda state: (
isinstance(state.outcome.exception(), PineconeApiException)
and state.outcome.exception().status == 429
),
)
def safe_upsert(index, vectors):
return index.upsert(vectors=vectors)
safe_upsert(index, my_batch)Parallelism throttling for bulk uploads:
from concurrent.futures import ThreadPoolExecutor
import time
def upsert_batch(batch):
try:
index.upsert(vectors=batch)
except PineconeApiException as e:
if e.status == 429:
time.sleep(5)
index.upsert(vectors=batch) # Retry after backoff
else:
raise
# Limit concurrent requests
with ThreadPoolExecutor(max_workers=5) as executor:
list(executor.map(upsert_batch, batches))Serverless read units determine rate limits. If you’re hitting limits constantly, contact Pinecone support to request higher limits or upgrade to a pod-based index with guaranteed capacity.
Fix 7: Metadata Indexing and Storage Limits
Pinecone automatically indexes all metadata fields — filtering is fast out of the box. But there are limits:
- Max 40 KB metadata per vector
- Max 40 metadata fields per vector
- String fields indexed up to 512 bytes
Strategies for large metadata:
# WRONG — full document in metadata
index.upsert(vectors=[
("doc-1", vec, {
"content": full_article_text, # Might be 100 KB
"category": "news",
}),
])
# CORRECT — metadata just for filtering, content elsewhere (S3, DB)
index.upsert(vectors=[
("doc-1", vec, {
"category": "news",
"year": 2025,
"doc_s3_path": "s3://my-bucket/articles/doc-1.txt",
}),
])
# Fetch full content from S3 when you need itSelective metadata indexing (pod-based only) — reduce memory by disabling index on fields you don’t filter by:
pc.create_index(
name="my-index",
dimension=1536,
metric="cosine",
spec=PodSpec(
environment="us-east1-gcp",
pod_type="p1.x1",
metadata_config={
"indexed": ["category", "year", "author"], # Only these are filterable
},
),
)
# Other metadata stored but not indexed — smaller memory footprintServerless indexes index all metadata by default.
Fix 8: Deletion and Updates
# Delete specific IDs
index.delete(ids=["doc-1", "doc-2"], namespace="")
# Delete by metadata filter (pod-based only; not supported on serverless)
index.delete(
filter={"archived": {"$eq": True}},
namespace="",
)
# Delete entire namespace
index.delete(delete_all=True, namespace="tenant-a")
# Delete the whole index (irreversible)
pc.delete_index("my-index")Update metadata without changing the vector:
index.update(
id="doc-1",
set_metadata={"status": "archived", "updated_at": "2025-04-09"},
namespace="",
)Update the vector too:
index.update(
id="doc-1",
values=new_embedding,
set_metadata={"version": 2},
)Still Not Working?
Pinecone vs Other Vector DBs
- Pinecone — Managed SaaS, minimal ops. Best when you want zero infra. Costs scale with usage.
- Qdrant — Self-hosted or managed, rich filtering. Best for custom deployments. See Qdrant not working.
- Chroma — Local-first, simplest for prototypes. See ChromaDB not working.
- Weaviate — Hybrid search (vector + keyword), GraphQL API.
- pgvector — Postgres extension, best when you already run Postgres.
Embedding Model Integration
# With OpenAI embeddings
from openai import OpenAI
openai_client = OpenAI()
def embed(text: str):
response = openai_client.embeddings.create(
model="text-embedding-3-small",
input=text,
)
return response.data[0].embedding
# Index a document
text = "Your document content"
vec = embed(text)
index.upsert(vectors=[("doc-1", vec, {"text_preview": text[:200]})])
# Query
query_vec = embed("search query")
results = index.query(vector=query_vec, top_k=5, include_metadata=True)For OpenAI API key setup and rate limit handling, see OpenAI API not working.
Integration with LangChain and LlamaIndex
# LangChain
from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vector_store = PineconeVectorStore(
index_name="my-index",
embedding=embeddings,
pinecone_api_key="your-key",
)
vector_store.add_texts(texts=["doc 1", "doc 2"])
results = vector_store.similarity_search("query", k=5)For LangChain integration patterns, see LangChain Python not working. For LlamaIndex RAG setup with Pinecone, see LlamaIndex not working.
Monitoring and Index Stats
stats = index.describe_index_stats()
print(f"Total vectors: {stats.total_vector_count}")
print(f"Dimension: {stats.dimension}")
print(f"Index fullness: {stats.index_fullness}") # 0.0–1.0
for ns, ns_stats in stats.namespaces.items():
print(f"Namespace '{ns}': {ns_stats.vector_count} vectors")Check these regularly in production to catch ingestion issues early. An unexpectedly low total_vector_count often indicates silent upsert failures.
Hybrid Search with Sparse Vectors
Pinecone supports sparse-dense hybrid search — combining vector similarity with keyword matching:
# Create a hybrid index (dotproduct required for sparse-dense)
pc.create_index(
name="hybrid-index",
dimension=1536,
metric="dotproduct",
spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)
# Upsert with both dense and sparse vectors
index.upsert(vectors=[
{
"id": "doc-1",
"values": dense_vec, # From embedding model
"sparse_values": { # From BM25 or SPLADE
"indices": [10, 42, 108],
"values": [0.9, 0.5, 0.3],
},
"metadata": {"text": "..."},
},
])
# Query with both
results = index.query(
vector=query_dense,
sparse_vector={"indices": [...], "values": [...]},
top_k=10,
)This catches exact keyword matches (product codes, names) that pure vector search misses while still benefiting from semantic similarity.
Moving from Free to Paid Tier
The free tier has index count and throughput limits. When upgrading:
- Backup existing vectors via
fetch+ store to disk - Delete the free-tier index
- Create a new index on a paid project with desired
ServerlessSpec - Re-upsert from your backup
Pinecone does not auto-migrate indexes between tiers — always back up first before tier changes.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: ChromaDB Not Working — Persistent Client, Collection Errors, and Embedding Function Issues
How to fix ChromaDB errors — persistent client not saving data, collection already exists error, dimension mismatch in embeddings, embedding function required, HTTP client connection refused, and memory growing unbounded.
Fix: Qdrant Not Working — Connection Errors, Collection Setup, and Filter Syntax Issues
How to fix Qdrant errors — connection refused to localhost 6333, collection not found create_collection, vector size mismatch, filter must match schema, payload index missing slow queries, and timeout on large batch uploads.
Fix: pgvector Not Working — Extension Install, Index Not Used, and Dimension Errors
How to fix pgvector errors — extension does not exist CREATE EXTENSION vector, dimension mismatch on insert, HNSW index not used slow queries, distance operator confusion, psycopg register adapter, and ivfflat vs hnsw selection.
Fix: LlamaIndex Not Working — Import Errors, Vector Store Issues, and Query Engine Failures
How to fix LlamaIndex errors — ImportError llama_index.core module not found, ServiceContext deprecated use Settings instead, vector store index not persisting, query engine returns irrelevant results, and LlamaIndex 0.10 migration.