Fix: FAISS Not Working — Import Errors, Index Selection, and GPU Setup
Part of: Python Errors
Quick Answer
How to fix FAISS errors — ImportError cannot import name swigfaiss, faiss-gpu vs faiss-cpu install, IndexFlatL2 slow on large data, IVF training required, index serialization write_index, and dimension mismatch.
The Error
You install FAISS and the import crashes:
ImportError: cannot import name 'swigfaiss' from 'faiss'
ModuleNotFoundError: No module named 'faiss'Or GPU FAISS doesn’t detect your CUDA device:
RuntimeError: No GPU resources available
AssertionError: GPU index not supported: faiss.IndexFlatL2Or search is unbearably slow on a million vectors:
index = faiss.IndexFlatL2(1536)
index.add(big_matrix)
# Query takes 5 seconds on 1M vectors — not usableOr you train an IVF index without enough data:
WARNING: clustering 100 points to 256 centroids
RuntimeError: Quantizer is not trainedOr you save an index and it won’t load:
TypeError: write_index() missing 1 required positional argumentFAISS (Facebook AI Similarity Search) is the foundation of modern vector search — many vector databases (including parts of Chroma) use it under the hood. It’s extremely fast but opinionated: you must pick the right index type for your scale, configure training for compressed indexes, and match the numeric precision of your vectors. This guide covers each failure mode.
Why This Happens
FAISS is a C++ library with Python bindings generated by SWIG. The pip packages (faiss-cpu, faiss-gpu) are separate — installing one doesn’t install the other. The GPU version requires a matching CUDA installation and specific NVIDIA driver versions.
Unlike higher-level libraries, FAISS gives you direct control over the index structure: flat (exact, slow), IVF (partitioned, needs training), HNSW (graph-based, very fast), and product quantization (compressed, lossy). Picking wrong means either slow queries or poor recall.
Diagnostic Timeline
FAISS failures rarely look like errors. They look like bad results, slow queries, or mysterious crashes. Here is how an experienced engineer narrows down a FAISS issue.
Minute 0 — First guess: rebuild the index. Recall looks wrong, so you delete the index file and re-add all vectors. The new index has the same problem. Rebuilding rarely fixes FAISS because the index structure is rarely the issue — vector preparation almost always is.
Minute 5 — Verify dtype and contiguity. Run vectors.dtype and vectors.flags['C_CONTIGUOUS'] on what you actually pass to index.add(). FAISS silently misbehaves with float64, integer, or non-contiguous arrays. Slicing operations (x[::2], x.T) produce non-contiguous views that FAISS reads as garbage. Wrap in np.ascontiguousarray(x.astype(np.float32)) before any add or search.
Minute 10 — Check normalization for cosine. If you wanted cosine similarity but used IndexFlatIP without calling faiss.normalize_L2() on both index vectors and queries, the scores are dot products, not cosines. Magnitude differences swamp angle differences and the top-k becomes “longest vectors” instead of “most similar.” This is the most common cause of “the same FAISS code worked in my notebook but ranks differently in production” — the notebook had normalized, production did not.
Minute 20 — IVF training set audit. If you train an IVF index on too few vectors, the centroids barely cover your space and recall collapses. Check index.is_trained and verify training used at least 39 * nlist vectors of the same distribution as your data. Training on random noise or a non-representative sample produces useless quantizers.
Minute 30 — GPU vs CPU index mismatch. A IndexHNSWFlat cannot run on GPU. A GPU index cannot be serialized with write_index directly. Mixing them is the cause of AssertionError: GPU index not supported and “saved index file is unusable” reports. Check faiss.get_num_gpus() and confirm the index type you picked is GPU-compatible before moving anything to GPU.
Minute 45 — nprobe too low for IVF. Default nprobe=1 searches only one cluster — recall on a 100-cluster index is roughly 1%. Raise nprobe to 10-32 for production. If queries return wildly varying neighbors run-to-run for the same query, the cluster boundaries are biting and nprobe is the lever.
Fix 1: Installing FAISS
CPU install (recommended for most):
pip install faiss-cpuGPU install — requires CUDA:
pip install faiss-gpu # For CUDA 11.x
# Or via conda (often easier for GPU)
conda install -c pytorch faiss-gpuVerify install:
import faiss
print(faiss.__version__)
# 1.8.0
# For GPU, check GPU availability
print(faiss.get_num_gpus()) # 0 if CPU-only build or no GPUCommon install errors:
ImportError: cannot import name 'swigfaiss' from 'faiss'Usually caused by an incomplete install or mixed environments. Fix:
pip uninstall faiss faiss-cpu faiss-gpu
pip install --no-cache-dir faiss-cpuOn Apple Silicon (M1/M2/M3), faiss-cpu works but has no GPU support:
pip install faiss-cpu
# Apple's Metal GPU is not supported — falls back to CPUFor Windows, only CPU version is officially supported:
pip install faiss-cpu
# faiss-gpu Windows wheels aren't publishedCommon Mistake: Installing both faiss-cpu and faiss-gpu in the same environment. They conflict — the imports resolve to one or the other unpredictably. Pick one and stick with it.
Fix 2: Choosing the Right Index Type
FAISS offers many index types. The choice depends on dataset size and recall/speed trade-off:
| Index | Description | Best for |
|---|---|---|
IndexFlatL2 | Exact brute-force, L2 distance | <100k vectors, or when perfect recall is required |
IndexFlatIP | Exact brute-force, inner product | Same as above with cosine-equivalent search (normalize first) |
IndexIVFFlat | Inverted file with flat storage | 100k–10M, needs training |
IndexIVFPQ | Inverted file + product quantization | 10M+, memory-constrained |
IndexHNSWFlat | HNSW graph, full precision | 100k–10M, incremental inserts, no training |
IndexHNSWPQ | HNSW + product quantization | Very large datasets with memory limits |
IndexLSH | Locality-sensitive hashing | Rarely used; lower quality |
Flat index (exact search):
import faiss
import numpy as np
dim = 1536
index = faiss.IndexFlatL2(dim) # L2 distance, exact
# Add vectors (must be float32)
vectors = np.random.rand(10000, dim).astype(np.float32)
index.add(vectors)
# Search
queries = np.random.rand(5, dim).astype(np.float32)
distances, indices = index.search(queries, k=10) # Top 10 per query
print(distances.shape) # (5, 10)
print(indices.shape) # (5, 10)For cosine similarity, normalize vectors and use inner product:
# Normalize to unit length
faiss.normalize_L2(vectors) # In-place
index = faiss.IndexFlatIP(dim) # Inner product = cosine for unit vectors
index.add(vectors)
# Normalize query too
faiss.normalize_L2(queries)
distances, indices = index.search(queries, k=10)
# distances now contain cosine similarity (higher = more similar)HNSW index — great default for medium-to-large datasets:
index = faiss.IndexHNSWFlat(dim, 32) # M=32 (graph connectivity)
index.hnsw.efConstruction = 64 # Build-time quality
index.hnsw.efSearch = 32 # Query-time recall (higher = slower, better recall)
index.add(vectors)
distances, indices = index.search(queries, k=10)IVF index — needs training but scales to large datasets:
nlist = 100 # Number of clusters; rule of thumb: sqrt(n_vectors)
quantizer = faiss.IndexFlatL2(dim) # The quantizer defines how centroids are computed
index = faiss.IndexIVFFlat(quantizer, dim, nlist)
# TRAIN first — requires representative sample
train_vectors = vectors[:10000] # At least 39 * nlist vectors recommended
index.train(train_vectors)
# Then add
index.add(vectors)
# Query
index.nprobe = 10 # How many clusters to search at query time
distances, indices = index.search(queries, k=10)Pro Tip: Start with IndexFlatL2 while prototyping. Move to IndexHNSWFlat when query time becomes a bottleneck (usually around 100k+ vectors). Only use IVFPQ when memory, not speed, becomes the limiting factor. Premature optimization with IVF/PQ is a common source of worse-than-necessary recall.
Fix 3: Training IVF Indexes
RuntimeError: Quantizer is not trainedIVF and PQ indexes cluster vectors into groups at build time. Training requires representative data — all-zeros vectors or constant data breaks the clustering.
import faiss
import numpy as np
dim = 1536
nlist = 100
quantizer = faiss.IndexFlatL2(dim)
index = faiss.IndexIVFFlat(quantizer, dim, nlist)
# WRONG — training on random noise
train_data = np.random.rand(1000, dim).astype(np.float32)
# No error, but clustering quality suffers — use real embeddings
# CORRECT — use a sample of your actual data
real_data = load_real_embeddings(sample_size=50000)
index.train(real_data)
# Training size rule of thumb:
# - Minimum: 39 * nlist (FAISS warns below this)
# - Good: 256 * nlist
# - Full: 1000 * nlist if you have the dataTraining on GPU (much faster for large training sets):
# Move quantizer to GPU for training
res = faiss.StandardGpuResources()
cpu_quantizer = faiss.IndexFlatL2(dim)
gpu_quantizer = faiss.index_cpu_to_gpu(res, 0, cpu_quantizer)
index = faiss.IndexIVFFlat(gpu_quantizer, dim, nlist)
index.train(train_data)
index.add(vectors)is_trained flag to check:
print(index.is_trained) # True after trainingTraining is NOT required for:
IndexFlatL2,IndexFlatIP(exact search, no clustering)IndexHNSWFlat(graph-based, built incrementally)
Fix 4: Matching Query and Index Precision
All vectors must be float32 contiguous NumPy arrays. FAISS doesn’t accept float64, integer, or non-contiguous arrays.
import numpy as np
import faiss
# WRONG — float64 (default NumPy dtype)
vectors = np.random.rand(1000, 1536) # Default float64
index.add(vectors) # TypeError or weird behavior
# WRONG — non-contiguous (from slicing or transpose)
vectors = large_matrix[::2] # Every other row — may be non-contiguous
index.add(vectors)
# CORRECT — float32, contiguous
vectors = np.ascontiguousarray(vectors.astype(np.float32))
index.add(vectors)Convert PyTorch tensors:
import torch
import numpy as np
tensor = torch.randn(1000, 1536)
vectors = tensor.cpu().numpy().astype(np.float32)
vectors = np.ascontiguousarray(vectors)
index.add(vectors)Batch additions for memory efficiency:
def add_in_batches(index, vectors, batch_size=10000):
for i in range(0, len(vectors), batch_size):
batch = vectors[i:i + batch_size].astype(np.float32)
batch = np.ascontiguousarray(batch)
index.add(batch)
add_in_batches(index, my_large_matrix)Fix 5: GPU Acceleration
GPU FAISS is dramatically faster for large-batch queries. But not all index types support GPU, and the speedup varies.
import faiss
# CPU index first
cpu_index = faiss.IndexFlatL2(dim)
# Move to GPU
res = faiss.StandardGpuResources() # Default GPU resources
gpu_index = faiss.index_cpu_to_gpu(res, 0, cpu_index) # GPU 0
# Add and search as usual — all on GPU now
gpu_index.add(vectors.astype(np.float32))
distances, indices = gpu_index.search(queries.astype(np.float32), k=10)Multi-GPU for very large indexes:
gpu_index = faiss.index_cpu_to_all_gpus(cpu_index)
# Automatically shards across all available GPUsGPU index types — not all indexes work on GPU:
| Index | GPU Support |
|---|---|
IndexFlatL2, IndexFlatIP | Yes (fastest on GPU) |
IndexIVFFlat, IndexIVFPQ | Yes |
IndexHNSW* | No — HNSW is CPU-only |
IndexLSH | No |
Move back to CPU for serialization:
cpu_index = faiss.index_gpu_to_cpu(gpu_index)
faiss.write_index(cpu_index, "my_index.faiss")Common Mistake: Trying to save a GPU index directly with write_index. GPU indexes must be moved back to CPU first. Otherwise FAISS either errors or saves an unusable file.
Fix 6: Saving and Loading Indexes
import faiss
# Save
faiss.write_index(index, "my_index.faiss")
# Load
index = faiss.read_index("my_index.faiss")
# Query as normal
distances, indices = index.search(queries, k=10)With metadata mapping (FAISS only stores vectors, not your IDs or payload):
import faiss
import numpy as np
import pickle
class VectorStore:
def __init__(self, dim):
self.index = faiss.IndexFlatL2(dim)
self.id_to_payload = {} # Maps FAISS position → your data
def add(self, vectors, ids, payloads):
start = self.index.ntotal
self.index.add(vectors.astype(np.float32))
for i, (id_, payload) in enumerate(zip(ids, payloads)):
self.id_to_payload[start + i] = {"id": id_, "payload": payload}
def search(self, query, k=10):
distances, indices = self.index.search(query.astype(np.float32), k)
results = []
for i, idx in enumerate(indices[0]):
if idx == -1: # No more results
break
results.append({
"distance": float(distances[0][i]),
**self.id_to_payload[int(idx)],
})
return results
def save(self, path):
faiss.write_index(self.index, f"{path}.faiss")
with open(f"{path}.pkl", "wb") as f:
pickle.dump(self.id_to_payload, f)
def load(self, path):
self.index = faiss.read_index(f"{path}.faiss")
with open(f"{path}.pkl", "rb") as f:
self.id_to_payload = pickle.load(f)
store = VectorStore(1536)
store.add(vectors, ids=[...], payloads=[...])
store.save("my_store")IndexIDMap wrapper for attaching integer IDs:
index = faiss.IndexFlatL2(dim)
index_with_ids = faiss.IndexIDMap(index)
ids = np.array([1001, 1002, 1003], dtype=np.int64)
vectors = np.random.rand(3, dim).astype(np.float32)
index_with_ids.add_with_ids(vectors, ids)
distances, retrieved_ids = index_with_ids.search(query, k=10)
# retrieved_ids contains 1001, 1002, 1003 (not 0, 1, 2)Fix 7: Distance Thresholds and Result Quality
distances, indices = index.search(query, k=10)
# Filter by threshold (lower distance = more similar for L2)
threshold = 0.5
filtered = [
(idx, dist)
for idx, dist in zip(indices[0], distances[0])
if dist < threshold and idx != -1
]Range search — find all vectors within a distance:
# FAISS 1.6+
lims, distances, indices = index.range_search(query, radius=0.5)
# lims[i] to lims[i+1] gives the range in distances/indices for query iNot all indexes support range search — IndexFlatL2 and IndexIVF* do, HNSW does not.
Score interpretation:
IndexFlatL2/IndexIVF*: lower distance = more similar (squared L2)IndexFlatIP: higher score = more similar- After normalization + IP: score is cosine similarity [-1, 1]
Convert L2 distance to cosine similarity (only valid for normalized vectors):
# For unit-normalized vectors only
cosine_similarity = 1 - (l2_distance_squared / 2)Fix 8: Combining with LangChain and LlamaIndex
FAISS is a default backend for many higher-level libraries — using it through those is often easier than direct FAISS.
LangChain:
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Build index
texts = ["doc 1", "doc 2", "doc 3"]
vector_store = FAISS.from_texts(texts, embedding=embeddings)
# Search
results = vector_store.similarity_search("query", k=5)
# Save and load
vector_store.save_local("faiss_index")
loaded = FAISS.load_local(
"faiss_index",
embeddings,
allow_dangerous_deserialization=True, # Required in newer versions
)For LangChain setup and common errors, see LangChain Python not working.
LlamaIndex:
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.faiss import FaissVectorStore
import faiss
dim = 1536
faiss_index = faiss.IndexFlatL2(dim)
vector_store = FaissVectorStore(faiss_index=faiss_index)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
# Persist
index.storage_context.persist(persist_dir="./faiss_storage")For LlamaIndex setup patterns, see LlamaIndex not working.
Still Not Working?
FAISS vs Managed Vector Databases
- FAISS — Library, runs in-process. Best for embedded use cases, research, or when you want full control.
- Chroma — Wraps FAISS/HNSW, adds persistence and metadata. See ChromaDB not working.
- Qdrant / Pinecone — Dedicated vector databases with filtering, namespaces, APIs. See Pinecone not working.
- pgvector — Vector in Postgres, best for SQL integration.
FAISS is often the right building block for custom systems; managed databases are better when you want batteries included.
Benchmarking Index Types
import time
def benchmark_search(index, queries, k=10, n_runs=100):
# Warm up
for _ in range(5):
index.search(queries[:1], k)
start = time.perf_counter()
for _ in range(n_runs):
index.search(queries, k)
elapsed = time.perf_counter() - start
qps = (n_runs * len(queries)) / elapsed
print(f"QPS: {qps:.1f}")
# Compare
for name, index in [("Flat", flat), ("HNSW", hnsw), ("IVF", ivf)]:
print(name)
benchmark_search(index, queries)Memory-Efficient Indexes with Product Quantization
For billion-scale datasets:
# Compress to 8 bytes per vector (from 6144 bytes for 1536 dims)
pq = faiss.IndexPQ(dim, 32, 8) # M=32 subquantizers, 8 bits each
pq.train(train_vectors)
pq.add(vectors)
# Typically combined with IVF for even better memory/speed
ivfpq = faiss.IndexIVFPQ(quantizer, dim, nlist=1024, m=32, nbits=8)
ivfpq.train(train_vectors)
ivfpq.add(vectors)Debugging Zero Results
If search returns -1 indices or empty results:
- Check
index.ntotal— how many vectors are in the index - Confirm vectors added successfully (no silent failures)
- Check query dtype is float32
- For IVF indexes, increase
nprobe— too low misses clusters containing relevant results
Memory Spikes During Index Build
Building an HNSW index uses 2-3x the final index size in peak memory because the graph construction holds candidate neighbor lists. A 4 GB final index can spike to 12 GB during build. If your build crashes with OOM but the saved index is small, this is the cause. Lower efConstruction and M to reduce build memory, or build in batches with add_with_ids to a pre-allocated index.
Concurrent Search Returning Inconsistent Results
FAISS indexes are not thread-safe for concurrent writes during reads. If your service hot-swaps the index while requests are in flight, queries can return half-old, half-new results or crash with segfaults. Use a read-write lock or double-buffer the index — build the new one off to the side, then atomically swap the pointer when ready.
Index Loaded But ntotal Is Zero
If faiss.read_index() succeeds but index.ntotal == 0, the file is likely a quantizer-only IVF index whose data was never added, or it was saved before index.add() completed. The save/load cycle does not warn about this. Confirm the source pipeline calls write_index after add, not after train.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: Gradio Not Working — Share Link, Queue Timeout, and Component Errors
How to fix Gradio errors — share link not working, queue timeout, component not updating, Blocks layout mistakes, flagging permission denied, file upload size limit, and HuggingFace Spaces deployment failures.
Fix: Jupyter Notebook Not Working — Kernel Dead, Module Not Found, and Widget Errors
How to fix Jupyter errors — kernel fails to start or dies, ModuleNotFoundError despite pip install, matplotlib plots not showing, ipywidgets not rendering in JupyterLab, port already in use, and jupyter command not found.
Fix: LightGBM Not Working — Installation Errors, Categorical Features, and Training Issues
How to fix LightGBM errors — ImportError libomp libgomp not found, do not support special JSON characters in feature name, categorical feature index out of range, num_leaves vs max_depth overfitting, early stopping callback changes, and GPU build errors.
Fix: LlamaIndex Not Working — Import Errors, Vector Store Issues, and Query Engine Failures
How to fix LlamaIndex errors — ImportError llama_index.core module not found, ServiceContext deprecated use Settings instead, vector store index not persisting, query engine returns irrelevant results, and LlamaIndex 0.10 migration.