Fix: FAISS Not Working — Import Errors, Index Selection, and GPU Setup
Quick Answer
How to fix FAISS errors — ImportError cannot import name swigfaiss, faiss-gpu vs faiss-cpu install, IndexFlatL2 slow on large data, IVF training required, index serialization write_index, and dimension mismatch.
The Error
You install FAISS and the import crashes:
ImportError: cannot import name 'swigfaiss' from 'faiss'
ModuleNotFoundError: No module named 'faiss'Or GPU FAISS doesn’t detect your CUDA device:
RuntimeError: No GPU resources available
AssertionError: GPU index not supported: faiss.IndexFlatL2Or search is unbearably slow on a million vectors:
index = faiss.IndexFlatL2(1536)
index.add(big_matrix)
# Query takes 5 seconds on 1M vectors — not usableOr you train an IVF index without enough data:
WARNING: clustering 100 points to 256 centroids
RuntimeError: Quantizer is not trainedOr you save an index and it won’t load:
TypeError: write_index() missing 1 required positional argumentFAISS (Facebook AI Similarity Search) is the foundation of modern vector search — many vector databases (including parts of Chroma) use it under the hood. It’s extremely fast but opinionated: you must pick the right index type for your scale, configure training for compressed indexes, and match the numeric precision of your vectors. This guide covers each failure mode.
Why This Happens
FAISS is a C++ library with Python bindings generated by SWIG. The pip packages (faiss-cpu, faiss-gpu) are separate — installing one doesn’t install the other. The GPU version requires a matching CUDA installation and specific NVIDIA driver versions.
Unlike higher-level libraries, FAISS gives you direct control over the index structure: flat (exact, slow), IVF (partitioned, needs training), HNSW (graph-based, very fast), and product quantization (compressed, lossy). Picking wrong means either slow queries or poor recall.
Fix 1: Installing FAISS
CPU install (recommended for most):
pip install faiss-cpuGPU install — requires CUDA:
pip install faiss-gpu # For CUDA 11.x
# Or via conda (often easier for GPU)
conda install -c pytorch faiss-gpuVerify install:
import faiss
print(faiss.__version__)
# 1.8.0
# For GPU, check GPU availability
print(faiss.get_num_gpus()) # 0 if CPU-only build or no GPUCommon install errors:
ImportError: cannot import name 'swigfaiss' from 'faiss'Usually caused by an incomplete install or mixed environments. Fix:
pip uninstall faiss faiss-cpu faiss-gpu
pip install --no-cache-dir faiss-cpuOn Apple Silicon (M1/M2/M3), faiss-cpu works but has no GPU support:
pip install faiss-cpu
# Apple's Metal GPU is not supported — falls back to CPUFor Windows, only CPU version is officially supported:
pip install faiss-cpu
# faiss-gpu Windows wheels aren't publishedCommon Mistake: Installing both faiss-cpu and faiss-gpu in the same environment. They conflict — the imports resolve to one or the other unpredictably. Pick one and stick with it.
Fix 2: Choosing the Right Index Type
FAISS offers many index types. The choice depends on dataset size and recall/speed trade-off:
| Index | Description | Best for |
|---|---|---|
IndexFlatL2 | Exact brute-force, L2 distance | <100k vectors, or when perfect recall is required |
IndexFlatIP | Exact brute-force, inner product | Same as above with cosine-equivalent search (normalize first) |
IndexIVFFlat | Inverted file with flat storage | 100k–10M, needs training |
IndexIVFPQ | Inverted file + product quantization | 10M+, memory-constrained |
IndexHNSWFlat | HNSW graph, full precision | 100k–10M, incremental inserts, no training |
IndexHNSWPQ | HNSW + product quantization | Very large datasets with memory limits |
IndexLSH | Locality-sensitive hashing | Rarely used; lower quality |
Flat index (exact search):
import faiss
import numpy as np
dim = 1536
index = faiss.IndexFlatL2(dim) # L2 distance, exact
# Add vectors (must be float32)
vectors = np.random.rand(10000, dim).astype(np.float32)
index.add(vectors)
# Search
queries = np.random.rand(5, dim).astype(np.float32)
distances, indices = index.search(queries, k=10) # Top 10 per query
print(distances.shape) # (5, 10)
print(indices.shape) # (5, 10)For cosine similarity, normalize vectors and use inner product:
# Normalize to unit length
faiss.normalize_L2(vectors) # In-place
index = faiss.IndexFlatIP(dim) # Inner product = cosine for unit vectors
index.add(vectors)
# Normalize query too
faiss.normalize_L2(queries)
distances, indices = index.search(queries, k=10)
# distances now contain cosine similarity (higher = more similar)HNSW index — great default for medium-to-large datasets:
index = faiss.IndexHNSWFlat(dim, 32) # M=32 (graph connectivity)
index.hnsw.efConstruction = 64 # Build-time quality
index.hnsw.efSearch = 32 # Query-time recall (higher = slower, better recall)
index.add(vectors)
distances, indices = index.search(queries, k=10)IVF index — needs training but scales to large datasets:
nlist = 100 # Number of clusters; rule of thumb: sqrt(n_vectors)
quantizer = faiss.IndexFlatL2(dim) # The quantizer defines how centroids are computed
index = faiss.IndexIVFFlat(quantizer, dim, nlist)
# TRAIN first — requires representative sample
train_vectors = vectors[:10000] # At least 39 * nlist vectors recommended
index.train(train_vectors)
# Then add
index.add(vectors)
# Query
index.nprobe = 10 # How many clusters to search at query time
distances, indices = index.search(queries, k=10)Pro Tip: Start with IndexFlatL2 while prototyping. Move to IndexHNSWFlat when query time becomes a bottleneck (usually around 100k+ vectors). Only use IVFPQ when memory, not speed, becomes the limiting factor. Premature optimization with IVF/PQ is a common source of worse-than-necessary recall.
Fix 3: Training IVF Indexes
RuntimeError: Quantizer is not trainedIVF and PQ indexes cluster vectors into groups at build time. Training requires representative data — all-zeros vectors or constant data breaks the clustering.
import faiss
import numpy as np
dim = 1536
nlist = 100
quantizer = faiss.IndexFlatL2(dim)
index = faiss.IndexIVFFlat(quantizer, dim, nlist)
# WRONG — training on random noise
train_data = np.random.rand(1000, dim).astype(np.float32)
# No error, but clustering quality suffers — use real embeddings
# CORRECT — use a sample of your actual data
real_data = load_real_embeddings(sample_size=50000)
index.train(real_data)
# Training size rule of thumb:
# - Minimum: 39 * nlist (FAISS warns below this)
# - Good: 256 * nlist
# - Full: 1000 * nlist if you have the dataTraining on GPU (much faster for large training sets):
# Move quantizer to GPU for training
res = faiss.StandardGpuResources()
cpu_quantizer = faiss.IndexFlatL2(dim)
gpu_quantizer = faiss.index_cpu_to_gpu(res, 0, cpu_quantizer)
index = faiss.IndexIVFFlat(gpu_quantizer, dim, nlist)
index.train(train_data)
index.add(vectors)is_trained flag to check:
print(index.is_trained) # True after trainingTraining is NOT required for:
IndexFlatL2,IndexFlatIP(exact search, no clustering)IndexHNSWFlat(graph-based, built incrementally)
Fix 4: Matching Query and Index Precision
All vectors must be float32 contiguous NumPy arrays. FAISS doesn’t accept float64, integer, or non-contiguous arrays.
import numpy as np
import faiss
# WRONG — float64 (default NumPy dtype)
vectors = np.random.rand(1000, 1536) # Default float64
index.add(vectors) # TypeError or weird behavior
# WRONG — non-contiguous (from slicing or transpose)
vectors = large_matrix[::2] # Every other row — may be non-contiguous
index.add(vectors)
# CORRECT — float32, contiguous
vectors = np.ascontiguousarray(vectors.astype(np.float32))
index.add(vectors)Convert PyTorch tensors:
import torch
import numpy as np
tensor = torch.randn(1000, 1536)
vectors = tensor.cpu().numpy().astype(np.float32)
vectors = np.ascontiguousarray(vectors)
index.add(vectors)Batch additions for memory efficiency:
def add_in_batches(index, vectors, batch_size=10000):
for i in range(0, len(vectors), batch_size):
batch = vectors[i:i + batch_size].astype(np.float32)
batch = np.ascontiguousarray(batch)
index.add(batch)
add_in_batches(index, my_large_matrix)Fix 5: GPU Acceleration
GPU FAISS is dramatically faster for large-batch queries. But not all index types support GPU, and the speedup varies.
import faiss
# CPU index first
cpu_index = faiss.IndexFlatL2(dim)
# Move to GPU
res = faiss.StandardGpuResources() # Default GPU resources
gpu_index = faiss.index_cpu_to_gpu(res, 0, cpu_index) # GPU 0
# Add and search as usual — all on GPU now
gpu_index.add(vectors.astype(np.float32))
distances, indices = gpu_index.search(queries.astype(np.float32), k=10)Multi-GPU for very large indexes:
gpu_index = faiss.index_cpu_to_all_gpus(cpu_index)
# Automatically shards across all available GPUsGPU index types — not all indexes work on GPU:
| Index | GPU Support |
|---|---|
IndexFlatL2, IndexFlatIP | Yes (fastest on GPU) |
IndexIVFFlat, IndexIVFPQ | Yes |
IndexHNSW* | No — HNSW is CPU-only |
IndexLSH | No |
Move back to CPU for serialization:
cpu_index = faiss.index_gpu_to_cpu(gpu_index)
faiss.write_index(cpu_index, "my_index.faiss")Common Mistake: Trying to save a GPU index directly with write_index. GPU indexes must be moved back to CPU first. Otherwise FAISS either errors or saves an unusable file.
Fix 6: Saving and Loading Indexes
import faiss
# Save
faiss.write_index(index, "my_index.faiss")
# Load
index = faiss.read_index("my_index.faiss")
# Query as normal
distances, indices = index.search(queries, k=10)With metadata mapping (FAISS only stores vectors, not your IDs or payload):
import faiss
import numpy as np
import pickle
class VectorStore:
def __init__(self, dim):
self.index = faiss.IndexFlatL2(dim)
self.id_to_payload = {} # Maps FAISS position → your data
def add(self, vectors, ids, payloads):
start = self.index.ntotal
self.index.add(vectors.astype(np.float32))
for i, (id_, payload) in enumerate(zip(ids, payloads)):
self.id_to_payload[start + i] = {"id": id_, "payload": payload}
def search(self, query, k=10):
distances, indices = self.index.search(query.astype(np.float32), k)
results = []
for i, idx in enumerate(indices[0]):
if idx == -1: # No more results
break
results.append({
"distance": float(distances[0][i]),
**self.id_to_payload[int(idx)],
})
return results
def save(self, path):
faiss.write_index(self.index, f"{path}.faiss")
with open(f"{path}.pkl", "wb") as f:
pickle.dump(self.id_to_payload, f)
def load(self, path):
self.index = faiss.read_index(f"{path}.faiss")
with open(f"{path}.pkl", "rb") as f:
self.id_to_payload = pickle.load(f)
store = VectorStore(1536)
store.add(vectors, ids=[...], payloads=[...])
store.save("my_store")IndexIDMap wrapper for attaching integer IDs:
index = faiss.IndexFlatL2(dim)
index_with_ids = faiss.IndexIDMap(index)
ids = np.array([1001, 1002, 1003], dtype=np.int64)
vectors = np.random.rand(3, dim).astype(np.float32)
index_with_ids.add_with_ids(vectors, ids)
distances, retrieved_ids = index_with_ids.search(query, k=10)
# retrieved_ids contains 1001, 1002, 1003 (not 0, 1, 2)Fix 7: Distance Thresholds and Result Quality
distances, indices = index.search(query, k=10)
# Filter by threshold (lower distance = more similar for L2)
threshold = 0.5
filtered = [
(idx, dist)
for idx, dist in zip(indices[0], distances[0])
if dist < threshold and idx != -1
]Range search — find all vectors within a distance:
# FAISS 1.6+
lims, distances, indices = index.range_search(query, radius=0.5)
# lims[i] to lims[i+1] gives the range in distances/indices for query iNot all indexes support range search — IndexFlatL2 and IndexIVF* do, HNSW does not.
Score interpretation:
IndexFlatL2/IndexIVF*: lower distance = more similar (squared L2)IndexFlatIP: higher score = more similar- After normalization + IP: score is cosine similarity [-1, 1]
Convert L2 distance to cosine similarity (only valid for normalized vectors):
# For unit-normalized vectors only
cosine_similarity = 1 - (l2_distance_squared / 2)Fix 8: Combining with LangChain and LlamaIndex
FAISS is a default backend for many higher-level libraries — using it through those is often easier than direct FAISS.
LangChain:
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Build index
texts = ["doc 1", "doc 2", "doc 3"]
vector_store = FAISS.from_texts(texts, embedding=embeddings)
# Search
results = vector_store.similarity_search("query", k=5)
# Save and load
vector_store.save_local("faiss_index")
loaded = FAISS.load_local(
"faiss_index",
embeddings,
allow_dangerous_deserialization=True, # Required in newer versions
)For LangChain setup and common errors, see LangChain Python not working.
LlamaIndex:
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.faiss import FaissVectorStore
import faiss
dim = 1536
faiss_index = faiss.IndexFlatL2(dim)
vector_store = FaissVectorStore(faiss_index=faiss_index)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
# Persist
index.storage_context.persist(persist_dir="./faiss_storage")For LlamaIndex setup patterns, see LlamaIndex not working.
Still Not Working?
FAISS vs Managed Vector Databases
- FAISS — Library, runs in-process. Best for embedded use cases, research, or when you want full control.
- Chroma — Wraps FAISS/HNSW, adds persistence and metadata. See ChromaDB not working.
- Qdrant / Pinecone — Dedicated vector databases with filtering, namespaces, APIs. See Qdrant not working and Pinecone not working.
- pgvector — Vector in Postgres, best for SQL integration. See pgvector not working.
FAISS is often the right building block for custom systems; managed databases are better when you want batteries included.
Benchmarking Index Types
import time
def benchmark_search(index, queries, k=10, n_runs=100):
# Warm up
for _ in range(5):
index.search(queries[:1], k)
start = time.perf_counter()
for _ in range(n_runs):
index.search(queries, k)
elapsed = time.perf_counter() - start
qps = (n_runs * len(queries)) / elapsed
print(f"QPS: {qps:.1f}")
# Compare
for name, index in [("Flat", flat), ("HNSW", hnsw), ("IVF", ivf)]:
print(name)
benchmark_search(index, queries)Memory-Efficient Indexes with Product Quantization
For billion-scale datasets:
# Compress to 8 bytes per vector (from 6144 bytes for 1536 dims)
pq = faiss.IndexPQ(dim, 32, 8) # M=32 subquantizers, 8 bits each
pq.train(train_vectors)
pq.add(vectors)
# Typically combined with IVF for even better memory/speed
ivfpq = faiss.IndexIVFPQ(quantizer, dim, nlist=1024, m=32, nbits=8)
ivfpq.train(train_vectors)
ivfpq.add(vectors)Debugging Zero Results
If search returns -1 indices or empty results:
- Check
index.ntotal— how many vectors are in the index - Confirm vectors added successfully (no silent failures)
- Check query dtype is float32
- For IVF indexes, increase
nprobe— too low misses clusters containing relevant results
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: Gradio Not Working — Share Link, Queue Timeout, and Component Errors
How to fix Gradio errors — share link not working, queue timeout, component not updating, Blocks layout mistakes, flagging permission denied, file upload size limit, and HuggingFace Spaces deployment failures.
Fix: Jupyter Notebook Not Working — Kernel Dead, Module Not Found, and Widget Errors
How to fix Jupyter errors — kernel fails to start or dies, ModuleNotFoundError despite pip install, matplotlib plots not showing, ipywidgets not rendering in JupyterLab, port already in use, and jupyter command not found.
Fix: LightGBM Not Working — Installation Errors, Categorical Features, and Training Issues
How to fix LightGBM errors — ImportError libomp libgomp not found, do not support special JSON characters in feature name, categorical feature index out of range, num_leaves vs max_depth overfitting, early stopping callback changes, and GPU build errors.
Fix: LlamaIndex Not Working — Import Errors, Vector Store Issues, and Query Engine Failures
How to fix LlamaIndex errors — ImportError llama_index.core module not found, ServiceContext deprecated use Settings instead, vector store index not persisting, query engine returns irrelevant results, and LlamaIndex 0.10 migration.