Fix: FAISS Not Working — Import Errors, Index Selection, and GPU Setup

Q: How do I fix "FAISS Not Working — Import Errors, Index Selection, and GPU Setup"?

How to fix FAISS errors — ImportError cannot import name swigfaiss, faiss-gpu vs faiss-cpu install, IndexFlatL2 slow on large data, IVF training required, index serialization write_index, and dimension mismatch.

The Error

You install FAISS and the import crashes:

ImportError: cannot import name 'swigfaiss' from 'faiss'
ModuleNotFoundError: No module named 'faiss'

Or GPU FAISS doesn’t detect your CUDA device:

RuntimeError: No GPU resources available
AssertionError: GPU index not supported: faiss.IndexFlatL2

Or search is unbearably slow on a million vectors:

index = faiss.IndexFlatL2(1536)
index.add(big_matrix)
# Query takes 5 seconds on 1M vectors — not usable

Or you train an IVF index without enough data:

WARNING: clustering 100 points to 256 centroids
RuntimeError: Quantizer is not trained

Or you save an index and it won’t load:

TypeError: write_index() missing 1 required positional argument

FAISS (Facebook AI Similarity Search) is the foundation of modern vector search — many vector databases (including parts of Chroma) use it under the hood. It’s extremely fast but opinionated: you must pick the right index type for your scale, configure training for compressed indexes, and match the numeric precision of your vectors. This guide covers each failure mode.

Why This Happens

FAISS is a C++ library with Python bindings generated by SWIG. The pip packages (faiss-cpu, faiss-gpu) are separate — installing one doesn’t install the other. The GPU version requires a matching CUDA installation and specific NVIDIA driver versions.

Unlike higher-level libraries, FAISS gives you direct control over the index structure: flat (exact, slow), IVF (partitioned, needs training), HNSW (graph-based, very fast), and product quantization (compressed, lossy). Picking wrong means either slow queries or poor recall.

Diagnostic Timeline

FAISS failures rarely look like errors. They look like bad results, slow queries, or mysterious crashes. Here is how an experienced engineer narrows down a FAISS issue.

Minute 0 — First guess: rebuild the index. Recall looks wrong, so you delete the index file and re-add all vectors. The new index has the same problem. Rebuilding rarely fixes FAISS because the index structure is rarely the issue — vector preparation almost always is.

Minute 5 — Verify dtype and contiguity. Run vectors.dtype and vectors.flags['C_CONTIGUOUS'] on what you actually pass to index.add(). FAISS silently misbehaves with float64, integer, or non-contiguous arrays. Slicing operations (x[::2], x.T) produce non-contiguous views that FAISS reads as garbage. Wrap in np.ascontiguousarray(x.astype(np.float32)) before any add or search.

Minute 10 — Check normalization for cosine. If you wanted cosine similarity but used IndexFlatIP without calling faiss.normalize_L2() on both index vectors and queries, the scores are dot products, not cosines. Magnitude differences swamp angle differences and the top-k becomes “longest vectors” instead of “most similar.” This is the most common cause of “the same FAISS code worked in my notebook but ranks differently in production” — the notebook had normalized, production did not.

Minute 20 — IVF training set audit. If you train an IVF index on too few vectors, the centroids barely cover your space and recall collapses. Check index.is_trained and verify training used at least 39 * nlist vectors of the same distribution as your data. Training on random noise or a non-representative sample produces useless quantizers.

Minute 30 — GPU vs CPU index mismatch. A IndexHNSWFlat cannot run on GPU. A GPU index cannot be serialized with write_index directly. Mixing them is the cause of AssertionError: GPU index not supported and “saved index file is unusable” reports. Check faiss.get_num_gpus() and confirm the index type you picked is GPU-compatible before moving anything to GPU.

Minute 45 — nprobe too low for IVF. Default nprobe=1 searches only one cluster — recall on a 100-cluster index is roughly 1%. Raise nprobe to 10-32 for production. If queries return wildly varying neighbors run-to-run for the same query, the cluster boundaries are biting and nprobe is the lever.

Fix 1: Installing FAISS

CPU install (recommended for most):

pip install faiss-cpu

GPU install — requires CUDA:

pip install faiss-gpu   # For CUDA 11.x

# Or via conda (often easier for GPU)
conda install -c pytorch faiss-gpu

Verify install:

import faiss
print(faiss.__version__)
# 1.8.0

# For GPU, check GPU availability
print(faiss.get_num_gpus())   # 0 if CPU-only build or no GPU

Common install errors:

ImportError: cannot import name 'swigfaiss' from 'faiss'

Usually caused by an incomplete install or mixed environments. Fix:

pip uninstall faiss faiss-cpu faiss-gpu
pip install --no-cache-dir faiss-cpu

On Apple Silicon (M1/M2/M3), faiss-cpu works but has no GPU support:

pip install faiss-cpu
# Apple's Metal GPU is not supported — falls back to CPU

For Windows, only CPU version is officially supported:

pip install faiss-cpu
# faiss-gpu Windows wheels aren't published

Common Mistake: Installing both faiss-cpu and faiss-gpu in the same environment. They conflict — the imports resolve to one or the other unpredictably. Pick one and stick with it.

Fix 2: Choosing the Right Index Type

FAISS offers many index types. The choice depends on dataset size and recall/speed trade-off:

Index	Description	Best for
`IndexFlatL2`	Exact brute-force, L2 distance	<100k vectors, or when perfect recall is required
`IndexFlatIP`	Exact brute-force, inner product	Same as above with cosine-equivalent search (normalize first)
`IndexIVFFlat`	Inverted file with flat storage	100k–10M, needs training
`IndexIVFPQ`	Inverted file + product quantization	10M+, memory-constrained
`IndexHNSWFlat`	HNSW graph, full precision	100k–10M, incremental inserts, no training
`IndexHNSWPQ`	HNSW + product quantization	Very large datasets with memory limits
`IndexLSH`	Locality-sensitive hashing	Rarely used; lower quality

Flat index (exact search):

import faiss
import numpy as np

dim = 1536
index = faiss.IndexFlatL2(dim)   # L2 distance, exact

# Add vectors (must be float32)
vectors = np.random.rand(10000, dim).astype(np.float32)
index.add(vectors)

# Search
queries = np.random.rand(5, dim).astype(np.float32)
distances, indices = index.search(queries, k=10)   # Top 10 per query

print(distances.shape)   # (5, 10)
print(indices.shape)     # (5, 10)

For cosine similarity, normalize vectors and use inner product:

# Normalize to unit length
faiss.normalize_L2(vectors)   # In-place

index = faiss.IndexFlatIP(dim)   # Inner product = cosine for unit vectors
index.add(vectors)

# Normalize query too
faiss.normalize_L2(queries)
distances, indices = index.search(queries, k=10)
# distances now contain cosine similarity (higher = more similar)

HNSW index — great default for medium-to-large datasets:

index = faiss.IndexHNSWFlat(dim, 32)   # M=32 (graph connectivity)
index.hnsw.efConstruction = 64   # Build-time quality
index.hnsw.efSearch = 32          # Query-time recall (higher = slower, better recall)

index.add(vectors)
distances, indices = index.search(queries, k=10)

IVF index — needs training but scales to large datasets:

nlist = 100   # Number of clusters; rule of thumb: sqrt(n_vectors)
quantizer = faiss.IndexFlatL2(dim)   # The quantizer defines how centroids are computed
index = faiss.IndexIVFFlat(quantizer, dim, nlist)

# TRAIN first — requires representative sample
train_vectors = vectors[:10000]   # At least 39 * nlist vectors recommended
index.train(train_vectors)

# Then add
index.add(vectors)

# Query
index.nprobe = 10   # How many clusters to search at query time
distances, indices = index.search(queries, k=10)

Pro Tip: Start with IndexFlatL2 while prototyping. Move to IndexHNSWFlat when query time becomes a bottleneck (usually around 100k+ vectors). Only use IVFPQ when memory, not speed, becomes the limiting factor. Premature optimization with IVF/PQ is a common source of worse-than-necessary recall.

Fix 3: Training IVF Indexes

RuntimeError: Quantizer is not trained

IVF and PQ indexes cluster vectors into groups at build time. Training requires representative data — all-zeros vectors or constant data breaks the clustering.

import faiss
import numpy as np

dim = 1536
nlist = 100
quantizer = faiss.IndexFlatL2(dim)
index = faiss.IndexIVFFlat(quantizer, dim, nlist)

# WRONG — training on random noise
train_data = np.random.rand(1000, dim).astype(np.float32)
# No error, but clustering quality suffers — use real embeddings

# CORRECT — use a sample of your actual data
real_data = load_real_embeddings(sample_size=50000)
index.train(real_data)

# Training size rule of thumb:
# - Minimum: 39 * nlist (FAISS warns below this)
# - Good: 256 * nlist
# - Full: 1000 * nlist if you have the data

Training on GPU (much faster for large training sets):

# Move quantizer to GPU for training
res = faiss.StandardGpuResources()
cpu_quantizer = faiss.IndexFlatL2(dim)
gpu_quantizer = faiss.index_cpu_to_gpu(res, 0, cpu_quantizer)

index = faiss.IndexIVFFlat(gpu_quantizer, dim, nlist)
index.train(train_data)
index.add(vectors)

is_trained flag to check:

print(index.is_trained)   # True after training

Training is NOT required for:

IndexFlatL2, IndexFlatIP (exact search, no clustering)
IndexHNSWFlat (graph-based, built incrementally)

Fix 4: Matching Query and Index Precision

All vectors must be float32 contiguous NumPy arrays. FAISS doesn’t accept float64, integer, or non-contiguous arrays.

import numpy as np
import faiss

# WRONG — float64 (default NumPy dtype)
vectors = np.random.rand(1000, 1536)   # Default float64
index.add(vectors)   # TypeError or weird behavior

# WRONG — non-contiguous (from slicing or transpose)
vectors = large_matrix[::2]   # Every other row — may be non-contiguous
index.add(vectors)

# CORRECT — float32, contiguous
vectors = np.ascontiguousarray(vectors.astype(np.float32))
index.add(vectors)

Convert PyTorch tensors:

import torch
import numpy as np

tensor = torch.randn(1000, 1536)
vectors = tensor.cpu().numpy().astype(np.float32)
vectors = np.ascontiguousarray(vectors)
index.add(vectors)

Batch additions for memory efficiency:

def add_in_batches(index, vectors, batch_size=10000):
    for i in range(0, len(vectors), batch_size):
        batch = vectors[i:i + batch_size].astype(np.float32)
        batch = np.ascontiguousarray(batch)
        index.add(batch)

add_in_batches(index, my_large_matrix)

Fix 5: GPU Acceleration

GPU FAISS is dramatically faster for large-batch queries. But not all index types support GPU, and the speedup varies.

import faiss

# CPU index first
cpu_index = faiss.IndexFlatL2(dim)

# Move to GPU
res = faiss.StandardGpuResources()   # Default GPU resources
gpu_index = faiss.index_cpu_to_gpu(res, 0, cpu_index)   # GPU 0

# Add and search as usual — all on GPU now
gpu_index.add(vectors.astype(np.float32))
distances, indices = gpu_index.search(queries.astype(np.float32), k=10)

Multi-GPU for very large indexes:

gpu_index = faiss.index_cpu_to_all_gpus(cpu_index)
# Automatically shards across all available GPUs

GPU index types — not all indexes work on GPU:

Index	GPU Support
`IndexFlatL2`, `IndexFlatIP`	Yes (fastest on GPU)
`IndexIVFFlat`, `IndexIVFPQ`	Yes
`IndexHNSW*`	No — HNSW is CPU-only
`IndexLSH`	No

Move back to CPU for serialization:

cpu_index = faiss.index_gpu_to_cpu(gpu_index)
faiss.write_index(cpu_index, "my_index.faiss")

Common Mistake: Trying to save a GPU index directly with write_index. GPU indexes must be moved back to CPU first. Otherwise FAISS either errors or saves an unusable file.

Fix 6: Saving and Loading Indexes

import faiss

# Save
faiss.write_index(index, "my_index.faiss")

# Load
index = faiss.read_index("my_index.faiss")

# Query as normal
distances, indices = index.search(queries, k=10)

With metadata mapping (FAISS only stores vectors, not your IDs or payload):

import faiss
import numpy as np
import pickle

class VectorStore:
    def __init__(self, dim):
        self.index = faiss.IndexFlatL2(dim)
        self.id_to_payload = {}   # Maps FAISS position → your data

    def add(self, vectors, ids, payloads):
        start = self.index.ntotal
        self.index.add(vectors.astype(np.float32))
        for i, (id_, payload) in enumerate(zip(ids, payloads)):
            self.id_to_payload[start + i] = {"id": id_, "payload": payload}

    def search(self, query, k=10):
        distances, indices = self.index.search(query.astype(np.float32), k)
        results = []
        for i, idx in enumerate(indices[0]):
            if idx == -1:   # No more results
                break
            results.append({
                "distance": float(distances[0][i]),
                **self.id_to_payload[int(idx)],
            })
        return results

    def save(self, path):
        faiss.write_index(self.index, f"{path}.faiss")
        with open(f"{path}.pkl", "wb") as f:
            pickle.dump(self.id_to_payload, f)

    def load(self, path):
        self.index = faiss.read_index(f"{path}.faiss")
        with open(f"{path}.pkl", "rb") as f:
            self.id_to_payload = pickle.load(f)

store = VectorStore(1536)
store.add(vectors, ids=[...], payloads=[...])
store.save("my_store")

IndexIDMap wrapper for attaching integer IDs:

index = faiss.IndexFlatL2(dim)
index_with_ids = faiss.IndexIDMap(index)

ids = np.array([1001, 1002, 1003], dtype=np.int64)
vectors = np.random.rand(3, dim).astype(np.float32)
index_with_ids.add_with_ids(vectors, ids)

distances, retrieved_ids = index_with_ids.search(query, k=10)
# retrieved_ids contains 1001, 1002, 1003 (not 0, 1, 2)

Fix 7: Distance Thresholds and Result Quality

distances, indices = index.search(query, k=10)

# Filter by threshold (lower distance = more similar for L2)
threshold = 0.5
filtered = [
    (idx, dist)
    for idx, dist in zip(indices[0], distances[0])
    if dist < threshold and idx != -1
]

Range search — find all vectors within a distance:

# FAISS 1.6+
lims, distances, indices = index.range_search(query, radius=0.5)
# lims[i] to lims[i+1] gives the range in distances/indices for query i

Not all indexes support range search — IndexFlatL2 and IndexIVF* do, HNSW does not.

Score interpretation:

IndexFlatL2 / IndexIVF*: lower distance = more similar (squared L2)
IndexFlatIP: higher score = more similar
After normalization + IP: score is cosine similarity [-1, 1]

Convert L2 distance to cosine similarity (only valid for normalized vectors):

# For unit-normalized vectors only
cosine_similarity = 1 - (l2_distance_squared / 2)

Fix 8: Combining with LangChain and LlamaIndex

FAISS is a default backend for many higher-level libraries — using it through those is often easier than direct FAISS.

LangChain:

from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Build index
texts = ["doc 1", "doc 2", "doc 3"]
vector_store = FAISS.from_texts(texts, embedding=embeddings)

# Search
results = vector_store.similarity_search("query", k=5)

# Save and load
vector_store.save_local("faiss_index")
loaded = FAISS.load_local(
    "faiss_index",
    embeddings,
    allow_dangerous_deserialization=True,   # Required in newer versions
)

For LangChain setup and common errors, see LangChain Python not working.

LlamaIndex:

from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.faiss import FaissVectorStore
import faiss

dim = 1536
faiss_index = faiss.IndexFlatL2(dim)
vector_store = FaissVectorStore(faiss_index=faiss_index)

storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

# Persist
index.storage_context.persist(persist_dir="./faiss_storage")

For LlamaIndex setup patterns, see LlamaIndex not working.

Still Not Working?

FAISS vs Managed Vector Databases

FAISS — Library, runs in-process. Best for embedded use cases, research, or when you want full control.
Chroma — Wraps FAISS/HNSW, adds persistence and metadata. See ChromaDB not working.
Qdrant / Pinecone — Dedicated vector databases with filtering, namespaces, APIs. See Pinecone not working.
pgvector — Vector in Postgres, best for SQL integration.

FAISS is often the right building block for custom systems; managed databases are better when you want batteries included.

Benchmarking Index Types

import time

def benchmark_search(index, queries, k=10, n_runs=100):
    # Warm up
    for _ in range(5):
        index.search(queries[:1], k)

    start = time.perf_counter()
    for _ in range(n_runs):
        index.search(queries, k)
    elapsed = time.perf_counter() - start

    qps = (n_runs * len(queries)) / elapsed
    print(f"QPS: {qps:.1f}")

# Compare
for name, index in [("Flat", flat), ("HNSW", hnsw), ("IVF", ivf)]:
    print(name)
    benchmark_search(index, queries)

Memory-Efficient Indexes with Product Quantization

For billion-scale datasets:

# Compress to 8 bytes per vector (from 6144 bytes for 1536 dims)
pq = faiss.IndexPQ(dim, 32, 8)   # M=32 subquantizers, 8 bits each
pq.train(train_vectors)
pq.add(vectors)

# Typically combined with IVF for even better memory/speed
ivfpq = faiss.IndexIVFPQ(quantizer, dim, nlist=1024, m=32, nbits=8)
ivfpq.train(train_vectors)
ivfpq.add(vectors)

Debugging Zero Results

If search returns -1 indices or empty results:

Check index.ntotal — how many vectors are in the index
Confirm vectors added successfully (no silent failures)
Check query dtype is float32
For IVF indexes, increase nprobe — too low misses clusters containing relevant results

Memory Spikes During Index Build

Building an HNSW index uses 2-3x the final index size in peak memory because the graph construction holds candidate neighbor lists. A 4 GB final index can spike to 12 GB during build. If your build crashes with OOM but the saved index is small, this is the cause. Lower efConstruction and M to reduce build memory, or build in batches with add_with_ids to a pre-allocated index.

Concurrent Search Returning Inconsistent Results

FAISS indexes are not thread-safe for concurrent writes during reads. If your service hot-swaps the index while requests are in flight, queries can return half-old, half-new results or crash with segfaults. Use a read-write lock or double-buffer the index — build the new one off to the side, then atomically swap the pointer when ready.

Index Loaded But `ntotal` Is Zero

If faiss.read_index() succeeds but index.ntotal == 0, the file is likely a quantizer-only IVF index whose data was never added, or it was saved before index.add() completed. The save/load cycle does not warn about this. Confirm the source pipeline calls write_index after add, not after train.

Fix: FAISS Not Working — Import Errors, Index Selection, and GPU Setup

The Error

Why This Happens

Diagnostic Timeline

Fix 1: Installing FAISS

Fix 2: Choosing the Right Index Type

Fix 3: Training IVF Indexes

Fix 4: Matching Query and Index Precision

Fix 5: GPU Acceleration

Fix 6: Saving and Loading Indexes

Fix 7: Distance Thresholds and Result Quality

Fix 8: Combining with LangChain and LlamaIndex

Still Not Working?

FAISS vs Managed Vector Databases

Benchmarking Index Types

Memory-Efficient Indexes with Product Quantization

Debugging Zero Results

Memory Spikes During Index Build

Concurrent Search Returning Inconsistent Results

Index Loaded But `ntotal` Is Zero

Related Articles

Fix: Gradio Not Working — Share Link, Queue Timeout, and Component Errors

Fix: Jupyter Notebook Not Working — Kernel Dead, Module Not Found, and Widget Errors

Fix: LightGBM Not Working — Installation Errors, Categorical Features, and Training Issues

Fix: LlamaIndex Not Working — Import Errors, Vector Store Issues, and Query Engine Failures

The Error

Why This Happens

Diagnostic Timeline

Fix 1: Installing FAISS

Fix 2: Choosing the Right Index Type

Fix 3: Training IVF Indexes

Fix 4: Matching Query and Index Precision

Fix 5: GPU Acceleration

Fix 6: Saving and Loading Indexes

Fix 7: Distance Thresholds and Result Quality

Fix 8: Combining with LangChain and LlamaIndex

Still Not Working?

FAISS vs Managed Vector Databases

Benchmarking Index Types

Memory-Efficient Indexes with Product Quantization

Debugging Zero Results

Memory Spikes During Index Build

Concurrent Search Returning Inconsistent Results

Index Loaded But ntotal Is Zero

Related Articles

Fix: Gradio Not Working — Share Link, Queue Timeout, and Component Errors

Fix: Jupyter Notebook Not Working — Kernel Dead, Module Not Found, and Widget Errors

Fix: LightGBM Not Working — Installation Errors, Categorical Features, and Training Issues

Fix: LlamaIndex Not Working — Import Errors, Vector Store Issues, and Query Engine Failures

Index Loaded But `ntotal` Is Zero