Skip to content

Fix: FAISS Not Working — Import Errors, Index Selection, and GPU Setup

FixDevs ·

Quick Answer

How to fix FAISS errors — ImportError cannot import name swigfaiss, faiss-gpu vs faiss-cpu install, IndexFlatL2 slow on large data, IVF training required, index serialization write_index, and dimension mismatch.

The Error

You install FAISS and the import crashes:

ImportError: cannot import name 'swigfaiss' from 'faiss'
ModuleNotFoundError: No module named 'faiss'

Or GPU FAISS doesn’t detect your CUDA device:

RuntimeError: No GPU resources available
AssertionError: GPU index not supported: faiss.IndexFlatL2

Or search is unbearably slow on a million vectors:

index = faiss.IndexFlatL2(1536)
index.add(big_matrix)
# Query takes 5 seconds on 1M vectors — not usable

Or you train an IVF index without enough data:

WARNING: clustering 100 points to 256 centroids
RuntimeError: Quantizer is not trained

Or you save an index and it won’t load:

TypeError: write_index() missing 1 required positional argument

FAISS (Facebook AI Similarity Search) is the foundation of modern vector search — many vector databases (including parts of Chroma) use it under the hood. It’s extremely fast but opinionated: you must pick the right index type for your scale, configure training for compressed indexes, and match the numeric precision of your vectors. This guide covers each failure mode.

Why This Happens

FAISS is a C++ library with Python bindings generated by SWIG. The pip packages (faiss-cpu, faiss-gpu) are separate — installing one doesn’t install the other. The GPU version requires a matching CUDA installation and specific NVIDIA driver versions.

Unlike higher-level libraries, FAISS gives you direct control over the index structure: flat (exact, slow), IVF (partitioned, needs training), HNSW (graph-based, very fast), and product quantization (compressed, lossy). Picking wrong means either slow queries or poor recall.

Fix 1: Installing FAISS

CPU install (recommended for most):

pip install faiss-cpu

GPU install — requires CUDA:

pip install faiss-gpu   # For CUDA 11.x

# Or via conda (often easier for GPU)
conda install -c pytorch faiss-gpu

Verify install:

import faiss
print(faiss.__version__)
# 1.8.0

# For GPU, check GPU availability
print(faiss.get_num_gpus())   # 0 if CPU-only build or no GPU

Common install errors:

ImportError: cannot import name 'swigfaiss' from 'faiss'

Usually caused by an incomplete install or mixed environments. Fix:

pip uninstall faiss faiss-cpu faiss-gpu
pip install --no-cache-dir faiss-cpu

On Apple Silicon (M1/M2/M3), faiss-cpu works but has no GPU support:

pip install faiss-cpu
# Apple's Metal GPU is not supported — falls back to CPU

For Windows, only CPU version is officially supported:

pip install faiss-cpu
# faiss-gpu Windows wheels aren't published

Common Mistake: Installing both faiss-cpu and faiss-gpu in the same environment. They conflict — the imports resolve to one or the other unpredictably. Pick one and stick with it.

Fix 2: Choosing the Right Index Type

FAISS offers many index types. The choice depends on dataset size and recall/speed trade-off:

IndexDescriptionBest for
IndexFlatL2Exact brute-force, L2 distance<100k vectors, or when perfect recall is required
IndexFlatIPExact brute-force, inner productSame as above with cosine-equivalent search (normalize first)
IndexIVFFlatInverted file with flat storage100k–10M, needs training
IndexIVFPQInverted file + product quantization10M+, memory-constrained
IndexHNSWFlatHNSW graph, full precision100k–10M, incremental inserts, no training
IndexHNSWPQHNSW + product quantizationVery large datasets with memory limits
IndexLSHLocality-sensitive hashingRarely used; lower quality

Flat index (exact search):

import faiss
import numpy as np

dim = 1536
index = faiss.IndexFlatL2(dim)   # L2 distance, exact

# Add vectors (must be float32)
vectors = np.random.rand(10000, dim).astype(np.float32)
index.add(vectors)

# Search
queries = np.random.rand(5, dim).astype(np.float32)
distances, indices = index.search(queries, k=10)   # Top 10 per query

print(distances.shape)   # (5, 10)
print(indices.shape)     # (5, 10)

For cosine similarity, normalize vectors and use inner product:

# Normalize to unit length
faiss.normalize_L2(vectors)   # In-place

index = faiss.IndexFlatIP(dim)   # Inner product = cosine for unit vectors
index.add(vectors)

# Normalize query too
faiss.normalize_L2(queries)
distances, indices = index.search(queries, k=10)
# distances now contain cosine similarity (higher = more similar)

HNSW index — great default for medium-to-large datasets:

index = faiss.IndexHNSWFlat(dim, 32)   # M=32 (graph connectivity)
index.hnsw.efConstruction = 64   # Build-time quality
index.hnsw.efSearch = 32          # Query-time recall (higher = slower, better recall)

index.add(vectors)
distances, indices = index.search(queries, k=10)

IVF index — needs training but scales to large datasets:

nlist = 100   # Number of clusters; rule of thumb: sqrt(n_vectors)
quantizer = faiss.IndexFlatL2(dim)   # The quantizer defines how centroids are computed
index = faiss.IndexIVFFlat(quantizer, dim, nlist)

# TRAIN first — requires representative sample
train_vectors = vectors[:10000]   # At least 39 * nlist vectors recommended
index.train(train_vectors)

# Then add
index.add(vectors)

# Query
index.nprobe = 10   # How many clusters to search at query time
distances, indices = index.search(queries, k=10)

Pro Tip: Start with IndexFlatL2 while prototyping. Move to IndexHNSWFlat when query time becomes a bottleneck (usually around 100k+ vectors). Only use IVFPQ when memory, not speed, becomes the limiting factor. Premature optimization with IVF/PQ is a common source of worse-than-necessary recall.

Fix 3: Training IVF Indexes

RuntimeError: Quantizer is not trained

IVF and PQ indexes cluster vectors into groups at build time. Training requires representative data — all-zeros vectors or constant data breaks the clustering.

import faiss
import numpy as np

dim = 1536
nlist = 100
quantizer = faiss.IndexFlatL2(dim)
index = faiss.IndexIVFFlat(quantizer, dim, nlist)

# WRONG — training on random noise
train_data = np.random.rand(1000, dim).astype(np.float32)
# No error, but clustering quality suffers — use real embeddings

# CORRECT — use a sample of your actual data
real_data = load_real_embeddings(sample_size=50000)
index.train(real_data)

# Training size rule of thumb:
# - Minimum: 39 * nlist (FAISS warns below this)
# - Good: 256 * nlist
# - Full: 1000 * nlist if you have the data

Training on GPU (much faster for large training sets):

# Move quantizer to GPU for training
res = faiss.StandardGpuResources()
cpu_quantizer = faiss.IndexFlatL2(dim)
gpu_quantizer = faiss.index_cpu_to_gpu(res, 0, cpu_quantizer)

index = faiss.IndexIVFFlat(gpu_quantizer, dim, nlist)
index.train(train_data)
index.add(vectors)

is_trained flag to check:

print(index.is_trained)   # True after training

Training is NOT required for:

  • IndexFlatL2, IndexFlatIP (exact search, no clustering)
  • IndexHNSWFlat (graph-based, built incrementally)

Fix 4: Matching Query and Index Precision

All vectors must be float32 contiguous NumPy arrays. FAISS doesn’t accept float64, integer, or non-contiguous arrays.

import numpy as np
import faiss

# WRONG — float64 (default NumPy dtype)
vectors = np.random.rand(1000, 1536)   # Default float64
index.add(vectors)   # TypeError or weird behavior

# WRONG — non-contiguous (from slicing or transpose)
vectors = large_matrix[::2]   # Every other row — may be non-contiguous
index.add(vectors)

# CORRECT — float32, contiguous
vectors = np.ascontiguousarray(vectors.astype(np.float32))
index.add(vectors)

Convert PyTorch tensors:

import torch
import numpy as np

tensor = torch.randn(1000, 1536)
vectors = tensor.cpu().numpy().astype(np.float32)
vectors = np.ascontiguousarray(vectors)
index.add(vectors)

Batch additions for memory efficiency:

def add_in_batches(index, vectors, batch_size=10000):
    for i in range(0, len(vectors), batch_size):
        batch = vectors[i:i + batch_size].astype(np.float32)
        batch = np.ascontiguousarray(batch)
        index.add(batch)

add_in_batches(index, my_large_matrix)

Fix 5: GPU Acceleration

GPU FAISS is dramatically faster for large-batch queries. But not all index types support GPU, and the speedup varies.

import faiss

# CPU index first
cpu_index = faiss.IndexFlatL2(dim)

# Move to GPU
res = faiss.StandardGpuResources()   # Default GPU resources
gpu_index = faiss.index_cpu_to_gpu(res, 0, cpu_index)   # GPU 0

# Add and search as usual — all on GPU now
gpu_index.add(vectors.astype(np.float32))
distances, indices = gpu_index.search(queries.astype(np.float32), k=10)

Multi-GPU for very large indexes:

gpu_index = faiss.index_cpu_to_all_gpus(cpu_index)
# Automatically shards across all available GPUs

GPU index types — not all indexes work on GPU:

IndexGPU Support
IndexFlatL2, IndexFlatIPYes (fastest on GPU)
IndexIVFFlat, IndexIVFPQYes
IndexHNSW*No — HNSW is CPU-only
IndexLSHNo

Move back to CPU for serialization:

cpu_index = faiss.index_gpu_to_cpu(gpu_index)
faiss.write_index(cpu_index, "my_index.faiss")

Common Mistake: Trying to save a GPU index directly with write_index. GPU indexes must be moved back to CPU first. Otherwise FAISS either errors or saves an unusable file.

Fix 6: Saving and Loading Indexes

import faiss

# Save
faiss.write_index(index, "my_index.faiss")

# Load
index = faiss.read_index("my_index.faiss")

# Query as normal
distances, indices = index.search(queries, k=10)

With metadata mapping (FAISS only stores vectors, not your IDs or payload):

import faiss
import numpy as np
import pickle

class VectorStore:
    def __init__(self, dim):
        self.index = faiss.IndexFlatL2(dim)
        self.id_to_payload = {}   # Maps FAISS position → your data

    def add(self, vectors, ids, payloads):
        start = self.index.ntotal
        self.index.add(vectors.astype(np.float32))
        for i, (id_, payload) in enumerate(zip(ids, payloads)):
            self.id_to_payload[start + i] = {"id": id_, "payload": payload}

    def search(self, query, k=10):
        distances, indices = self.index.search(query.astype(np.float32), k)
        results = []
        for i, idx in enumerate(indices[0]):
            if idx == -1:   # No more results
                break
            results.append({
                "distance": float(distances[0][i]),
                **self.id_to_payload[int(idx)],
            })
        return results

    def save(self, path):
        faiss.write_index(self.index, f"{path}.faiss")
        with open(f"{path}.pkl", "wb") as f:
            pickle.dump(self.id_to_payload, f)

    def load(self, path):
        self.index = faiss.read_index(f"{path}.faiss")
        with open(f"{path}.pkl", "rb") as f:
            self.id_to_payload = pickle.load(f)

store = VectorStore(1536)
store.add(vectors, ids=[...], payloads=[...])
store.save("my_store")

IndexIDMap wrapper for attaching integer IDs:

index = faiss.IndexFlatL2(dim)
index_with_ids = faiss.IndexIDMap(index)

ids = np.array([1001, 1002, 1003], dtype=np.int64)
vectors = np.random.rand(3, dim).astype(np.float32)
index_with_ids.add_with_ids(vectors, ids)

distances, retrieved_ids = index_with_ids.search(query, k=10)
# retrieved_ids contains 1001, 1002, 1003 (not 0, 1, 2)

Fix 7: Distance Thresholds and Result Quality

distances, indices = index.search(query, k=10)

# Filter by threshold (lower distance = more similar for L2)
threshold = 0.5
filtered = [
    (idx, dist)
    for idx, dist in zip(indices[0], distances[0])
    if dist < threshold and idx != -1
]

Range search — find all vectors within a distance:

# FAISS 1.6+
lims, distances, indices = index.range_search(query, radius=0.5)
# lims[i] to lims[i+1] gives the range in distances/indices for query i

Not all indexes support range search — IndexFlatL2 and IndexIVF* do, HNSW does not.

Score interpretation:

  • IndexFlatL2 / IndexIVF*: lower distance = more similar (squared L2)
  • IndexFlatIP: higher score = more similar
  • After normalization + IP: score is cosine similarity [-1, 1]

Convert L2 distance to cosine similarity (only valid for normalized vectors):

# For unit-normalized vectors only
cosine_similarity = 1 - (l2_distance_squared / 2)

Fix 8: Combining with LangChain and LlamaIndex

FAISS is a default backend for many higher-level libraries — using it through those is often easier than direct FAISS.

LangChain:

from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Build index
texts = ["doc 1", "doc 2", "doc 3"]
vector_store = FAISS.from_texts(texts, embedding=embeddings)

# Search
results = vector_store.similarity_search("query", k=5)

# Save and load
vector_store.save_local("faiss_index")
loaded = FAISS.load_local(
    "faiss_index",
    embeddings,
    allow_dangerous_deserialization=True,   # Required in newer versions
)

For LangChain setup and common errors, see LangChain Python not working.

LlamaIndex:

from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.faiss import FaissVectorStore
import faiss

dim = 1536
faiss_index = faiss.IndexFlatL2(dim)
vector_store = FaissVectorStore(faiss_index=faiss_index)

storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

# Persist
index.storage_context.persist(persist_dir="./faiss_storage")

For LlamaIndex setup patterns, see LlamaIndex not working.

Still Not Working?

FAISS vs Managed Vector Databases

  • FAISS — Library, runs in-process. Best for embedded use cases, research, or when you want full control.
  • Chroma — Wraps FAISS/HNSW, adds persistence and metadata. See ChromaDB not working.
  • Qdrant / Pinecone — Dedicated vector databases with filtering, namespaces, APIs. See Qdrant not working and Pinecone not working.
  • pgvector — Vector in Postgres, best for SQL integration. See pgvector not working.

FAISS is often the right building block for custom systems; managed databases are better when you want batteries included.

Benchmarking Index Types

import time

def benchmark_search(index, queries, k=10, n_runs=100):
    # Warm up
    for _ in range(5):
        index.search(queries[:1], k)

    start = time.perf_counter()
    for _ in range(n_runs):
        index.search(queries, k)
    elapsed = time.perf_counter() - start

    qps = (n_runs * len(queries)) / elapsed
    print(f"QPS: {qps:.1f}")

# Compare
for name, index in [("Flat", flat), ("HNSW", hnsw), ("IVF", ivf)]:
    print(name)
    benchmark_search(index, queries)

Memory-Efficient Indexes with Product Quantization

For billion-scale datasets:

# Compress to 8 bytes per vector (from 6144 bytes for 1536 dims)
pq = faiss.IndexPQ(dim, 32, 8)   # M=32 subquantizers, 8 bits each
pq.train(train_vectors)
pq.add(vectors)

# Typically combined with IVF for even better memory/speed
ivfpq = faiss.IndexIVFPQ(quantizer, dim, nlist=1024, m=32, nbits=8)
ivfpq.train(train_vectors)
ivfpq.add(vectors)

Debugging Zero Results

If search returns -1 indices or empty results:

  1. Check index.ntotal — how many vectors are in the index
  2. Confirm vectors added successfully (no silent failures)
  3. Check query dtype is float32
  4. For IVF indexes, increase nprobe — too low misses clusters containing relevant results
F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles