AI, ML & LLM Errors
41 articles in this category
AI/ML errors live at the intersection of fragile dependencies (PyTorch + CUDA + Python version), API-shape changes (LangChain breaks weekly, model providers update without notice), and resource limits (memory, rate limits, context windows).
Recommended starting points:
- OpenAI API not working — auth, rate limits, and streaming.
- LangChain Python not working — chain composition and v0.1 vs v0.2 changes.
- PyTorch CUDA out of memory — VRAM tuning.
All articles (41)
Fix: AWS Bedrock Not Working — Model Access, IAM, Converse API, Streaming, and Cross-Region
How to fix AWS Bedrock errors — AccessDeniedException for model access, bedrock vs bedrock-runtime client, Converse vs InvokeModel API, streaming with ConverseStream, regional availability, and Knowledge Bases setup.
Fix: Cloudflare Workers AI Not Working — AI Binding, Model IDs, Streaming, and Vectorize Integration
How to fix Cloudflare Workers AI errors — env.AI binding setup, model ID format, text-generation streaming with ReadableStream, AI Gateway, Vectorize embeddings, region availability, and Neuron-based pricing.
Fix: Langfuse Not Working — SDK Init, Tracing Generations, LangChain Wrapper, and Self-Hosted Setup
How to fix Langfuse errors — Python/JS SDK init, trace/span/generation hierarchy, LangChain CallbackHandler, OpenAI wrapper, missing usage/cost data, prompt management, and self-hosted Postgres setup.
Fix: Outlines Not Working — Backend Setup, Pydantic Schemas, Regex, Choice, and Slow Sampling
How to fix Python Outlines errors — model backend missing, JSON schema vs Pydantic, regex pattern compilation slow, choice list timing, vLLM/Transformers/Ollama wiring, and streaming structured outputs.
Fix: DSPy Not Working — LM Configuration, Signatures, Modules, Optimizers, and Cache Surprises
How to fix DSPy errors — no LM configured, signature field types, ChainOfThought vs Predict, optimizer (MIPROv2) setup, retrieval module wiring, async usage, and cache invalidation between runs.
Fix: Instructor Not Working — Validation Loops, Mode Mismatch, Streaming, and Anthropic / Gemini Issues
How to fix Python Instructor errors — ValidationError loops, max_retries exhausted, mode=Mode.TOOLS vs JSON, partial streaming type errors, Anthropic and Gemini client patching, token usage tracking.
Fix: LiteLLM Not Working — Model Name Format, API Keys, Streaming, and Fallback Errors
How to fix LiteLLM errors — BadRequestError model not found, missing API key env vars, streaming chunk differences, fallback model not triggering, async drop_params, and proxy server 401.
Fix: Milvus Not Working — Connection Errors, Schema Setup, and Index Build Failures
How to fix Milvus errors — pymilvus connection refused localhost 19530, collection schema mismatch, index not built before search, partition not found, embedded vs standalone vs cluster, and flush before search.
Fix: ChromaDB Not Working — Persistent Client, Collection Errors, and Embedding Function Issues
How to fix ChromaDB errors — persistent client not saving data, collection already exists error, dimension mismatch in embeddings, embedding function required, HTTP client connection refused, and memory growing unbounded.
Fix: CrewAI Not Working — Agent Delegation, Task Context, and LLM Configuration Errors
How to fix CrewAI errors — LLM not configured ValidationError, agent delegation loop, task context not passed between agents, tool output truncated, process hierarchical vs sequential, and memory not persisting across runs.
Fix: Dask Not Working — Scheduler Errors, Out of Memory, and Delayed Not Computing
How to fix Dask errors — KilledWorker out of memory, client cannot connect to scheduler, delayed not computing, DataFrame partition size wrong, map_partitions TypeError, diagnostics dashboard not showing, and version mismatch.
Fix: FAISS Not Working — Import Errors, Index Selection, and GPU Setup
How to fix FAISS errors — ImportError cannot import name swigfaiss, faiss-gpu vs faiss-cpu install, IndexFlatL2 slow on large data, IVF training required, index serialization write_index, and dimension mismatch.
Fix: Gradio Not Working — Share Link, Queue Timeout, and Component Errors
How to fix Gradio errors — share link not working, queue timeout, component not updating, Blocks layout mistakes, flagging permission denied, file upload size limit, and HuggingFace Spaces deployment failures.
Fix: Jupyter Notebook Not Working — Kernel Dead, Module Not Found, and Widget Errors
How to fix Jupyter errors — kernel fails to start or dies, ModuleNotFoundError despite pip install, matplotlib plots not showing, ipywidgets not rendering in JupyterLab, port already in use, and jupyter command not found.
Fix: LangGraph Not Working — State Errors, Checkpointer Setup, and Cyclic Graph Failures
How to fix LangGraph errors — state not updating between nodes, checkpointer thread_id required, StateGraph compile error, conditional edges not routing, streaming events missing, recursion limit exceeded, and interrupt handling.
Fix: LightGBM Not Working — Installation Errors, Categorical Features, and Training Issues
How to fix LightGBM errors — ImportError libomp libgomp not found, do not support special JSON characters in feature name, categorical feature index out of range, num_leaves vs max_depth overfitting, early stopping callback changes, and GPU build errors.
Fix: LlamaIndex Not Working — Import Errors, Vector Store Issues, and Query Engine Failures
How to fix LlamaIndex errors — ImportError llama_index.core module not found, ServiceContext deprecated use Settings instead, vector store index not persisting, query engine returns irrelevant results, and LlamaIndex 0.10 migration.
Fix: Matplotlib Not Working — Plots Not Showing, Blank Output, and Figure Layout Problems
How to fix Matplotlib errors — plot not displaying, blank figure, RuntimeError main thread not in main loop, tight_layout UserWarning, overlapping subplots, savefig saving blank image, backend errors, and figure/axes confusion.
Fix: MLflow Not Working — Tracking URI, Artifact Store, and Model Registry Errors
How to fix MLflow errors — no tracking server, artifact path not accessible, model version not found, experiment not found, MLFLOW_TRACKING_URI not set, autolog not recording metrics, and MLflow UI showing no runs.
Fix: NumPy Not Working — Broadcasting Error, dtype Mismatch, and Array Shape Problems
How to fix NumPy errors — ValueError operands could not be broadcast together, setting an array element with a sequence, integer overflow, axis confusion, view vs copy bugs, NaN handling, and NumPy 1.24+ removed type aliases.
Fix: ONNX Not Working — Conversion Errors, Runtime Provider Issues, and Dynamic Shape Problems
How to fix ONNX errors — torch.onnx.export unsupported operator, ONNX Runtime CUDA provider not found, InvalidArgument input shape mismatch, dynamic axes not working, IR version mismatch, and opset version conflicts.
Fix: Optuna Not Working — Trial Pruned, Storage Errors, and Search Space Problems
How to fix Optuna errors — TrialPruned stops too early, RDB storage locked or not saving, suggest methods raise ValueError, parallel study workers deadlock, integration callbacks not reporting, and best trial not reproducible.
Fix: Pinecone Not Working — Index Creation, Serverless vs Pod, and Python SDK v3 Migration
How to fix Pinecone errors — ApiException 401 unauthorized, index not found, dimension mismatch, serverless spec required, Python SDK v3 breaking changes, namespace confusion, and upsert rate limit 429.
Fix: Polars Not Working — AttributeError, InvalidOperationError, and ShapeError
How to fix Polars errors — AttributeError groupby not found, InvalidOperationError from Python lambdas, ShapeError broadcasting mismatch, lazy vs eager collect confusion, type casting failures, and ColumnNotFoundError in with_columns.
Fix: Qdrant Not Working — Connection Errors, Collection Setup, and Filter Syntax Issues
How to fix Qdrant errors — connection refused to localhost 6333, collection not found create_collection, vector size mismatch, filter must match schema, payload index missing slow queries, and timeout on large batch uploads.
Fix: Ray Not Working — Cluster Init, Object Store Memory, and Actor Lifecycle Errors
How to fix Ray errors — ray.init connection refused, object store full ObjectStoreFullError, worker died unexpectedly, serialization PickleError for remote function, Ray Tune trials fail, Ray cluster version mismatch, and actor ReferenceError.
Fix: Seaborn Not Working — FutureWarning, FacetGrid Errors, and Figure-Level Confusion
How to fix Seaborn errors — FutureWarning use of palette without hue, figure-level vs axes-level function confusion, FacetGrid layout issues, tight_layout with seaborn, seaborn 0.13 breaking changes, and ci parameter deprecated.
Fix: scikit-learn Not Working — NotFittedError, NaN Input, Pipeline, and ConvergenceWarning
How to fix scikit-learn errors — NotFittedError call fit before predict, ValueError Input contains NaN, could not convert string to float, Pipeline ColumnTransformer mistakes, cross-validation leakage, n_jobs hanging on Windows, and ConvergenceWarning.
Fix: Streamlit Not Working — Session State, Cache, and Rerun Problems
How to fix Streamlit errors — session state KeyError state not persisting, @st.cache deprecated migrate to cache_data cache_resource, file upload resetting, slow app loading on every interaction, secrets not loading, and widget rerun loops.
Fix: TensorFlow Not Working — OOM, Shape Mismatch, GPU Not Found, and Keras Errors
How to fix TensorFlow errors — GPU not detected CUDA library missing, ResourceExhaustedError OOM, InvalidArgumentError shape mismatch, NaN loss, @tf.function AutoGraph failures, and Keras 3 breaking changes in TF 2.16+.
Fix: vLLM Not Working — CUDA OOM, Model Loading, and API Server Errors
How to fix vLLM errors — CUDA out of memory during model load, tokenizer mismatch with HuggingFace, tensor parallel size does not match GPU count, KV cache exceeds memory, OpenAI API compatibility issues, and max_model_len too large.
Fix: Weights & Biases (wandb) Not Working — Login Errors, Init Hangs, and Sync Failures
How to fix wandb errors — API key not set login failed, wandb init hangs, offline mode sync, artifact upload failure, run not showing in dashboard, image logging size limit, and sweep agent not starting.
Fix: OpenAI Whisper Not Working — FFmpeg Missing, GPU Slow, and Language Detection Errors
How to fix Whisper errors — FFmpeg not found audio file load failed, CUDA out of memory on large model, slow CPU transcription, language detected incorrectly, hallucinations on silence, faster-whisper migration, and timestamp accuracy.
Fix: XGBoost Not Working — Feature Name Mismatch, GPU Errors, and Early Stopping Changes
How to fix XGBoost errors — feature names mismatch, XGBoostError GPU training fails, use_label_encoder deprecated, eval_metric warning, early stopping moved to callback, ValueError for DMatrix, and sklearn API confusion.
Fix: Hugging Face Transformers Not Working — OSError, CUDA OOM, and Generation Errors
How to fix Hugging Face Transformers errors — OSError can't load tokenizer, gated repo access, CUDA out of memory with device_map auto, bitsandbytes not installed, tokenizer padding mismatch, pad_token_id warning, and LoRA adapter loading failures.
Fix: LangChain Python Not Working — ImportError, Pydantic, and Deprecated Classes
How to fix LangChain Python errors — ImportError from package split, Pydantic v2 compatibility, AgentExecutor deprecated, ConversationBufferMemory removed, LCEL output type mismatches, and tool calling failures.
Fix: Ollama Not Working — Connection Refused, Model Not Found, GPU Not Detected
How to fix Ollama errors — connection refused when the daemon isn't running, model not found, GPU not detected falling back to CPU, port 11434 already in use, VRAM exhausted, and API access from other machines.
Fix: OpenAI API Not Working — RateLimitError, 401, 429, and Connection Issues
How to fix OpenAI API errors — RateLimitError (429), AuthenticationError (401), APIConnectionError, context length exceeded, model not found, and SDK v0-to-v1 migration mistakes.
Fix: PyTorch Not Working — CUDA Out of Memory, Device Mismatch, and NaN Loss
How to fix PyTorch errors — CUDA out of memory, expected all tensors on same device, CUDA device-side assert triggered, torch.cuda.is_available() False, inplace gradient errors, DataLoader Windows crash, dtype mismatch, and NaN loss.
Fix: pandas SettingWithCopyWarning — A value is trying to be set on a copy
How to fix pandas SettingWithCopyWarning — understanding chained indexing, using .loc correctly, Copy-on-Write in pandas 2.x, and when the warning indicates a real bug vs a false alarm.
Fix: pandas merge() Key Error and Duplicate Columns (_x, _y)
How to fix pandas merge and join errors — KeyError on merge key, duplicate _x/_y columns, unexpected row counts, suffixes, and how to validate merge results.