Fix: ONNX Not Working — Conversion Errors, Runtime Provider Issues, and Dynamic Shape Problems
Quick Answer
How to fix ONNX errors — torch.onnx.export unsupported operator, ONNX Runtime CUDA provider not found, InvalidArgument input shape mismatch, dynamic axes not working, IR version mismatch, and opset version conflicts.
The Error
You export a PyTorch model to ONNX and the converter chokes:
torch.onnx.errors.UnsupportedOperatorError: Exporting the operator 'aten::custom_op'
to ONNX opset version 17 is not supported.Or ONNX Runtime starts but ignores your GPU:
session = ort.InferenceSession("model.onnx")
print(session.get_providers())
# ['CPUExecutionProvider'] # Even though you installed onnxruntime-gpuOr inference fails with a shape mismatch:
InvalidArgument: Got invalid dimensions for input: input_ids.
Expected: {1, 512}, received: {1, 768}Or the exported model requires a fixed batch size and you need variable batch:
RuntimeError: Input 'input' got shape [3, 224, 224, 3] but expected [1, 224, 224, 3]Or the model loads but outputs garbage:
# PyTorch output: [0.87, 0.12, 0.01]
# ONNX output: [0.34, 0.33, 0.33] # Same input, wrong resultONNX is the standard interchange format for ML models — trained in PyTorch/TensorFlow, deployed with ONNX Runtime, TensorRT, or CoreML. The conversion process introduces subtle bugs: unsupported operators, wrong opset versions, incorrect dynamic axes, and precision mismatches. This guide covers each failure mode.
Why This Happens
ONNX defines a fixed set of operators per opset version. PyTorch/TensorFlow have thousands of operations — most map cleanly to ONNX, but custom ops, certain complex indexing patterns, and newer operators don’t. The exporter picks an opset and silently fails on unsupported ops, often with misleading error messages.
ONNX Runtime uses “execution providers” (CPU, CUDA, TensorRT, OpenVINO). Each provider has its own installation and runtime requirements. Installing onnxruntime-gpu doesn’t automatically enable CUDA — the provider must be listed when creating the session, and the matching CUDA toolkit version must be present.
Fix 1: Exporting a PyTorch Model to ONNX
import torch
import torch.onnx
model = MyModel()
model.eval() # Export mode — disables dropout, batchnorm updates
dummy_input = torch.randn(1, 3, 224, 224)
torch.onnx.export(
model,
dummy_input,
"model.onnx",
opset_version=17, # ONNX operator set version
do_constant_folding=True, # Optimize constants
input_names=["input"],
output_names=["output"],
dynamic_axes={
"input": {0: "batch_size"}, # Allow variable batch at dim 0
"output": {0: "batch_size"},
},
)Opset version table:
| Opset | PyTorch compatibility |
|---|---|
| 11 | Legacy; minimum for ORT 1.2+ |
| 13 | Good default for older PyTorch |
| 14-16 | Current production sweet spot |
| 17 | PyTorch 1.13+ |
| 18-20 | PyTorch 2.x |
| 21+ | PyTorch 2.3+ (newest ops) |
Use higher opsets if you need newer operators; older opsets for compatibility with legacy runtimes:
torch.onnx.export(model, dummy_input, "model.onnx", opset_version=17)For PyTorch 2.x, use the new dynamo exporter (better than the legacy exporter):
import torch
# New dynamo-based exporter (PyTorch 2.1+)
torch.onnx.dynamo_export(model, dummy_input).save("model.onnx")
# Or with options
onnx_program = torch.onnx.dynamo_export(
model,
dummy_input,
export_options=torch.onnx.ExportOptions(dynamic_shapes=True),
)Unsupported operators:
UnsupportedOperatorError: Exporting the operator 'aten::grid_sampler'
to ONNX opset version 11 is not supported.Solutions in order:
- Raise opset version — newer opsets support more operators
- Use a supported alternative — e.g., replace custom indexing with
gatherorscatter - Register a custom ONNX function:
from torch.onnx import register_custom_op_symbolic
def custom_op_export(g, *args):
return g.op("custom::MyOp", *args)
register_custom_op_symbolic("mylib::my_op", custom_op_export, opset_version=17)Common Mistake: Exporting a model still in training mode (with dropout, batchnorm in update mode). This produces incorrect ONNX output because randomness and running statistics don’t match evaluation. Always call model.eval() before export.
Fix 2: ONNX Runtime Provider Setup
import onnxruntime as ort
session = ort.InferenceSession("model.onnx")
print(session.get_providers()) # ['CPUExecutionProvider'] even with GPU installedThe provider must be explicitly requested, and the correct package must be installed.
Install the right package:
# CPU only
pip install onnxruntime
# NVIDIA GPU
pip install onnxruntime-gpu
# Can't have both in the same env
pip uninstall onnxruntime # Remove CPU before installing GPU
pip install onnxruntime-gpuSpecify providers when creating the session:
import onnxruntime as ort
providers = [
("CUDAExecutionProvider", {
"device_id": 0,
"arena_extend_strategy": "kNextPowerOfTwo",
"gpu_mem_limit": 4 * 1024 * 1024 * 1024, # 4 GB
}),
"CPUExecutionProvider", # Fallback if CUDA fails
]
session = ort.InferenceSession("model.onnx", providers=providers)
print(session.get_providers()) # ['CUDAExecutionProvider', 'CPUExecutionProvider']Available providers (require matching install):
| Provider | Install | Use case |
|---|---|---|
CPUExecutionProvider | onnxruntime | Default everywhere |
CUDAExecutionProvider | onnxruntime-gpu | NVIDIA GPUs |
TensorrtExecutionProvider | onnxruntime-gpu + TRT | NVIDIA, higher perf than CUDA |
OpenVINOExecutionProvider | onnxruntime-openvino | Intel CPUs/GPUs |
CoreMLExecutionProvider | onnxruntime on macOS | Apple Silicon |
DmlExecutionProvider | onnxruntime-directml | Windows DirectML (any GPU) |
Verify provider is actually used:
session = ort.InferenceSession("model.onnx", providers=["CUDAExecutionProvider"])
# Check provider list
print(session.get_providers())
# If it fell back to CPU silently, CUDA setup is broken
assert "CUDAExecutionProvider" in session.get_providers(), "GPU not available!"CUDA version requirements:
| onnxruntime-gpu | CUDA | cuDNN |
|---|---|---|
| 1.20 | 12.4 | 9.1 |
| 1.18 | 12.2 | 8.9 |
| 1.17 | 11.8 or 12.2 | 8.9 |
| 1.16 | 11.8 | 8.9 |
Use nvidia-smi to check your driver/CUDA version.
Pro Tip: Always include CPUExecutionProvider as a fallback in your providers list. If CUDA fails to initialize (wrong version, missing library, out of memory), the session falls back to CPU instead of crashing. Production systems should degrade gracefully when GPU resources are unavailable.
Fix 3: Dynamic Shapes and Batch Sizes
A model exported with dummy_input shape (1, 3, 224, 224) rejects any other batch size without dynamic_axes:
# WRONG — fixed shape
torch.onnx.export(model, dummy_input, "model.onnx")
# Later
session.run(None, {"input": batch_of_8_images}) # InvalidArgument: expected batch 1Fix — declare dynamic axes:
torch.onnx.export(
model,
dummy_input,
"model.onnx",
input_names=["input"],
output_names=["output"],
dynamic_axes={
"input": {0: "batch_size"}, # Batch dimension is dynamic
"output": {0: "batch_size"},
},
opset_version=17,
)Multiple dynamic dimensions:
dynamic_axes={
"input": {
0: "batch_size",
2: "height",
3: "width",
},
"output": {0: "batch_size"},
}Verify shapes in the ONNX model:
import onnx
model = onnx.load("model.onnx")
for inp in model.graph.input:
print(f"{inp.name}: {[dim.dim_value or dim.dim_param for dim in inp.type.tensor_type.shape.dim]}")
# input: ['batch_size', 3, 224, 224]If dimensions show integers (like 1), they’re fixed. If they show strings (like 'batch_size'), they’re dynamic.
Fix a model with wrong dynamic axes using onnx tools:
import onnx
from onnx.tools import update_model_dims
model = onnx.load("model.onnx")
updated_model = update_model_dims.update_inputs_outputs_dims(
model,
input_dims={"input": ["batch", 3, 224, 224]}, # Set batch dimension to dynamic
output_dims={"output": ["batch", 1000]},
)
onnx.save(updated_model, "model_dynamic.onnx")Fix 4: Input/Output Shape Mismatch at Runtime
InvalidArgument: Got invalid dimensions for input: input.
Expected: {1, 3, 224, 224}, received: {1, 224, 224, 3}This is usually a layout (channels-first vs channels-last) mismatch.
import numpy as np
# PyTorch / ONNX convention: NCHW (batch, channels, height, width)
img = np.random.rand(1, 3, 224, 224).astype(np.float32)
# TensorFlow convention: NHWC (batch, height, width, channels) — doesn't match
# Convert before feeding:
img_nchw = np.transpose(img, (0, 3, 1, 2))Check expected input types and shapes:
import onnxruntime as ort
session = ort.InferenceSession("model.onnx")
for inp in session.get_inputs():
print(f"Name: {inp.name}")
print(f"Shape: {inp.shape}")
print(f"Type: {inp.type}")
# Name: input
# Shape: ['batch_size', 3, 224, 224]
# Type: tensor(float)Type matching — tensor(float) means float32, not float64:
# WRONG
img = np.random.rand(1, 3, 224, 224) # Default float64
# CORRECT
img = np.random.rand(1, 3, 224, 224).astype(np.float32)Run inference:
outputs = session.run(
None, # None = all outputs
{"input": img}, # Dict: input_name → numpy array
)
# outputs is a list, one per output
prediction = outputs[0]Specify which outputs to compute (faster for multi-output models):
outputs = session.run(
["logits", "attention_weights"], # Only these two
{"input": img},
)Fix 5: Verifying Export Correctness
After export, always verify the ONNX model produces the same output as the original:
import torch
import onnxruntime as ort
import numpy as np
model = MyModel().eval()
dummy_input = torch.randn(1, 3, 224, 224)
# PyTorch inference
with torch.no_grad():
pytorch_output = model(dummy_input).numpy()
# ONNX inference
session = ort.InferenceSession("model.onnx")
onnx_output = session.run(None, {"input": dummy_input.numpy()})[0]
# Compare
diff = np.abs(pytorch_output - onnx_output).max()
print(f"Max diff: {diff:.6f}")
assert diff < 1e-4, "ONNX output diverges from PyTorch!"Common Mistake: Exporting and deploying without verifying. Small differences (1e-6) are fine — floating-point ops aren’t identical across frameworks. Large differences (>1e-3) indicate bugs like wrong training/eval mode, custom op miscompilation, or unsupported operator fallbacks producing wrong results silently.
Validate ONNX model structure:
import onnx
model = onnx.load("model.onnx")
onnx.checker.check_model(model) # Raises if model is malformed
# IR version, opset version
print(f"IR version: {model.ir_version}")
print(f"Producer: {model.producer_name} {model.producer_version}")
for opset in model.opset_import:
print(f"Opset: {opset.domain} v{opset.version}")Fix 6: Performance and Optimization
Graph optimization levels:
import onnxruntime as ort
sess_options = ort.SessionOptions()
sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
# Options: ORT_DISABLE_ALL, ORT_ENABLE_BASIC, ORT_ENABLE_EXTENDED, ORT_ENABLE_ALL
session = ort.InferenceSession("model.onnx", sess_options=sess_options)Thread tuning for CPU:
sess_options = ort.SessionOptions()
sess_options.intra_op_num_threads = 4 # Parallelism within a single op
sess_options.inter_op_num_threads = 1 # Parallelism across ops (keep low)
sess_options.execution_mode = ort.ExecutionMode.ORT_SEQUENTIALSave optimized model for faster future loads:
sess_options.optimized_model_filepath = "model_optimized.onnx"
session = ort.InferenceSession("model.onnx", sess_options=sess_options)
# Optimized version saved to disk; use it directly next timeQuantization for smaller models and faster CPU inference:
from onnxruntime.quantization import quantize_dynamic, QuantType
quantize_dynamic(
model_input="model.onnx",
model_output="model_int8.onnx",
weight_type=QuantType.QInt8,
)
# 4x smaller, often 2-4x faster on CPU, slight accuracy dropIO binding — avoid Python overhead for high-throughput inference:
io_binding = session.io_binding()
# Bind input (can be on GPU directly for zero-copy)
io_binding.bind_input(
name="input",
device_type="cuda",
device_id=0,
element_type=np.float32,
shape=(1, 3, 224, 224),
buffer_ptr=input_buffer_ptr,
)
# Bind output
io_binding.bind_output(name="output", device_type="cuda", device_id=0)
# Run
session.run_with_iobinding(io_binding)Fix 7: Converting from Other Frameworks
TensorFlow/Keras to ONNX with tf2onnx:
pip install tf2onnx# SavedModel format
python -m tf2onnx.convert --saved-model my_model/ --output model.onnx --opset 17
# Keras H5 format
python -m tf2onnx.convert --keras model.h5 --output model.onnx --opset 17For TensorFlow-specific issues that come up during conversion, see TensorFlow not working.
scikit-learn to ONNX with skl2onnx:
from skl2onnx import to_onnx
import numpy as np
model = train_sklearn_model()
onnx_model = to_onnx(
model,
X_train[:1].astype(np.float32),
target_opset=17,
)
with open("model.onnx", "wb") as f:
f.write(onnx_model.SerializeToString())For scikit-learn pipeline patterns that export to ONNX, see scikit-learn not working.
HuggingFace Transformers to ONNX:
pip install optimum[onnxruntime]from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer
# Export and load in one step
model = ORTModelForSequenceClassification.from_pretrained(
"distilbert-base-uncased",
export=True,
)
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
inputs = tokenizer("Hello world", return_tensors="pt")
outputs = model(**inputs)For HuggingFace-specific patterns, see HuggingFace Transformers not working.
Fix 8: Debugging Common ONNX Issues
Inspect model graph:
# Command-line tool
pip install onnx
python -c "import onnx; m = onnx.load('model.onnx'); print(onnx.helper.printable_graph(m.graph))"
# Or use Netron (visual inspection)
pip install netron
netron model.onnx # Opens browser with graph visualizationCheck for unsupported ops before export:
from torch.onnx import register_custom_op_symbolic
import torch
# Test export with dry run
try:
torch.onnx.export(model, dummy_input, "test.onnx", verbose=True)
except Exception as e:
print(f"Export failed: {e}")Trace vs script export — affects model capture fidelity:
# Tracing: runs the model once with dummy input, captures operations
# May miss conditional branches that depend on input values
torch.onnx.export(model, dummy_input, "model.onnx")
# Scripting: statically analyzes model code, handles control flow
scripted = torch.jit.script(model)
torch.onnx.export(scripted, dummy_input, "model.onnx")Use scripting when your model has if statements or loops that depend on input values.
Still Not Working?
ONNX vs TensorRT vs CoreML
- ONNX Runtime — Cross-platform, decent performance. Default for portable deployment.
- TensorRT — NVIDIA only, highest performance via fused kernels and INT8 quantization. Use for production NVIDIA inference.
- CoreML — Apple devices only, Metal-optimized, smallest deployment size on iOS.
- OpenVINO — Intel CPUs and GPUs, strong for edge/embedded deployment.
Start with ONNX Runtime for broad compatibility, then specialize if you need maximum performance on a specific platform.
Model Size and Memory
Large models (>2GB) may not load in ONNX Runtime without the external data format:
import onnx
# Save with external data — weights in separate files
onnx.save(model, "model.onnx", save_as_external_data=True, all_tensors_to_one_file=True, location="model.weights")PyTorch Model Export Issues
For PyTorch-specific problems during export (training mode, custom layers, gradient checkpointing conflicts), see PyTorch not working.
vLLM and ONNX
For LLM inference, vLLM typically outperforms ONNX Runtime for transformer models due to paged attention and continuous batching. ONNX is better for smaller non-LLM models or when you need cross-platform deployment. For vLLM setup, see vLLM not working.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: FAISS Not Working — Import Errors, Index Selection, and GPU Setup
How to fix FAISS errors — ImportError cannot import name swigfaiss, faiss-gpu vs faiss-cpu install, IndexFlatL2 slow on large data, IVF training required, index serialization write_index, and dimension mismatch.
Fix: Gradio Not Working — Share Link, Queue Timeout, and Component Errors
How to fix Gradio errors — share link not working, queue timeout, component not updating, Blocks layout mistakes, flagging permission denied, file upload size limit, and HuggingFace Spaces deployment failures.
Fix: Gunicorn Not Working — Worker Timeout, Boot Errors, and Signal Handling
How to fix Gunicorn errors — WORKER TIMEOUT killed, ImportError cannot import app, worker class not found, connection refused 502 behind nginx, graceful reload not working, and sync vs async worker selection.
Fix: Jupyter Notebook Not Working — Kernel Dead, Module Not Found, and Widget Errors
How to fix Jupyter errors — kernel fails to start or dies, ModuleNotFoundError despite pip install, matplotlib plots not showing, ipywidgets not rendering in JupyterLab, port already in use, and jupyter command not found.