Skip to content

Fix: ONNX Not Working — Conversion Errors, Runtime Provider Issues, and Dynamic Shape Problems

FixDevs ·

Quick Answer

How to fix ONNX errors — torch.onnx.export unsupported operator, ONNX Runtime CUDA provider not found, InvalidArgument input shape mismatch, dynamic axes not working, IR version mismatch, and opset version conflicts.

The Error

You export a PyTorch model to ONNX and the converter chokes:

torch.onnx.errors.UnsupportedOperatorError: Exporting the operator 'aten::custom_op'
to ONNX opset version 17 is not supported.

Or ONNX Runtime starts but ignores your GPU:

session = ort.InferenceSession("model.onnx")
print(session.get_providers())
# ['CPUExecutionProvider']   # Even though you installed onnxruntime-gpu

Or inference fails with a shape mismatch:

InvalidArgument: Got invalid dimensions for input: input_ids.
Expected: {1, 512}, received: {1, 768}

Or the exported model requires a fixed batch size and you need variable batch:

RuntimeError: Input 'input' got shape [3, 224, 224, 3] but expected [1, 224, 224, 3]

Or the model loads but outputs garbage:

# PyTorch output: [0.87, 0.12, 0.01]
# ONNX output:    [0.34, 0.33, 0.33]   # Same input, wrong result

ONNX is the standard interchange format for ML models — trained in PyTorch/TensorFlow, deployed with ONNX Runtime, TensorRT, or CoreML. The conversion process introduces subtle bugs: unsupported operators, wrong opset versions, incorrect dynamic axes, and precision mismatches. This guide covers each failure mode.

Why This Happens

ONNX defines a fixed set of operators per opset version. PyTorch/TensorFlow have thousands of operations — most map cleanly to ONNX, but custom ops, certain complex indexing patterns, and newer operators don’t. The exporter picks an opset and silently fails on unsupported ops, often with misleading error messages.

ONNX Runtime uses “execution providers” (CPU, CUDA, TensorRT, OpenVINO). Each provider has its own installation and runtime requirements. Installing onnxruntime-gpu doesn’t automatically enable CUDA — the provider must be listed when creating the session, and the matching CUDA toolkit version must be present.

Fix 1: Exporting a PyTorch Model to ONNX

import torch
import torch.onnx

model = MyModel()
model.eval()   # Export mode — disables dropout, batchnorm updates

dummy_input = torch.randn(1, 3, 224, 224)

torch.onnx.export(
    model,
    dummy_input,
    "model.onnx",
    opset_version=17,              # ONNX operator set version
    do_constant_folding=True,      # Optimize constants
    input_names=["input"],
    output_names=["output"],
    dynamic_axes={
        "input": {0: "batch_size"},   # Allow variable batch at dim 0
        "output": {0: "batch_size"},
    },
)

Opset version table:

OpsetPyTorch compatibility
11Legacy; minimum for ORT 1.2+
13Good default for older PyTorch
14-16Current production sweet spot
17PyTorch 1.13+
18-20PyTorch 2.x
21+PyTorch 2.3+ (newest ops)

Use higher opsets if you need newer operators; older opsets for compatibility with legacy runtimes:

torch.onnx.export(model, dummy_input, "model.onnx", opset_version=17)

For PyTorch 2.x, use the new dynamo exporter (better than the legacy exporter):

import torch

# New dynamo-based exporter (PyTorch 2.1+)
torch.onnx.dynamo_export(model, dummy_input).save("model.onnx")

# Or with options
onnx_program = torch.onnx.dynamo_export(
    model,
    dummy_input,
    export_options=torch.onnx.ExportOptions(dynamic_shapes=True),
)

Unsupported operators:

UnsupportedOperatorError: Exporting the operator 'aten::grid_sampler'
to ONNX opset version 11 is not supported.

Solutions in order:

  1. Raise opset version — newer opsets support more operators
  2. Use a supported alternative — e.g., replace custom indexing with gather or scatter
  3. Register a custom ONNX function:
from torch.onnx import register_custom_op_symbolic

def custom_op_export(g, *args):
    return g.op("custom::MyOp", *args)

register_custom_op_symbolic("mylib::my_op", custom_op_export, opset_version=17)

Common Mistake: Exporting a model still in training mode (with dropout, batchnorm in update mode). This produces incorrect ONNX output because randomness and running statistics don’t match evaluation. Always call model.eval() before export.

Fix 2: ONNX Runtime Provider Setup

import onnxruntime as ort

session = ort.InferenceSession("model.onnx")
print(session.get_providers())   # ['CPUExecutionProvider'] even with GPU installed

The provider must be explicitly requested, and the correct package must be installed.

Install the right package:

# CPU only
pip install onnxruntime

# NVIDIA GPU
pip install onnxruntime-gpu

# Can't have both in the same env
pip uninstall onnxruntime   # Remove CPU before installing GPU
pip install onnxruntime-gpu

Specify providers when creating the session:

import onnxruntime as ort

providers = [
    ("CUDAExecutionProvider", {
        "device_id": 0,
        "arena_extend_strategy": "kNextPowerOfTwo",
        "gpu_mem_limit": 4 * 1024 * 1024 * 1024,   # 4 GB
    }),
    "CPUExecutionProvider",   # Fallback if CUDA fails
]

session = ort.InferenceSession("model.onnx", providers=providers)
print(session.get_providers())   # ['CUDAExecutionProvider', 'CPUExecutionProvider']

Available providers (require matching install):

ProviderInstallUse case
CPUExecutionProvideronnxruntimeDefault everywhere
CUDAExecutionProvideronnxruntime-gpuNVIDIA GPUs
TensorrtExecutionProvideronnxruntime-gpu + TRTNVIDIA, higher perf than CUDA
OpenVINOExecutionProvideronnxruntime-openvinoIntel CPUs/GPUs
CoreMLExecutionProvideronnxruntime on macOSApple Silicon
DmlExecutionProvideronnxruntime-directmlWindows DirectML (any GPU)

Verify provider is actually used:

session = ort.InferenceSession("model.onnx", providers=["CUDAExecutionProvider"])

# Check provider list
print(session.get_providers())

# If it fell back to CPU silently, CUDA setup is broken
assert "CUDAExecutionProvider" in session.get_providers(), "GPU not available!"

CUDA version requirements:

onnxruntime-gpuCUDAcuDNN
1.2012.49.1
1.1812.28.9
1.1711.8 or 12.28.9
1.1611.88.9

Use nvidia-smi to check your driver/CUDA version.

Pro Tip: Always include CPUExecutionProvider as a fallback in your providers list. If CUDA fails to initialize (wrong version, missing library, out of memory), the session falls back to CPU instead of crashing. Production systems should degrade gracefully when GPU resources are unavailable.

Fix 3: Dynamic Shapes and Batch Sizes

A model exported with dummy_input shape (1, 3, 224, 224) rejects any other batch size without dynamic_axes:

# WRONG — fixed shape
torch.onnx.export(model, dummy_input, "model.onnx")

# Later
session.run(None, {"input": batch_of_8_images})   # InvalidArgument: expected batch 1

Fix — declare dynamic axes:

torch.onnx.export(
    model,
    dummy_input,
    "model.onnx",
    input_names=["input"],
    output_names=["output"],
    dynamic_axes={
        "input": {0: "batch_size"},           # Batch dimension is dynamic
        "output": {0: "batch_size"},
    },
    opset_version=17,
)

Multiple dynamic dimensions:

dynamic_axes={
    "input": {
        0: "batch_size",
        2: "height",
        3: "width",
    },
    "output": {0: "batch_size"},
}

Verify shapes in the ONNX model:

import onnx

model = onnx.load("model.onnx")
for inp in model.graph.input:
    print(f"{inp.name}: {[dim.dim_value or dim.dim_param for dim in inp.type.tensor_type.shape.dim]}")
# input: ['batch_size', 3, 224, 224]

If dimensions show integers (like 1), they’re fixed. If they show strings (like 'batch_size'), they’re dynamic.

Fix a model with wrong dynamic axes using onnx tools:

import onnx
from onnx.tools import update_model_dims

model = onnx.load("model.onnx")
updated_model = update_model_dims.update_inputs_outputs_dims(
    model,
    input_dims={"input": ["batch", 3, 224, 224]},   # Set batch dimension to dynamic
    output_dims={"output": ["batch", 1000]},
)
onnx.save(updated_model, "model_dynamic.onnx")

Fix 4: Input/Output Shape Mismatch at Runtime

InvalidArgument: Got invalid dimensions for input: input.
Expected: {1, 3, 224, 224}, received: {1, 224, 224, 3}

This is usually a layout (channels-first vs channels-last) mismatch.

import numpy as np

# PyTorch / ONNX convention: NCHW (batch, channels, height, width)
img = np.random.rand(1, 3, 224, 224).astype(np.float32)

# TensorFlow convention: NHWC (batch, height, width, channels) — doesn't match
# Convert before feeding:
img_nchw = np.transpose(img, (0, 3, 1, 2))

Check expected input types and shapes:

import onnxruntime as ort

session = ort.InferenceSession("model.onnx")

for inp in session.get_inputs():
    print(f"Name: {inp.name}")
    print(f"Shape: {inp.shape}")
    print(f"Type: {inp.type}")

# Name: input
# Shape: ['batch_size', 3, 224, 224]
# Type: tensor(float)

Type matchingtensor(float) means float32, not float64:

# WRONG
img = np.random.rand(1, 3, 224, 224)   # Default float64

# CORRECT
img = np.random.rand(1, 3, 224, 224).astype(np.float32)

Run inference:

outputs = session.run(
    None,                          # None = all outputs
    {"input": img},                # Dict: input_name → numpy array
)

# outputs is a list, one per output
prediction = outputs[0]

Specify which outputs to compute (faster for multi-output models):

outputs = session.run(
    ["logits", "attention_weights"],   # Only these two
    {"input": img},
)

Fix 5: Verifying Export Correctness

After export, always verify the ONNX model produces the same output as the original:

import torch
import onnxruntime as ort
import numpy as np

model = MyModel().eval()
dummy_input = torch.randn(1, 3, 224, 224)

# PyTorch inference
with torch.no_grad():
    pytorch_output = model(dummy_input).numpy()

# ONNX inference
session = ort.InferenceSession("model.onnx")
onnx_output = session.run(None, {"input": dummy_input.numpy()})[0]

# Compare
diff = np.abs(pytorch_output - onnx_output).max()
print(f"Max diff: {diff:.6f}")
assert diff < 1e-4, "ONNX output diverges from PyTorch!"

Common Mistake: Exporting and deploying without verifying. Small differences (1e-6) are fine — floating-point ops aren’t identical across frameworks. Large differences (>1e-3) indicate bugs like wrong training/eval mode, custom op miscompilation, or unsupported operator fallbacks producing wrong results silently.

Validate ONNX model structure:

import onnx

model = onnx.load("model.onnx")
onnx.checker.check_model(model)   # Raises if model is malformed

# IR version, opset version
print(f"IR version: {model.ir_version}")
print(f"Producer: {model.producer_name} {model.producer_version}")
for opset in model.opset_import:
    print(f"Opset: {opset.domain} v{opset.version}")

Fix 6: Performance and Optimization

Graph optimization levels:

import onnxruntime as ort

sess_options = ort.SessionOptions()
sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
# Options: ORT_DISABLE_ALL, ORT_ENABLE_BASIC, ORT_ENABLE_EXTENDED, ORT_ENABLE_ALL

session = ort.InferenceSession("model.onnx", sess_options=sess_options)

Thread tuning for CPU:

sess_options = ort.SessionOptions()
sess_options.intra_op_num_threads = 4   # Parallelism within a single op
sess_options.inter_op_num_threads = 1   # Parallelism across ops (keep low)
sess_options.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL

Save optimized model for faster future loads:

sess_options.optimized_model_filepath = "model_optimized.onnx"
session = ort.InferenceSession("model.onnx", sess_options=sess_options)
# Optimized version saved to disk; use it directly next time

Quantization for smaller models and faster CPU inference:

from onnxruntime.quantization import quantize_dynamic, QuantType

quantize_dynamic(
    model_input="model.onnx",
    model_output="model_int8.onnx",
    weight_type=QuantType.QInt8,
)

# 4x smaller, often 2-4x faster on CPU, slight accuracy drop

IO binding — avoid Python overhead for high-throughput inference:

io_binding = session.io_binding()

# Bind input (can be on GPU directly for zero-copy)
io_binding.bind_input(
    name="input",
    device_type="cuda",
    device_id=0,
    element_type=np.float32,
    shape=(1, 3, 224, 224),
    buffer_ptr=input_buffer_ptr,
)

# Bind output
io_binding.bind_output(name="output", device_type="cuda", device_id=0)

# Run
session.run_with_iobinding(io_binding)

Fix 7: Converting from Other Frameworks

TensorFlow/Keras to ONNX with tf2onnx:

pip install tf2onnx
# SavedModel format
python -m tf2onnx.convert --saved-model my_model/ --output model.onnx --opset 17

# Keras H5 format
python -m tf2onnx.convert --keras model.h5 --output model.onnx --opset 17

For TensorFlow-specific issues that come up during conversion, see TensorFlow not working.

scikit-learn to ONNX with skl2onnx:

from skl2onnx import to_onnx
import numpy as np

model = train_sklearn_model()
onnx_model = to_onnx(
    model,
    X_train[:1].astype(np.float32),
    target_opset=17,
)
with open("model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

For scikit-learn pipeline patterns that export to ONNX, see scikit-learn not working.

HuggingFace Transformers to ONNX:

pip install optimum[onnxruntime]
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer

# Export and load in one step
model = ORTModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    export=True,
)
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

inputs = tokenizer("Hello world", return_tensors="pt")
outputs = model(**inputs)

For HuggingFace-specific patterns, see HuggingFace Transformers not working.

Fix 8: Debugging Common ONNX Issues

Inspect model graph:

# Command-line tool
pip install onnx
python -c "import onnx; m = onnx.load('model.onnx'); print(onnx.helper.printable_graph(m.graph))"

# Or use Netron (visual inspection)
pip install netron
netron model.onnx   # Opens browser with graph visualization

Check for unsupported ops before export:

from torch.onnx import register_custom_op_symbolic
import torch

# Test export with dry run
try:
    torch.onnx.export(model, dummy_input, "test.onnx", verbose=True)
except Exception as e:
    print(f"Export failed: {e}")

Trace vs script export — affects model capture fidelity:

# Tracing: runs the model once with dummy input, captures operations
# May miss conditional branches that depend on input values
torch.onnx.export(model, dummy_input, "model.onnx")

# Scripting: statically analyzes model code, handles control flow
scripted = torch.jit.script(model)
torch.onnx.export(scripted, dummy_input, "model.onnx")

Use scripting when your model has if statements or loops that depend on input values.

Still Not Working?

ONNX vs TensorRT vs CoreML

  • ONNX Runtime — Cross-platform, decent performance. Default for portable deployment.
  • TensorRT — NVIDIA only, highest performance via fused kernels and INT8 quantization. Use for production NVIDIA inference.
  • CoreML — Apple devices only, Metal-optimized, smallest deployment size on iOS.
  • OpenVINO — Intel CPUs and GPUs, strong for edge/embedded deployment.

Start with ONNX Runtime for broad compatibility, then specialize if you need maximum performance on a specific platform.

Model Size and Memory

Large models (>2GB) may not load in ONNX Runtime without the external data format:

import onnx

# Save with external data — weights in separate files
onnx.save(model, "model.onnx", save_as_external_data=True, all_tensors_to_one_file=True, location="model.weights")

PyTorch Model Export Issues

For PyTorch-specific problems during export (training mode, custom layers, gradient checkpointing conflicts), see PyTorch not working.

vLLM and ONNX

For LLM inference, vLLM typically outperforms ONNX Runtime for transformer models due to paged attention and continuous batching. ONNX is better for smaller non-LLM models or when you need cross-platform deployment. For vLLM setup, see vLLM not working.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles