Fix: scalene Not Working — Web UI, GPU Profiling, and AI Suggestion Errors
Quick Answer
How to fix scalene errors — scalene command not found, web UI port conflict, no GPU detected, profile.json empty, AI optimize requires OpenAI key, native code not attributed, and Jupyter integration.
The Error
You install scalene and the command isn’t found:
$ scalene my_script.py
bash: scalene: command not foundOr the web UI fails to open:
$ scalene my_script.py
# Profile completes but web UI doesn't open in browserOr GPU profiling shows nothing:
$ scalene --gpu my_script.py
# Profile shows CPU and memory but GPU column is all zerosOr AI optimization suggestions don’t work:
$ scalene --cli my_script.py
# Output mentions "ask AI" but no actual suggestions appearOr native code (NumPy, PyTorch) shows as opaque blocks:
# my_script.py
import numpy as np
arr = np.dot(big_matrix, big_matrix.T)
# Scalene flags numpy.dot as slow but doesn't show WHYscalene is the modern combined profiler — measures CPU (Python and native separately), memory (with leak detection), GPU usage, and energy consumption in a single run. The killer feature is the AI-assisted optimization mode: GPT-4 reads the profile and suggests faster code. It’s heavier than py-spy but gives more dimensions of insight. This guide covers the common setup issues.
Why This Happens
scalene installs via pip but writes its scripts to the user-script directory, which may not be on PATH. The web UI opens automatically when scalene finishes — works locally but fails in headless environments (CI, remote SSH, Docker). GPU profiling requires NVIDIA libraries (pynvml) which aren’t installed by default and require a CUDA-enabled GPU. The AI optimization feature integrates with OpenAI’s API and needs an API key in env vars.
Fix 1: Installation and PATH
pip install scalene
# Or for the latest with GPU support
pip install "scalene[gpu]"Verify the command is on PATH:
which scalene
# /usr/local/bin/scalene or ~/.local/bin/scalene or similarIf “command not found”:
# Find where scalene was installed
python -m pip show scalene | grep Location
# Add the scripts dir to PATH
export PATH="$HOME/.local/bin:$PATH"
# Or run via Python module
python -m scalene my_script.pyPython module form works regardless of PATH:
python -m scalene my_script.py
python -m scalene --cli my_script.py
python -m scalene --html --outfile profile.html my_script.pyCommon Mistake: Installing scalene globally via pip but running scripts inside a virtualenv. The venv may not have scalene installed, so scalene resolves to the global binary that profiles the global Python — not your venv’s Python. Always pip install scalene inside the venv you’re profiling.
Fix 2: Output Modes
scalene has multiple output modes for different contexts:
# Default: opens web UI in browser
scalene my_script.py
# CLI mode (terminal output, no browser)
scalene --cli my_script.py
# HTML file for sharing or CI artifacts
scalene --html --outfile profile.html my_script.py
# JSON for tooling integration
scalene --json --outfile profile.json my_script.py
# All three at once
scalene --cli --html --json --outfile profile my_script.pyCLI mode output:
Memory usage: ...
│ Time %│ Memory %│
│ │ 1 │ import numpy as np │
│ │ 2 │ │
│ N │ 3 │ data = np.random.rand(1_000_000, 100) │ 2% ▆ │ 120 MB ▆ │
│ N │ 4 │ result = data @ data.T │ 87% █ │ 80 MB ▆ │Column meanings:
Time %— % of total CPU timeMemory %— % of total memory allocatedNindicator — line uses native (C) codePindicator — line uses Python- Black square — pure CPU; orange — GPU; etc.
Pro Tip: For CI integration, use --cli and pipe to a file:
scalene --cli my_script.py > profile.txt 2>&1Then upload profile.txt as a CI artifact. Web UI is great for local exploration but useless in headless CI environments.
Fix 3: GPU Profiling
scalene --gpu my_script.pyRequirements:
- NVIDIA GPU (no AMD/Intel support yet)
pynvmlinstalled:pip install nvidia-ml-pyorpip install "scalene[gpu]"- NVIDIA drivers loaded
Verify GPU access:
nvidia-smi # Should list your GPUs
python -c "import pynvml; pynvml.nvmlInit(); print(pynvml.nvmlDeviceGetCount())"Common Mistake: Running scalene --gpu on a machine without NVIDIA hardware. Scalene runs but reports 0% GPU usage everywhere — no error, just useless output. Verify GPU access first; only use --gpu flag when you actually have one.
For multi-GPU systems:
CUDA_VISIBLE_DEVICES=0 scalene --gpu my_script.py
# Profile only GPU 0For PyTorch GPU profiling that goes deeper than scalene’s high-level stats, see PyTorch not working.
Fix 4: AI-Assisted Optimization
Scalene’s AI feature reads the profile and suggests optimized code:
# Set API key
export OPENAI_API_KEY=sk-...
# Run with AI suggestions enabled
scalene --cli my_script.py
# Look for "Optimize" links in the outputIn the web UI, click ”🧠 Optimize” next to a hot line — Scalene sends that code (with context) to OpenAI and shows the suggested optimization.
Local LLM alternative:
# Set Azure OpenAI endpoint instead
export OPENAI_API_BASE=https://your-azure.openai.azure.com/
export OPENAI_API_KEY=your-azure-keyOr use --ai-provider:
scalene --ai-provider azure --cli my_script.pyCommon Mistake: Expecting AI suggestions to work without an OpenAI key. The feature is opt-in — you must set OPENAI_API_KEY for it to function. Without the key, the “Optimize” button does nothing.
For OpenAI API integration patterns that affect AI tooling like scalene, see OpenAI API not working.
Fix 5: Profiling Specific Code (Decorators)
By default, scalene profiles everything. To focus on specific functions:
from scalene import scalene_profiler
@scalene_profiler.profile
def slow_function():
data = [i ** 2 for i in range(1_000_000)]
return sum(data)
slow_function()Then run normally:
scalene --profile-only "@scalene_profiler.profile" my_script.pyOr profile only specific modules:
scalene --profile-only "my_module" my_script.py
# Only lines in my_module.py appear in the profileExclude noise (third-party libs that dominate the profile):
scalene --profile-exclude "site-packages" my_script.py
# Excludes all third-party codePro Tip: For large applications, always set --profile-only to your project’s package. Without it, the profile is dominated by framework internals (FastAPI, SQLAlchemy, asyncio) and your actual hot code is buried. Restricting the profile makes the signal much clearer.
Fix 6: Memory Profiling
scalene’s memory profiling tracks both Python objects and native allocations:
scalene --cli my_script.py
# Memory column shows allocations per lineMemory leak detection:
scalene --memory-leak-detector my_script.py
# Highlights allocations that grow without being freedSampling vs full tracking:
# Sampling (default) — low overhead, may miss small allocations
scalene my_script.py
# Reduce sampling overhead even more
scalene --reduced-profile my_script.pyCommon Mistake: Comparing scalene’s memory numbers to top or htop. The numbers measure different things — scalene tracks allocations attributable to specific Python lines; top shows total RSS (resident set size) including caches and shared libs. Expect scalene’s numbers to be smaller — they’re allocations, not total memory in use.
For memory-specific deep dives, see memray not working — memray gives more detail for pure memory questions; scalene’s strength is combining memory with CPU.
Fix 7: Jupyter Integration
%load_ext scalene%scalene
def slow_thing():
data = [i ** 2 for i in range(1_000_000)]
return sum(data)
slow_thing()The %scalene magic profiles the cell. Output is displayed inline.
For longer cells:
%%scalene
import numpy as np
arr = np.random.rand(10_000_000)
result = arr.sum()
print(result)Cell-level vs line-level — %scalene profiles the whole cell; %%scalene is the same but cell magic syntax. Both show line-by-line breakdowns.
For Jupyter setup patterns that affect profiling, see Jupyter not working.
Fix 8: Reducing Profiling Overhead
# Default: ~5-15% overhead
scalene my_script.py
# Reduced: ~2-5% overhead, fewer samples
scalene --reduced-profile my_script.py
# Custom CPU sampling rate
scalene --cpu-sampling-rate 0.05 my_script.py
# Higher = more samples, more overhead
# Sample memory less often
scalene --allocation-sampling-window 1024 my_script.py
# Sample 1 in 1024 allocationsProfile vs measure — for measuring production performance, lower the sampling rate. For deep debugging, raise it.
Still Not Working?
scalene vs py-spy vs memray
- scalene — All-in-one: CPU + memory + GPU + energy + AI suggestions. Best for “what’s slow AND why?”
- py-spy — CPU only, sampling, attach to running processes. Best for production diagnostics. See py-spy not working.
- memray — Memory only, allocation tracking. Best for finding leaks. See memray not working.
scalene is the heaviest of the three but provides the most context. Use it for “this is slow, let me understand why”; use py-spy for “this is hung, what’s blocking?”; use memray for “this is leaking, where?”
Distinguishing Python vs Native Time
scalene’s most unique feature is the ”% time in C” column — shows how much time was in native code (NumPy, PyTorch internals) vs Python:
Time % Python │ Time % C │ Line
5% │ 85% │ result = np.dot(a, b)
85% │ 5% │ result = [x * y for x in a for y in b]The numpy line spends most time in C (good — moving to numpy was the right optimization). The pure-Python list comprehension spends most time in Python (where you’d see your code on the call stack, but the work is interpreter-bound).
Pro Tip: When optimizing, look at the Python % column. If it’s high, you have room to optimize by moving to NumPy/Cython/Rust. If C % is high, your code is already calling fast native code — further speedup needs algorithmic improvements, not language tricks.
CI Integration
# .github/workflows/profile.yml
- name: Profile benchmark
run: |
pip install scalene
scalene --cli --outfile profile.txt benchmarks/main.py
- uses: actions/upload-artifact@v4
with:
name: profile
path: profile.txtCompare profile.txt across commits to spot performance regressions.
Working with Tests
scalene --cli --outfile profile.txt -- pytest tests/slow_test.py-- separates scalene flags from the command to run. Use this whenever the profiled command has its own flags.
For pytest patterns that benefit from profiling, see pytest fixture not found.
When scalene Doesn’t Help
If scalene shows everything is fast but your app is slow:
- External I/O — DB queries, HTTP calls. Use APM (Datadog, Honeycomb, Sentry) or
strace/tcpdump. - Lock contention — GIL, asyncio event loop saturation. Use py-spy’s
dumpmode. - GPU memory bandwidth — actual compute is fast but waiting on memory transfers. Use NVIDIA Nsight.
For broader async event loop investigation, see Python asyncio not running.
Energy Consumption Profiling
scalene can estimate energy use per code path (RAPL on supported Intel/AMD CPUs):
scalene --cli my_script.py
# Look for Energy column when availableThe energy column shows joules consumed per line — useful for sustainability-conscious teams optimizing for cost or carbon. Requires Linux + supported hardware (Intel Sandy Bridge or newer, AMD Zen). On macOS and Windows, the column is omitted.
Multi-Process Profiling
scalene doesn’t have a direct equivalent to --subprocesses from py-spy, but you can profile parent and children separately:
# Parent
scalene --cli --outfile parent.txt my_script.py
# For workers, instrument them to write per-process profiles
# Use os.getpid() to differentiate output filesFor multiprocessing patterns that affect profiling strategy, see Python multiprocessing not working.
Programmatic Use
from scalene import scalene_profiler
scalene_profiler.start()
# Code to profile
do_heavy_work()
scalene_profiler.stop()Useful when you want to profile only a portion of a long-running app — start/stop around the specific section.
Comparing Profiles Across Runs
scalene doesn’t have built-in diff support, but you can compare JSON outputs:
scalene --json --outfile before.json my_script.py
# Make changes
scalene --json --outfile after.json my_script.py
# Manual diff or use a custom script to compare line-by-line metrics
python compare_profiles.py before.json after.jsonRegression detection in CI follows this pattern — store a baseline JSON, run on every PR, alert if any line’s CPU% or memory regresses by more than a threshold.
Visualization Modes
In the web UI, click column headers to sort by:
- CPU % (Python or native)
- Memory (current or peak)
- GPU %
- Line execution count
Different sorts reveal different bottlenecks — sorting by memory finds allocators, by CPU finds compute hotspots, by execution count finds tight loops.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: py-spy Not Working — Attach Permission, Empty Output, and Native Frame Errors
How to fix py-spy errors — Operation not permitted ptrace, flamegraph blank, missing native code frames, top mode shows no Python frames, dump command empty, and subprocess inheritance.
Fix: memray Not Working — Tracking Errors, Flamegraph Empty, and Native Allocations
How to fix memray errors — memray run command not found, flamegraph shows no data, native allocations not tracked, live mode TUI broken, attach to running process fails, and pytest integration.
Fix: Locust Not Working — User Class Errors, Distributed Mode, and Throughput Issues
How to fix Locust errors — no locustfile found, User class not detected, worker connection refused, distributed mode throughput lower than single-node, StopUser exception, FastHttpUser vs HttpUser, and headless CSV reports.
Fix: TensorFlow Not Working — OOM, Shape Mismatch, GPU Not Found, and Keras Errors
How to fix TensorFlow errors — GPU not detected CUDA library missing, ResourceExhaustedError OOM, InvalidArgumentError shape mismatch, NaN loss, @tf.function AutoGraph failures, and Keras 3 breaking changes in TF 2.16+.