Skip to content

Fix: scalene Not Working — Web UI, GPU Profiling, and AI Suggestion Errors

FixDevs ·

Quick Answer

How to fix scalene errors — scalene command not found, web UI port conflict, no GPU detected, profile.json empty, AI optimize requires OpenAI key, native code not attributed, and Jupyter integration.

The Error

You install scalene and the command isn’t found:

$ scalene my_script.py
bash: scalene: command not found

Or the web UI fails to open:

$ scalene my_script.py
# Profile completes but web UI doesn't open in browser

Or GPU profiling shows nothing:

$ scalene --gpu my_script.py
# Profile shows CPU and memory but GPU column is all zeros

Or AI optimization suggestions don’t work:

$ scalene --cli my_script.py
# Output mentions "ask AI" but no actual suggestions appear

Or native code (NumPy, PyTorch) shows as opaque blocks:

# my_script.py
import numpy as np
arr = np.dot(big_matrix, big_matrix.T)
# Scalene flags numpy.dot as slow but doesn't show WHY

scalene is the modern combined profiler — measures CPU (Python and native separately), memory (with leak detection), GPU usage, and energy consumption in a single run. The killer feature is the AI-assisted optimization mode: GPT-4 reads the profile and suggests faster code. It’s heavier than py-spy but gives more dimensions of insight. This guide covers the common setup issues.

Why This Happens

scalene installs via pip but writes its scripts to the user-script directory, which may not be on PATH. The web UI opens automatically when scalene finishes — works locally but fails in headless environments (CI, remote SSH, Docker). GPU profiling requires NVIDIA libraries (pynvml) which aren’t installed by default and require a CUDA-enabled GPU. The AI optimization feature integrates with OpenAI’s API and needs an API key in env vars.

Fix 1: Installation and PATH

pip install scalene
# Or for the latest with GPU support
pip install "scalene[gpu]"

Verify the command is on PATH:

which scalene
# /usr/local/bin/scalene  or  ~/.local/bin/scalene  or similar

If “command not found”:

# Find where scalene was installed
python -m pip show scalene | grep Location
# Add the scripts dir to PATH
export PATH="$HOME/.local/bin:$PATH"

# Or run via Python module
python -m scalene my_script.py

Python module form works regardless of PATH:

python -m scalene my_script.py
python -m scalene --cli my_script.py
python -m scalene --html --outfile profile.html my_script.py

Common Mistake: Installing scalene globally via pip but running scripts inside a virtualenv. The venv may not have scalene installed, so scalene resolves to the global binary that profiles the global Python — not your venv’s Python. Always pip install scalene inside the venv you’re profiling.

Fix 2: Output Modes

scalene has multiple output modes for different contexts:

# Default: opens web UI in browser
scalene my_script.py

# CLI mode (terminal output, no browser)
scalene --cli my_script.py

# HTML file for sharing or CI artifacts
scalene --html --outfile profile.html my_script.py

# JSON for tooling integration
scalene --json --outfile profile.json my_script.py

# All three at once
scalene --cli --html --json --outfile profile my_script.py

CLI mode output:

                                         Memory usage: ...
                                       │ Time          %│ Memory   %│
    │   │   1 │ import numpy as np                       │
    │   │   2 │                                          │
    │ N │   3 │ data = np.random.rand(1_000_000, 100)    │   2%  ▆  │  120 MB ▆ │
    │ N │   4 │ result = data @ data.T                    │  87%  █  │   80 MB ▆ │

Column meanings:

  • Time % — % of total CPU time
  • Memory % — % of total memory allocated
  • N indicator — line uses native (C) code
  • P indicator — line uses Python
  • Black square — pure CPU; orange — GPU; etc.

Pro Tip: For CI integration, use --cli and pipe to a file:

scalene --cli my_script.py > profile.txt 2>&1

Then upload profile.txt as a CI artifact. Web UI is great for local exploration but useless in headless CI environments.

Fix 3: GPU Profiling

scalene --gpu my_script.py

Requirements:

  • NVIDIA GPU (no AMD/Intel support yet)
  • pynvml installed: pip install nvidia-ml-py or pip install "scalene[gpu]"
  • NVIDIA drivers loaded

Verify GPU access:

nvidia-smi   # Should list your GPUs
python -c "import pynvml; pynvml.nvmlInit(); print(pynvml.nvmlDeviceGetCount())"

Common Mistake: Running scalene --gpu on a machine without NVIDIA hardware. Scalene runs but reports 0% GPU usage everywhere — no error, just useless output. Verify GPU access first; only use --gpu flag when you actually have one.

For multi-GPU systems:

CUDA_VISIBLE_DEVICES=0 scalene --gpu my_script.py
# Profile only GPU 0

For PyTorch GPU profiling that goes deeper than scalene’s high-level stats, see PyTorch not working.

Fix 4: AI-Assisted Optimization

Scalene’s AI feature reads the profile and suggests optimized code:

# Set API key
export OPENAI_API_KEY=sk-...

# Run with AI suggestions enabled
scalene --cli my_script.py
# Look for "Optimize" links in the output

In the web UI, click ”🧠 Optimize” next to a hot line — Scalene sends that code (with context) to OpenAI and shows the suggested optimization.

Local LLM alternative:

# Set Azure OpenAI endpoint instead
export OPENAI_API_BASE=https://your-azure.openai.azure.com/
export OPENAI_API_KEY=your-azure-key

Or use --ai-provider:

scalene --ai-provider azure --cli my_script.py

Common Mistake: Expecting AI suggestions to work without an OpenAI key. The feature is opt-in — you must set OPENAI_API_KEY for it to function. Without the key, the “Optimize” button does nothing.

For OpenAI API integration patterns that affect AI tooling like scalene, see OpenAI API not working.

Fix 5: Profiling Specific Code (Decorators)

By default, scalene profiles everything. To focus on specific functions:

from scalene import scalene_profiler

@scalene_profiler.profile
def slow_function():
    data = [i ** 2 for i in range(1_000_000)]
    return sum(data)

slow_function()

Then run normally:

scalene --profile-only "@scalene_profiler.profile" my_script.py

Or profile only specific modules:

scalene --profile-only "my_module" my_script.py
# Only lines in my_module.py appear in the profile

Exclude noise (third-party libs that dominate the profile):

scalene --profile-exclude "site-packages" my_script.py
# Excludes all third-party code

Pro Tip: For large applications, always set --profile-only to your project’s package. Without it, the profile is dominated by framework internals (FastAPI, SQLAlchemy, asyncio) and your actual hot code is buried. Restricting the profile makes the signal much clearer.

Fix 6: Memory Profiling

scalene’s memory profiling tracks both Python objects and native allocations:

scalene --cli my_script.py
# Memory column shows allocations per line

Memory leak detection:

scalene --memory-leak-detector my_script.py
# Highlights allocations that grow without being freed

Sampling vs full tracking:

# Sampling (default) — low overhead, may miss small allocations
scalene my_script.py

# Reduce sampling overhead even more
scalene --reduced-profile my_script.py

Common Mistake: Comparing scalene’s memory numbers to top or htop. The numbers measure different things — scalene tracks allocations attributable to specific Python lines; top shows total RSS (resident set size) including caches and shared libs. Expect scalene’s numbers to be smaller — they’re allocations, not total memory in use.

For memory-specific deep dives, see memray not working — memray gives more detail for pure memory questions; scalene’s strength is combining memory with CPU.

Fix 7: Jupyter Integration

%load_ext scalene
%scalene
def slow_thing():
    data = [i ** 2 for i in range(1_000_000)]
    return sum(data)

slow_thing()

The %scalene magic profiles the cell. Output is displayed inline.

For longer cells:

%%scalene
import numpy as np

arr = np.random.rand(10_000_000)
result = arr.sum()
print(result)

Cell-level vs line-level%scalene profiles the whole cell; %%scalene is the same but cell magic syntax. Both show line-by-line breakdowns.

For Jupyter setup patterns that affect profiling, see Jupyter not working.

Fix 8: Reducing Profiling Overhead

# Default: ~5-15% overhead
scalene my_script.py

# Reduced: ~2-5% overhead, fewer samples
scalene --reduced-profile my_script.py

# Custom CPU sampling rate
scalene --cpu-sampling-rate 0.05 my_script.py
# Higher = more samples, more overhead

# Sample memory less often
scalene --allocation-sampling-window 1024 my_script.py
# Sample 1 in 1024 allocations

Profile vs measure — for measuring production performance, lower the sampling rate. For deep debugging, raise it.

Still Not Working?

scalene vs py-spy vs memray

  • scalene — All-in-one: CPU + memory + GPU + energy + AI suggestions. Best for “what’s slow AND why?”
  • py-spy — CPU only, sampling, attach to running processes. Best for production diagnostics. See py-spy not working.
  • memray — Memory only, allocation tracking. Best for finding leaks. See memray not working.

scalene is the heaviest of the three but provides the most context. Use it for “this is slow, let me understand why”; use py-spy for “this is hung, what’s blocking?”; use memray for “this is leaking, where?”

Distinguishing Python vs Native Time

scalene’s most unique feature is the ”% time in C” column — shows how much time was in native code (NumPy, PyTorch internals) vs Python:

Time % Python │ Time % C │ Line
   5%        │  85%     │ result = np.dot(a, b)
  85%        │   5%     │ result = [x * y for x in a for y in b]

The numpy line spends most time in C (good — moving to numpy was the right optimization). The pure-Python list comprehension spends most time in Python (where you’d see your code on the call stack, but the work is interpreter-bound).

Pro Tip: When optimizing, look at the Python % column. If it’s high, you have room to optimize by moving to NumPy/Cython/Rust. If C % is high, your code is already calling fast native code — further speedup needs algorithmic improvements, not language tricks.

CI Integration

# .github/workflows/profile.yml
- name: Profile benchmark
  run: |
    pip install scalene
    scalene --cli --outfile profile.txt benchmarks/main.py
- uses: actions/upload-artifact@v4
  with:
    name: profile
    path: profile.txt

Compare profile.txt across commits to spot performance regressions.

Working with Tests

scalene --cli --outfile profile.txt -- pytest tests/slow_test.py

-- separates scalene flags from the command to run. Use this whenever the profiled command has its own flags.

For pytest patterns that benefit from profiling, see pytest fixture not found.

When scalene Doesn’t Help

If scalene shows everything is fast but your app is slow:

  1. External I/O — DB queries, HTTP calls. Use APM (Datadog, Honeycomb, Sentry) or strace/tcpdump.
  2. Lock contention — GIL, asyncio event loop saturation. Use py-spy’s dump mode.
  3. GPU memory bandwidth — actual compute is fast but waiting on memory transfers. Use NVIDIA Nsight.

For broader async event loop investigation, see Python asyncio not running.

Energy Consumption Profiling

scalene can estimate energy use per code path (RAPL on supported Intel/AMD CPUs):

scalene --cli my_script.py
# Look for Energy column when available

The energy column shows joules consumed per line — useful for sustainability-conscious teams optimizing for cost or carbon. Requires Linux + supported hardware (Intel Sandy Bridge or newer, AMD Zen). On macOS and Windows, the column is omitted.

Multi-Process Profiling

scalene doesn’t have a direct equivalent to --subprocesses from py-spy, but you can profile parent and children separately:

# Parent
scalene --cli --outfile parent.txt my_script.py

# For workers, instrument them to write per-process profiles
# Use os.getpid() to differentiate output files

For multiprocessing patterns that affect profiling strategy, see Python multiprocessing not working.

Programmatic Use

from scalene import scalene_profiler

scalene_profiler.start()
# Code to profile
do_heavy_work()
scalene_profiler.stop()

Useful when you want to profile only a portion of a long-running app — start/stop around the specific section.

Comparing Profiles Across Runs

scalene doesn’t have built-in diff support, but you can compare JSON outputs:

scalene --json --outfile before.json my_script.py
# Make changes
scalene --json --outfile after.json my_script.py

# Manual diff or use a custom script to compare line-by-line metrics
python compare_profiles.py before.json after.json

Regression detection in CI follows this pattern — store a baseline JSON, run on every PR, alert if any line’s CPU% or memory regresses by more than a threshold.

Visualization Modes

In the web UI, click column headers to sort by:

  • CPU % (Python or native)
  • Memory (current or peak)
  • GPU %
  • Line execution count

Different sorts reveal different bottlenecks — sorting by memory finds allocators, by CPU finds compute hotspots, by execution count finds tight loops.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles