Fix: py-spy Not Working — Attach Permission, Empty Output, and Native Frame Errors
Quick Answer
How to fix py-spy errors — Operation not permitted ptrace, flamegraph blank, missing native code frames, top mode shows no Python frames, dump command empty, and subprocess inheritance.
The Error
You try to attach py-spy to a running process and get permission denied:
$ py-spy dump --pid 12345
Error: Operation not permitted: ptrace failedOr the flamegraph generated is empty or just shows <idle>:
$ py-spy record -o profile.svg --pid 12345 --duration 30
# After 30 seconds, profile.svg shows almost nothingOr you see “thread state was not Python” errors:
Error: Unable to get python interpreter state, can't profileOr native (C extension) frames don’t appear:
$ py-spy record -o profile.svg python my_script.py
# Flamegraph shows numpy.zeros calls but no breakdown of what numpy does internallyOr subprocess profiling misses child processes:
$ py-spy record -o profile.svg python my_script.py
# Script spawns worker processes; py-spy only profiles the parentpy-spy is the gold-standard Python CPU profiler — Rust-based, sample-based (low overhead), can attach to already-running processes without restart. It’s the right tool for “production is slow, why?” debugging — but ptrace permission issues, sampling-vs-deterministic confusion, and native frame handling produce specific failures. This guide covers each.
Why This Happens
py-spy works by attaching to a running Python process via ptrace (Linux/macOS) or similar system calls. It samples the interpreter’s call stack at a fixed rate (default 100 Hz). No instrumentation, no code changes — just observation. Because it’s sample-based, very fast functions may not appear in profiles; only frames captured at sample points show up.
ptrace is a privileged system call on Linux. Default kernel hardening (Yama LSM, ptrace_scope=1) restricts ptrace to parent processes only — attaching to arbitrary processes requires either CAP_SYS_PTRACE capability, root, or relaxing the kernel setting.
Fix 1: Installation and Basic Use
pip install py-spy
# Or via cargo for the latest
cargo install py-spyThree main commands:
# Profile a new process — outputs flamegraph SVG
py-spy record -o profile.svg python my_script.py
# Live top view (like Linux top, but for Python)
py-spy top --pid 12345
# One-shot stack dump for all threads
py-spy dump --pid 12345Common Mistake: Running py-spy record against a one-shot script and expecting the flamegraph to show the script’s logic. py-spy samples — a script that runs in 100ms only gets a handful of samples and produces a useless flamegraph. Always profile workloads that run for at least several seconds; ideally minutes.
Profile until script exits:
py-spy record -o profile.svg -- python my_script.py
# Note the `--` before pythonProfile for fixed duration:
py-spy record -o profile.svg --duration 60 --pid 12345
# Records 60 seconds of samples from PID 12345Fix 2: Permission Denied on Attach
$ py-spy dump --pid 12345
Error: Permission denied: cannot read process memoryOr:
ptrace: Operation not permitted (os error 1)Linux blocks ptrace to non-child processes by default for security.
Quick fix — sudo:
sudo py-spy dump --pid 12345
sudo py-spy record -o profile.svg --pid 12345 --duration 30Permanent fix — relax ptrace scope:
# Check current setting
cat /proc/sys/kernel/yama/ptrace_scope
# 0 = ptrace any process
# 1 = ptrace only descendants (default on Ubuntu/Debian)
# 2 = require CAP_SYS_PTRACE
# 3 = ptrace disabled
# Temporarily allow ptrace any process
sudo sysctl kernel.yama.ptrace_scope=0
# Permanently
echo "kernel.yama.ptrace_scope = 0" | sudo tee /etc/sysctl.d/10-ptrace.conf
sudo sysctl -p /etc/sysctl.d/10-ptrace.confReduce security risk — grant just py-spy the capability:
sudo setcap cap_sys_ptrace=eip $(which py-spy)
# Now py-spy can attach to any process without sudoThis is safer than ptrace_scope=0 — only py-spy gets the elevated permission, not every process you run.
Docker containers — add the capability:
docker run --cap-add SYS_PTRACE ...Or for security policies, use a sidecar container:
services:
app:
image: myapp
py-spy:
image: python:3.12
pid: "service:app" # Share PID namespace
cap_add: [SYS_PTRACE]
command: py-spy record -o /output/profile.svg --pid 1 --duration 60Pro Tip: Set cap_sys_ptrace on the py-spy binary once via setcap. After that, profile any process without sudo or kernel config changes. The security risk is bounded to py-spy specifically, not your shell or all binaries.
Fix 3: Sampling Rate
py-spy record -o profile.svg --rate 250 --pid 12345
# Sample at 250 Hz instead of default 100Higher rates = more detail but more overhead:
| Rate | Overhead | Use case |
|---|---|---|
| 1 Hz | 0.01% | Very long observations (hours), low overhead |
| 100 Hz (default) | ~1% | General profiling |
| 500 Hz | ~5% | Short bursts, fine-grained |
| 1000 Hz | ~10% | Very fast functions |
For production profiling where overhead matters, stick to 100 Hz or less. For debugging brief operations, 500-1000 Hz catches more.
Sampling vs deterministic profilers:
- Sampling (py-spy, py-spy) — periodic snapshots, low overhead, may miss fast events
- Deterministic (cProfile) — records every call, high overhead, captures everything
py-spy’s sampling means very brief functions called rarely won’t show up. For “what’s slow in this hot loop?”, py-spy is perfect; for “did this function get called at all?”, cProfile is better.
Fix 4: Native Frames (C Extensions)
py-spy record -o profile.svg --native --pid 12345--native resolves C extension frames (NumPy, PyTorch, lxml, etc.) instead of showing them as opaque blocks.
Without --native:
my_func
numpy.dot
<native code> ← Just a black boxWith --native:
my_func
numpy.dot
__dgemm_kernel_avx2 ← Actual BLAS function
matmul_internalCommon Mistake: Profiling NumPy/PyTorch code without --native and concluding the bottleneck is “matrix multiplication” — but the real question is “which BLAS kernel” or “is GEMM blocking on memory bandwidth.” Native frames reveal the actual hot spots.
--native overhead — adds 10-50% to sampling overhead because it resolves symbols at each sample. For long-running production profiles, leave native off; for targeted hot-path investigations, turn it on.
For PyTorch profiling that benefits from native frames, see PyTorch not working.
Fix 5: Subprocess and Multi-Process Profiling
py-spy record -o profile.svg --subprocesses python my_script.py--subprocesses follows child processes spawned via os.fork, subprocess.Popen, multiprocessing. Each subprocess gets its own sample stream in the same flamegraph.
Without --subprocesses:
# my_script.py
from multiprocessing import Pool
with Pool(4) as p:
p.map(slow_function, range(1000))py-spy record -o profile.svg python my_script.py
# Profile shows only the parent process — workers invisibleWith --subprocesses:
py-spy record -o profile.svg --subprocesses python my_script.py
# Profile includes all worker processesFor containerized workloads:
py-spy record -o profile.svg --pid 12345 --subprocesses --duration 60For multiprocessing patterns that affect profiling, see Python multiprocessing not working.
Fix 6: Top Mode (Live)
py-spy top --pid 12345Live TUI like Linux top but for Python — shows which functions are using CPU right now:
%Own %Total Function (filename)
80.0% 80.0% slow_function (my_script.py)
15.0% 95.0% process_data (my_script.py)
5.0% 5.0% <built-in method>Sort modes (press the key while top is running):
| Key | Sort by |
|---|---|
1 | %Own |
2 | %Total |
3 | Function name |
4 | Time spent |
q | Quit |
top is great for “is this still happening?” queries. Attach to a running process for a few seconds, see what’s hot, detach.
Pro Tip: Use py-spy top as a quick diagnostic before deeper analysis. If top shows your hot function consistently, you have a CPU problem worth profiling further. If top shows everything is idle but the process is slow, your bottleneck is I/O — switch to other tools (strace for syscalls, iotop for disk).
Fix 7: Dump for Stuck/Hung Processes
py-spy dump --pid 12345Prints the current Python stack of every thread — instant snapshot, no sampling. Perfect for hung processes:
Thread 0x7f8b2c19c700 (active+gil): "MainThread"
fetch_data (my_module.py:42)
main (my_module.py:78)
<module> (my_script.py:5)
Thread 0x7f8b1a3fd700 (idle): "Thread-1"
wait (threading.py:312)
join (threading.py:355)
main (my_module.py:80)(active+gil) = currently executing Python; (idle) = waiting (I/O, lock, sleep).
Common Mistake: Using top on a hung process. py-spy top shows CPU usage — a deadlocked process shows 0% CPU and you learn nothing. Use dump for hangs; it shows where every thread is parked.
Profile a stuck process from a remote machine:
ssh prod-server "sudo py-spy dump --pid 12345"Output is plain text — easy to grep and share with teammates.
Fix 8: Reading Flamegraphs
py-spy generates SVG flamegraphs — interactive in any browser.
py-spy record -o profile.svg --pid 12345 --duration 60
open profile.svg # macOS — opens in default browserFlamegraph anatomy:
- X-axis (width) = total sample time at that function
- Y-axis (depth) = call stack — deeper = nested calls
- Color = arbitrary, helps distinguish adjacent frames
- Click a frame = zoom in to that subtree
- Search box = highlight matching frames
To find bottlenecks:
- Look for wide blocks at the top — these are the leaves of the stack, the actual work being done
- Look for stacks that recur — same function called from many paths is a candidate for caching
- Ignore tall narrow towers — they’re deep but not consuming much time
Reverse flamegraph (icicle graph) — root at top, leaves at bottom:
py-spy record -o profile.svg --flamegraph-direction down --pid 12345Useful when you want to start from “what’s calling X?” rather than “where does X live?”
Speedscope format for richer analysis:
py-spy record -o profile.json --format speedscope --pid 12345Then open in speedscope.app — interactive timeline, multiple visualization modes, filtering.
Still Not Working?
py-spy vs scalene vs cProfile
- py-spy — Sampling, low overhead, attach to running processes. Best for production debugging.
- scalene — Sampling + CPU + memory + GPU. Slightly more overhead than py-spy. Best when you want everything in one tool.
- cProfile — Deterministic stdlib profiler, captures every call. Best for unit-test-style profiling of specific code.
- memray — Memory profiling specifically. See memray not working.
For “production is slow,” start with py-spy. For “this test is slow,” use cProfile or pytest-benchmark.
Profiling pytest Tests
py-spy record -o test-profile.svg -- pytest tests/slow_test.pyOr use pytest-py-spy:
pip install pytest-py-spy
pytest --py-spy tests/For pytest patterns that surface in profiles, see pytest fixture not found.
Profiling Async Code
Async code is tricky to profile — many tasks share the event loop. py-spy handles asyncio reasonably:
py-spy record --pid 12345 -o asyncio-profile.svgThe flamegraph shows asyncio.events.run_forever at the top, with coroutines underneath. Look for _run_once and the coroutines it calls.
For asyncio-specific debugging patterns, see Python asyncio not running.
Profiling Production with Minimal Risk
# 30 seconds of sampling at 50 Hz on a single worker
py-spy record -o profile.svg --pid <worker-pid> --duration 30 --rate 5050 Hz with 30 second duration: ~0.5% CPU overhead, low risk of affecting production traffic. For longer observations, lower the rate further.
Pro tip for Docker: Don’t run py-spy inside the same container as the app for production profiling. Use a sidecar container that shares the PID namespace — keeps py-spy off the production hot path.
Combining with logging / metrics
Production profiling complements logs and metrics — not replaces them. Use py-spy when:
- Metrics show CPU is high
- Logs don’t reveal the cause
- You need to see what code path is responsible
For structured logging that complements profiling, see Loguru not working.
Output Formats
py-spy record -o profile.svg # Flamegraph SVG
py-spy record -o profile.json --format speedscope # Speedscope JSON
py-spy record -o profile.raw --format raw # Raw sample dataRaw format is useful for custom analysis — feed into your own scripts to compute custom metrics.
Live Profiling for FastAPI / Uvicorn
# Find the worker PID
ps aux | grep uvicorn
# Profile
py-spy top --pid <worker-pid>Watch the workers handle requests in real time. For Uvicorn worker configuration, see Uvicorn not working.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: scalene Not Working — Web UI, GPU Profiling, and AI Suggestion Errors
How to fix scalene errors — scalene command not found, web UI port conflict, no GPU detected, profile.json empty, AI optimize requires OpenAI key, native code not attributed, and Jupyter integration.
Fix: memray Not Working — Tracking Errors, Flamegraph Empty, and Native Allocations
How to fix memray errors — memray run command not found, flamegraph shows no data, native allocations not tracked, live mode TUI broken, attach to running process fails, and pytest integration.
Fix: Locust Not Working — User Class Errors, Distributed Mode, and Throughput Issues
How to fix Locust errors — no locustfile found, User class not detected, worker connection refused, distributed mode throughput lower than single-node, StopUser exception, FastHttpUser vs HttpUser, and headless CSV reports.
Fix: Python asyncio Blocking the Event Loop — Mixing Sync and Async Code
How to fix Python asyncio event loop blocking — using run_in_executor for sync calls, asyncio.to_thread, avoiding blocking I/O in coroutines, and detecting event loop stalls.