Fix: memray Not Working — Tracking Errors, Flamegraph Empty, and Native Allocations
Quick Answer
How to fix memray errors — memray run command not found, flamegraph shows no data, native allocations not tracked, live mode TUI broken, attach to running process fails, and pytest integration.
The Error
You install memray and run a script — get a binary file but can’t read it:
$ pip install memray
$ memray run my_script.py
# Output: memray-my_script.py.12345.bin
$ cat memray-my_script.py.12345.bin
# Binary garbage — how do I read this?Or the generated flamegraph is empty:
$ memray flamegraph memray-my_script.py.12345.bin
# Opens HTML — but the flamegraph is just one tiny block at the topOr native allocations from C extensions aren’t tracked:
import numpy as np
arr = np.zeros(1_000_000_000) # Allocates ~8 GB
# memray report shows ~0 MB used — native alloc not trackedOr live mode TUI doesn’t work in your terminal:
$ memray run --live my_script.py
# Terminal goes blank, no display, or weird charactersOr attaching to a running process fails:
$ memray attach 12345
# Error: cannot attach — ptrace permissionsmemray is the heavyweight Python memory profiler — written by Bloomberg, tracks every allocation (Python and native), supports live monitoring of running processes, and generates flamegraphs. The Bloomberg engineering quality shows: the tooling is excellent. But the default workflow has a “track first, view later” pattern that confuses developers used to live profilers, and native allocation tracking requires explicit opt-in. This guide covers each.
Why This Happens
memray records allocations to a binary file during the program run. The file contains the call stacks and sizes for every alloc — converting it to a human-readable view (flamegraph, summary, tree) happens as a separate memray <command> step. New users expect a “run and see results” workflow like py-spy; memray’s “run, then analyze” model takes adjustment.
Native allocations (C/Rust extensions like NumPy, PyTorch) bypass Python’s tracemalloc and aren’t tracked by default. memray can trace them via libc hooks, but you must enable --native explicitly.
Fix 1: Basic Recording and Viewing
# Record allocations
memray run my_script.py
# Generates: memray-my_script.py.<pid>.bin
# Quick summary
memray summary memray-my_script.py.12345.bin
# Flamegraph (HTML)
memray flamegraph memray-my_script.py.12345.bin
# Opens memray-flamegraph-my_script.py.12345.html
# Allocation tree
memray tree memray-my_script.py.12345.bin
# Stats
memray stats memray-my_script.py.12345.binThe 3-step workflow:
- Run with
memray run - Open the .bin file with a viewer command
- Browse the report
Specify output file:
memray run -o my_profile.bin my_script.py
memray flamegraph my_profile.binCommon Mistake: Looking for live output during memray run. The recording mode runs silently — no progress bar, no in-terminal stats, just generates the binary file. For live monitoring, use --live (covered below).
Profile a module/script with args:
memray run -m my_package.main --arg1 value1
memray run my_script.py arg1 arg2Fix 2: Native Allocations
import numpy as np
import torch
arr = np.zeros(100_000_000) # 800 MB native alloc
tensor = torch.zeros(50_000_000) # 200 MB native allocWithout --native, memray only sees Python’s allocator — these large native allocs are invisible.
Enable native tracking:
memray run --native my_script.py
memray flamegraph --native memray-my_script.py.12345.binNative tracking intercepts malloc/free via libc hooks. This catches:
- NumPy / SciPy array allocations
- PyTorch tensor allocations
- pandas DataFrame internal buffers
- Anything any C/Rust extension allocates via standard libc
Overhead — native tracking adds 2-5x slowdown vs Python-only profiling. Worth it when debugging C-extension memory; skip it for pure-Python profiling.
Pro Tip: For ML / data science workloads (PyTorch, TensorFlow, pandas, NumPy), always use --native. Without it, you’d see your Python code allocating dicts and lists but completely miss the multi-GB tensor allocations dominating actual memory use. The slowdown is acceptable for debugging sessions.
Fix 3: Empty Flamegraph
If memray flamegraph shows just one tiny block, your script either ran too briefly or didn’t allocate significantly.
Force longer profiling:
# my_script.py
def actually_do_work():
data = [i ** 2 for i in range(10_000_000)]
return sum(data)
actually_do_work()
# Add more work if needed — a microsecond-long script has no allocations to trackUse --leaks mode to focus on leaked allocations:
memray flamegraph --leaks memray-script.binThis shows only allocations that weren’t freed by program end — focuses the flamegraph on actual leaks.
Use --temporary-allocation-threshold for short-lived allocs:
memray run --trace-python-allocators --temporary-allocation-threshold 1024 my_script.pyThis separately tracks allocations that are quickly freed — useful for finding code paths that thrash the allocator.
Common Mistake: Profiling a short script (< 100ms) and concluding memray is broken. memray’s overhead per allocation is meaningful — very brief scripts may have so few allocations they don’t make for a meaningful flamegraph. Add more work, or profile a longer test/workload.
Fix 4: Live Mode TUI
memray run --live my_script.pyLive mode opens a terminal UI showing allocations in real time as the script runs. Useful for long-running scripts.
Live mode controls:
| Key | Action |
|---|---|
t | Switch between Total/Own memory views |
← → | Navigate sort columns |
s | Toggle ordering |
q | Quit |
TUI doesn’t render properly — usually a terminal compatibility issue:
# Try different terminal types
TERM=xterm-256color memray run --live my_script.py
TERM=screen memray run --live my_script.pyOr run live mode in a separate process:
# Terminal 1
memray run --live-remote -p 9000 my_script.py
# Terminal 2
memray live 9000--live-remote opens a socket on the specified port; memray live connects from anywhere (including over SSH).
Fix 5: Attach to Running Process
# Find PID
ps aux | grep python
# Attach
memray attach 12345Required permissions:
On Linux, attaching needs ptrace permission:
# Either run as root
sudo memray attach 12345
# Or enable ptrace for unprivileged processes
sudo sysctl kernel.yama.ptrace_scope=0
# Or per-process: launch with PR_SET_DUMPABLEDetach with:
memray attach --stop 12345
# Or send SIGUSR1 to the process
kill -USR1 12345Common Mistake: Attaching to a process and getting “ptrace permission denied” without realizing it’s a kernel security setting. The kernel.yama.ptrace_scope default of 1 only allows ptrace for parent processes (and children). For arbitrary processes, set it to 0 (less secure) or use sudo.
Attach + live mode:
memray attach --live 12345Combines attach with the live TUI — peek into a running production-ish process’s memory pattern.
Fix 6: pytest Integration
pip install pytest-memray# test_my_code.py
import pytest
@pytest.mark.limit_memory("100 MB")
def test_memory_use():
# Test fails if it allocates > 100 MB
data = [i ** 2 for i in range(1_000_000)]
assert sum(data) > 0
@pytest.mark.limit_leaks("1 KB")
def test_no_leaks():
# Test fails if any allocation isn't freed
result = compute_something()
assert resultpytest --memray # Enable memray for all tests, prints summary
pytest --memray test_my_code.py::test_memory_use # Profile one testCommon Mistake: Setting overly tight limits like limit_memory("10 MB") on tests that legitimately need more. The test fails not because of a bug but because the limit was unrealistic. Profile the test first to know its actual memory baseline, then add 50% headroom for the limit.
For pytest fixture patterns that work with memray, see pytest fixture not found.
Fix 7: CI Integration and Regression Detection
# .github/workflows/memory.yml
name: Memory Regression Check
on: [push, pull_request]
jobs:
memory:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install memray pytest pytest-memray
- run: pytest --memray --memray-bin-path=memray-reports
- uses: actions/upload-artifact@v4
with:
name: memray-reports
path: memray-reports/Compare two profiles to detect regressions:
memray compare baseline.bin new.bin
# Shows allocations that increased between runsFor continuous monitoring in production-like environments, periodic profiling jobs catch slow memory creep before it hits prod.
Fix 8: Reading the Flamegraph
The flamegraph’s columns/rows mean different things than you might think:
- X-axis (width) = total allocated memory at that call site
- Y-axis (depth) = call stack — deeper = more nested
- Color = arbitrary, distinguishes adjacent frames
To find leaks:
- Generate
--leaksflamegraph - Look for wide blocks deep in the stack
- Wide block = lots of memory allocated, never freed
To find hot allocators:
memray summary memray-script.binShows top N allocators by total bytes. Often surprising — Pydantic validation, JSON serialization, and pandas DataFrame construction commonly dominate.
Pro Tip: memray’s tree mode is often more useful than the flamegraph for digging into allocations:
memray tree memray-script.binIt’s a navigable tree where you can drill down into call paths. Click a function to see what it allocated. For tracking down a specific leak source, tree is faster than scanning a flamegraph.
Still Not Working?
memray vs py-spy vs cProfile
- memray — Memory profiling. Best for finding leaks and high allocators.
- py-spy — CPU profiling. Sample-based, low overhead, attach without restart. Best for understanding where time goes.
- cProfile — Stdlib CPU profiler. Higher overhead but built-in.
- scalene — Memory + CPU + GPU. Newer, full-featured.
For memory-specific debugging, memray wins. For CPU + memory combined, scalene is worth a look.
Tracking Python Allocators Only
memray run --trace-python-allocators my_script.pyTracks each call to Python’s memory allocator (pymalloc) separately. Useful for understanding small-object churn that doesn’t show up in regular allocation tracking.
Profiling Multi-Process Applications
memray run --follow-fork my_script.py
# Tracks child processes spawned via fork()Each child gets its own .bin file. Multiprocessing apps (Celery, Gunicorn workers) need this flag to see their workers’ allocations.
For multiprocessing patterns that interact with memory profiling, see Python multiprocessing not working.
Large .bin Files
For long-running profiles, the .bin can be gigabytes:
memray run --aggregate my_script.py--aggregate records aggregated stats instead of every allocation — much smaller file, less detail.
Profiling Tests / FastAPI / Django
memray run --aggregate -o profile.bin -- pytest tests/
memray flamegraph profile.bin
# FastAPI request handler
memray run --aggregate -o profile.bin -- uvicorn app:app
# Then send requests; press Ctrl+C; analyzeFor FastAPI patterns where memory profiling matters (large response payloads, file uploads), see FastAPI dependency injection error.
Combining with Loguru / Structlog Logging
For long-running services, periodic memory snapshots via logging:
import resource
def log_memory():
usage = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
# On Linux, ru_maxrss is in KB; on macOS, in bytes
print(f"Peak memory: {usage / 1024:.0f} MB")For Loguru-based logging patterns, see Loguru not working.
Inline Python Use (Context Manager)
For profiling a specific block of code without external CLI:
import memray
with memray.Tracker("output.bin"):
# Code in here is profiled
data = [i ** 2 for i in range(1_000_000)]
process(data)
# Tracking stops at context exit
# Then analyze:
# memray flamegraph output.binThis pattern is useful for profiling specific functions inside a larger app without restarting under memray run.
Tracking Stack Depth
Default stack depth is 50 frames — may not show enough for deeply nested code:
memray run --max-stack-depth 100 my_script.pyHigher depth gives more context but produces larger .bin files. 100 is enough for most apps; raise to 200+ for deep recursion or complex frameworks.
Custom Memory Allocators (PyTorch, JAX)
PyTorch’s CUDA allocator and JAX’s allocators are outside libc’s malloc — memray’s --native doesn’t catch them. For GPU memory:
# PyTorch
import torch
print(torch.cuda.memory_summary()) # PyTorch's own memory report
# JAX
import jax
print(jax.devices()[0].memory_stats())For PyTorch GPU memory issues, the built-in torch.cuda.memory_summary() is the right tool — memray only sees CPU memory.
When to Reach for memray vs Alternatives
- Memory leak in long-running service — memray with
--leaksmode - High memory at peak — memray full profile, look for largest allocators
- OOM kill in CI — memray with
--aggregateto keep file size small - Native extension suspected — memray with
--native - General “is my code slow” question — py-spy first; memory profiling is secondary
For PyTorch-specific memory debugging, see PyTorch not working. For NumPy/Pandas patterns that often dominate memory, see NumPy not working.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: scalene Not Working — Web UI, GPU Profiling, and AI Suggestion Errors
How to fix scalene errors — scalene command not found, web UI port conflict, no GPU detected, profile.json empty, AI optimize requires OpenAI key, native code not attributed, and Jupyter integration.
Fix: py-spy Not Working — Attach Permission, Empty Output, and Native Frame Errors
How to fix py-spy errors — Operation not permitted ptrace, flamegraph blank, missing native code frames, top mode shows no Python frames, dump command empty, and subprocess inheritance.
Fix: Locust Not Working — User Class Errors, Distributed Mode, and Throughput Issues
How to fix Locust errors — no locustfile found, User class not detected, worker connection refused, distributed mode throughput lower than single-node, StopUser exception, FastHttpUser vs HttpUser, and headless CSV reports.
Fix: Python asyncio Blocking the Event Loop — Mixing Sync and Async Code
How to fix Python asyncio event loop blocking — using run_in_executor for sync calls, asyncio.to_thread, avoiding blocking I/O in coroutines, and detecting event loop stalls.