Skip to content

Fix: memray Not Working — Tracking Errors, Flamegraph Empty, and Native Allocations

FixDevs ·

Quick Answer

How to fix memray errors — memray run command not found, flamegraph shows no data, native allocations not tracked, live mode TUI broken, attach to running process fails, and pytest integration.

The Error

You install memray and run a script — get a binary file but can’t read it:

$ pip install memray
$ memray run my_script.py
# Output: memray-my_script.py.12345.bin
$ cat memray-my_script.py.12345.bin
# Binary garbage — how do I read this?

Or the generated flamegraph is empty:

$ memray flamegraph memray-my_script.py.12345.bin
# Opens HTML — but the flamegraph is just one tiny block at the top

Or native allocations from C extensions aren’t tracked:

import numpy as np
arr = np.zeros(1_000_000_000)   # Allocates ~8 GB
# memray report shows ~0 MB used — native alloc not tracked

Or live mode TUI doesn’t work in your terminal:

$ memray run --live my_script.py
# Terminal goes blank, no display, or weird characters

Or attaching to a running process fails:

$ memray attach 12345
# Error: cannot attach — ptrace permissions

memray is the heavyweight Python memory profiler — written by Bloomberg, tracks every allocation (Python and native), supports live monitoring of running processes, and generates flamegraphs. The Bloomberg engineering quality shows: the tooling is excellent. But the default workflow has a “track first, view later” pattern that confuses developers used to live profilers, and native allocation tracking requires explicit opt-in. This guide covers each.

Why This Happens

memray records allocations to a binary file during the program run. The file contains the call stacks and sizes for every alloc — converting it to a human-readable view (flamegraph, summary, tree) happens as a separate memray <command> step. New users expect a “run and see results” workflow like py-spy; memray’s “run, then analyze” model takes adjustment.

Native allocations (C/Rust extensions like NumPy, PyTorch) bypass Python’s tracemalloc and aren’t tracked by default. memray can trace them via libc hooks, but you must enable --native explicitly.

Fix 1: Basic Recording and Viewing

# Record allocations
memray run my_script.py
# Generates: memray-my_script.py.<pid>.bin

# Quick summary
memray summary memray-my_script.py.12345.bin

# Flamegraph (HTML)
memray flamegraph memray-my_script.py.12345.bin
# Opens memray-flamegraph-my_script.py.12345.html

# Allocation tree
memray tree memray-my_script.py.12345.bin

# Stats
memray stats memray-my_script.py.12345.bin

The 3-step workflow:

  1. Run with memray run
  2. Open the .bin file with a viewer command
  3. Browse the report

Specify output file:

memray run -o my_profile.bin my_script.py
memray flamegraph my_profile.bin

Common Mistake: Looking for live output during memray run. The recording mode runs silently — no progress bar, no in-terminal stats, just generates the binary file. For live monitoring, use --live (covered below).

Profile a module/script with args:

memray run -m my_package.main --arg1 value1
memray run my_script.py arg1 arg2

Fix 2: Native Allocations

import numpy as np
import torch

arr = np.zeros(100_000_000)   # 800 MB native alloc
tensor = torch.zeros(50_000_000)   # 200 MB native alloc

Without --native, memray only sees Python’s allocator — these large native allocs are invisible.

Enable native tracking:

memray run --native my_script.py
memray flamegraph --native memray-my_script.py.12345.bin

Native tracking intercepts malloc/free via libc hooks. This catches:

  • NumPy / SciPy array allocations
  • PyTorch tensor allocations
  • pandas DataFrame internal buffers
  • Anything any C/Rust extension allocates via standard libc

Overhead — native tracking adds 2-5x slowdown vs Python-only profiling. Worth it when debugging C-extension memory; skip it for pure-Python profiling.

Pro Tip: For ML / data science workloads (PyTorch, TensorFlow, pandas, NumPy), always use --native. Without it, you’d see your Python code allocating dicts and lists but completely miss the multi-GB tensor allocations dominating actual memory use. The slowdown is acceptable for debugging sessions.

Fix 3: Empty Flamegraph

If memray flamegraph shows just one tiny block, your script either ran too briefly or didn’t allocate significantly.

Force longer profiling:

# my_script.py
def actually_do_work():
    data = [i ** 2 for i in range(10_000_000)]
    return sum(data)

actually_do_work()
# Add more work if needed — a microsecond-long script has no allocations to track

Use --leaks mode to focus on leaked allocations:

memray flamegraph --leaks memray-script.bin

This shows only allocations that weren’t freed by program end — focuses the flamegraph on actual leaks.

Use --temporary-allocation-threshold for short-lived allocs:

memray run --trace-python-allocators --temporary-allocation-threshold 1024 my_script.py

This separately tracks allocations that are quickly freed — useful for finding code paths that thrash the allocator.

Common Mistake: Profiling a short script (< 100ms) and concluding memray is broken. memray’s overhead per allocation is meaningful — very brief scripts may have so few allocations they don’t make for a meaningful flamegraph. Add more work, or profile a longer test/workload.

Fix 4: Live Mode TUI

memray run --live my_script.py

Live mode opens a terminal UI showing allocations in real time as the script runs. Useful for long-running scripts.

Live mode controls:

KeyAction
tSwitch between Total/Own memory views
Navigate sort columns
sToggle ordering
qQuit

TUI doesn’t render properly — usually a terminal compatibility issue:

# Try different terminal types
TERM=xterm-256color memray run --live my_script.py
TERM=screen memray run --live my_script.py

Or run live mode in a separate process:

# Terminal 1
memray run --live-remote -p 9000 my_script.py

# Terminal 2
memray live 9000

--live-remote opens a socket on the specified port; memray live connects from anywhere (including over SSH).

Fix 5: Attach to Running Process

# Find PID
ps aux | grep python

# Attach
memray attach 12345

Required permissions:

On Linux, attaching needs ptrace permission:

# Either run as root
sudo memray attach 12345

# Or enable ptrace for unprivileged processes
sudo sysctl kernel.yama.ptrace_scope=0

# Or per-process: launch with PR_SET_DUMPABLE

Detach with:

memray attach --stop 12345
# Or send SIGUSR1 to the process
kill -USR1 12345

Common Mistake: Attaching to a process and getting “ptrace permission denied” without realizing it’s a kernel security setting. The kernel.yama.ptrace_scope default of 1 only allows ptrace for parent processes (and children). For arbitrary processes, set it to 0 (less secure) or use sudo.

Attach + live mode:

memray attach --live 12345

Combines attach with the live TUI — peek into a running production-ish process’s memory pattern.

Fix 6: pytest Integration

pip install pytest-memray
# test_my_code.py
import pytest

@pytest.mark.limit_memory("100 MB")
def test_memory_use():
    # Test fails if it allocates > 100 MB
    data = [i ** 2 for i in range(1_000_000)]
    assert sum(data) > 0

@pytest.mark.limit_leaks("1 KB")
def test_no_leaks():
    # Test fails if any allocation isn't freed
    result = compute_something()
    assert result
pytest --memray   # Enable memray for all tests, prints summary
pytest --memray test_my_code.py::test_memory_use   # Profile one test

Common Mistake: Setting overly tight limits like limit_memory("10 MB") on tests that legitimately need more. The test fails not because of a bug but because the limit was unrealistic. Profile the test first to know its actual memory baseline, then add 50% headroom for the limit.

For pytest fixture patterns that work with memray, see pytest fixture not found.

Fix 7: CI Integration and Regression Detection

# .github/workflows/memory.yml
name: Memory Regression Check

on: [push, pull_request]

jobs:
  memory:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install memray pytest pytest-memray
      - run: pytest --memray --memray-bin-path=memray-reports
      - uses: actions/upload-artifact@v4
        with:
          name: memray-reports
          path: memray-reports/

Compare two profiles to detect regressions:

memray compare baseline.bin new.bin
# Shows allocations that increased between runs

For continuous monitoring in production-like environments, periodic profiling jobs catch slow memory creep before it hits prod.

Fix 8: Reading the Flamegraph

The flamegraph’s columns/rows mean different things than you might think:

  • X-axis (width) = total allocated memory at that call site
  • Y-axis (depth) = call stack — deeper = more nested
  • Color = arbitrary, distinguishes adjacent frames

To find leaks:

  1. Generate --leaks flamegraph
  2. Look for wide blocks deep in the stack
  3. Wide block = lots of memory allocated, never freed

To find hot allocators:

memray summary memray-script.bin

Shows top N allocators by total bytes. Often surprising — Pydantic validation, JSON serialization, and pandas DataFrame construction commonly dominate.

Pro Tip: memray’s tree mode is often more useful than the flamegraph for digging into allocations:

memray tree memray-script.bin

It’s a navigable tree where you can drill down into call paths. Click a function to see what it allocated. For tracking down a specific leak source, tree is faster than scanning a flamegraph.

Still Not Working?

memray vs py-spy vs cProfile

  • memray — Memory profiling. Best for finding leaks and high allocators.
  • py-spy — CPU profiling. Sample-based, low overhead, attach without restart. Best for understanding where time goes.
  • cProfile — Stdlib CPU profiler. Higher overhead but built-in.
  • scalene — Memory + CPU + GPU. Newer, full-featured.

For memory-specific debugging, memray wins. For CPU + memory combined, scalene is worth a look.

Tracking Python Allocators Only

memray run --trace-python-allocators my_script.py

Tracks each call to Python’s memory allocator (pymalloc) separately. Useful for understanding small-object churn that doesn’t show up in regular allocation tracking.

Profiling Multi-Process Applications

memray run --follow-fork my_script.py
# Tracks child processes spawned via fork()

Each child gets its own .bin file. Multiprocessing apps (Celery, Gunicorn workers) need this flag to see their workers’ allocations.

For multiprocessing patterns that interact with memory profiling, see Python multiprocessing not working.

Large .bin Files

For long-running profiles, the .bin can be gigabytes:

memray run --aggregate my_script.py

--aggregate records aggregated stats instead of every allocation — much smaller file, less detail.

Profiling Tests / FastAPI / Django

memray run --aggregate -o profile.bin -- pytest tests/
memray flamegraph profile.bin

# FastAPI request handler
memray run --aggregate -o profile.bin -- uvicorn app:app
# Then send requests; press Ctrl+C; analyze

For FastAPI patterns where memory profiling matters (large response payloads, file uploads), see FastAPI dependency injection error.

Combining with Loguru / Structlog Logging

For long-running services, periodic memory snapshots via logging:

import resource

def log_memory():
    usage = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
    # On Linux, ru_maxrss is in KB; on macOS, in bytes
    print(f"Peak memory: {usage / 1024:.0f} MB")

For Loguru-based logging patterns, see Loguru not working.

Inline Python Use (Context Manager)

For profiling a specific block of code without external CLI:

import memray

with memray.Tracker("output.bin"):
    # Code in here is profiled
    data = [i ** 2 for i in range(1_000_000)]
    process(data)

# Tracking stops at context exit
# Then analyze:
#   memray flamegraph output.bin

This pattern is useful for profiling specific functions inside a larger app without restarting under memray run.

Tracking Stack Depth

Default stack depth is 50 frames — may not show enough for deeply nested code:

memray run --max-stack-depth 100 my_script.py

Higher depth gives more context but produces larger .bin files. 100 is enough for most apps; raise to 200+ for deep recursion or complex frameworks.

Custom Memory Allocators (PyTorch, JAX)

PyTorch’s CUDA allocator and JAX’s allocators are outside libc’s malloc — memray’s --native doesn’t catch them. For GPU memory:

# PyTorch
import torch
print(torch.cuda.memory_summary())   # PyTorch's own memory report

# JAX
import jax
print(jax.devices()[0].memory_stats())

For PyTorch GPU memory issues, the built-in torch.cuda.memory_summary() is the right tool — memray only sees CPU memory.

When to Reach for memray vs Alternatives

  • Memory leak in long-running service — memray with --leaks mode
  • High memory at peak — memray full profile, look for largest allocators
  • OOM kill in CI — memray with --aggregate to keep file size small
  • Native extension suspected — memray with --native
  • General “is my code slow” question — py-spy first; memory profiling is secondary

For PyTorch-specific memory debugging, see PyTorch not working. For NumPy/Pandas patterns that often dominate memory, see NumPy not working.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles