Fix: Modal Not Working — App vs Stub, Image Build, Volumes, GPU Selection, and Cold Starts
Quick Answer
How to fix Modal Labs errors — modal.App vs modal.Stub deprecation, image dependencies missing, Volume vs NetworkFileSystem, GPU type mismatch, .remote vs .local invocation, web endpoint URL, and cold start tuning.
The Error
You write a Modal app and the new App syntax errors:
import modal
stub = modal.Stub("my-app") # DeprecationWarning or AttributeError
@stub.function()
def hello():
print("hi")Or modal run fails because the image is missing a dependency:
ModuleNotFoundError: No module named 'transformers'Or you allocate a GPU and Modal says it’s unavailable:
modal.exception.ResourceExhaustedError: A100 capacity is unavailable in the regionOr .local() runs locally instead of on Modal:
result = my_function.local(...)
# Runs in your terminal, not in the cloud.Why This Happens
Modal is Python-native serverless: you write Python, decorate it, and Modal handles container build, GPU allocation, scaling, and queues. Most issues map to one of:
Stub→Apprename. Modal 0.62+ renamedStubtoApp. Both still work in transition versions but new code should useApp. Older tutorials useStub.- Images are declarative. You build an image with
modal.Image.debian_slim().pip_install(...)chains. The image is built remotely; if a dep is missing, you have to add it to the image — not just import locally. - Invocation styles.
.remote(...)runs on Modal;.local(...)runs in your local Python;.map(...)runs many in parallel on Modal. Mixing them produces confusing results. - GPU types differ by availability.
gpu="a100"may not be available;gpu="any"lets Modal pick. Pinning a specific GPU can fail at runtime when capacity runs out.
Fix 1: Use modal.App (Not Stub)
import modal
app = modal.App("my-app") # New name
@app.function()
def hello():
print("hi")
@app.local_entrypoint()
def main():
hello.remote()For older code that still uses Stub, both should work in current versions:
# Both equivalent — but App is the future:
app = modal.App("my-app")
stub = modal.Stub("my-app") # Deprecated aliasRun from CLI:
modal run my_app.py
# or
modal run my_app.py::main # Specify entry pointFor deployment (persistent functions accessible via API):
modal deploy my_app.pyrun is one-shot; deploy makes the functions callable from anywhere with API credentials.
Pro Tip: Use modal serve my_app.py during development. It deploys with live reload — file changes trigger a re-deploy automatically.
Fix 2: Build the Image With Your Dependencies
Modal runs each function in its own container. The container’s image must include every package your function imports:
import modal
image = (
modal.Image.debian_slim(python_version="3.12")
.pip_install("transformers", "torch", "numpy")
.apt_install("git")
.env({"MY_VAR": "value"})
)
app = modal.App("my-app", image=image)
@app.function()
def use_transformers():
from transformers import pipeline # Now available
pipe = pipeline("sentiment-analysis")
return pipe("I love this!")The image is built remotely the first time you run. Subsequent runs reuse the cached image.
For project-local code:
image = (
modal.Image.debian_slim()
.pip_install("transformers")
.add_local_dir("./my_package", remote_path="/root/my_package")
)add_local_dir syncs a local directory into the image. For Python source you want to call from a function:
import sys
@app.function(image=image)
def my_func():
sys.path.append("/root") # If using add_local_dir
from my_package import thing
return thing()Common Mistake: Installing pip packages outside the image chain:
# Wrong — only installs in your local venv:
# pip install transformers
# Right — installs in the Modal image:
image = modal.Image.debian_slim().pip_install("transformers")Fix 3: Pick the Right GPU
Modal supports several GPU types. As of 2025-2026:
@app.function(gpu="any") # Anything available (cheapest)
@app.function(gpu="T4") # T4 (16 GB)
@app.function(gpu="L4") # L4 (24 GB)
@app.function(gpu="A10G") # A10G (24 GB)
@app.function(gpu="A100") # A100 (40 GB)
@app.function(gpu="A100-80GB") # A100 80 GB
@app.function(gpu="H100") # H100 (80 GB)For multi-GPU:
@app.function(gpu=modal.gpu.A100(count=4))
def train():
# 4x A100 in one container
passIf your preferred GPU isn’t available (ResourceExhaustedError), Modal will queue or fail. Two strategies:
# Fallback list (Modal tries in order):
@app.function(gpu=["H100", "A100-80GB", "A100"])
def train():
pass
# Or just "any" and detect inside:
@app.function(gpu="any")
def train():
import torch
print(torch.cuda.get_device_name(0))Pro Tip: Don’t pin to H100 unless you actually need its specific features (FP8, NVL chains). T4/A10G are cheaper and available faster.
Fix 4: .remote(), .local(), .map()
@app.function()
def square(x: int) -> int:
return x * x
@app.local_entrypoint()
def main():
# Runs on Modal:
result = square.remote(5)
print(result) # 25
# Runs locally in your terminal:
result = square.local(5)
print(result) # 25, but didn't use Modal
# Runs many in parallel on Modal:
results = list(square.map(range(10)))
print(results) # [0, 1, 4, 9, ..., 81].remote() — single call on Modal. Returns the result. .local() — runs in your local Python process (for testing). .map() — batch parallel. Returns a generator of results. .spawn() — fire-and-forget; returns a FunctionCall handle for later polling.
For thousands of parallel calls:
results = list(square.map(range(10000), order_outputs=False))order_outputs=False lets results return as they finish (faster). Default is True (matches input order, slower for skewed durations).
Common Mistake: Calling square(5) directly (no .remote() or .local()). This calls the bare function in your local Python — same as .local() but without making it explicit.
Fix 5: Volumes for Persistent Storage
Containers are ephemeral. For persistent data, use modal.Volume:
volume = modal.Volume.from_name("my-vol", create_if_missing=True)
@app.function(volumes={"/data": volume})
def write_data():
with open("/data/file.txt", "w") as f:
f.write("hello")
volume.commit() # Persist changes
@app.function(volumes={"/data": volume})
def read_data():
volume.reload() # Get latest state
with open("/data/file.txt") as f:
return f.read()Two important calls:
volume.commit()— saves changes back to the Volume after writes. Without it, writes are lost when the container exits.volume.reload()— pulls fresh state before reading. Without it, you may read stale cached data.
For caching model weights:
weights_vol = modal.Volume.from_name("model-weights", create_if_missing=True)
@app.function(
volumes={"/cache": weights_vol},
image=image,
)
def inference(prompt: str):
import os
os.environ["HF_HOME"] = "/cache/huggingface"
# First call downloads ~5GB to /cache; subsequent calls reuse.
from transformers import pipeline
pipe = pipeline("text-generation", model="meta-llama/Llama-3-8B")
return pipe(prompt)The first inference downloads the model; subsequent inferences (even in fresh containers) reuse the Volume.
Pro Tip: Use a separate Volume for each large dataset/model. Volumes have per-Volume read/write caches, so isolating them gives the best cold-start times.
Fix 6: Web Endpoints
Expose a function as an HTTPS endpoint:
@app.function(image=image)
@modal.web_endpoint(method="POST")
def predict(payload: dict):
result = my_model.predict(payload["input"])
return {"result": result}After modal deploy, the URL is in the deploy output. Or via CLI:
modal app list
modal app show my-appFor FastAPI integration (more powerful):
from fastapi import FastAPI
web_app = FastAPI()
@web_app.post("/predict")
def predict(payload: dict):
return {"result": my_model.predict(payload["input"])}
@app.function(image=image)
@modal.asgi_app()
def fastapi_app():
return web_appThis exposes the FastAPI app at one Modal-assigned URL. All routes work as in any FastAPI deploy.
Common Mistake: Authenticated web endpoints without proper headers. Modal’s web endpoints can be public or require an API token:
@modal.web_endpoint(method="POST", requires_proxy_auth=True)
def secure_endpoint(...): ...requires_proxy_auth=True blocks unauthenticated callers at the Modal edge.
Fix 7: Secrets
Don’t put API keys in your code. Use Modal Secrets:
modal secret create openai-secret OPENAI_API_KEY=sk-...Or via the dashboard.
Reference in functions:
@app.function(
image=image,
secrets=[modal.Secret.from_name("openai-secret")],
)
def call_openai(prompt: str):
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
return client.chat.completions.create(...)The secret’s env vars are injected at runtime; they don’t appear in your image or logs.
For multiple secrets:
secrets=[
modal.Secret.from_name("openai-secret"),
modal.Secret.from_name("anthropic-secret"),
modal.Secret.from_name("aws-creds"),
]For dynamic secrets (per-deployment overrides):
modal.Secret.from_dict({"DB_URL": os.environ["LOCAL_DB_URL"]})from_dict is fine for dev but don’t commit hard-coded secrets that way.
Fix 8: Cold Starts and container_idle_timeout
Each cold start downloads the image, starts the container, and runs your function. For latency-sensitive workloads, keep containers warm:
@app.function(
image=image,
gpu="A10G",
container_idle_timeout=300, # Keep idle containers alive for 5 minutes
allow_concurrent_inputs=10, # One container handles up to 10 concurrent calls
keep_warm=1, # Always have 1 container ready
)
def predict(payload: dict):
...Three controls:
container_idle_timeout— how long a container sits idle before being killed.allow_concurrent_inputs— concurrent requests per container. Higher means fewer cold starts but more memory pressure.keep_warm— number of always-running containers. Costs money even when idle, but cold-start latency drops to zero.
For GPU functions, even keep_warm=1 is expensive. Use it for production endpoints; for batch jobs, accept cold starts.
Pro Tip: Test cold-start time with modal run --detach. The first call after deploy is a cold start; subsequent calls are warm.
Still Not Working?
A few less-obvious failures:
ImportErrordespitepip_install. Modal caches images by chain hash. Adding apip_installafter arun_commandsmay invalidate the cache forrun_commands. Order chain steps from least-changing to most-changing.TimeoutError: function exceeded 600s. Default function timeout is 10 minutes. Bump via@app.function(timeout=3600)(1 hour). Max varies by Modal plan.- GPU function runs on CPU. No
gpu=set. Decorator must includegpu="any"or specific type. add_local_dirnot picking up changes. Modal caches local syncs. Force re-sync with--detachor bump the version of your function decorator.- Volumes diverge across regions. Volumes are region-scoped. Functions in different regions reading the same Volume name access different physical volumes. Pin function region or use a different storage backend.
modal token setfails in CI. CI environments need API token via env var:MODAL_TOKEN_IDandMODAL_TOKEN_SECRET. Generate from the dashboard.modal.exception.InvalidError: function does not exist. Either the function isn’t deployed yet (runmodal deploy) or the function name in the lookup doesn’t match. UseFunction.lookup("my-app", "my-func")exact names.Class.clsdeprecated. Modal moved fromStub.clstoApp.clsalong withStub → App. Update.
For related Python deployment and ML serving issues, see vLLM not working, AWS Lambda timeout, PyTorch not working, and Docker no space left on device.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: AWS Lambda SnapStart Not Working — Version vs Alias, Restore Hooks, and Uniqueness Bugs
How to fix Lambda SnapStart errors — feature requires published version, $LATEST not supported, restore hook for stale connections, UUID collisions after snapshot, time-based state staleness, and pricing surprises.
Fix: scalene Not Working — Web UI, GPU Profiling, and AI Suggestion Errors
How to fix scalene errors — scalene command not found, web UI port conflict, no GPU detected, profile.json empty, AI optimize requires OpenAI key, native code not attributed, and Jupyter integration.
Fix: Gunicorn Not Working — Worker Timeout, Boot Errors, and Signal Handling
How to fix Gunicorn errors — WORKER TIMEOUT killed, ImportError cannot import app, worker class not found, connection refused 502 behind nginx, graceful reload not working, and sync vs async worker selection.
Fix: ONNX Not Working — Conversion Errors, Runtime Provider Issues, and Dynamic Shape Problems
How to fix ONNX errors — torch.onnx.export unsupported operator, ONNX Runtime CUDA provider not found, InvalidArgument input shape mismatch, dynamic axes not working, IR version mismatch, and opset version conflicts.