Fix: Fly.io Deploy Not Working — fly.toml, Machines, Volumes, Secrets, and Internal DNS
Quick Answer
How to fix Fly.io errors — fly.toml app vs name confusion, machines API vs legacy apps, Dockerfile build failures, volume per-region, secrets staging, fly proxy for local access, and internal IPv6 routing.
The Error
You fly deploy and the build fails halfway through:
==> Building image
Error: failed to fetch an image or build from source: error building:
exit code 1Or the deploy succeeds but the app crashes immediately:
$ fly status
# State: machine started, then exited
$ fly logs
[error] PORT environment variable not setOr you create a volume and the machine can’t see it:
$ fly volumes create my-data --region nrt --size 10
$ fly deploy
# Container starts but /data is empty.Or fly secrets set succeeds but the app doesn’t see the variable:
$ fly secrets set OPENAI_API_KEY=sk-...
$ fly ssh console -C "env | grep OPENAI"
OPENAI_API_KEY=
# Empty.Why This Happens
Fly.io runs your apps in Firecracker microVMs (“Machines”) in regions worldwide. Most deploy issues map to one of:
fly.tomlis the contract. It declares the app name, primary region, builder, ports, mounts, and health checks. Bugs infly.tomlcause subtle deploy failures.- Machines vs Apps v1. Older Fly used “apps” with Nomad scheduling. New deploys use Machines (Firecracker VMs). Some tutorials still reference Nomad-style commands. Stick to Machines (
fly deploydefaults to it). - Volumes are zone-specific. A volume in
nrt(Tokyo) can’t attach to a machine iniad(Virginia). One volume = one machine. - Secrets are staged.
fly secrets setqueues the change; it doesn’t apply until the next deploy or restart. You can force withfly machines restart.
Fix 1: Write a Working fly.toml
app = "my-app"
primary_region = "nrt"
[build]
# Builder is auto-detected from Dockerfile or buildpacks
# Or explicit:
# builder = "paketobuildpacks/builder:base"
# dockerfile = "Dockerfile.prod"
[env]
NODE_ENV = "production"
PORT = "8080"
[http_service]
internal_port = 8080
force_https = true
auto_stop_machines = "stop" # Stop idle machines to save money
auto_start_machines = true
min_machines_running = 0
[[http_service.checks]]
interval = "10s"
timeout = "2s"
grace_period = "5s"
method = "GET"
path = "/health"
[[vm]]
cpu_kind = "shared"
cpus = 1
memory_mb = 256Key sections:
app— your unique Fly app name.primary_region— where new machines spawn by default.http_service— exposes HTTP, terminates TLS, handlesforce_httpsredirect.http_service.checks— health checks. If they fail, Fly marks the machine unhealthy and stops sending traffic.vm— sizing per machine.shared-cpu-1xwith 256 MB is the cheapest tier.
Pro Tip: Generate a starter fly.toml with fly launch --no-deploy. It auto-detects your stack (Node, Python, Go, Rust) and writes a sensible default. Then edit before deploying.
Fix 2: Inspect Build Failures
When fly deploy fails during build:
fly deploy --verboseVerbose mode prints the full Dockerfile build log. The most common failures:
- Missing files in build context.
.dockerignoreexcludes them. Check what’s getting sent:tar -czf - . --exclude-from=.dockerignore | tar tz | head. - Network issues fetching deps.
npm installorpip installtimes out. Add retries or a build-time cache. - Image too large. Fly’s free tier has size limits. Use multi-stage builds to ship only the final artifact:
# Build stage
FROM node:20-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Runtime stage
FROM node:20-alpine
WORKDIR /app
COPY --from=build /app/dist ./dist
COPY --from=build /app/node_modules ./node_modules
COPY package*.json ./
EXPOSE 8080
CMD ["node", "dist/server.js"]For Buildpacks or Nixpacks instead of Dockerfile:
[build]
builder = "paketobuildpacks/builder:base"Buildpacks auto-detect your stack. Slower than a tuned Dockerfile but no Dockerfile needed.
Fix 3: Pick the Right Port
The container’s process must listen on internal_port (declared in fly.toml). Fly injects PORT as an env var pointing at that:
# Python example:
import os
port = int(os.environ.get("PORT", 8080))
app.run(host="0.0.0.0", port=port)Critical: bind to 0.0.0.0, not 127.0.0.1. Fly’s network requires accepting connections on the public interface inside the VM. Loopback-only servers are invisible to Fly’s proxy.
For Node/Express:
app.listen(process.env.PORT || 8080, "0.0.0.0", () => {
console.log(`listening on ${process.env.PORT}`);
});If health checks fail with “connection refused,” the process either isn’t listening or is bound to localhost only.
Common Mistake: Setting PORT=3000 in [env] of fly.toml and a Dockerfile EXPOSE 8080. Fly uses internal_port from http_service; the EXPOSE and PORT env should match it. Pick one number and use it everywhere.
Fix 4: Volumes Are Per-Region, Per-Machine
Create a volume in a specific region:
fly volumes create my-data --region nrt --size 10Attach in fly.toml:
[[mounts]]
source = "my-data"
destination = "/data"
initial_size = "10gb"When you deploy, Fly attaches the volume to a machine in the same region. If you scale to 2 machines, you need 2 volumes (one per machine):
fly volumes create my-data --region nrt --size 10 # Creates a new volume each time
fly volumes create my-data --region nrt --size 10
fly scale count 2Common Mistake: Expecting one volume to be shared across machines. Volumes are local SSD attached to a single machine. For shared storage, use Tigris (Fly’s S3-compatible object storage) or LiteFS for distributed SQLite.
For databases that need consistent storage:
[[mounts]]
source = "postgres_data"
destination = "/var/lib/postgresql/data"Then pin the app to a single machine (no auto-scale) or use Fly Postgres (managed).
Fix 5: Secrets and Staging
fly secrets set queues the change; the app doesn’t see new values until restart:
fly secrets set OPENAI_API_KEY=sk-...
# Secret staged. To apply, restart machines.
fly machines restart
# Or:
fly deploySet multiple at once (avoids multiple restarts):
fly secrets set \
OPENAI_API_KEY=sk-... \
DATABASE_URL=postgres://... \
STRIPE_KEY=sk_test_...For dev:
# Import from .env file:
fly secrets import < .envTo list (names only, values hidden):
fly secrets listTo remove:
fly secrets unset OPENAI_API_KEYPro Tip: Use a separate Fly app per environment (my-app-dev, my-app-prod). Secrets are per-app — no risk of accidentally pushing prod secrets to dev.
Fix 6: Local Access via fly proxy
For databases and internal services that aren’t HTTP-exposed:
# Connect to your Fly Postgres locally:
fly proxy 5432:5432 -a my-postgres-app
# Now psql can connect:
psql postgres://user:pass@localhost:5432/dbnamefly proxy opens a tunnel from your laptop through Fly’s edge to the internal service. Useful for one-off psql, redis-cli, mongosh sessions.
For Redis:
fly proxy 6379:6379 -a my-redis-app
redis-cli -h localhost -p 6379For SSH into a running machine:
fly ssh console
# Or specific machine:
fly ssh console --machine <machine-id>Common Mistake: Trying to connect to <app>.fly.dev:5432. Fly’s external HTTPS endpoint only proxies HTTP. For TCP services, use fly proxy or attach a public IPv4/IPv6 with proper port config.
Fix 7: Internal IPv6 Networking
Fly’s internal network is IPv6-only by default. App-to-app calls use .internal DNS:
# From app A, calling app B:
import httpx
response = httpx.get("http://my-app-b.internal:8080/api/health")<app-name>.internal resolves to the IPv6 address of the closest healthy machine in the org’s private network.
For region-specific routing:
# Hit a machine in a specific region:
"http://nrt.my-app-b.internal:8080"
# Hit all machines:
"http://global.my-app-b.internal:8080" # Load-balancedCommon Mistake: Trying to call my-app-b.fly.dev from inside another Fly app. That round-trips through the public edge — slow and wastes bandwidth. Use .internal for app-to-app.
For Postgres connections inside the org:
postgres://user:[email protected]:5432/dbnameFix 8: Scale and Auto-Stop
auto_stop_machines = "stop" stops idle machines to save money:
[http_service]
auto_stop_machines = "stop" # or "off" to never stop
auto_start_machines = true # Start on incoming traffic
min_machines_running = 0 # Number to keep always runningA stopped machine has zero cost but a ~1-2s cold start when a request arrives. For latency-sensitive apps, set min_machines_running = 1.
Scale manually:
fly scale count 3 # 3 machines total
fly scale count 1 --region nrt # 1 machine in Tokyo
fly scale count 2 --region nrt --region iad # 2 each in nrt and iad
fly scale vm shared-cpu-2x --memory 1024 # Resize VMsScale by region:
fly scale count 2 --region nrt
fly scale count 1 --region iad
fly scale count 1 --region fraPro Tip: Use fly logs --region nrt to filter logs per region when debugging multi-region issues.
Still Not Working?
A few less-obvious failures:
fly launchoverwrites yourfly.toml. Usefly launch --no-deployand review the generated file before deploying. Or skiplaunchif you already have a working config.deploysucceeds butfly statusshows “no machines.” The Dockerfile’sCMDexits immediately. Make sure your process keeps running (don’texeca one-shot command).- Free tier bandwidth exceeded. Fly’s free allowance covers basic apps. Heavy traffic or large image pulls eat into it. Check usage in the dashboard.
fly deployis slow even with cache. The image is huge (GB+). Use--build-onlyto inspect the image, multi-stage to slim it, or use--remote-onlyto skip local Docker.- App can’t connect to managed Postgres. Use the internal hostname (
<pg-app>.internal:5432), not the public hostname. Check the connection string Fly’sattachcommand output. - Sudden
Error: machines updated; the machine cannot be ssh'd into during deploy. Deploy is in progress; wait for it to finish. Force with--no-deployif you’re just sshing for diagnostics. hostsync.fly.devlookup fails. Internal DNS is region-aware; sometimes a region has issues. Trynslookup my-app.internalfrom a different region or check Fly’s status page.- LiteFS errors after deploy. LiteFS needs a leader/replica config in
litefs.ymland Consul or static lease. Without it, all nodes try to be the leader and writes fail. Pin one machine as the primary.
For related deployment and edge computing issues, see Cloudflare D1 not working, Docker Compose service failed to build, Heroku h10 app crashed, and Postgres connection refused.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: Coolify Not Working — Deployment Failing, SSL Not Working, or Containers Not Starting
How to fix Coolify self-hosted PaaS issues — server setup, application deployment, Docker and Nixpacks builds, environment variables, SSL certificates, database provisioning, and GitHub integration.
Fix: Cloudflare Pages Not Working — Build Output, Functions Routing, _redirects, and Bindings
How to fix Cloudflare Pages errors — build output directory mismatch, Functions in /functions/, _redirects vs _headers, compatibility flags, env per branch, D1/R2/KV bindings, and Direct Upload alternatives.
Fix: LiteFS Not Working — Consul Lease, Primary Election, Halt Locks, and Replica Reads
How to fix LiteFS errors — primary not elected, Consul lease setup, static lease single-node mode, halt locks for cross-node writes, replica seeing stale data, mount path mismatch, and LiteFS Cloud sync.
Fix: Docker Compose Watch Not Working — sync vs rebuild, Ignore Patterns, WSL/macOS File Events
How to fix docker compose watch errors — develop.watch directive not firing, sync vs sync+restart vs rebuild differences, ignore globs not matching, WSL2 file events delayed, named volumes shadowing watch, and Compose version requirements.