Skip to content

Fix: Fly.io Deploy Not Working — fly.toml, Machines, Volumes, Secrets, and Internal DNS

FixDevs ·

Quick Answer

How to fix Fly.io errors — fly.toml app vs name confusion, machines API vs legacy apps, Dockerfile build failures, volume per-region, secrets staging, fly proxy for local access, and internal IPv6 routing.

The Error

You fly deploy and the build fails halfway through:

==> Building image
Error: failed to fetch an image or build from source: error building: 
exit code 1

Or the deploy succeeds but the app crashes immediately:

$ fly status
# State: machine started, then exited
$ fly logs
[error] PORT environment variable not set

Or you create a volume and the machine can’t see it:

$ fly volumes create my-data --region nrt --size 10
$ fly deploy
# Container starts but /data is empty.

Or fly secrets set succeeds but the app doesn’t see the variable:

$ fly secrets set OPENAI_API_KEY=sk-...
$ fly ssh console -C "env | grep OPENAI"
OPENAI_API_KEY=
# Empty.

Why This Happens

Fly.io runs your apps in Firecracker microVMs (“Machines”) in regions worldwide. Most deploy issues map to one of:

  • fly.toml is the contract. It declares the app name, primary region, builder, ports, mounts, and health checks. Bugs in fly.toml cause subtle deploy failures.
  • Machines vs Apps v1. Older Fly used “apps” with Nomad scheduling. New deploys use Machines (Firecracker VMs). Some tutorials still reference Nomad-style commands. Stick to Machines (fly deploy defaults to it).
  • Volumes are zone-specific. A volume in nrt (Tokyo) can’t attach to a machine in iad (Virginia). One volume = one machine.
  • Secrets are staged. fly secrets set queues the change; it doesn’t apply until the next deploy or restart. You can force with fly machines restart.

Fix 1: Write a Working fly.toml

app = "my-app"
primary_region = "nrt"

[build]
  # Builder is auto-detected from Dockerfile or buildpacks
  # Or explicit:
  # builder = "paketobuildpacks/builder:base"
  # dockerfile = "Dockerfile.prod"

[env]
  NODE_ENV = "production"
  PORT = "8080"

[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = "stop"  # Stop idle machines to save money
  auto_start_machines = true
  min_machines_running = 0

  [[http_service.checks]]
    interval = "10s"
    timeout = "2s"
    grace_period = "5s"
    method = "GET"
    path = "/health"

[[vm]]
  cpu_kind = "shared"
  cpus = 1
  memory_mb = 256

Key sections:

  • app — your unique Fly app name.
  • primary_region — where new machines spawn by default.
  • http_service — exposes HTTP, terminates TLS, handles force_https redirect.
  • http_service.checks — health checks. If they fail, Fly marks the machine unhealthy and stops sending traffic.
  • vm — sizing per machine. shared-cpu-1x with 256 MB is the cheapest tier.

Pro Tip: Generate a starter fly.toml with fly launch --no-deploy. It auto-detects your stack (Node, Python, Go, Rust) and writes a sensible default. Then edit before deploying.

Fix 2: Inspect Build Failures

When fly deploy fails during build:

fly deploy --verbose

Verbose mode prints the full Dockerfile build log. The most common failures:

  • Missing files in build context. .dockerignore excludes them. Check what’s getting sent: tar -czf - . --exclude-from=.dockerignore | tar tz | head.
  • Network issues fetching deps. npm install or pip install times out. Add retries or a build-time cache.
  • Image too large. Fly’s free tier has size limits. Use multi-stage builds to ship only the final artifact:
# Build stage
FROM node:20-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Runtime stage
FROM node:20-alpine
WORKDIR /app
COPY --from=build /app/dist ./dist
COPY --from=build /app/node_modules ./node_modules
COPY package*.json ./
EXPOSE 8080
CMD ["node", "dist/server.js"]

For Buildpacks or Nixpacks instead of Dockerfile:

[build]
  builder = "paketobuildpacks/builder:base"

Buildpacks auto-detect your stack. Slower than a tuned Dockerfile but no Dockerfile needed.

Fix 3: Pick the Right Port

The container’s process must listen on internal_port (declared in fly.toml). Fly injects PORT as an env var pointing at that:

# Python example:
import os
port = int(os.environ.get("PORT", 8080))
app.run(host="0.0.0.0", port=port)

Critical: bind to 0.0.0.0, not 127.0.0.1. Fly’s network requires accepting connections on the public interface inside the VM. Loopback-only servers are invisible to Fly’s proxy.

For Node/Express:

app.listen(process.env.PORT || 8080, "0.0.0.0", () => {
  console.log(`listening on ${process.env.PORT}`);
});

If health checks fail with “connection refused,” the process either isn’t listening or is bound to localhost only.

Common Mistake: Setting PORT=3000 in [env] of fly.toml and a Dockerfile EXPOSE 8080. Fly uses internal_port from http_service; the EXPOSE and PORT env should match it. Pick one number and use it everywhere.

Fix 4: Volumes Are Per-Region, Per-Machine

Create a volume in a specific region:

fly volumes create my-data --region nrt --size 10

Attach in fly.toml:

[[mounts]]
  source = "my-data"
  destination = "/data"
  initial_size = "10gb"

When you deploy, Fly attaches the volume to a machine in the same region. If you scale to 2 machines, you need 2 volumes (one per machine):

fly volumes create my-data --region nrt --size 10  # Creates a new volume each time
fly volumes create my-data --region nrt --size 10
fly scale count 2

Common Mistake: Expecting one volume to be shared across machines. Volumes are local SSD attached to a single machine. For shared storage, use Tigris (Fly’s S3-compatible object storage) or LiteFS for distributed SQLite.

For databases that need consistent storage:

[[mounts]]
  source = "postgres_data"
  destination = "/var/lib/postgresql/data"

Then pin the app to a single machine (no auto-scale) or use Fly Postgres (managed).

Fix 5: Secrets and Staging

fly secrets set queues the change; the app doesn’t see new values until restart:

fly secrets set OPENAI_API_KEY=sk-...
# Secret staged. To apply, restart machines.

fly machines restart
# Or:
fly deploy

Set multiple at once (avoids multiple restarts):

fly secrets set \
  OPENAI_API_KEY=sk-... \
  DATABASE_URL=postgres://... \
  STRIPE_KEY=sk_test_...

For dev:

# Import from .env file:
fly secrets import < .env

To list (names only, values hidden):

fly secrets list

To remove:

fly secrets unset OPENAI_API_KEY

Pro Tip: Use a separate Fly app per environment (my-app-dev, my-app-prod). Secrets are per-app — no risk of accidentally pushing prod secrets to dev.

Fix 6: Local Access via fly proxy

For databases and internal services that aren’t HTTP-exposed:

# Connect to your Fly Postgres locally:
fly proxy 5432:5432 -a my-postgres-app

# Now psql can connect:
psql postgres://user:pass@localhost:5432/dbname

fly proxy opens a tunnel from your laptop through Fly’s edge to the internal service. Useful for one-off psql, redis-cli, mongosh sessions.

For Redis:

fly proxy 6379:6379 -a my-redis-app
redis-cli -h localhost -p 6379

For SSH into a running machine:

fly ssh console
# Or specific machine:
fly ssh console --machine <machine-id>

Common Mistake: Trying to connect to <app>.fly.dev:5432. Fly’s external HTTPS endpoint only proxies HTTP. For TCP services, use fly proxy or attach a public IPv4/IPv6 with proper port config.

Fix 7: Internal IPv6 Networking

Fly’s internal network is IPv6-only by default. App-to-app calls use .internal DNS:

# From app A, calling app B:
import httpx
response = httpx.get("http://my-app-b.internal:8080/api/health")

<app-name>.internal resolves to the IPv6 address of the closest healthy machine in the org’s private network.

For region-specific routing:

# Hit a machine in a specific region:
"http://nrt.my-app-b.internal:8080"

# Hit all machines:
"http://global.my-app-b.internal:8080"  # Load-balanced

Common Mistake: Trying to call my-app-b.fly.dev from inside another Fly app. That round-trips through the public edge — slow and wastes bandwidth. Use .internal for app-to-app.

For Postgres connections inside the org:

postgres://user:[email protected]:5432/dbname

Fix 8: Scale and Auto-Stop

auto_stop_machines = "stop" stops idle machines to save money:

[http_service]
  auto_stop_machines = "stop"   # or "off" to never stop
  auto_start_machines = true     # Start on incoming traffic
  min_machines_running = 0       # Number to keep always running

A stopped machine has zero cost but a ~1-2s cold start when a request arrives. For latency-sensitive apps, set min_machines_running = 1.

Scale manually:

fly scale count 3                  # 3 machines total
fly scale count 1 --region nrt     # 1 machine in Tokyo
fly scale count 2 --region nrt --region iad  # 2 each in nrt and iad
fly scale vm shared-cpu-2x --memory 1024  # Resize VMs

Scale by region:

fly scale count 2 --region nrt
fly scale count 1 --region iad
fly scale count 1 --region fra

Pro Tip: Use fly logs --region nrt to filter logs per region when debugging multi-region issues.

Still Not Working?

A few less-obvious failures:

  • fly launch overwrites your fly.toml. Use fly launch --no-deploy and review the generated file before deploying. Or skip launch if you already have a working config.
  • deploy succeeds but fly status shows “no machines.” The Dockerfile’s CMD exits immediately. Make sure your process keeps running (don’t exec a one-shot command).
  • Free tier bandwidth exceeded. Fly’s free allowance covers basic apps. Heavy traffic or large image pulls eat into it. Check usage in the dashboard.
  • fly deploy is slow even with cache. The image is huge (GB+). Use --build-only to inspect the image, multi-stage to slim it, or use --remote-only to skip local Docker.
  • App can’t connect to managed Postgres. Use the internal hostname (<pg-app>.internal:5432), not the public hostname. Check the connection string Fly’s attach command output.
  • Sudden Error: machines updated; the machine cannot be ssh'd into during deploy. Deploy is in progress; wait for it to finish. Force with --no-deploy if you’re just sshing for diagnostics.
  • hostsync.fly.dev lookup fails. Internal DNS is region-aware; sometimes a region has issues. Try nslookup my-app.internal from a different region or check Fly’s status page.
  • LiteFS errors after deploy. LiteFS needs a leader/replica config in litefs.yml and Consul or static lease. Without it, all nodes try to be the leader and writes fail. Pin one machine as the primary.

For related deployment and edge computing issues, see Cloudflare D1 not working, Docker Compose service failed to build, Heroku h10 app crashed, and Postgres connection refused.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles