Skip to content

Fix: LiteFS Not Working — Consul Lease, Primary Election, Halt Locks, and Replica Reads

FixDevs ·

Quick Answer

How to fix LiteFS errors — primary not elected, Consul lease setup, static lease single-node mode, halt locks for cross-node writes, replica seeing stale data, mount path mismatch, and LiteFS Cloud sync.

The Error

You deploy LiteFS to Fly.io with multiple machines and writes fail:

SQLITE_READONLY_DBMOVED: attempt to write a readonly database

Or the primary never elects:

$ fly logs
[litefs] waiting for consul connection
[litefs] error: consul: dial tcp: connection refused

Or a replica reads stale data immediately after a write:

# On the primary:
cur.execute("INSERT INTO users (name) VALUES (?)", ["Alice"])
conn.commit()

# On a replica seconds later:
cur.execute("SELECT name FROM users WHERE id = LAST_INSERT_ROWID()")
# Returns no rows.

Or the FUSE mount fails to start:

[litefs] error: cannot mount: fuse: device not found

Why This Happens

LiteFS is a FUSE filesystem that intercepts SQLite writes and replicates them across nodes. Most issues map to one of:

  • No primary election. LiteFS needs a coordination mechanism to pick one writer. Two backends: Consul (Fly’s built-in) or static lease (one-node mode for testing). Without one, no writes happen.
  • Replicas are read-only. A replica that receives a write request must forward it to the primary, or your app must route writes itself. The SQLITE_READONLY_DBMOVED error means a replica tried to commit a write directly.
  • Replication is asynchronous by default. A replica may be milliseconds (or seconds) behind. For “read your own writes,” use halt locks or pin reads to the primary.
  • FUSE requires kernel support. Fly’s machines ship with FUSE enabled, but local Docker testing usually doesn’t. LiteFS needs cap_add: SYS_ADMIN and /dev/fuse access.

Fix 1: Configure Consul Lease (Multi-Node)

Fly provides a free Consul cluster per app. Use it for LiteFS leases:

fly consul attach

This injects FLY_CONSUL_URL into your app’s env. LiteFS reads it automatically.

In litefs.yml:

fuse:
  dir: "/litefs"
  allow-other: true

data:
  dir: "/var/lib/litefs"

proxy:
  addr: ":8080"
  target: "localhost:8081"
  db: "my-app.db"

lease:
  type: "consul"
  candidate: ${FLY_REGION == PRIMARY_REGION}
  promote: true
  advertise-url: "http://${HOSTNAME}.vm.${FLY_APP_NAME}.internal:20202"

  consul:
    url: "${FLY_CONSUL_URL}"
    key: "litefs/my-app"

Three lease fields:

  • candidate — whether this node can become primary. Restrict to PRIMARY_REGION for predictable failover.
  • promote — auto-promote if no primary exists.
  • advertise-url — how replicas reach this node. Fly’s .vm.<app>.internal DNS works.

Set PRIMARY_REGION in fly.toml:

[env]
  PRIMARY_REGION = "nrt"

Pro Tip: Pick a primary region with the lowest write latency for your team. Reads happen everywhere; writes go through one region. Picking a region far from your users for writes adds latency.

Fix 2: Use Static Lease for Single-Node Dev

For local testing or single-machine deployments where you don’t need failover:

lease:
  type: "static"
  advertise-url: "http://localhost:20202"
  candidate: true
  hostname: "primary"

type: "static" skips Consul entirely. The single node is always primary.

This is the simplest setup for fly machines scale 1 deployments or local Docker. For prod with HA, switch to consul.

Common Mistake: Mixing static lease and multi-node deploys. Two nodes both running with type: "static" think they’re both primary and writes conflict.

Fix 3: Route Writes Through the Primary

A replica that receives a write request must forward it. LiteFS provides a built-in HTTP proxy:

proxy:
  addr: ":8080"
  target: "localhost:8081"
  db: "my-app.db"

This makes LiteFS listen on :8080 and forward to your app on :8081. The proxy automatically routes writes (PUT/POST/PATCH/DELETE) to the primary; reads stay local.

Your fly.toml should expose :8080 (LiteFS) to the public, with your app on the internal :8081:

[http_service]
  internal_port = 8080   # LiteFS proxy port
  force_https = true

# Your app binds to localhost:8081

For apps that don’t use HTTP (background workers, queues), you need a different routing mechanism:

# In your app:
import requests
from litefs import get_primary  # Pseudo — read from LiteFS API

def write_user(name):
    primary = get_primary()  # Returns this node if primary, else the primary's URL
    if primary == "localhost":
        # Local write
        conn.execute("INSERT INTO users (name) VALUES (?)", [name])
    else:
        # Forward to primary
        requests.post(f"http://{primary}/api/users", json={"name": name})

LiteFS exposes /primary over HTTP that returns the primary’s hostname.

Fix 4: Halt Locks for Synchronous Writes

For “read your own writes” semantics, use a halt lock — it pauses replication on the primary while you commit, then explicitly resumes:

import requests

# Acquire halt lock (HTTP):
requests.post(f"http://localhost:20202/api/v1/dbs/my-app.db/halt")

try:
    cur.execute("INSERT INTO users ...")
    conn.commit()
finally:
    # Release:
    requests.delete(f"http://localhost:20202/api/v1/dbs/my-app.db/halt")

With halt held, the write is committed to the WAL but not yet streamed to replicas. After release, replication resumes and replicas catch up.

For Go apps, the superfly/litefs-go library provides ergonomic helpers:

import "github.com/superfly/litefs-go"

err := litefs.WithHalt(ctx, dbPath, func() error {
    _, err := db.ExecContext(ctx, "INSERT INTO ...")
    return err
})

Note: Halt locks block writes from other connections during the hold. Don’t hold them across slow operations. Use only for the few cases where stale reads are unacceptable.

Fix 5: FUSE in Docker / Local Dev

For local LiteFS testing in Docker:

# docker-compose.yml
services:
  app:
    image: my-app
    cap_add:
      - SYS_ADMIN
    devices:
      - /dev/fuse
    security_opt:
      - apparmor:unconfined
    volumes:
      - litefs:/var/lib/litefs
    command: litefs mount

volumes:
  litefs:

Three Docker requirements:

  • cap_add: SYS_ADMIN — FUSE needs admin capability.
  • devices: /dev/fuse — the FUSE device.
  • apparmor:unconfined — AppArmor on Ubuntu hosts blocks FUSE by default.

On Fly.io, these are set automatically — you don’t configure them in fly.toml.

Common Mistake: Using restart: unless-stopped with litefs mount. If LiteFS crashes, Docker restarts it but the FUSE mount is stale. Use restart: on-failure with a max retry count.

Fix 6: LiteFS Cloud for Hosted Replication

LiteFS Cloud (Fly’s hosted service) offers:

  • Point-in-time backups
  • Cross-region replication without managing Consul
  • A managed primary

To use:

lease:
  type: "consul"
  # consul: { url: ..., key: ... }
  advertise-url: ...

# Or with LiteFS Cloud:
# Configure via fly secrets set LITEFS_CLOUD_TOKEN=...
fly litefs-cloud create
fly secrets set LITEFS_CLOUD_TOKEN=<token>

LiteFS Cloud manages backups and snapshots. Restore:

fly litefs-cloud snapshots
fly litefs-cloud restore --snapshot=<id>

Pro Tip: Even if you use Consul lease, enable LiteFS Cloud for backups. SQLite without backups is one disk failure from total loss.

Fix 7: Mount Path and App Configuration

LiteFS mounts at fuse.dir. Your app reads/writes through this path:

fuse:
  dir: "/litefs"
import sqlite3

conn = sqlite3.connect("/litefs/my-app.db")
# Writes go through LiteFS, replicate to other nodes.

Don’t open the underlying file directly (/var/lib/litefs/dbs/my-app.db/database). That’s LiteFS’s internal storage; writes bypass replication and corrupt the cluster.

For dynamic DB names, use a subdirectory pattern:

fuse:
  dir: "/litefs"
conn = sqlite3.connect(f"/litefs/tenant-{tenant_id}.db")

Each .db file gets its own replication channel. LiteFS handles them independently.

Note: LiteFS doesn’t currently support cross-DB transactions (an INSERT into users.db + audit.db in one transaction isn’t atomic across the two). For multi-tenant patterns, isolate concerns or accept eventual consistency between DBs.

Fix 8: Monitoring Replication Lag

Check primary status:

curl http://localhost:20202/api/v1/dbs/my-app.db

Returns JSON with current TXID, primary hostname, replica positions.

For Prometheus:

http:
  addr: ":20202"
  # Exposes /metrics

Key metrics:

  • litefs_db_position_replica vs litefs_db_position_primary — gap shows replication lag.
  • litefs_subscriber_count — number of replicas connected to this primary.
  • litefs_halt_lock_active — whether halt is held.

Set up alerts for position_replica - position_primary > N to catch stuck replicas.

Pro Tip: A replica that’s hours behind isn’t a slow replica — it’s a disconnected one. Check litefs_subscriber_count first; if it’s zero, the replica isn’t getting the WAL stream at all.

Still Not Working?

A few less-obvious failures:

  • failed to acquire lease. Consul connectivity broken. Verify fly consul attach ran. Check FLY_CONSUL_URL is set in the app’s env.
  • Primary stays in old region after a regional outage. Consul retains the lease for ttl seconds (default 10s) even after the holder dies. Either wait or manually expire via Consul KV.
  • Replication stops after vacuum. SQLite VACUUM rewrites the entire DB. LiteFS handles this but it’s expensive and can pause replication for the duration. Schedule vacuums during low-traffic windows.
  • PRAGMA journal_mode=DELETE ignored. LiteFS requires WAL mode. Default is WAL; trying to switch to DELETE or TRUNCATE fails silently.
  • Read replica writes succeed locally but never propagate. You wrote to the underlying file instead of through the FUSE mount. Fix the path in your app config.
  • Backups via sqlite3 .backup fail. Use the LiteFS-aware backup pattern: stop writes (halt lock), copy the file, release. Or use LiteFS Cloud snapshots.
  • SQLITE_BUSY under load. SQLite’s single-writer constraint plus LiteFS’s primary forwarding adds latency. For high-write workloads, consider Postgres instead.
  • App restarts but data is gone. The volume isn’t mounted. litefs.yml’s data.dir must be on a persistent volume ([[mounts]] in fly.toml).

For related Fly.io, SQLite, and distributed-data issues, see Fly deploy not working, SQLite database is locked, Electric SQL not working, and Postgres connection refused.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles