Fix: LiteFS Not Working — Consul Lease, Primary Election, Halt Locks, and Replica Reads
Quick Answer
How to fix LiteFS errors — primary not elected, Consul lease setup, static lease single-node mode, halt locks for cross-node writes, replica seeing stale data, mount path mismatch, and LiteFS Cloud sync.
The Error
You deploy LiteFS to Fly.io with multiple machines and writes fail:
SQLITE_READONLY_DBMOVED: attempt to write a readonly databaseOr the primary never elects:
$ fly logs
[litefs] waiting for consul connection
[litefs] error: consul: dial tcp: connection refusedOr a replica reads stale data immediately after a write:
# On the primary:
cur.execute("INSERT INTO users (name) VALUES (?)", ["Alice"])
conn.commit()
# On a replica seconds later:
cur.execute("SELECT name FROM users WHERE id = LAST_INSERT_ROWID()")
# Returns no rows.Or the FUSE mount fails to start:
[litefs] error: cannot mount: fuse: device not foundWhy This Happens
LiteFS is a FUSE filesystem that intercepts SQLite writes and replicates them across nodes. Most issues map to one of:
- No primary election. LiteFS needs a coordination mechanism to pick one writer. Two backends: Consul (Fly’s built-in) or static lease (one-node mode for testing). Without one, no writes happen.
- Replicas are read-only. A replica that receives a write request must forward it to the primary, or your app must route writes itself. The
SQLITE_READONLY_DBMOVEDerror means a replica tried to commit a write directly. - Replication is asynchronous by default. A replica may be milliseconds (or seconds) behind. For “read your own writes,” use halt locks or pin reads to the primary.
- FUSE requires kernel support. Fly’s machines ship with FUSE enabled, but local Docker testing usually doesn’t. LiteFS needs
cap_add: SYS_ADMINand/dev/fuseaccess.
Fix 1: Configure Consul Lease (Multi-Node)
Fly provides a free Consul cluster per app. Use it for LiteFS leases:
fly consul attachThis injects FLY_CONSUL_URL into your app’s env. LiteFS reads it automatically.
In litefs.yml:
fuse:
dir: "/litefs"
allow-other: true
data:
dir: "/var/lib/litefs"
proxy:
addr: ":8080"
target: "localhost:8081"
db: "my-app.db"
lease:
type: "consul"
candidate: ${FLY_REGION == PRIMARY_REGION}
promote: true
advertise-url: "http://${HOSTNAME}.vm.${FLY_APP_NAME}.internal:20202"
consul:
url: "${FLY_CONSUL_URL}"
key: "litefs/my-app"Three lease fields:
candidate— whether this node can become primary. Restrict toPRIMARY_REGIONfor predictable failover.promote— auto-promote if no primary exists.advertise-url— how replicas reach this node. Fly’s.vm.<app>.internalDNS works.
Set PRIMARY_REGION in fly.toml:
[env]
PRIMARY_REGION = "nrt"Pro Tip: Pick a primary region with the lowest write latency for your team. Reads happen everywhere; writes go through one region. Picking a region far from your users for writes adds latency.
Fix 2: Use Static Lease for Single-Node Dev
For local testing or single-machine deployments where you don’t need failover:
lease:
type: "static"
advertise-url: "http://localhost:20202"
candidate: true
hostname: "primary"type: "static" skips Consul entirely. The single node is always primary.
This is the simplest setup for fly machines scale 1 deployments or local Docker. For prod with HA, switch to consul.
Common Mistake: Mixing static lease and multi-node deploys. Two nodes both running with type: "static" think they’re both primary and writes conflict.
Fix 3: Route Writes Through the Primary
A replica that receives a write request must forward it. LiteFS provides a built-in HTTP proxy:
proxy:
addr: ":8080"
target: "localhost:8081"
db: "my-app.db"This makes LiteFS listen on :8080 and forward to your app on :8081. The proxy automatically routes writes (PUT/POST/PATCH/DELETE) to the primary; reads stay local.
Your fly.toml should expose :8080 (LiteFS) to the public, with your app on the internal :8081:
[http_service]
internal_port = 8080 # LiteFS proxy port
force_https = true
# Your app binds to localhost:8081For apps that don’t use HTTP (background workers, queues), you need a different routing mechanism:
# In your app:
import requests
from litefs import get_primary # Pseudo — read from LiteFS API
def write_user(name):
primary = get_primary() # Returns this node if primary, else the primary's URL
if primary == "localhost":
# Local write
conn.execute("INSERT INTO users (name) VALUES (?)", [name])
else:
# Forward to primary
requests.post(f"http://{primary}/api/users", json={"name": name})LiteFS exposes /primary over HTTP that returns the primary’s hostname.
Fix 4: Halt Locks for Synchronous Writes
For “read your own writes” semantics, use a halt lock — it pauses replication on the primary while you commit, then explicitly resumes:
import requests
# Acquire halt lock (HTTP):
requests.post(f"http://localhost:20202/api/v1/dbs/my-app.db/halt")
try:
cur.execute("INSERT INTO users ...")
conn.commit()
finally:
# Release:
requests.delete(f"http://localhost:20202/api/v1/dbs/my-app.db/halt")With halt held, the write is committed to the WAL but not yet streamed to replicas. After release, replication resumes and replicas catch up.
For Go apps, the superfly/litefs-go library provides ergonomic helpers:
import "github.com/superfly/litefs-go"
err := litefs.WithHalt(ctx, dbPath, func() error {
_, err := db.ExecContext(ctx, "INSERT INTO ...")
return err
})Note: Halt locks block writes from other connections during the hold. Don’t hold them across slow operations. Use only for the few cases where stale reads are unacceptable.
Fix 5: FUSE in Docker / Local Dev
For local LiteFS testing in Docker:
# docker-compose.yml
services:
app:
image: my-app
cap_add:
- SYS_ADMIN
devices:
- /dev/fuse
security_opt:
- apparmor:unconfined
volumes:
- litefs:/var/lib/litefs
command: litefs mount
volumes:
litefs:Three Docker requirements:
cap_add: SYS_ADMIN— FUSE needs admin capability.devices: /dev/fuse— the FUSE device.apparmor:unconfined— AppArmor on Ubuntu hosts blocks FUSE by default.
On Fly.io, these are set automatically — you don’t configure them in fly.toml.
Common Mistake: Using restart: unless-stopped with litefs mount. If LiteFS crashes, Docker restarts it but the FUSE mount is stale. Use restart: on-failure with a max retry count.
Fix 6: LiteFS Cloud for Hosted Replication
LiteFS Cloud (Fly’s hosted service) offers:
- Point-in-time backups
- Cross-region replication without managing Consul
- A managed primary
To use:
lease:
type: "consul"
# consul: { url: ..., key: ... }
advertise-url: ...
# Or with LiteFS Cloud:
# Configure via fly secrets set LITEFS_CLOUD_TOKEN=...fly litefs-cloud create
fly secrets set LITEFS_CLOUD_TOKEN=<token>LiteFS Cloud manages backups and snapshots. Restore:
fly litefs-cloud snapshots
fly litefs-cloud restore --snapshot=<id>Pro Tip: Even if you use Consul lease, enable LiteFS Cloud for backups. SQLite without backups is one disk failure from total loss.
Fix 7: Mount Path and App Configuration
LiteFS mounts at fuse.dir. Your app reads/writes through this path:
fuse:
dir: "/litefs"import sqlite3
conn = sqlite3.connect("/litefs/my-app.db")
# Writes go through LiteFS, replicate to other nodes.Don’t open the underlying file directly (/var/lib/litefs/dbs/my-app.db/database). That’s LiteFS’s internal storage; writes bypass replication and corrupt the cluster.
For dynamic DB names, use a subdirectory pattern:
fuse:
dir: "/litefs"conn = sqlite3.connect(f"/litefs/tenant-{tenant_id}.db")Each .db file gets its own replication channel. LiteFS handles them independently.
Note: LiteFS doesn’t currently support cross-DB transactions (an INSERT into users.db + audit.db in one transaction isn’t atomic across the two). For multi-tenant patterns, isolate concerns or accept eventual consistency between DBs.
Fix 8: Monitoring Replication Lag
Check primary status:
curl http://localhost:20202/api/v1/dbs/my-app.dbReturns JSON with current TXID, primary hostname, replica positions.
For Prometheus:
http:
addr: ":20202"
# Exposes /metricsKey metrics:
litefs_db_position_replicavslitefs_db_position_primary— gap shows replication lag.litefs_subscriber_count— number of replicas connected to this primary.litefs_halt_lock_active— whether halt is held.
Set up alerts for position_replica - position_primary > N to catch stuck replicas.
Pro Tip: A replica that’s hours behind isn’t a slow replica — it’s a disconnected one. Check litefs_subscriber_count first; if it’s zero, the replica isn’t getting the WAL stream at all.
Still Not Working?
A few less-obvious failures:
failed to acquire lease. Consul connectivity broken. Verifyfly consul attachran. CheckFLY_CONSUL_URLis set in the app’s env.- Primary stays in old region after a regional outage. Consul retains the lease for
ttlseconds (default 10s) even after the holder dies. Either wait or manually expire via Consul KV. - Replication stops after
vacuum. SQLiteVACUUMrewrites the entire DB. LiteFS handles this but it’s expensive and can pause replication for the duration. Schedule vacuums during low-traffic windows. PRAGMA journal_mode=DELETEignored. LiteFS requires WAL mode. Default is WAL; trying to switch to DELETE or TRUNCATE fails silently.- Read replica writes succeed locally but never propagate. You wrote to the underlying file instead of through the FUSE mount. Fix the path in your app config.
- Backups via
sqlite3 .backupfail. Use the LiteFS-aware backup pattern: stop writes (halt lock), copy the file, release. Or use LiteFS Cloud snapshots. SQLITE_BUSYunder load. SQLite’s single-writer constraint plus LiteFS’s primary forwarding adds latency. For high-write workloads, consider Postgres instead.- App restarts but data is gone. The volume isn’t mounted.
litefs.yml’sdata.dirmust be on a persistent volume ([[mounts]]infly.toml).
For related Fly.io, SQLite, and distributed-data issues, see Fly deploy not working, SQLite database is locked, Electric SQL not working, and Postgres connection refused.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: Fly.io Deploy Not Working — fly.toml, Machines, Volumes, Secrets, and Internal DNS
How to fix Fly.io errors — fly.toml app vs name confusion, machines API vs legacy apps, Dockerfile build failures, volume per-region, secrets staging, fly proxy for local access, and internal IPv6 routing.
Fix: aiosqlite Not Working — Single Writer, WAL Mode, Row Factory, and Connection Patterns
How to fix Python aiosqlite errors — database is locked, WAL mode for concurrent reads, foreign_keys PRAGMA, row factory for dict-like rows, connection per request vs pool, datetime detect_types, and FastAPI integration.
Fix: Cloudflare D1 Not Working — Binding Errors, Local vs Remote, Migrations, and Foreign Keys
How to fix Cloudflare D1 errors — D1_ERROR no such table, binding undefined, --local vs --remote drift, migrations not applied, prepared statement bind index, foreign keys not enforced, and concurrent writes.
Fix: Peewee Not Working — Connection Pooling, Field Errors, and Migration Setup
How to fix Peewee errors — OperationalError database is locked, connection already open, field type mismatch, model meta database missing, N+1 queries, and peewee-migrate setup.