How do I fix "Go Deadlock — all goroutines are asleep, deadlock!"?

How to fix Go channel deadlocks — unbuffered vs buffered channels, missing goroutines, select statements, closing channels, sync primitives, and detecting deadlocks with go race detector.

Fix: Go Deadlock — all goroutines are asleep, deadlock!

The Problem

A Go program crashes with the deadlock error:

fatal error: all goroutines are asleep - deadlock!

goroutine 1 [chan receive]:
main.main()
        /app/main.go:12 +0x28
exit status 2

Or a program hangs indefinitely without output:

func process(data []int) []int {
    ch := make(chan int)
    var wg sync.WaitGroup

    for _, v := range data {
        wg.Add(1)
        go func(n int) {
            ch <- n * 2   // Sends to channel
            wg.Done()
        }(v)
    }

    wg.Wait()       // All goroutines done — but who reads from ch?
    close(ch)       // Too late — goroutines are blocked on ch <- (send)

    var results []int
    for v := range ch { // Never reached — deadlocked above
        results = append(results, v)
    }
    return results
}

Or a goroutine blocks forever waiting for a channel that’s never written to:

result := <-ch   // Blocks forever if nothing sends to ch

Why This Happens

A deadlock occurs when every goroutine in the program is blocked, each waiting on a channel or mutex operation that no other goroutine can satisfy. The Go runtime scans for this condition and panics rather than hanging silently, which is the all goroutines are asleep message you see in the stack trace. The detection is only triggered when the entire program is stuck — partial deadlocks where one goroutine still runs (a background timer, a network listener) escape the runtime check and surface later as goroutine leaks and memory growth.

The first conceptual hurdle is the difference between buffered and unbuffered channels. An unbuffered channel is a rendezvous point: the sender blocks until a receiver arrives, and vice versa. A buffered channel of size N lets the sender succeed without a receiver up to N times, then blocks. Most deadlocks come from assuming a channel will buffer when it won’t, or from setting up senders before any receiver exists. The second hurdle is ownership: every channel needs a clear sender side that closes it. If both ends try to close, you get a panic on the second close. If neither end closes, for range loops never terminate.

Common causes:

Sending to an unbuffered channel with no receiver — unbuffered channels block the sender until a receiver is ready. If no goroutine is reading, the sender blocks forever.
Reading from an empty channel that will never receive data — if the only writer closes without sending, the reader blocks.
wg.Wait() before starting the reader — wg.Wait() blocks until all goroutines finish. If goroutines block on sending to a channel that nobody reads, wg.Wait() never returns.
Circular channel dependency — goroutine A waits for goroutine B to send, goroutine B waits for goroutine A to send.
Not closing a channel being ranged — for v := range ch blocks after the last item until the channel is closed.
sync.Mutex locked twice — calling Lock() when you already hold the lock deadlocks (use RWMutex for read-sharing or restructure locking).

In Production: Incident Lens

Channel deadlocks behave differently in production than in tests. In tests, the runtime catches the all-goroutines-asleep state and panics — you see the failure immediately. In production, the deadlocked goroutines are a tiny fraction of the goroutines in the process. Health checks pass. HTTP handlers serve requests. The runtime never trips its detector because thousands of other goroutines are still running. Instead, leaked goroutines accumulate, memory grows, and the service eventually OOMs or hits the scheduler limit.

Blast radius. The blast radius is “service memory grows until restart.” Each deadlocked goroutine holds its stack (8KB minimum) plus any captured variables. A handler that leaks one goroutine per request reaches gigabytes of leaked memory before the OOM killer fires. If the service is behind a load balancer with health checks that succeed on /health, the LB keeps sending traffic to the leaking instance until the kernel terminates the process.

Monitoring signal. Track go_goroutines from the Prometheus client library as a primary SLI. Healthy services hold a relatively stable goroutine count proportional to in-flight work. A monotonically rising goroutine count is a leak — almost always a deadlocked channel send or receive. Alert on the derivative: if goroutine count grows by more than X per minute for Y consecutive minutes, page on-call. Pair this with process_resident_memory_bytes — when goroutine count and RSS both climb in lockstep, you have a channel leak rather than a slow cache fill.

Recovery sequence. When goroutine count is climbing, send SIGQUIT to the process (or hit /debug/pprof/goroutine?debug=2) to dump every goroutine’s stack. Group by the line they’re blocked on — the offending channel operation is whichever line appears thousands of times. Restart the instance to free the memory. Then patch the leak. Without the stack dump, you have no way to localize which channel is stuck.

Postmortem preventive. Add goleak.VerifyTestMain(m) to every test package — it catches goroutines that outlive a test, which is the most reliable way to spot the bug at PR time. For services that handle long-lived connections, run periodic runtime.NumGoroutine() checks in load tests and fail the build if the count drifts upward.

Fix 1: Match Senders and Receivers

Every channel send needs a corresponding receive, either concurrent or buffered:

// DEADLOCK — unbuffered channel, send blocks, no concurrent receiver
func bad() {
    ch := make(chan int)
    ch <- 42         // Blocks — no one is receiving
    fmt.Println(<-ch)
}

// FIX 1 — use a buffered channel (send doesn't block if buffer has space)
func fix1() {
    ch := make(chan int, 1)  // Buffer of 1
    ch <- 42                 // Doesn't block — buffered
    fmt.Println(<-ch)        // Reads from buffer
}

// FIX 2 — send from a goroutine (concurrent send + receive)
func fix2() {
    ch := make(chan int)
    go func() {
        ch <- 42   // Goroutine blocks here until main receives
    }()
    fmt.Println(<-ch)   // Unblocks the goroutine
}

The fundamental rule: every unbuffered channel send must have a ready receiver.

Fix 2: Fix the Fan-Out / Collect Pattern

Collecting results from multiple goroutines is a common deadlock source:

// DEADLOCK — wg.Wait() blocks before reading from ch
// Goroutines are blocked on ch <- (no receiver), wg.Wait() never returns
func collectBad(data []int) []int {
    ch := make(chan int)
    var wg sync.WaitGroup

    for _, v := range data {
        wg.Add(1)
        go func(n int) {
            defer wg.Done()
            ch <- n * 2    // BLOCKS — no one reading ch yet
        }(v)
    }

    wg.Wait()   // Never reached — goroutines stuck on send
    close(ch)

    var results []int
    for v := range ch {
        results = append(results, v)
    }
    return results
}

// FIX — close the channel AFTER wg.Wait() using a separate goroutine
func collectGood(data []int) []int {
    ch := make(chan int, len(data))  // Buffered — senders don't block
    var wg sync.WaitGroup

    for _, v := range data {
        wg.Add(1)
        go func(n int) {
            defer wg.Done()
            ch <- n * 2    // Buffered — doesn't block
        }(v)
    }

    // Wait for all sends, then close channel to signal collector
    go func() {
        wg.Wait()
        close(ch)   // Close after all sends complete
    }()

    var results []int
    for v := range ch {   // Reads until channel is closed
        results = append(results, v)
    }
    return results
}

Alternative — collect without a channel:

func collectWithMutex(data []int) []int {
    var mu sync.Mutex
    var results []int
    var wg sync.WaitGroup

    for _, v := range data {
        wg.Add(1)
        go func(n int) {
            defer wg.Done()
            result := n * 2
            mu.Lock()
            results = append(results, result)
            mu.Unlock()
        }(v)
    }

    wg.Wait()
    return results
}

Fix 3: Use select with a Default or Done Channel

select prevents blocking on a single channel operation:

// BLOCKS FOREVER if ch has no data and ctx is never cancelled
func bad(ch <-chan int, ctx context.Context) {
    value := <-ch   // Blocks indefinitely
}

// CORRECT — use select to handle multiple cases
func good(ch <-chan int, ctx context.Context) (int, bool) {
    select {
    case value := <-ch:
        return value, true
    case <-ctx.Done():
        return 0, false   // Context cancelled — stop waiting
    case <-time.After(5 * time.Second):
        return 0, false   // Timeout
    }
}

// Non-blocking send/receive with default
func nonBlockingSend(ch chan<- int, value int) bool {
    select {
    case ch <- value:
        return true    // Sent successfully
    default:
        return false   // Channel full or no receiver — skip
    }
}

func nonBlockingReceive(ch <-chan int) (int, bool) {
    select {
    case v := <-ch:
        return v, true
    default:
        return 0, false   // No data available
    }
}

Fix 4: Always Close Channels from the Sender

Channels should be closed by the sender (writer), not the receiver:

// DEADLOCK — channel never closed, range blocks forever
func producerBad() <-chan int {
    ch := make(chan int)
    go func() {
        for i := 0; i < 5; i++ {
            ch <- i
        }
        // MISSING: close(ch)
    }()
    return ch
}

func consumerBad() {
    ch := producerBad()
    for v := range ch {   // Blocks after 5 items — channel never closed
        fmt.Println(v)
    }
}

// CORRECT — sender closes the channel when done
func producerGood() <-chan int {
    ch := make(chan int)
    go func() {
        defer close(ch)   // Always close when done sending
        for i := 0; i < 5; i++ {
            ch <- i
        }
    }()
    return ch
}

// Reading from a closed channel returns zero value and false
v, ok := <-ch
if !ok {
    // Channel is closed
}

Don’t close a channel from the receiver — panics if sender tries to send after close:

// PANIC — sending to a closed channel panics
ch := make(chan int, 10)
close(ch)
ch <- 1   // panic: send on closed channel

Fix 5: Fix Mutex Deadlocks

sync.Mutex deadlocks when the same goroutine tries to lock it twice:

var mu sync.Mutex

// DEADLOCK — Lock() called twice in same goroutine
func bad() {
    mu.Lock()
    defer mu.Unlock()
    anotherFunc()   // Calls mu.Lock() — deadlock
}

func anotherFunc() {
    mu.Lock()         // Deadlock — already locked by bad()
    defer mu.Unlock()
    // ...
}

// FIX 1 — don't hold lock when calling functions that also lock
func good() {
    mu.Lock()
    localCopy := sharedData   // Copy data while locked
    mu.Unlock()               // Release before calling other functions
    anotherFunc(localCopy)    // No lock held — anotherFunc can acquire it
}

// FIX 2 — restructure so mutex is only locked at one level
func goodAlternative() {
    data := getDataWithoutLock()   // No lock
    mu.Lock()
    defer mu.Unlock()
    sharedData = processData(data)   // Only hold lock for the write
}

Detect lock order issues (AB-BA deadlock):

// POTENTIAL DEADLOCK — goroutine 1 locks A then B
//                      goroutine 2 locks B then A
var mutexA, mutexB sync.Mutex

// Goroutine 1
mutexA.Lock()
mutexB.Lock()   // Waits for goroutine 2 to release B
mutexB.Unlock()
mutexA.Unlock()

// Goroutine 2 (concurrent)
mutexB.Lock()
mutexA.Lock()   // Waits for goroutine 1 to release A → DEADLOCK
mutexA.Unlock()
mutexB.Unlock()

// FIX — always acquire locks in the same order
// Both goroutines: lock A first, then B

Fix 6: Detect Deadlocks with the Race Detector

Run your program or tests with the race detector — it catches data races that often lead to deadlocks:

# Run with race detector
go run -race main.go
go test -race ./...

# Build with race detector (for staging/canary deployments)
go build -race -o myapp

For channel-specific deadlock debugging, add timeouts:

// Instead of blocking forever, add a timeout to identify the stuck operation
func withTimeout(fn func() error) error {
    done := make(chan error, 1)
    go func() {
        done <- fn()
    }()

    select {
    case err := <-done:
        return err
    case <-time.After(10 * time.Second):
        // Dump goroutine stack traces to identify the deadlock
        buf := make([]byte, 1<<20)
        n := runtime.Stack(buf, true)
        fmt.Printf("TIMEOUT — goroutine stacks:\n%s\n", buf[:n])
        return errors.New("operation timed out")
    }
}

Print goroutine stacks on SIGQUIT:

# Send SIGQUIT to a running Go program to dump all goroutine stacks
kill -SIGQUIT <pid>

# Or in tests
go test -v -timeout 30s ./...
# -timeout causes go test to panic with stack dump after 30s

Fix 7: Channel Direction in Function Signatures

Using typed channel directions prevents accidental misuse:

// Bidirectional — any goroutine can send or receive
ch := make(chan int)

// Send-only — function can only send, not receive
func producer(ch chan<- int) {
    ch <- 42
    // <-ch  // Compile error — can't receive on send-only channel
}

// Receive-only — function can only receive, not send
func consumer(ch <-chan int) {
    v := <-ch
    // ch <- 42  // Compile error — can't send on receive-only channel
}

// Pattern — pipeline
func generateNumbers(count int) <-chan int {
    ch := make(chan int)
    go func() {
        defer close(ch)
        for i := 0; i < count; i++ {
            ch <- i
        }
    }()
    return ch  // Returns receive-only channel — caller can't accidentally send
}

func doubleValues(in <-chan int) <-chan int {
    out := make(chan int)
    go func() {
        defer close(out)
        for v := range in {
            out <- v * 2
        }
    }()
    return out
}

// Usage
numbers := generateNumbers(10)
doubled := doubleValues(numbers)
for v := range doubled {
    fmt.Println(v)
}

Still Not Working?

Deadlock outside main — Go’s deadlock detector only fires if ALL goroutines are blocked. If one goroutine is still running (e.g., a background timer), Go won’t detect the deadlock. Use goleak in tests to detect goroutine leaks:

go get go.uber.org/goleak

func TestMain(m *testing.M) {
    goleak.VerifyTestMain(m)   // Fails if any goroutines leak after tests
}

select with nil channels — a receive or send on a nil channel blocks forever. In select, a nil channel case is simply never selected (useful for disabling a case conditionally):

var ch chan int   // nil channel
select {
case v := <-ch:  // This case is never selected — nil channel blocks forever
    fmt.Println(v)
case <-time.After(1 * time.Second):
    fmt.Println("timeout")
}
// Prints "timeout" — nil channel in select is effectively disabled

Deadlock in test code — Go tests run with a timeout (default 10 minutes). If a test deadlocks, it eventually times out with a goroutine dump. Use go test -timeout 10s to get faster feedback during debugging.

Channel send inside a defer that never runs — if a goroutine panics before reaching a defer ch <- result (or before a defer close(ch)), the receiver waits forever. Deferred sends only execute if the function returns normally or via a recovered panic. Send before risky work, or use recover() to ensure cleanup runs even on panic.

Goroutine blocked on a channel that escaped scope — if a goroutine captures a channel variable that’s no longer referenced anywhere else, the channel can’t be closed and the goroutine blocks indefinitely. Pass channels explicitly into goroutines and ensure at least one caller still holds the sender side.

select with a time.After leak — time.After(d) allocates a timer that survives until d elapses, even if the select already returned via another case. In tight loops with short durations, these accumulate and look like a goroutine leak in pprof. Use time.NewTimer with explicit timer.Stop() instead.