Fix: Go Goroutine Leak — Goroutines That Never Exit
Part of: Go, Rust & Systems Errors
Quick Answer
How to find and fix goroutine leaks in Go — detecting leaks with pprof and goleak, blocked channel patterns, context cancellation, and goroutine lifecycle management.
The Problem
A Go service’s memory and goroutine count grow indefinitely:
# pprof output — goroutine count keeps climbing
goroutine profile: total 14382
# After 1 hour of traffic, this number should stabilize — instead it growsOr in application logs, memory keeps increasing:
runtime.MemStats.NumGoroutine: 100 # On startup
runtime.MemStats.NumGoroutine: 1500 # After 10 minutes
runtime.MemStats.NumGoroutine: 8200 # After 1 hourOr a test catches a leak:
--- FAIL: TestHandleRequest (0.12s)
goroutine_leak_test.go:45: found unexpected goroutines:
[Goroutine 18 in state chan receive, with main.processItems on top of the stack]Or the service eventually OOM-crashes or becomes unresponsive after running for hours.
Why This Happens
A goroutine leak occurs when a goroutine is started but never exits. Unlike memory allocated with make or new, goroutines are not garbage collected when unreachable. They only exit when their function returns, when runtime.Goexit() is called, or when the program itself terminates. A single leaked goroutine costs a minimum of 2 KB of stack space, but a goroutine that is blocked on I/O or holding references to heap objects prevents those objects from being collected too. Over hours of traffic, leaked goroutines compound: 10 leaks per request at 100 req/s produces 3.6 million leaked goroutines in a single hour.
The most common causes fall into a few patterns. Blocked channel operations are the leading culprit. A goroutine waits on <-ch but no one ever sends to ch or closes it, so the goroutine blocks forever. The inverse is also common: a goroutine sends to an unbuffered channel, but the receiver has already exited. Missing context cancellation is the second most frequent cause. A goroutine running a long loop or waiting on I/O checks ctx.Done(), but the caller never cancels the context when it finishes, so the goroutine runs indefinitely.
Less obvious sources include goroutines started inside HTTP handlers that outlive the request, time.After called inside a loop (each iteration creates a new timer goroutine that persists until it fires), goroutines blocked on sync.Mutex that will never be unlocked, and goroutines spawned per event in a message consumer that block on downstream services. In all cases, the root issue is the same: the goroutine has no exit path.
Platform and Environment Differences
Goroutine leak behavior changes across operating systems, container runtimes, and CI environments in ways that make detection harder.
GOMAXPROCS and CPU detection. On bare-metal Linux and macOS, runtime.GOMAXPROCS defaults to the number of logical CPUs reported by the OS. Inside a Docker container with CPU limits (e.g., --cpus=2), Go versions before 1.19 still see all host CPUs and set GOMAXPROCS to 8 or 16, causing the scheduler to create more OS threads than the container can use. Go 1.19+ reads cgroup v2 limits on Linux and adjusts automatically, but cgroup v1 (still common on older Docker hosts and AWS ECS) is not auto-detected. The automaxprocs package from Uber fixes this. On macOS Docker Desktop, cgroup detection works because the Linux VM exposes cgroup v2. On Windows with WSL2-backed Docker, the behavior matches Linux cgroup v2.
Profiling tools per platform. net/http/pprof works everywhere, but the visual flamegraph experience differs. On macOS, go tool pprof -http=:8080 opens a browser-based UI that requires graphviz installed via Homebrew. On Linux, the same command works but the system clipboard integration and SVG rendering depend on the desktop environment. fgprof (a wall-clock profiler) reveals goroutines blocked on I/O that standard CPU profiling misses, but it adds overhead and should not run in production on resource-constrained containers. go tool trace captures goroutine scheduling events with nanosecond precision and is the best tool for diagnosing intermittent leaks, but trace files grow quickly and can exceed available memory in long-running CI jobs.
Docker CPU limit vs GOMAXPROCS mismatch. When a container has a 2-CPU limit but GOMAXPROCS is 8, the Go scheduler creates more runnable goroutines than the container can schedule, leading to high context-switch overhead that masks leak symptoms. The service appears slow rather than leaking. Monitor goroutine count separately from CPU usage inside containers. Use runtime.NumGoroutine() and export it as a Prometheus metric.
CI timeout hiding leaks. In CI pipelines (GitHub Actions, GitLab CI, Jenkins), test suites run with tight timeouts. A goroutine leak test using goleak.VerifyTestMain may pass if the leaked goroutine is still in its startup phase when the check runs. Increase the goleak poll interval or add explicit time.Sleep before verification in flaky CI environments. On GitHub Actions specifically, the default runner has 2 vCPUs, which changes goroutine scheduling compared to a developer’s 8-core laptop — leaks that manifest under concurrent load may not appear in CI at all.
Fix 1: Detect Leaks with pprof
The net/http/pprof package exposes goroutine stack traces over HTTP:
// main.go — add pprof endpoints
import (
_ "net/http/pprof" // Side-effect import registers handlers
"net/http"
)
func main() {
// pprof endpoints on a separate port (don't expose to public)
go func() {
http.ListenAndServe("localhost:6060", nil)
}()
// ... rest of your app
}# View all running goroutines
go tool pprof http://localhost:6060/debug/pprof/goroutine
# Interactive mode
(pprof) top10 # Top 10 goroutine creators
(pprof) list main. # Show goroutines with 'main.' in the stack
# Save and compare snapshots (detect growth)
curl http://localhost:6060/debug/pprof/goroutine > goroutines_before.pb
# ... run some requests ...
curl http://localhost:6060/debug/pprof/goroutine > goroutines_after.pb
go tool pprof -diff_base goroutines_before.pb goroutines_after.pb
# Quick text dump of all goroutines
curl http://localhost:6060/debug/pprof/goroutine?debug=2Monitor goroutine count in production:
import (
"runtime"
"time"
"log/slog"
)
func monitorGoroutines(interval time.Duration) {
ticker := time.NewTicker(interval)
defer ticker.Stop()
for range ticker.C {
count := runtime.NumGoroutine()
slog.Info("goroutine count", "count", count)
if count > 10000 {
slog.Warn("goroutine count exceeds threshold — possible leak", "count", count)
}
}
}Fix 2: Use goleak in Tests
The goleak package detects goroutine leaks in unit tests automatically:
go get go.uber.org/goleakpackage mypackage_test
import (
"testing"
"go.uber.org/goleak"
)
func TestMain(m *testing.M) {
// Verify no goroutines are leaked across all tests in the package
goleak.VerifyTestMain(m)
}
func TestHandleRequest(t *testing.T) {
defer goleak.VerifyNone(t) // Verify no leaks after this specific test
handler := NewRequestHandler()
handler.Handle(context.Background(), testRequest())
// goleak will fail the test if any goroutines spawned here are still running
}goleak checks goroutine state at the end of each test. If goroutines started during the test are still running, the test fails with a stack trace showing where the leaked goroutine was created.
Fix 3: Fix Blocked Channel Patterns
The most common leak — goroutines waiting on channels that will never receive a value:
// LEAKY — goroutine blocks on receive forever if processItem never sends to results
func processItems(items []Item) {
results := make(chan Result) // Unbuffered channel
for _, item := range items {
go func(item Item) {
result := process(item)
results <- result // If the receiver exits early, this goroutine blocks forever
}(item)
}
// If this returns early (error, timeout), goroutines above are stuck trying to send
for range items {
result := <-results
if err := handleResult(result); err != nil {
return // Returns here, but goroutines are still trying to send
}
}
}// FIXED — use a done channel or context to signal goroutines to exit
func processItems(ctx context.Context, items []Item) ([]Result, error) {
results := make(chan Result, len(items)) // Buffered — goroutines never block on send
for _, item := range items {
go func(item Item) {
select {
case <-ctx.Done():
return // Context cancelled — exit without sending
case results <- process(item):
// Sent successfully
}
}(item)
}
var collected []Result
for range items {
select {
case <-ctx.Done():
return nil, ctx.Err()
case result := <-results:
collected = append(collected, result)
}
}
return collected, nil
}Always close channels when done writing:
func producer(ch chan<- int) {
defer close(ch) // Closing unblocks all receivers waiting on <-ch
for i := 0; i < 10; i++ {
ch <- i
}
}
func consumer(ch <-chan int) {
for v := range ch { // range exits when ch is closed
fmt.Println(v)
}
// Goroutine exits cleanly after channel is closed
}Fix 4: Use Context for Goroutine Lifecycle
Pass context to all goroutines that do I/O or long-running work. Cancel the context when the caller is done:
// LEAKY — goroutine runs forever because it has no exit signal
func startWorker() {
go func() {
for {
msg := fetchMessage() // Blocks until a message arrives
process(msg)
// No way to stop this goroutine
}
}()
}
// FIXED — goroutine exits when context is cancelled
func startWorker(ctx context.Context) {
go func() {
for {
select {
case <-ctx.Done():
log.Println("Worker stopping:", ctx.Err())
return // Clean exit
default:
msg, err := fetchMessageWithContext(ctx)
if err != nil {
if ctx.Err() != nil {
return // Context cancelled during fetch — exit
}
log.Println("Fetch error:", err)
continue
}
process(msg)
}
}
}()
}
// Caller controls the goroutine's lifetime
func main() {
ctx, cancel := context.WithCancel(context.Background())
defer cancel() // Cancels the context (and stops the worker) when main exits
startWorker(ctx)
// ... rest of main
}For HTTP handlers — the request context is automatically cancelled when the client disconnects or the request times out:
func handleRequest(w http.ResponseWriter, r *http.Request) {
ctx := r.Context() // Cancelled when handler returns or client disconnects
// Pass ctx to goroutines — they'll stop when the request is done
go func() {
select {
case <-ctx.Done():
return // Client disconnected — stop background work
case result := <-doBackgroundWork(ctx):
log.Println("Background work done:", result)
}
}()
}Fix 5: Fix time.After Leaks in Loops
time.After creates a timer channel that’s garbage collected only after the timer fires — not when the surrounding function returns. In a loop, this creates a goroutine per iteration:
// LEAKY — creates a new timer (and goroutine) on every iteration
func processWithTimeout(items []Item) {
for _, item := range items {
select {
case result := <-process(item):
handle(result)
case <-time.After(5 * time.Second): // New timer goroutine each iteration
log.Println("Timeout")
}
}
}// FIXED — reuse a single timer
func processWithTimeout(items []Item) {
timer := time.NewTimer(5 * time.Second)
defer timer.Stop() // Cancel the timer when done
for _, item := range items {
timer.Reset(5 * time.Second) // Reset for each iteration
select {
case result := <-process(item):
if !timer.Stop() {
<-timer.C // Drain the channel if Stop() returns false
}
handle(result)
case <-timer.C:
log.Println("Timeout processing item")
}
}
}Common Mistake: Forgetting to drain
timer.Caftertimer.Stop(). IfStop()returnsfalse, the timer already fired and its channel has a value. The nextReset()won’t work correctly until the channel is drained.
Fix 6: Use sync.WaitGroup to Track and Wait for Goroutines
sync.WaitGroup ensures all goroutines finish before the parent function returns:
// LEAKY — goroutines continue after function returns
func processAll(items []Item) {
for _, item := range items {
go processItem(item) // Fire and forget — goroutines outlive the function
}
// Function returns immediately — goroutines are orphaned
}
// FIXED — wait for all goroutines to finish
func processAll(ctx context.Context, items []Item) error {
var wg sync.WaitGroup
errCh := make(chan error, len(items)) // Buffered — goroutines don't block on send
for _, item := range items {
wg.Add(1)
go func(item Item) {
defer wg.Done()
if err := processItem(ctx, item); err != nil {
errCh <- err
}
}(item)
}
// Wait for all goroutines to finish
wg.Wait()
close(errCh)
// Collect errors
var errs []error
for err := range errCh {
errs = append(errs, err)
}
if len(errs) > 0 {
return errors.Join(errs...)
}
return nil
}With errgroup for cleaner error handling:
import "golang.org/x/sync/errgroup"
func processAll(ctx context.Context, items []Item) error {
g, ctx := errgroup.WithContext(ctx)
for _, item := range items {
item := item // Capture loop variable (Go < 1.22)
g.Go(func() error {
return processItem(ctx, item)
})
}
return g.Wait() // Waits for all goroutines; returns first non-nil error
}errgroup.WithContext cancels the context when any goroutine returns an error, signalling all other goroutines to stop — preventing the leak when one goroutine fails.
Fix 7: Worker Pool Pattern to Bound Goroutine Count
Instead of spawning one goroutine per task (unbounded growth), use a fixed-size worker pool:
func processWithPool(ctx context.Context, items []Item, workerCount int) error {
jobs := make(chan Item, len(items))
results := make(chan error, len(items))
// Start fixed number of workers
var wg sync.WaitGroup
for i := 0; i < workerCount; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for item := range jobs { // Workers exit when jobs channel is closed
select {
case <-ctx.Done():
return
default:
results <- processItem(ctx, item)
}
}
}()
}
// Send all jobs
for _, item := range items {
jobs <- item
}
close(jobs) // Signal workers there are no more jobs
// Wait for workers to finish, then close results
go func() {
wg.Wait()
close(results)
}()
// Collect results
var errs []error
for err := range results {
if err != nil {
errs = append(errs, err)
}
}
if len(errs) > 0 {
return errors.Join(errs...)
}
return nil
}
// Usage
err := processWithPool(ctx, items, runtime.NumCPU())Still Not Working?
Check for goroutines blocked on mutex — a goroutine waiting on a locked sync.Mutex is harder to spot than a blocked channel. Use pprof’s mutex profile:
curl http://localhost:6060/debug/pprof/mutex?debug=1Check for goroutines in syscall state — goroutines making blocking system calls (DNS resolution, file I/O without context) can block indefinitely:
curl http://localhost:6060/debug/pprof/goroutine?debug=2 | grep -A 5 "syscall"Use context-aware versions of blocking operations: net.DefaultResolver.LookupHost(ctx, ...) instead of net.LookupHost(...).
Long-lived HTTP connections — http.Client connections stay open in the pool. If the pool grows unboundedly, set transport limits:
transport := &http.Transport{
MaxIdleConns: 100,
MaxIdleConnsPerHost: 10,
IdleConnTimeout: 90 * time.Second,
}
client := &http.Client{Transport: transport}Goroutines blocked on DNS resolution — on Linux, pure-Go DNS resolution (GODEBUG=netdns=go) uses goroutines for lookups. If /etc/resolv.conf points to a slow or unreachable DNS server, lookup goroutines accumulate. Switch to cgo resolver (GODEBUG=netdns=cgo) or fix the DNS server. On macOS, the cgo resolver is the default because the system DNS resolution path requires it.
Leaked goroutines inside third-party libraries — libraries that start background goroutines (gRPC health checkers, database connection pool managers, Kafka consumers) may leak if you do not call their Close() or Stop() method. Use defer client.Close() immediately after creation and verify with goleak in integration tests.
For related Go issues, see Fix: Go Context Deadline Exceeded, Fix: Go Channel Deadlock, Fix: Go Nil Pointer Dereference, and Fix: Go Map Concurrent Access.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: Go context deadline exceeded / context canceled
How to fix Go context.DeadlineExceeded and context.Canceled errors — setting timeouts correctly, propagating context through call chains, handling cancellation, and debugging which operation timed out.
Fix: Go Test Not Working — Tests Not Running, Failing Unexpectedly, or Coverage Not Collected
How to fix Go testing issues — test function naming, table-driven tests, t.Run subtests, httptest, testify assertions, and common go test flag errors.
Fix: Spring Boot @Cacheable Not Working — Cache Miss Every Time or Stale Data
How to fix Spring Boot @Cacheable issues — @EnableCaching missing, self-invocation bypass, key generation, TTL configuration, cache eviction, and Caffeine vs Redis setup.
Fix: Go Generics Type Constraint Error — Does Not Implement or Cannot Use as Type
How to fix Go generics errors — type constraints, interface vs constraint, comparable, union types, type inference failures, and common generic function pitfalls.