Fix: GitHub Actions Job Timeout — Workflow Cancelled or Stuck After 6 Hours
Part of: Docker, DevOps & Infrastructure
Quick Answer
How to fix GitHub Actions timeout issues — job-level and step-level timeouts, stuck processes, self-hosted runner timeouts, debugging hanging jobs, and timeout best practices.
The Problem
A GitHub Actions workflow is cancelled with a timeout error:
Error: The operation was canceled.
The job running on runner GitHub Actions X has exceeded the maximum execution time of 360 minutes.Or a specific step hangs indefinitely without producing output:
Run npm test
npm test
shell: /usr/bin/bash -e {0}
... (no output for 45 minutes, then cancelled)Or a self-hosted runner job times out much sooner than expected:
Error: The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.Or a workflow that previously finished in 5 minutes now takes hours.
Why This Happens
GitHub Actions has hard limits and several common reasons for jobs getting stuck. The platform’s defaults are generous — generous enough that a stuck job will eat through a paid plan’s monthly minute budget before anyone notices.
The headline limit is the 360-minute (6-hour) job timeout on hosted runners. Without an explicit timeout-minutes, a hung job runs until that ceiling is reached, which is rarely what you want. Step timeouts are inherited from the job and there is no way to set a higher per-step timeout than the surrounding job, so you should always set the job timeout to a value that bounds your worst-case successful run plus a small margin.
The second common cause is interactive prompts in tools that were never designed for CI. apt-get install without -y, npm init, pip install with conflicting versions, aws configure, gh auth login, git push against an HTTPS remote with no stored credentials — every one of these will block on stdin in a non-TTY environment. The process will not crash, will not time out on its own, and will produce no output. GitHub’s runner only notices when the job-level timeout fires.
The third is test runners that hold the event loop open. Jest, Mocha, Vitest, and pytest with --asyncio-mode=auto all default to waiting for the event loop to drain rather than calling process.exit(). An unclosed database pool, a forgotten setInterval, a websocket subscription, an OpenTelemetry exporter, or a worker thread that never resolves will keep the process alive indefinitely. Tests appear to “pass” — you see the green output — and then the runner sits idle until the timeout fires.
The fourth is deadlock between dependent jobs. A needs: chain that loops back on itself is rejected at parse time, but more subtle cases — job A waiting for an external resource that job B is supposed to create, but job B was filtered out by an if: condition — pass validation and then hang forever on the wait.
Finally, self-hosted runners introduce their own failure modes. The runner process can crash, the underlying VM can be evicted, the network can partition. GitHub’s queue re-dispatches the job to the next available runner, and the timeout clock typically restarts, so an unstable fleet can cycle through several runners before any single job actually completes or fails.
Version History: Timeout Controls in GitHub Actions
The way you express timeouts in GitHub Actions has evolved across the platform’s lifetime, and the defaults you’ll see in legacy workflows differ from what’s documented today.
- General availability (November 2019) — when GitHub Actions launched, the only timeout knob was the job-level
timeout-minutesfield. The default ceiling for hosted runners was already 360 minutes, with step-level timeouts inheriting that bound. Self-hosted runners had no built-in upper limit beyond what the runner process enforced. - 2020 — step-level
timeout-minutes— GitHub added per-step timeouts so a single slow step (database seed, integration suite) could be bounded without lowering the job ceiling. This was the first knob that let you “fail fast on a stuck step, keep going on the rest of the job.” - April 2021 —
jobs.<job_id>.concurrencyandcancel-in-progress— the concurrency block landed, allowing you to declare a logical group and cancel previous in-flight runs when a new one starts. This is the cleanest way to avoid stacking timeouts: a PR push cancels the previous PR’s stuck run instead of letting both burn six hours. - September 2021 — workflow-level
concurrency— the same primitive was promoted to the top of the workflow file, so you don’t need to repeat it on every job. - 2022 — reusable workflows (
workflow_call) — when a workflow is called from another workflow, each side has its owntimeout-minutes. The caller’s timeout does not cascade to the callee. A 5-minute caller timeout will not stop a callee that hastimeout-minutes: 60set internally; the callee keeps running until its own clock fires, then the caller picks up the result. - 2023 — runner re-queue policy clarification — GitHub documented that when a self-hosted runner crashes mid-job, the queue retries on the next available runner, and the timeout clock restarts from zero on the new attempt. Combined with an unstable fleet, this can produce wall-clock durations that exceed the configured timeout by a factor of 2-3x.
- 2024 — larger runners and ARM64 hosted runners — none of the timeout defaults changed, but larger machines made it more obvious when a workflow was hanging on I/O versus genuinely slow. ARM64 runners also exposed compatibility issues that surface as silent hangs (missing arm64 binaries, falling back to QEMU emulation).
- 2024-2025 —
step_summaryand step-levelcontinue-on-errorhandling — failures from steps that were cancelled by timeout now appear distinctly in the run summary, making it easier to tell “step ran to completion and failed” from “step was killed by the timeout.”
The 360-minute default has been stable since launch and is not configurable upward on hosted runners. If you genuinely need longer than six hours, the only option is self-hosted runners.
Fix 1: Set Explicit Timeouts
Set timeout-minutes at the job or step level to fail fast instead of hanging:
jobs:
build:
runs-on: ubuntu-latest
timeout-minutes: 15 # Job-level: cancel if not done in 15 minutes
steps:
- uses: actions/checkout@v4
- name: Install dependencies
timeout-minutes: 5 # Step-level: fail this step after 5 minutes
run: npm ci
- name: Run tests
timeout-minutes: 10
run: npm test
- name: Deploy
timeout-minutes: 5
run: ./deploy.shRecommended timeout strategy:
# Set a job timeout that's 20-30% longer than the expected duration
# Set step timeouts for known slow steps
# Fail fast — a 10-minute timeout on a 3-minute job is reasonable
jobs:
test:
timeout-minutes: 20 # Normally finishes in 8-12 minutes
deploy:
timeout-minutes: 10 # Deployment should take < 5 minutes
needs: testFix 2: Fix Test Suites That Don’t Exit
The most common cause of hanging CI jobs is a test runner that doesn’t exit after tests complete:
# Jest — use --forceExit as a fallback (but fix the root cause)
- name: Run tests
run: npx jest --forceExit
# Or set a timeout on the test run itself
- name: Run tests
run: timeout 300 npm test # Linux: kill after 300 secondsRoot cause fix — close open handles:
// jest.config.js
module.exports = {
// Detect open handles so you can fix them properly
detectOpenHandles: true,
// Or force exit if you can't fix all handles immediately
forceExit: true,
};// Common open handle fixes in test files
// Database connections
afterAll(async () => {
await db.close(); // Close DB connection after all tests
await server.close(); // Close HTTP server
clearTimeout(myTimer); // Clear pending timers
subscription.unsubscribe(); // Unsubscribe from event streams
});Mocha:
# --exit forces Mocha to quit after tests, even with open handles
npx mocha --exit tests/**/*.test.js
# --timeout sets per-test timeout
npx mocha --timeout 10000 tests/**/*.test.jsFix 3: Prevent Interactive Prompts in CI
Tools that ask for confirmation will hang forever in CI. Always pass non-interactive flags:
steps:
- name: Install Python packages
run: pip install --no-input -r requirements.txt
# --no-input: never prompt for confirmation
- name: Install npm packages
run: npm ci
# npm ci is already non-interactive; npm install may prompt
- name: Run database migrations
run: |
# Django — no interactive prompts
python manage.py migrate --no-input
# Rails
RAILS_ENV=production bundle exec rails db:migrate
# Flyway
flyway migrate -url=$DB_URL -user=$DB_USER -password=$DB_PASS
- name: Deploy with Terraform
env:
TF_INPUT: "false" # Disable all Terraform interactive prompts
run: terraform apply -auto-approve
- name: Docker build
run: |
# --no-cache avoids prompts and stale cache
docker build --no-cache -t myapp .Detect hanging jobs by checking for output:
- name: Run tests with heartbeat
run: |
# Run tests in background, print progress every 30s
npm test &
TEST_PID=$!
while kill -0 $TEST_PID 2>/dev/null; do
echo "Still running at $(date)..."
sleep 30
done
wait $TEST_PIDFix 4: Debug Hanging Jobs with tmate
Connect to a running GitHub Actions runner to debug interactively:
- name: Setup tmate session for debugging
uses: mxschmitt/action-tmate@v3
if: ${{ failure() }} # Only open session on failure
with:
limit-access-to-actor: true # Only the repo owner can connect
timeout-minutes: 15Or add a conditional debug step:
- name: Debug
uses: mxschmitt/action-tmate@v3
if: ${{ github.event_name == 'workflow_dispatch' && inputs.debug_enabled }}
timeout-minutes: 30Log more context before the hanging step:
- name: Pre-test diagnostics
run: |
echo "=== System Info ==="
free -h
df -h
ps aux | head -20
echo "=== Network ==="
netstat -tlnp 2>/dev/null || ss -tlnp
echo "=== Environment ==="
env | grep -v -E "(TOKEN|SECRET|PASSWORD|KEY)"Fix 5: Configure Self-Hosted Runner Timeouts
Self-hosted runners have different timeout behavior and common failure modes:
# Increase timeout for self-hosted runners (they're often slower)
jobs:
build:
runs-on: self-hosted
timeout-minutes: 60 # Self-hosted runners can be slower
steps:
- name: Checkout
uses: actions/checkout@v4
with:
# Shallow clone for faster checkout on self-hosted
fetch-depth: 1Runner configuration for reliability:
# Run the runner as a service so it restarts automatically
# (instead of running it manually)
# On Linux with systemd
cd ~/actions-runner
sudo ./svc.sh install
sudo ./svc.sh start
# Check runner status
sudo ./svc.sh status
# The runner will restart automatically if it crashesClean up stale files between runs on self-hosted runners:
jobs:
build:
runs-on: self-hosted
steps:
- name: Clean workspace
run: |
# Remove files from previous runs that may cause issues
git clean -fdx || true
docker system prune -f || trueFix 6: Optimize Slow Workflows
If the workflow finishes but takes too long, optimize before hitting timeouts:
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# Cache dependencies to avoid re-downloading
- name: Cache node modules
uses: actions/cache@v4
with:
path: ~/.npm
key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-
- run: npm ci
# Run tests in parallel
- name: Run tests
run: npx jest --maxWorkers=4 --runInBand=false
# Split long test suites across parallel jobs
test-unit:
runs-on: ubuntu-latest
steps:
- run: npm test -- --testPathPattern="unit"
test-integration:
runs-on: ubuntu-latest
steps:
- run: npm test -- --testPathPattern="integration"Conditional steps — skip expensive work when not needed:
- name: Build Docker image
# Only build on main branch pushes — skip for PRs
if: github.ref == 'refs/heads/main'
run: docker build -t myapp .
- name: Run E2E tests
# Only run if source files changed
if: contains(github.event.head_commit.modified, 'src/')
run: npx playwright testStill Not Working?
Job is queued but never starts — check if you’ve hit your concurrent job limit. Free GitHub accounts are limited to 20 concurrent jobs. If all runners are busy, new jobs wait in the queue. Check the Actions tab for queued jobs.
The operation was canceled immediately — a required secret or environment variable is missing, causing an early exit. Or a dependency job failed and the needs: condition cancelled this job. Check the job that ran before.
Step timeout doesn’t stop the process cleanly — when a step times out, GitHub sends SIGTERM followed by SIGKILL. If the process catches SIGTERM but doesn’t exit, it gets killed after a grace period. Some processes spawn children that aren’t killed. Use timeout with --kill-after:
# Send SIGTERM at 300s, SIGKILL at 330s if still running
timeout --kill-after=30s 300s npm testWorkflow dispatch with workflow_call timeout — when a workflow is called from another workflow (workflow_call), the called workflow’s own timeout-minutes applies to the entire called workflow, not the individual jobs within it. Set timeouts at both levels.
Matrix job timing out only on one OS — a matrix run on Linux finishes in 4 minutes while Windows hits the 10-minute timeout. The Windows runner has slower disk I/O and npm extraction can take 5-10x as long. Either bump the timeout for the Windows leg specifically using matrix.include, or cache ~/.npm and ~\AppData\Local\npm-cache per OS. The same effect appears with pip on Windows because of file-locking on antivirus-scanned wheels.
Concurrency cancellations re-using a stale timeout — when concurrency.cancel-in-progress: true cancels an older run and a new one starts immediately, the new run gets a fresh timeout-minutes clock. If your job depends on cleanup from a previous run (build cache priming, integration test database), the new run can time out waiting for that cleanup to finish. Move cleanup into a separate job that runs unconditionally with if: always().
Composite action steps inherit timeouts incorrectly — timeout-minutes set on a composite action’s outer uses: step does not propagate into the action’s internal steps. Each step inside the composite action runs without a timeout unless you set it inside the action’s own YAML. If you don’t own the action, wrap the entire uses: step in a timeout shell command, or use the outer step timeout combined with cancel-in-progress: true on the workflow.
For related GitHub Actions issues, see Fix: GitHub Actions Process Completed Exit Code 1, Fix: GitHub Actions Cache Not Working, Fix: GitHub Actions Artifacts Not Working, and Fix: GitHub Actions Runner Failed.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: GitHub Actions Reusable Workflow Not Working — Inputs Not Passed or Secrets Not Available
How to fix GitHub Actions reusable workflow issues — workflow_call trigger, passing inputs and secrets, output variables, caller vs called permissions, and common errors.
Fix: GitHub Actions Artifacts Not Working — Upload Fails, Download Empty, or Artifact Not Found
How to fix GitHub Actions artifact issues — upload-artifact path patterns, download-artifact across jobs, retention days, artifact name conflicts, and the v3 to v4 migration.
Fix: GitHub Actions Secret Not Available — Environment Variable Empty in Workflow
How to fix GitHub Actions secrets that appear empty or undefined in workflows — secret scope, fork PR restrictions, environment protection rules, secret names, and OIDC alternatives.
Fix: GitHub Actions Matrix Strategy Not Working — Jobs Not Running or Failing
How to fix GitHub Actions matrix strategy issues — matrix expansion, include/exclude patterns, failing fast, matrix variable access, and dependent jobs with matrix outputs.