Skip to content

Fix: GitHub Actions Job Timeout — Workflow Cancelled or Stuck After 6 Hours

FixDevs · (Updated: )

Part of:  Docker, DevOps & Infrastructure

Quick Answer

How to fix GitHub Actions timeout issues — job-level and step-level timeouts, stuck processes, self-hosted runner timeouts, debugging hanging jobs, and timeout best practices.

The Problem

A GitHub Actions workflow is cancelled with a timeout error:

Error: The operation was canceled.
The job running on runner GitHub Actions X has exceeded the maximum execution time of 360 minutes.

Or a specific step hangs indefinitely without producing output:

Run npm test
  npm test
  shell: /usr/bin/bash -e {0}
... (no output for 45 minutes, then cancelled)

Or a self-hosted runner job times out much sooner than expected:

Error: The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.

Or a workflow that previously finished in 5 minutes now takes hours.

Why This Happens

GitHub Actions has hard limits and several common reasons for jobs getting stuck. The platform’s defaults are generous — generous enough that a stuck job will eat through a paid plan’s monthly minute budget before anyone notices.

The headline limit is the 360-minute (6-hour) job timeout on hosted runners. Without an explicit timeout-minutes, a hung job runs until that ceiling is reached, which is rarely what you want. Step timeouts are inherited from the job and there is no way to set a higher per-step timeout than the surrounding job, so you should always set the job timeout to a value that bounds your worst-case successful run plus a small margin.

The second common cause is interactive prompts in tools that were never designed for CI. apt-get install without -y, npm init, pip install with conflicting versions, aws configure, gh auth login, git push against an HTTPS remote with no stored credentials — every one of these will block on stdin in a non-TTY environment. The process will not crash, will not time out on its own, and will produce no output. GitHub’s runner only notices when the job-level timeout fires.

The third is test runners that hold the event loop open. Jest, Mocha, Vitest, and pytest with --asyncio-mode=auto all default to waiting for the event loop to drain rather than calling process.exit(). An unclosed database pool, a forgotten setInterval, a websocket subscription, an OpenTelemetry exporter, or a worker thread that never resolves will keep the process alive indefinitely. Tests appear to “pass” — you see the green output — and then the runner sits idle until the timeout fires.

The fourth is deadlock between dependent jobs. A needs: chain that loops back on itself is rejected at parse time, but more subtle cases — job A waiting for an external resource that job B is supposed to create, but job B was filtered out by an if: condition — pass validation and then hang forever on the wait.

Finally, self-hosted runners introduce their own failure modes. The runner process can crash, the underlying VM can be evicted, the network can partition. GitHub’s queue re-dispatches the job to the next available runner, and the timeout clock typically restarts, so an unstable fleet can cycle through several runners before any single job actually completes or fails.

Version History: Timeout Controls in GitHub Actions

The way you express timeouts in GitHub Actions has evolved across the platform’s lifetime, and the defaults you’ll see in legacy workflows differ from what’s documented today.

  • General availability (November 2019) — when GitHub Actions launched, the only timeout knob was the job-level timeout-minutes field. The default ceiling for hosted runners was already 360 minutes, with step-level timeouts inheriting that bound. Self-hosted runners had no built-in upper limit beyond what the runner process enforced.
  • 2020 — step-level timeout-minutes — GitHub added per-step timeouts so a single slow step (database seed, integration suite) could be bounded without lowering the job ceiling. This was the first knob that let you “fail fast on a stuck step, keep going on the rest of the job.”
  • April 2021 — jobs.<job_id>.concurrency and cancel-in-progress — the concurrency block landed, allowing you to declare a logical group and cancel previous in-flight runs when a new one starts. This is the cleanest way to avoid stacking timeouts: a PR push cancels the previous PR’s stuck run instead of letting both burn six hours.
  • September 2021 — workflow-level concurrency — the same primitive was promoted to the top of the workflow file, so you don’t need to repeat it on every job.
  • 2022 — reusable workflows (workflow_call) — when a workflow is called from another workflow, each side has its own timeout-minutes. The caller’s timeout does not cascade to the callee. A 5-minute caller timeout will not stop a callee that has timeout-minutes: 60 set internally; the callee keeps running until its own clock fires, then the caller picks up the result.
  • 2023 — runner re-queue policy clarification — GitHub documented that when a self-hosted runner crashes mid-job, the queue retries on the next available runner, and the timeout clock restarts from zero on the new attempt. Combined with an unstable fleet, this can produce wall-clock durations that exceed the configured timeout by a factor of 2-3x.
  • 2024 — larger runners and ARM64 hosted runners — none of the timeout defaults changed, but larger machines made it more obvious when a workflow was hanging on I/O versus genuinely slow. ARM64 runners also exposed compatibility issues that surface as silent hangs (missing arm64 binaries, falling back to QEMU emulation).
  • 2024-2025 — step_summary and step-level continue-on-error handling — failures from steps that were cancelled by timeout now appear distinctly in the run summary, making it easier to tell “step ran to completion and failed” from “step was killed by the timeout.”

The 360-minute default has been stable since launch and is not configurable upward on hosted runners. If you genuinely need longer than six hours, the only option is self-hosted runners.

Fix 1: Set Explicit Timeouts

Set timeout-minutes at the job or step level to fail fast instead of hanging:

jobs:
  build:
    runs-on: ubuntu-latest
    timeout-minutes: 15  # Job-level: cancel if not done in 15 minutes

    steps:
      - uses: actions/checkout@v4

      - name: Install dependencies
        timeout-minutes: 5   # Step-level: fail this step after 5 minutes
        run: npm ci

      - name: Run tests
        timeout-minutes: 10
        run: npm test

      - name: Deploy
        timeout-minutes: 5
        run: ./deploy.sh

Recommended timeout strategy:

# Set a job timeout that's 20-30% longer than the expected duration
# Set step timeouts for known slow steps
# Fail fast — a 10-minute timeout on a 3-minute job is reasonable

jobs:
  test:
    timeout-minutes: 20  # Normally finishes in 8-12 minutes

  deploy:
    timeout-minutes: 10  # Deployment should take < 5 minutes
    needs: test

Fix 2: Fix Test Suites That Don’t Exit

The most common cause of hanging CI jobs is a test runner that doesn’t exit after tests complete:

# Jest — use --forceExit as a fallback (but fix the root cause)
- name: Run tests
  run: npx jest --forceExit

# Or set a timeout on the test run itself
- name: Run tests
  run: timeout 300 npm test  # Linux: kill after 300 seconds

Root cause fix — close open handles:

// jest.config.js
module.exports = {
  // Detect open handles so you can fix them properly
  detectOpenHandles: true,

  // Or force exit if you can't fix all handles immediately
  forceExit: true,
};
// Common open handle fixes in test files
// Database connections
afterAll(async () => {
  await db.close();          // Close DB connection after all tests
  await server.close();      // Close HTTP server
  clearTimeout(myTimer);     // Clear pending timers
  subscription.unsubscribe(); // Unsubscribe from event streams
});

Mocha:

# --exit forces Mocha to quit after tests, even with open handles
npx mocha --exit tests/**/*.test.js

# --timeout sets per-test timeout
npx mocha --timeout 10000 tests/**/*.test.js

Fix 3: Prevent Interactive Prompts in CI

Tools that ask for confirmation will hang forever in CI. Always pass non-interactive flags:

steps:
  - name: Install Python packages
    run: pip install --no-input -r requirements.txt
    # --no-input: never prompt for confirmation

  - name: Install npm packages
    run: npm ci
    # npm ci is already non-interactive; npm install may prompt

  - name: Run database migrations
    run: |
      # Django — no interactive prompts
      python manage.py migrate --no-input

      # Rails
      RAILS_ENV=production bundle exec rails db:migrate

      # Flyway
      flyway migrate -url=$DB_URL -user=$DB_USER -password=$DB_PASS

  - name: Deploy with Terraform
    env:
      TF_INPUT: "false"  # Disable all Terraform interactive prompts
    run: terraform apply -auto-approve

  - name: Docker build
    run: |
      # --no-cache avoids prompts and stale cache
      docker build --no-cache -t myapp .

Detect hanging jobs by checking for output:

- name: Run tests with heartbeat
  run: |
    # Run tests in background, print progress every 30s
    npm test &
    TEST_PID=$!
    while kill -0 $TEST_PID 2>/dev/null; do
      echo "Still running at $(date)..."
      sleep 30
    done
    wait $TEST_PID

Fix 4: Debug Hanging Jobs with tmate

Connect to a running GitHub Actions runner to debug interactively:

- name: Setup tmate session for debugging
  uses: mxschmitt/action-tmate@v3
  if: ${{ failure() }}  # Only open session on failure
  with:
    limit-access-to-actor: true  # Only the repo owner can connect
    timeout-minutes: 15

Or add a conditional debug step:

- name: Debug
  uses: mxschmitt/action-tmate@v3
  if: ${{ github.event_name == 'workflow_dispatch' && inputs.debug_enabled }}
  timeout-minutes: 30

Log more context before the hanging step:

- name: Pre-test diagnostics
  run: |
    echo "=== System Info ==="
    free -h
    df -h
    ps aux | head -20

    echo "=== Network ==="
    netstat -tlnp 2>/dev/null || ss -tlnp

    echo "=== Environment ==="
    env | grep -v -E "(TOKEN|SECRET|PASSWORD|KEY)"

Fix 5: Configure Self-Hosted Runner Timeouts

Self-hosted runners have different timeout behavior and common failure modes:

# Increase timeout for self-hosted runners (they're often slower)
jobs:
  build:
    runs-on: self-hosted
    timeout-minutes: 60  # Self-hosted runners can be slower

    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          # Shallow clone for faster checkout on self-hosted
          fetch-depth: 1

Runner configuration for reliability:

# Run the runner as a service so it restarts automatically
# (instead of running it manually)

# On Linux with systemd
cd ~/actions-runner
sudo ./svc.sh install
sudo ./svc.sh start

# Check runner status
sudo ./svc.sh status

# The runner will restart automatically if it crashes

Clean up stale files between runs on self-hosted runners:

jobs:
  build:
    runs-on: self-hosted
    steps:
      - name: Clean workspace
        run: |
          # Remove files from previous runs that may cause issues
          git clean -fdx || true
          docker system prune -f || true

Fix 6: Optimize Slow Workflows

If the workflow finishes but takes too long, optimize before hitting timeouts:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      # Cache dependencies to avoid re-downloading
      - name: Cache node modules
        uses: actions/cache@v4
        with:
          path: ~/.npm
          key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
          restore-keys: |
            ${{ runner.os }}-node-

      - run: npm ci

      # Run tests in parallel
      - name: Run tests
        run: npx jest --maxWorkers=4 --runInBand=false

  # Split long test suites across parallel jobs
  test-unit:
    runs-on: ubuntu-latest
    steps:
      - run: npm test -- --testPathPattern="unit"

  test-integration:
    runs-on: ubuntu-latest
    steps:
      - run: npm test -- --testPathPattern="integration"

Conditional steps — skip expensive work when not needed:

- name: Build Docker image
  # Only build on main branch pushes — skip for PRs
  if: github.ref == 'refs/heads/main'
  run: docker build -t myapp .

- name: Run E2E tests
  # Only run if source files changed
  if: contains(github.event.head_commit.modified, 'src/')
  run: npx playwright test

Still Not Working?

Job is queued but never starts — check if you’ve hit your concurrent job limit. Free GitHub accounts are limited to 20 concurrent jobs. If all runners are busy, new jobs wait in the queue. Check the Actions tab for queued jobs.

The operation was canceled immediately — a required secret or environment variable is missing, causing an early exit. Or a dependency job failed and the needs: condition cancelled this job. Check the job that ran before.

Step timeout doesn’t stop the process cleanly — when a step times out, GitHub sends SIGTERM followed by SIGKILL. If the process catches SIGTERM but doesn’t exit, it gets killed after a grace period. Some processes spawn children that aren’t killed. Use timeout with --kill-after:

# Send SIGTERM at 300s, SIGKILL at 330s if still running
timeout --kill-after=30s 300s npm test

Workflow dispatch with workflow_call timeout — when a workflow is called from another workflow (workflow_call), the called workflow’s own timeout-minutes applies to the entire called workflow, not the individual jobs within it. Set timeouts at both levels.

Matrix job timing out only on one OS — a matrix run on Linux finishes in 4 minutes while Windows hits the 10-minute timeout. The Windows runner has slower disk I/O and npm extraction can take 5-10x as long. Either bump the timeout for the Windows leg specifically using matrix.include, or cache ~/.npm and ~\AppData\Local\npm-cache per OS. The same effect appears with pip on Windows because of file-locking on antivirus-scanned wheels.

Concurrency cancellations re-using a stale timeout — when concurrency.cancel-in-progress: true cancels an older run and a new one starts immediately, the new run gets a fresh timeout-minutes clock. If your job depends on cleanup from a previous run (build cache priming, integration test database), the new run can time out waiting for that cleanup to finish. Move cleanup into a separate job that runs unconditionally with if: always().

Composite action steps inherit timeouts incorrectlytimeout-minutes set on a composite action’s outer uses: step does not propagate into the action’s internal steps. Each step inside the composite action runs without a timeout unless you set it inside the action’s own YAML. If you don’t own the action, wrap the entire uses: step in a timeout shell command, or use the outer step timeout combined with cancel-in-progress: true on the workflow.

For related GitHub Actions issues, see Fix: GitHub Actions Process Completed Exit Code 1, Fix: GitHub Actions Cache Not Working, Fix: GitHub Actions Artifacts Not Working, and Fix: GitHub Actions Runner Failed.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles