Skip to content

Fix: GitHub Actions Runner Failed to Start or Connect

FixDevs · (Updated: )

Part of:  Docker, DevOps & Infrastructure

Quick Answer

Fix GitHub Actions self-hosted runner failures including connection issues, version mismatches, and registration problems with step-by-step solutions.

The Error

You set up a self-hosted runner for GitHub Actions and see one of these messages:

Error: The self-hosted runner lost communication with the server.
Could not resolve host: github.com
Runner connect error: The HTTP request timed out after 00:01:00.

Or the runner appears offline in your repository’s Settings > Actions > Runners page, even though you believe it’s running.

Why This Happens

Self-hosted runners maintain a persistent long-poll HTTPS connection to GitHub’s job-dispatch service. The runner agent (Runner.Listener) keeps that socket open, waits for a job, leases it, then hands the work to a worker process (Runner.Worker). When the listener can’t reach GitHub, can’t authenticate, or can’t be matched to a queued job, the runner shows offline and your workflow sits in a “Waiting for a runner” state indefinitely. The error message you see is downstream of the failure — the root cause is almost always one of four categories: network reachability, runner agent state, label or group misrouting, or host resource exhaustion.

GitHub regularly updates the runner application and stops accepting connections from versions that fall outside the supported window. The agent has auto-update logic, but that update path itself depends on the runner being able to reach GitHub at the moment a new release ships. If your runner was offline when an update was pushed, it can be stuck on a version GitHub no longer accepts, which then prevents it from coming back online — a chicken-and-egg loop that requires a manual upgrade.

There’s also a class of failures that has nothing to do with the runner itself: the GitHub-side configuration can silently keep jobs from ever reaching the runner. Organization spending limits, restricted runner groups, missing labels, and disabled Actions at the repo or org level all produce the same surface symptom (“job pending, runner idle”) but require completely different fixes. The diagnostic timeline below walks through how to separate these cases in order of likelihood.

Diagnostic Timeline

Use this sequence the moment a self-hosted runner stops picking up jobs. Each step takes under a minute and rules out one root cause.

  • Minute 0 — Confirm the runner row in Settings > Actions > Runners. Green dot with “Idle” means the agent is connected and waiting for work; the issue is label/group routing or repo permissions, not the runner. Red dot with “Offline” means the agent itself can’t talk to GitHub — jump to network and version checks.
  • Minute 1 — Compare the workflow’s runs-on: value against the runner’s labels. Open the failing workflow file, note the label(s), then click into the runner row and compare. A typo (self-hosted-linux vs self-hosted,linux) sends jobs to a phantom runner.
  • Minute 2 — Check organization Actions spending and runner groups. Org Settings > Billing > Plans and add-ons shows whether you’ve hit the Actions minutes cap (this affects GitHub-hosted runners but also queues self-hosted jobs that depend on workflow_run). Org Settings > Actions > Runner groups shows whether the runner’s group is restricted to specific repositories.
  • Minute 3 — On the runner host, check the listener process. Run ps aux | grep Runner.Listener (or Get-Process Runner.Listener on Windows). If the process is missing, the service crashed; if it’s running but offline, the agent thinks it’s connected but GitHub disagrees — usually a version or token mismatch.
  • Minute 4 — Tail the diagnostic log. tail -50 _diag/Runner_*.log shows the exact handshake. A “401 Unauthorized” points at an expired or revoked registration; “Connect timeout” points at network or DNS; “Version not supported” points at an upgrade.
  • Minute 5 — Check disk and inode usage. df -h and df -i. A full disk silently kills the worker after job start, which looks identical to a connection drop in the UI.
  • Minute 6 — Restart the service interactively, not as a daemon. sudo ./svc.sh stop && ./run.sh. Running the agent in the foreground surfaces handshake errors that the systemd journal sometimes truncates.

If you reach minute 6 without a clear cause, you’re almost certainly looking at a corporate proxy doing TLS inspection, a DNS split-horizon issue, or an outbound firewall change that the runner host can’t see. Move to Fix 1.

Fix 1: Check Network Connectivity

The runner needs outbound HTTPS access to several GitHub domains. Test connectivity from the runner machine:

curl -v https://github.com
curl -v https://api.github.com
curl -v https://codeload.github.com
curl -v https://objects.githubusercontent.com

All must return HTTP 200 or 301. If any fail, check your firewall rules. The runner communicates exclusively over HTTPS (port 443).

For runners behind a corporate proxy:

export https_proxy=http://proxy.company.com:8080
export http_proxy=http://proxy.company.com:8080
export no_proxy=localhost,127.0.0.1

Add these to the runner’s .env file (located in the runner directory) to persist across restarts.

Pro Tip: GitHub publishes its IP ranges via the meta API. Use the actions key to find the IP ranges your firewall needs to allow.

Fix 2: Update the Runner Version

GitHub requires runners to be within a certain version range. Check your current version:

./run.sh --version

Compare it with the latest release on GitHub. If your version is more than a few minor versions behind, update:

# Stop the runner
sudo ./svc.sh stop

# Download and extract the latest version
curl -o actions-runner-linux-x64.tar.gz -L \
  https://github.com/actions/runner/releases/download/v2.XXX.X/actions-runner-linux-x64-2.XXX.X.tar.gz
tar xzf actions-runner-linux-x64.tar.gz

# Restart
sudo ./svc.sh start

The runner has auto-update capability, but it sometimes fails if the runner process isn’t running when an update is published.

Fix 3: Re-register the Runner

Registration tokens expire after 1 hour. If the runner was configured with an expired token, it won’t connect. Re-register:

# Remove existing registration
./config.sh remove --token YOUR_REMOVAL_TOKEN

# Generate a new token from:
# Settings > Actions > Runners > New self-hosted runner

# Re-configure
./config.sh --url https://github.com/OWNER/REPO --token NEW_TOKEN

For organization-level runners, use the organization settings page instead. You can also generate tokens via the GitHub API:

curl -X POST \
  -H "Authorization: token YOUR_PAT" \
  https://api.github.com/repos/OWNER/REPO/actions/runners/registration-token

Fix 4: Fix Label and Group Mismatches

Jobs target runners using labels. If your workflow specifies a label the runner doesn’t have, the job queues forever:

# Workflow expects this label
runs-on: self-hosted-gpu

# But runner was configured with
# ./config.sh --labels self-hosted,linux,x64

Check runner labels in Settings > Actions > Runners. Add missing labels:

# You must remove and re-register to change labels
./config.sh remove --token TOKEN
./config.sh --url https://github.com/OWNER/REPO \
  --token NEW_TOKEN \
  --labels self-hosted,linux,x64,self-hosted-gpu

Common Mistake: Runner groups (enterprise/organization feature) can restrict which repositories a runner serves. If your runner is in a group that doesn’t include your repository, jobs won’t be routed to it. Check organization Settings > Actions > Runner groups.

Fix 5: Fix Docker-Based Runner Issues

If you run the GitHub Actions runner inside a Docker container, several issues can arise:

# Common mistake: running as root without --user
FROM ubuntu:22.04
# Runner refuses to run as root by default

The runner won’t start as root unless you set RUNNER_ALLOW_RUNASROOT=1:

docker run -e RUNNER_ALLOW_RUNASROOT=1 \
  -v /var/run/docker.sock:/var/run/docker.sock \
  your-runner-image

For Docker-in-Docker workflows, mount the Docker socket:

docker run -v /var/run/docker.sock:/var/run/docker.sock \
  -v /tmp:/tmp \
  your-runner-image

Make sure the runner container has enough disk space for workspace files and Docker layer caching.

Fix 6: Address Resource Limits

The runner may crash or hang if the machine runs out of resources. Check:

# Memory
free -h

# Disk space
df -h

# CPU
top -bn1 | head -5

# Check if runner process is alive
ps aux | grep Runner.Listener

Common resource issues:

  • Disk full: Old workflow artifacts and Docker images accumulate. Clean up with docker system prune -af and clear the runner’s _work directory.
  • Memory exhaustion: The runner itself uses ~200MB, but your workflows may need much more. Monitor with dmesg | grep -i oom to check for OOM kills.
  • Too many concurrent jobs: By default, a runner processes one job at a time. Running multiple runners on the same machine requires enough resources for all concurrent jobs.

Fix 7: Fix GITHUB_TOKEN Permissions

The runner uses a GITHUB_TOKEN that’s automatically generated for each workflow run. If permissions are too restrictive, steps that interact with the repository may fail:

permissions:
  contents: read
  packages: write
  issues: write

For organization repositories with restrictive default permissions, set permissions explicitly in your workflow:

jobs:
  build:
    runs-on: self-hosted
    permissions:
      contents: write
      pull-requests: write

Check your organization settings under Settings > Actions > General > Workflow permissions. “Read repository contents” is the most restrictive default and may block operations like pushing commits or creating releases.

Fix 8: Debug Using Runner Logs

The runner writes detailed logs that reveal exactly why it can’t connect:

# Service logs (if installed as service)
journalctl -u actions.runner.OWNER-REPO.RUNNER_NAME -f

# Or check the log files directly
cat _diag/Runner_*.log | tail -100
cat _diag/Worker_*.log | tail -100

Look for these key messages:

  • "Authentication failed" — Token expired or invalid. Re-register.
  • "Http response code: Unauthorized" — PAT or app token lacks required scopes.
  • "Connect timeout" — Network issue. Check firewall and DNS.
  • "Version not supported" — Runner too old. Update.
  • "No free disk space" — Clean up the _work directory.

Enable diagnostic logging by creating a .env file in the runner directory:

ACTIONS_RUNNER_DEBUG=true
ACTIONS_STEP_DEBUG=true

Still Not Working?

  • Check if GitHub is down. Visit githubstatus.com before deep-diving into your configuration.

  • Verify DNS resolution. Run nslookup github.com from the runner machine. Corporate DNS servers sometimes block or redirect GitHub domains.

  • Check TLS certificates. Corporate proxies that perform SSL inspection can break the runner’s HTTPS connection. Add your corporate CA certificate to the runner’s trust store at the OS level so the .NET runtime that ships with the agent picks it up.

  • Try running interactively. Stop the service and run ./run.sh directly. This shows real-time errors that the service logs might not capture.

  • Check Docker image compatibility. If using a container-based runner, ensure the base image has all required dependencies (libicu, libssl, git).

  • Monitor the runner process. Use systemctl status actions.runner.* to check if the service is actually running or if it crashed silently.

  • Check organization spending limits. Even self-hosted runners can be blocked when an organization hits its monthly Actions storage or data transfer cap. Org Settings > Billing > Plans and add-ons shows usage. Bumping the limit immediately frees pending jobs without restarting anything.

  • Confirm Actions is enabled at every level. Repo Settings > Actions > General, then Org Settings > Actions > General. A “Disabled” setting at the org level overrides every repo and silently queues jobs. The runner stays “Idle” because it’s healthy — there just isn’t a job allowed to reach it.

  • Look for ephemeral runner exhaustion. If you registered the runner with --ephemeral, it accepts exactly one job and de-registers. A workflow that uses runs-on: self-hosted after that finds zero matching runners. Add a re-registration loop or switch to a non-ephemeral configuration if you need persistent capacity.

  • Audit _work/_temp ownership. A previous job that ran as root can leave files the runner user can’t delete on the next checkout. The next job fails before any of your steps execute. chown -R runner:runner _work resolves it.

  • Check for IPv6 surprises. Some runner hosts resolve github.com to an IPv6 address by default but only have IPv4 outbound through the corporate firewall. The TCP connection silently times out instead of failing fast. Force IPv4 by setting precedence ::ffff:0:0/96 100 in /etc/gai.conf or by editing the firewall to allow IPv6 egress on 443.

  • Watch for clock skew. TLS handshakes fail with cryptic errors when the runner clock drifts more than five minutes from real time. Enable chronyd or systemd-timesyncd and confirm with timedatectl status. Hosts that have been suspended or paused (common with VM-based runners) frequently come back with a stale clock.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles