Skip to content

Fix: Kubernetes HPA Not Scaling — HorizontalPodAutoscaler Shows Unknown or Doesn't Scale

FixDevs · (Updated: )

Part of:  Docker, DevOps & Infrastructure

Quick Answer

How to fix Kubernetes HorizontalPodAutoscaler issues — metrics-server not installed, CPU requests not set, unknown metrics, scale-down delay, custom metrics, and KEDA.

The Problem

A Kubernetes HorizontalPodAutoscaler shows <unknown> for the current metric value:

kubectl get hpa
# NAME      REFERENCE            TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
# my-hpa    Deployment/my-app    <unknown>/50%   2         10        2          5m

Or the HPA doesn’t scale up even when the application is clearly overloaded:

# CPU usage visible in kubectl top pods
kubectl top pods
# NAME                      CPU(cores)   MEMORY(bytes)
# my-app-7d9f8b6c4-xk2p9   950m         256Mi

# But HPA still shows 1 replica and won't scale
kubectl describe hpa my-hpa
# Warning  FailedGetScale  unable to fetch metrics from resource metrics API

Or the HPA scales up but never scales back down, leaving excess replicas running.

In the worst case, this surfaces as a production outage: a traffic spike hits, the existing pods saturate, latency climbs past timeout thresholds, requests fail with 503 errors, and the HPA — which should be adding replicas — sits at <unknown> because metrics-server is broken or the deployment has no CPU requests. The blast radius is the entire service under load, recovery requires a manual kubectl scale, and the post-mortem reveals that autoscaling was never actually working.

Why This Happens

HPA relies on the metrics API to make scaling decisions. Common failure causes:

  • metrics-server not installed — HPA’s default CPU and memory metrics require metrics-server in the cluster. Without it, all metrics show <unknown>.
  • No CPU requests on the container — HPA calculates CPU utilization as current usage / requested CPU. If resources.requests.cpu is not set, HPA can’t calculate a percentage and shows <unknown>.
  • metrics-server not accessiblemetrics-server uses kubelet’s resource endpoints. In some setups (kubeadm, kind, minikube), the kubelet’s serving certificate isn’t trusted, requiring --kubelet-insecure-tls.
  • Scale-down cooldown — by default, HPA waits 5 minutes before scaling down to avoid flapping. Replicas won’t decrease immediately after load drops.
  • Wrong metric target typeUtilization (percentage) vs AverageValue (absolute) have different meanings and requirements.

A more subtle failure mode: the HPA reports correct metrics in kubectl get hpa but never actually scales because of policy constraints. The deployment is already at maxReplicas, a PodDisruptionBudget blocks scale-down, the cluster has no spare node capacity for new pods, or the HPA controller’s reconcile loop is throttled because the API server is overloaded. These are silent failures: kubectl describe hpa shows the controller wants to scale to 8 pods, but currentReplicas remains stuck at 4.

There is also a measurement-vs-reality gap. HPA averages CPU usage across all pods. If one pod is hot at 95% and three sibling pods are idle at 5%, the average is 25% — well below the threshold — and HPA does not scale. The hot pod keeps timing out while the cluster looks healthy from the autoscaler’s perspective.

Fix 1: Install metrics-server

CPU and memory HPA requires metrics-server. Verify it’s installed and working:

# Check if metrics-server is installed
kubectl get deployment metrics-server -n kube-system

# If not found, install with Helm
helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
helm upgrade --install metrics-server metrics-server/metrics-server \
  --namespace kube-system

# Or with kubectl
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify metrics are working
kubectl top nodes
kubectl top pods -A

For kubeadm, kind, minikube — add --kubelet-insecure-tls:

# The default metrics-server deployment fails in clusters where
# kubelet serving certificates aren't signed by the cluster CA

# Patch the deployment to add the insecure flag
kubectl patch deployment metrics-server -n kube-system \
  --type='json' \
  -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'

# Or in Helm values:
helm upgrade --install metrics-server metrics-server/metrics-server \
  --namespace kube-system \
  --set args[0]="--kubelet-insecure-tls"

For minikube:

minikube addons enable metrics-server

Fix 2: Set CPU Requests on the Container

HPA requires resources.requests.cpu to calculate utilization percentage:

# WRONG — no resource requests
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    spec:
      containers:
        - name: my-app
          image: my-app:latest
          # No resources section → HPA shows <unknown>

---
# CORRECT — set CPU requests (and ideally limits)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    spec:
      containers:
        - name: my-app
          image: my-app:latest
          resources:
            requests:
              cpu: "200m"       # 200 millicores = 0.2 CPU cores
              memory: "256Mi"
            limits:
              cpu: "1000m"      # 1 CPU core maximum
              memory: "512Mi"

Create the HPA targeting CPU utilization:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 50  # Scale when avg CPU > 50% of requests

Or with kubectl autoscale:

# Create HPA targeting 50% CPU utilization
kubectl autoscale deployment my-app --cpu-percent=50 --min=2 --max=10

# Verify
kubectl get hpa my-app
kubectl describe hpa my-app

Fix 3: Configure Scale Behavior to Prevent Flapping

The default scale-down policy is conservative. Customize it for your use case:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60   # Wait 60s before scaling up again
      policies:
        - type: Pods
          value: 4                     # Add at most 4 pods at once
          periodSeconds: 60
        - type: Percent
          value: 100                   # Or double the current count
          periodSeconds: 60
      selectPolicy: Max               # Use the policy that allows more scaling
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 min before scaling down
      policies:
        - type: Pods
          value: 1                     # Remove at most 1 pod at a time
          periodSeconds: 120

Aggressive scale-down (for cost savings):

behavior:
  scaleDown:
    stabilizationWindowSeconds: 60  # Shorter wait
    policies:
      - type: Percent
        value: 50          # Remove up to 50% of pods at once
        periodSeconds: 60

Prevent scale-down entirely (for critical services):

behavior:
  scaleDown:
    selectPolicy: Disabled  # Never scale down (manual only)

Fix 4: Use Multiple Metrics

Scale on both CPU and memory, or combine with custom metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 20
  metrics:
    # CPU utilization
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60
    # Memory utilization
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 70
    # Custom metric from Prometheus (requires Prometheus Adapter)
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "100"  # Scale when each pod handles > 100 req/s

Note: When multiple metrics are defined, HPA scales to satisfy ALL of them — it uses the metric that requires the most replicas. This is conservative by design.

Fix 5: Set Up Custom Metrics with Prometheus Adapter

For application-level metrics (queue depth, request rate), use the Prometheus Adapter:

# Install Prometheus Adapter
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter \
  --namespace monitoring \
  --set prometheus.url=http://prometheus-server.monitoring.svc.cluster.local
# Configure the adapter to expose a custom metric
# In prometheus-adapter ConfigMap:
rules:
  custom:
    - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
      resources:
        overrides:
          namespace: { resource: "namespace" }
          pod: { resource: "pod" }
      name:
        matches: "^(.*)_total$"
        as: "${1}_per_second"
      metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[2m])'
# HPA using the custom metric
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "50"

Fix 6: Use KEDA for Event-Driven Autoscaling

KEDA (Kubernetes Event-Driven Autoscaling) scales based on external event sources like Kafka, Redis, AWS SQS, and more:

# Install KEDA
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace
# Scale based on Redis queue length
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: redis-scaledobject
spec:
  scaleTargetRef:
    name: my-worker
  minReplicaCount: 0     # Scale to zero when queue is empty
  maxReplicaCount: 30
  triggers:
    - type: redis
      metadata:
        address: redis:6379
        listName: jobs
        listLength: "10"   # 1 pod per 10 items in queue

---
# Scale based on Kafka consumer lag
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-consumer
spec:
  scaleTargetRef:
    name: kafka-worker
  minReplicaCount: 1
  maxReplicaCount: 50
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka:9092
        consumerGroup: my-group
        topic: events
        lagThreshold: "100"  # 1 pod per 100 unprocessed messages

Note: KEDA supports scale-to-zero, which the built-in HPA doesn’t. This is useful for batch workloads or workers that should be inactive when there’s no work.

Fix 7: The 503 Cascade Incident Playbook

When traffic spikes and your HPA does not respond, treat it as a P1 incident and work through this sequence:

Step 1: Manual scale immediately. Do not wait for HPA to recover. Buy yourself headroom:

# Override HPA temporarily by scaling the deployment directly
kubectl scale deployment my-app --replicas=20

# HPA will fight this once it recovers — patch HPA to widen the bounds first
kubectl patch hpa my-hpa --type=merge -p '{"spec":{"minReplicas":20,"maxReplicas":50}}'

Step 2: Diagnose the metrics gap. Check kubectl get hpa and kubectl describe hpa for the divergence between currentReplicas and desiredReplicas:

kubectl get hpa my-hpa -o yaml | grep -E "currentReplicas|desiredReplicas|conditions"
# If currentReplicas < desiredReplicas, the scale operation itself is failing.
# Common causes: insufficient cluster capacity, PDB blocking, quota exceeded.

Step 3: Check the metrics pipeline. If metrics show <unknown>, work top-down:

# Is metrics-server healthy?
kubectl -n kube-system get pods -l k8s-app=metrics-server
kubectl -n kube-system logs deployment/metrics-server --tail=50

# Is the metrics API responding?
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/default/pods" | head

# Does the target deployment have CPU requests?
kubectl get deployment my-app -o jsonpath='{.spec.template.spec.containers[*].resources.requests}'

Step 4: Look for the load-balancing fairness issue. If average CPU is moderate but some pods are saturated, your load balancer is sticky or your hash function is hot-spotting. HPA cannot fix a load-balancing problem — adding pods makes it worse because new pods sit idle while old pods stay hot. Check pod-level CPU with kubectl top pods and look for outliers.

Monitoring this proactively. Add alerts on the divergence itself, not on the absolute replica count:

# PrometheusRule
- alert: HPAScalingStuck
  expr: kube_horizontalpodautoscaler_status_desired_replicas
        - kube_horizontalpodautoscaler_status_current_replicas > 0
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "HPA {{ $labels.horizontalpodautoscaler }} wants more pods than it has"

This alert fires when HPA wants to scale but cannot, which is the precursor to a 503 cascade. Catching it before the cascade gives you minutes to react instead of seconds.

Blast radius: When HPA fails to scale, every request to the service competes for the same finite pod capacity. Latency rises non-linearly as CPU saturates, downstream calls time out, and clients retry — adding more load. Within a couple of minutes, what should have been a transient spike becomes a sustained outage that requires manual intervention to break.

Recovery: Manual scale is always the right first move. Fix the metrics pipeline second. Address load-balancing fairness third. Only after the service is stable should you tune HPA behavior to prevent recurrence.

Still Not Working?

HPA not found by kubectl describe — if kubectl describe hpa shows FailedComputeMetricsReplicas, check the events section. Common messages:

  • "unable to fetch metrics from resource metrics API" → metrics-server not running
  • "missing request for cpu" → no CPU requests on pod spec
  • "invalid metrics" → wrong metric type or name in the HPA spec

HPA scales up but Cluster Autoscaler doesn’t add nodes — if all pods are Pending after HPA scales up, the cluster may be out of capacity. Check if Cluster Autoscaler is installed and configured. HPA and Cluster Autoscaler work together: HPA scales pods, Cluster Autoscaler scales nodes.

Verify the metrics API is accessible:

# Test the metrics API directly
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/default/pods"

# Check metrics-server logs
kubectl logs -n kube-system deployment/metrics-server

HPA shows correct metrics but doesn’t scale — check minReplicas and maxReplicas. If REPLICAS already equals maxReplicas, HPA can’t scale up further. Also check if the deployment has a PodDisruptionBudget preventing scale-down.

ResourceQuota blocks new pods — if the namespace has a ResourceQuota for CPU or memory, HPA can request new replicas but the scheduler will refuse to admit them. kubectl describe namespace <ns> shows the quota and current usage. Increase the quota or move the workload to a namespace with headroom.

Slow scale because of stabilizationWindowSeconds — the default scaleUp.stabilizationWindowSeconds is 0 (immediate) but scaleDown is 300 seconds. If your HPA scaled up but is sticky on the way down, that is expected behavior — lower the window only if flapping is acceptable.

HPA conflicts with VPA — running both Horizontal and Vertical Pod Autoscaler on the same deployment can produce thrashing. VPA changes pod resource requests, which changes the HPA utilization percentage, which triggers HPA, which triggers VPA. Use one or the other on a given workload, not both.

For related Kubernetes issues, see Fix: Kubernetes CrashLoopBackOff, Fix: Kubernetes Pod Pending, Fix: Kubernetes OOMKilled, and Fix: Kubernetes Resource Quota Exceeded.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles