Fix: Kubernetes HPA Not Scaling — HorizontalPodAutoscaler Shows Unknown or Doesn't Scale
Part of: Docker, DevOps & Infrastructure
Quick Answer
How to fix Kubernetes HorizontalPodAutoscaler issues — metrics-server not installed, CPU requests not set, unknown metrics, scale-down delay, custom metrics, and KEDA.
The Problem
A Kubernetes HorizontalPodAutoscaler shows <unknown> for the current metric value:
kubectl get hpa
# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
# my-hpa Deployment/my-app <unknown>/50% 2 10 2 5mOr the HPA doesn’t scale up even when the application is clearly overloaded:
# CPU usage visible in kubectl top pods
kubectl top pods
# NAME CPU(cores) MEMORY(bytes)
# my-app-7d9f8b6c4-xk2p9 950m 256Mi
# But HPA still shows 1 replica and won't scale
kubectl describe hpa my-hpa
# Warning FailedGetScale unable to fetch metrics from resource metrics APIOr the HPA scales up but never scales back down, leaving excess replicas running.
In the worst case, this surfaces as a production outage: a traffic spike hits, the existing pods saturate, latency climbs past timeout thresholds, requests fail with 503 errors, and the HPA — which should be adding replicas — sits at <unknown> because metrics-server is broken or the deployment has no CPU requests. The blast radius is the entire service under load, recovery requires a manual kubectl scale, and the post-mortem reveals that autoscaling was never actually working.
Why This Happens
HPA relies on the metrics API to make scaling decisions. Common failure causes:
metrics-servernot installed — HPA’s default CPU and memory metrics requiremetrics-serverin the cluster. Without it, all metrics show<unknown>.- No CPU
requestson the container — HPA calculates CPU utilization ascurrent usage / requested CPU. Ifresources.requests.cpuis not set, HPA can’t calculate a percentage and shows<unknown>. metrics-servernot accessible —metrics-serveruses kubelet’s resource endpoints. In some setups (kubeadm, kind, minikube), the kubelet’s serving certificate isn’t trusted, requiring--kubelet-insecure-tls.- Scale-down cooldown — by default, HPA waits 5 minutes before scaling down to avoid flapping. Replicas won’t decrease immediately after load drops.
- Wrong metric target type —
Utilization(percentage) vsAverageValue(absolute) have different meanings and requirements.
A more subtle failure mode: the HPA reports correct metrics in kubectl get hpa but never actually scales because of policy constraints. The deployment is already at maxReplicas, a PodDisruptionBudget blocks scale-down, the cluster has no spare node capacity for new pods, or the HPA controller’s reconcile loop is throttled because the API server is overloaded. These are silent failures: kubectl describe hpa shows the controller wants to scale to 8 pods, but currentReplicas remains stuck at 4.
There is also a measurement-vs-reality gap. HPA averages CPU usage across all pods. If one pod is hot at 95% and three sibling pods are idle at 5%, the average is 25% — well below the threshold — and HPA does not scale. The hot pod keeps timing out while the cluster looks healthy from the autoscaler’s perspective.
Fix 1: Install metrics-server
CPU and memory HPA requires metrics-server. Verify it’s installed and working:
# Check if metrics-server is installed
kubectl get deployment metrics-server -n kube-system
# If not found, install with Helm
helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
helm upgrade --install metrics-server metrics-server/metrics-server \
--namespace kube-system
# Or with kubectl
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Verify metrics are working
kubectl top nodes
kubectl top pods -AFor kubeadm, kind, minikube — add --kubelet-insecure-tls:
# The default metrics-server deployment fails in clusters where
# kubelet serving certificates aren't signed by the cluster CA
# Patch the deployment to add the insecure flag
kubectl patch deployment metrics-server -n kube-system \
--type='json' \
-p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'
# Or in Helm values:
helm upgrade --install metrics-server metrics-server/metrics-server \
--namespace kube-system \
--set args[0]="--kubelet-insecure-tls"For minikube:
minikube addons enable metrics-serverFix 2: Set CPU Requests on the Container
HPA requires resources.requests.cpu to calculate utilization percentage:
# WRONG — no resource requests
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
spec:
containers:
- name: my-app
image: my-app:latest
# No resources section → HPA shows <unknown>
---
# CORRECT — set CPU requests (and ideally limits)
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
spec:
containers:
- name: my-app
image: my-app:latest
resources:
requests:
cpu: "200m" # 200 millicores = 0.2 CPU cores
memory: "256Mi"
limits:
cpu: "1000m" # 1 CPU core maximum
memory: "512Mi"Create the HPA targeting CPU utilization:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50 # Scale when avg CPU > 50% of requestsOr with kubectl autoscale:
# Create HPA targeting 50% CPU utilization
kubectl autoscale deployment my-app --cpu-percent=50 --min=2 --max=10
# Verify
kubectl get hpa my-app
kubectl describe hpa my-appFix 3: Configure Scale Behavior to Prevent Flapping
The default scale-down policy is conservative. Customize it for your use case:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
behavior:
scaleUp:
stabilizationWindowSeconds: 60 # Wait 60s before scaling up again
policies:
- type: Pods
value: 4 # Add at most 4 pods at once
periodSeconds: 60
- type: Percent
value: 100 # Or double the current count
periodSeconds: 60
selectPolicy: Max # Use the policy that allows more scaling
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scaling down
policies:
- type: Pods
value: 1 # Remove at most 1 pod at a time
periodSeconds: 120Aggressive scale-down (for cost savings):
behavior:
scaleDown:
stabilizationWindowSeconds: 60 # Shorter wait
policies:
- type: Percent
value: 50 # Remove up to 50% of pods at once
periodSeconds: 60Prevent scale-down entirely (for critical services):
behavior:
scaleDown:
selectPolicy: Disabled # Never scale down (manual only)Fix 4: Use Multiple Metrics
Scale on both CPU and memory, or combine with custom metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 20
metrics:
# CPU utilization
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
# Memory utilization
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70
# Custom metric from Prometheus (requires Prometheus Adapter)
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100" # Scale when each pod handles > 100 req/sNote: When multiple metrics are defined, HPA scales to satisfy ALL of them — it uses the metric that requires the most replicas. This is conservative by design.
Fix 5: Set Up Custom Metrics with Prometheus Adapter
For application-level metrics (queue depth, request rate), use the Prometheus Adapter:
# Install Prometheus Adapter
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter \
--namespace monitoring \
--set prometheus.url=http://prometheus-server.monitoring.svc.cluster.local# Configure the adapter to expose a custom metric
# In prometheus-adapter ConfigMap:
rules:
custom:
- seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
resources:
overrides:
namespace: { resource: "namespace" }
pod: { resource: "pod" }
name:
matches: "^(.*)_total$"
as: "${1}_per_second"
metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[2m])'# HPA using the custom metric
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "50"Fix 6: Use KEDA for Event-Driven Autoscaling
KEDA (Kubernetes Event-Driven Autoscaling) scales based on external event sources like Kafka, Redis, AWS SQS, and more:
# Install KEDA
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace# Scale based on Redis queue length
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: redis-scaledobject
spec:
scaleTargetRef:
name: my-worker
minReplicaCount: 0 # Scale to zero when queue is empty
maxReplicaCount: 30
triggers:
- type: redis
metadata:
address: redis:6379
listName: jobs
listLength: "10" # 1 pod per 10 items in queue
---
# Scale based on Kafka consumer lag
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-consumer
spec:
scaleTargetRef:
name: kafka-worker
minReplicaCount: 1
maxReplicaCount: 50
triggers:
- type: kafka
metadata:
bootstrapServers: kafka:9092
consumerGroup: my-group
topic: events
lagThreshold: "100" # 1 pod per 100 unprocessed messagesNote: KEDA supports scale-to-zero, which the built-in HPA doesn’t. This is useful for batch workloads or workers that should be inactive when there’s no work.
Fix 7: The 503 Cascade Incident Playbook
When traffic spikes and your HPA does not respond, treat it as a P1 incident and work through this sequence:
Step 1: Manual scale immediately. Do not wait for HPA to recover. Buy yourself headroom:
# Override HPA temporarily by scaling the deployment directly
kubectl scale deployment my-app --replicas=20
# HPA will fight this once it recovers — patch HPA to widen the bounds first
kubectl patch hpa my-hpa --type=merge -p '{"spec":{"minReplicas":20,"maxReplicas":50}}'Step 2: Diagnose the metrics gap. Check kubectl get hpa and kubectl describe hpa for the divergence between currentReplicas and desiredReplicas:
kubectl get hpa my-hpa -o yaml | grep -E "currentReplicas|desiredReplicas|conditions"
# If currentReplicas < desiredReplicas, the scale operation itself is failing.
# Common causes: insufficient cluster capacity, PDB blocking, quota exceeded.Step 3: Check the metrics pipeline. If metrics show <unknown>, work top-down:
# Is metrics-server healthy?
kubectl -n kube-system get pods -l k8s-app=metrics-server
kubectl -n kube-system logs deployment/metrics-server --tail=50
# Is the metrics API responding?
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/default/pods" | head
# Does the target deployment have CPU requests?
kubectl get deployment my-app -o jsonpath='{.spec.template.spec.containers[*].resources.requests}'Step 4: Look for the load-balancing fairness issue. If average CPU is moderate but some pods are saturated, your load balancer is sticky or your hash function is hot-spotting. HPA cannot fix a load-balancing problem — adding pods makes it worse because new pods sit idle while old pods stay hot. Check pod-level CPU with kubectl top pods and look for outliers.
Monitoring this proactively. Add alerts on the divergence itself, not on the absolute replica count:
# PrometheusRule
- alert: HPAScalingStuck
expr: kube_horizontalpodautoscaler_status_desired_replicas
- kube_horizontalpodautoscaler_status_current_replicas > 0
for: 5m
labels:
severity: warning
annotations:
summary: "HPA {{ $labels.horizontalpodautoscaler }} wants more pods than it has"This alert fires when HPA wants to scale but cannot, which is the precursor to a 503 cascade. Catching it before the cascade gives you minutes to react instead of seconds.
Blast radius: When HPA fails to scale, every request to the service competes for the same finite pod capacity. Latency rises non-linearly as CPU saturates, downstream calls time out, and clients retry — adding more load. Within a couple of minutes, what should have been a transient spike becomes a sustained outage that requires manual intervention to break.
Recovery: Manual scale is always the right first move. Fix the metrics pipeline second. Address load-balancing fairness third. Only after the service is stable should you tune HPA behavior to prevent recurrence.
Still Not Working?
HPA not found by kubectl describe — if kubectl describe hpa shows FailedComputeMetricsReplicas, check the events section. Common messages:
"unable to fetch metrics from resource metrics API"→ metrics-server not running"missing request for cpu"→ no CPU requests on pod spec"invalid metrics"→ wrong metric type or name in the HPA spec
HPA scales up but Cluster Autoscaler doesn’t add nodes — if all pods are Pending after HPA scales up, the cluster may be out of capacity. Check if Cluster Autoscaler is installed and configured. HPA and Cluster Autoscaler work together: HPA scales pods, Cluster Autoscaler scales nodes.
Verify the metrics API is accessible:
# Test the metrics API directly
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/default/pods"
# Check metrics-server logs
kubectl logs -n kube-system deployment/metrics-serverHPA shows correct metrics but doesn’t scale — check minReplicas and maxReplicas. If REPLICAS already equals maxReplicas, HPA can’t scale up further. Also check if the deployment has a PodDisruptionBudget preventing scale-down.
ResourceQuota blocks new pods — if the namespace has a ResourceQuota for CPU or memory, HPA can request new replicas but the scheduler will refuse to admit them. kubectl describe namespace <ns> shows the quota and current usage. Increase the quota or move the workload to a namespace with headroom.
Slow scale because of stabilizationWindowSeconds — the default scaleUp.stabilizationWindowSeconds is 0 (immediate) but scaleDown is 300 seconds. If your HPA scaled up but is sticky on the way down, that is expected behavior — lower the window only if flapping is acceptable.
HPA conflicts with VPA — running both Horizontal and Vertical Pod Autoscaler on the same deployment can produce thrashing. VPA changes pod resource requests, which changes the HPA utilization percentage, which triggers HPA, which triggers VPA. Use one or the other on a given workload, not both.
For related Kubernetes issues, see Fix: Kubernetes CrashLoopBackOff, Fix: Kubernetes Pod Pending, Fix: Kubernetes OOMKilled, and Fix: Kubernetes Resource Quota Exceeded.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: Kubernetes Secret Not Mounted — Pod Cannot Access Secret Values
How to fix Kubernetes Secrets not being mounted — namespace mismatches, RBAC permissions, volume mount configuration, environment variable injection, and secret decoding issues.
Fix: Kubernetes Pod OOMKilled — Out of Memory Error
How to fix Kubernetes OOMKilled errors — understanding memory limits, finding memory leaks, setting correct resource requests and limits, and using Vertical Pod Autoscaler.
Fix: Helm Not Working — Release Already Exists, Stuck Upgrade, and Values Not Applied
How to fix Helm 3 errors — release already exists, another operation is in progress, --set values not applied, nil pointer template errors, kubeVersion mismatch, hook failures, and ConfigMap changes not restarting pods.
Fix: Docker Secrets Not Working — BuildKit --secret Not Mounting, Compose Secrets Undefined, or Secret Leaking into Image
How to fix Docker secrets — BuildKit secret mounts in Dockerfile, docker-compose secrets config, runtime vs build-time secrets, environment variable alternatives, and verifying secrets don't leak into image layers.