Fix: Kubernetes HPA Not Scaling — HorizontalPodAutoscaler Shows Unknown or Doesn't Scale
Quick Answer
How to fix Kubernetes HorizontalPodAutoscaler issues — metrics-server not installed, CPU requests not set, unknown metrics, scale-down delay, custom metrics, and KEDA.
The Problem
A Kubernetes HorizontalPodAutoscaler shows <unknown> for the current metric value:
kubectl get hpa
# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
# my-hpa Deployment/my-app <unknown>/50% 2 10 2 5mOr the HPA doesn’t scale up even when the application is clearly overloaded:
# CPU usage visible in kubectl top pods
kubectl top pods
# NAME CPU(cores) MEMORY(bytes)
# my-app-7d9f8b6c4-xk2p9 950m 256Mi
# But HPA still shows 1 replica and won't scale
kubectl describe hpa my-hpa
# Warning FailedGetScale unable to fetch metrics from resource metrics APIOr the HPA scales up but never scales back down, leaving excess replicas running.
Why This Happens
HPA relies on the metrics API to make scaling decisions. Common failure causes:
metrics-servernot installed — HPA’s default CPU and memory metrics requiremetrics-serverin the cluster. Without it, all metrics show<unknown>.- No CPU
requestson the container — HPA calculates CPU utilization ascurrent usage / requested CPU. Ifresources.requests.cpuis not set, HPA can’t calculate a percentage and shows<unknown>. metrics-servernot accessible —metrics-serveruses kubelet’s resource endpoints. In some setups (kubeadm, kind, minikube), the kubelet’s serving certificate isn’t trusted, requiring--kubelet-insecure-tls.- Scale-down cooldown — by default, HPA waits 5 minutes before scaling down to avoid flapping. Replicas won’t decrease immediately after load drops.
- Wrong metric target type —
Utilization(percentage) vsAverageValue(absolute) have different meanings and requirements.
Fix 1: Install metrics-server
CPU and memory HPA requires metrics-server. Verify it’s installed and working:
# Check if metrics-server is installed
kubectl get deployment metrics-server -n kube-system
# If not found, install with Helm
helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
helm upgrade --install metrics-server metrics-server/metrics-server \
--namespace kube-system
# Or with kubectl
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Verify metrics are working
kubectl top nodes
kubectl top pods -AFor kubeadm, kind, minikube — add --kubelet-insecure-tls:
# The default metrics-server deployment fails in clusters where
# kubelet serving certificates aren't signed by the cluster CA
# Patch the deployment to add the insecure flag
kubectl patch deployment metrics-server -n kube-system \
--type='json' \
-p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'
# Or in Helm values:
helm upgrade --install metrics-server metrics-server/metrics-server \
--namespace kube-system \
--set args[0]="--kubelet-insecure-tls"For minikube:
minikube addons enable metrics-serverFix 2: Set CPU Requests on the Container
HPA requires resources.requests.cpu to calculate utilization percentage:
# WRONG — no resource requests
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
spec:
containers:
- name: my-app
image: my-app:latest
# No resources section → HPA shows <unknown>
---
# CORRECT — set CPU requests (and ideally limits)
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
spec:
containers:
- name: my-app
image: my-app:latest
resources:
requests:
cpu: "200m" # 200 millicores = 0.2 CPU cores
memory: "256Mi"
limits:
cpu: "1000m" # 1 CPU core maximum
memory: "512Mi"Create the HPA targeting CPU utilization:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50 # Scale when avg CPU > 50% of requestsOr with kubectl autoscale:
# Create HPA targeting 50% CPU utilization
kubectl autoscale deployment my-app --cpu-percent=50 --min=2 --max=10
# Verify
kubectl get hpa my-app
kubectl describe hpa my-appFix 3: Configure Scale Behavior to Prevent Flapping
The default scale-down policy is conservative. Customize it for your use case:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
behavior:
scaleUp:
stabilizationWindowSeconds: 60 # Wait 60s before scaling up again
policies:
- type: Pods
value: 4 # Add at most 4 pods at once
periodSeconds: 60
- type: Percent
value: 100 # Or double the current count
periodSeconds: 60
selectPolicy: Max # Use the policy that allows more scaling
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scaling down
policies:
- type: Pods
value: 1 # Remove at most 1 pod at a time
periodSeconds: 120Aggressive scale-down (for cost savings):
behavior:
scaleDown:
stabilizationWindowSeconds: 60 # Shorter wait
policies:
- type: Percent
value: 50 # Remove up to 50% of pods at once
periodSeconds: 60Prevent scale-down entirely (for critical services):
behavior:
scaleDown:
selectPolicy: Disabled # Never scale down (manual only)Fix 4: Use Multiple Metrics
Scale on both CPU and memory, or combine with custom metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 20
metrics:
# CPU utilization
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
# Memory utilization
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70
# Custom metric from Prometheus (requires Prometheus Adapter)
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100" # Scale when each pod handles > 100 req/sNote: When multiple metrics are defined, HPA scales to satisfy ALL of them — it uses the metric that requires the most replicas. This is conservative by design.
Fix 5: Set Up Custom Metrics with Prometheus Adapter
For application-level metrics (queue depth, request rate), use the Prometheus Adapter:
# Install Prometheus Adapter
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter \
--namespace monitoring \
--set prometheus.url=http://prometheus-server.monitoring.svc.cluster.local# Configure the adapter to expose a custom metric
# In prometheus-adapter ConfigMap:
rules:
custom:
- seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
resources:
overrides:
namespace: { resource: "namespace" }
pod: { resource: "pod" }
name:
matches: "^(.*)_total$"
as: "${1}_per_second"
metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[2m])'# HPA using the custom metric
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "50"Fix 6: Use KEDA for Event-Driven Autoscaling
KEDA (Kubernetes Event-Driven Autoscaling) scales based on external event sources like Kafka, Redis, AWS SQS, and more:
# Install KEDA
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace# Scale based on Redis queue length
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: redis-scaledobject
spec:
scaleTargetRef:
name: my-worker
minReplicaCount: 0 # Scale to zero when queue is empty
maxReplicaCount: 30
triggers:
- type: redis
metadata:
address: redis:6379
listName: jobs
listLength: "10" # 1 pod per 10 items in queue
---
# Scale based on Kafka consumer lag
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-consumer
spec:
scaleTargetRef:
name: kafka-worker
minReplicaCount: 1
maxReplicaCount: 50
triggers:
- type: kafka
metadata:
bootstrapServers: kafka:9092
consumerGroup: my-group
topic: events
lagThreshold: "100" # 1 pod per 100 unprocessed messagesNote: KEDA supports scale-to-zero, which the built-in HPA doesn’t. This is useful for batch workloads or workers that should be inactive when there’s no work.
Still Not Working?
HPA not found by kubectl describe — if kubectl describe hpa shows FailedComputeMetricsReplicas, check the events section. Common messages:
"unable to fetch metrics from resource metrics API"→ metrics-server not running"missing request for cpu"→ no CPU requests on pod spec"invalid metrics"→ wrong metric type or name in the HPA spec
HPA scales up but Cluster Autoscaler doesn’t add nodes — if all pods are Pending after HPA scales up, the cluster may be out of capacity. Check if Cluster Autoscaler is installed and configured. HPA and Cluster Autoscaler work together: HPA scales pods, Cluster Autoscaler scales nodes.
Verify the metrics API is accessible:
# Test the metrics API directly
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/default/pods"
# Check metrics-server logs
kubectl logs -n kube-system deployment/metrics-serverHPA shows correct metrics but doesn’t scale — check minReplicas and maxReplicas. If REPLICAS already equals maxReplicas, HPA can’t scale up further. Also check if the deployment has a PodDisruptionBudget preventing scale-down.
For related Kubernetes issues, see Fix: Kubernetes CrashLoopBackOff and Fix: Kubernetes Pod Pending.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: Kubernetes Secret Not Mounted — Pod Cannot Access Secret Values
How to fix Kubernetes Secrets not being mounted — namespace mismatches, RBAC permissions, volume mount configuration, environment variable injection, and secret decoding issues.
Fix: Kubernetes Pod OOMKilled — Out of Memory Error
How to fix Kubernetes OOMKilled errors — understanding memory limits, finding memory leaks, setting correct resource requests and limits, and using Vertical Pod Autoscaler.
Fix: Helm Not Working — Release Already Exists, Stuck Upgrade, and Values Not Applied
How to fix Helm 3 errors — release already exists, another operation is in progress, --set values not applied, nil pointer template errors, kubeVersion mismatch, hook failures, and ConfigMap changes not restarting pods.
Fix: Docker Secrets Not Working — BuildKit --secret Not Mounting, Compose Secrets Undefined, or Secret Leaking into Image
How to fix Docker secrets — BuildKit secret mounts in Dockerfile, docker-compose secrets config, runtime vs build-time secrets, environment variable alternatives, and verifying secrets don't leak into image layers.