Karpenter Overprovisioning

Eliminate Node Scaling Lag on EKS with Karpenter Overprovisioning

Node autoscaling lag can turn those few extra seconds into downtime or degraded performance.

Lag Between HPA and Node Provisioning

Here’s what happens during a sudden traffic spike in EKS:

  1. Request volume increases.
  2. Metrics like CPU usage or request counts are picked up by tools like Prometheus or the metrics server.
  3. The Horizontal Pod Autoscaler (HPA) spins up new pods.

But each new pod needs CPU and memory. If existing nodes don’t have spare capacity, those pods sit in a Pending state.

This triggers the cluster autoscaler or Karpenter, which then:

  • Requests a new VM from your cloud provider,
  • Waits for the VM to boot and join the cluster,
  • Initializes system components like CNI, CSI, and DaemonSets,
  • Finally schedules the pod.

⏱️ This whole process can take several minutes. That’s a big problem if your app needs to respond within seconds.

Karpenter Static NodePools — The Less Ideal Approach

You could mitigate the “pods waiting for new nodes” problem by maintaining a constant minimum capacity in the cluster. But that means paying for idle resources during low traffic — a direct hit to cost.

Karpenter intentionally avoided the classic “min node” model for a long time. Its core philosophy is to provision capacity on-demand based on unschedulable pod signals, not keep nodes alive unnecessarily.

Recent versions introduced something like Static NodePools to define static minimum capacity, but with notable constraints. In practice, this often just normalizes idle capacity rather than solving the root issue.

Static NodePool Limitations

As the docs mention, these features aren’t available in static NodePools:

  • A NodePool can’t switch between static and dynamic once set
  • Only limits.nodes is allowed in the limits section
  • The weight field is disabled
  • Nodes aren’t considered for consolidation
  • Scale operations bypass node disruption budgets (but respect PDBs)

If you need these features, you might be better off with the classic cluster autoscaler.

A well-designed scaling strategy shouldn’t rely on raising the minimum node count as a safety buffer. Instead, it should manage scaling delays and placement bottlenecks — metric latency, pod startup time, node provisioning, topology spread — through proper mechanisms. Static NodePools as a primary solution often become a permanent cost compromise rather than an optimization.

Overprovisioning & Headroom

So if we shouldn’t keep a minimum number of nodes, how do we avoid the lag when new pods need new nodes?

The answer is simple: keep “dummy” pods sleeping in the cluster. These are pause containers that don’t do real work — they just request CPU and memory without using it.

Overprovisioning keeps pre-schedulable slack capacity ready in the cluster. We implement this using a built-in Kubernetes feature: Pod Priority and Preemption.

We deploy low-priority placeholder pods across the cluster. They request resources and reserve capacity. When real workloads scale up and new pods arrive, the scheduler preempts (evicts) these placeholder pods to make room — no waiting for new nodes.

Once evicted, the placeholder pods go back to Pending. Karpenter sees them, provisions new nodes, and the headroom refills itself. This way, critical workload pods start faster on existing nodes, using the space freed by the placeholders.

Negative PriorityClass

A negative priority pushes these pods to the back of the scheduling and preemption queue, marking them as sacrificial.

The official docs recommend a negative priority for over provisioning placeholders:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: placeholder
value: -1000
globalDefault: false
description: "Negative priority for placeholder pods to enable overprovisioning."

value: The lower the value, the less important the pod.

Pause (Dummy) Deployment

A few important details about the dummy pods:

  • They should spread across nodes (use anti-affinity) to create usable headroom.
  • Their resource requests define how much buffer you want.
apiVersion: apps/v1
kind: Deployment
metadata:
  name: capacity-reservation
  namespace: overprovisioning
spec:
  replicas: 6   # headroom target
  selector:
    matchLabels:
      app.kubernetes.io/name: capacity-placeholder
  template:
    metadata:
      labels:
        app.kubernetes.io/name: capacity-placeholder
      annotations:
        kubernetes.io/description: "Capacity reservation"
    spec:
      priorityClassName: placeholder
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app.kubernetes.io/name: capacity-placeholder
              topologyKey: topology.kubernetes.io/hostname
      containers:
      - name: pause
        image: registry.k8s.io/pause:3.6
        resources:
          requests:
            cpu: "200m"
            memory: "512Mi"
          limits:
            memory: "512Mi"

Headroom is controlled by:

  • replicas (number of placeholders)
  • Resource requests per pod

Key Karpenter Consideration

Important behavior: Some autoscalers, Karpenter included, may treat preferred anti-affinity rules as if they were required when making scaling decisions. That means your placeholder replica count can effectively act like a minimum node count.

Yes, dummy pods cost money. So why not just set a high min value and call it a day?

Fair question. But dummy pods are renewable buffers. When your app scales and placeholders get evicted, the Deployment recreates them, maintaining the buffer for the next scaling event. If multiple apps scale at different times, the placeholder system adapts. A static minimum node count can’t do that — it’s just fixed, permanent overhead.

Overprovisioning with placeholder pods gives you a self-healing, on-demand buffer that only uses resources when absolutely necessary.