Maximizing Pod Density on EKS with VPC CNI Prefix Delegation

When you run Kubernetes on AWS with EKS, you have a choice in how pod networking is handled. Some teams opt for alternative CNI plugins like Cilium or Calico, which use overlay networking — pods get IPs from a separate private address space, traffic is encapsulated, and pods are not directly addressable on the VPC network.

Amazon EKS however ships with its own CNI plugin by default — Amazon VPC CNI — and it takes a fundamentally different approach. Every pod gets a real, routable VPC IP address. No overlays, no encapsulation, no NAT between pods. A pod on one node can talk directly to a pod on another node using its VPC IP, the same way two EC2 instances would communicate. This makes EKS networking operationally clean — your VPC security groups, flow logs, route tables, and network ACLs all work natively with pod traffic without any additional abstraction layer to reason about.

But this design decision has a direct consequence — every pod IP must come from your VPC’s IP address space. And since VPC IPs are a finite resource tied to your subnet CIDRs, how VPC CNI manages those IPs becomes increasingly important as your cluster grows.

By default, VPC CNI assigns individual IP addresses to ENI slots on each node. Each EC2 instance can have multiple ENIs attached, and each ENI supports multiple IP slots — but both of these numbers are hard limits defined by AWS per instance type. For example, an m5.xlarge supports up to 3 ENIs with 15 IP slots each. Subtracting 1 primary IP per ENI, that gives you (3 × (15-1)) + 2 = 44 maximum pods. One slot holds one IP, and one IP goes to one pod. This works well at small scale but as your workloads grow denser, you start hitting these hard per-node limits — and the only way around them in secondary IP mode is to use larger instance types with more ENI and IP capacity.

Prefix delegation changes this model. Instead of assigning a single IP to each ENI slot, it assigns an entire /28 prefix — a block of 16 consecutive IP addresses. Same slot, 16 times the capacity. The result is a dramatic increase in pod density per node and a much larger warm IP pool available at any moment, without requiring additional ENIs to be attached.

This matters because ENI attachment is not instant. In environments with dynamic workloads — spot nodes being replaced, Karpenter scaling up, HPA triggering rapid pod creation — the time it takes to attach new ENIs and populate the IP pool can be the difference between a pod starting successfully and failing with a networking error.

It is less relevant if you are running large pods with high resource requests where pod count per node is naturally low, or if you are using Security Groups per Pod — these two features are mutually incompatible.

How VPC CNI Assigns IPs by Default

To understand why prefix delegation matters, you first need a clear picture of how VPC CNI manages IP addresses today. This is the foundation everything else builds on.

How ENIs and IP Slots Work

When an EC2 node joins your cluster, it comes with one ENI already attached by AWS — the primary ENI. VPC CNI’s ipamd daemon immediately takes ownership of this ENI and starts managing its IP slots as a pool to assign to pods. As that pool fills up, ipamd attaches additional secondary ENIs to the node to expand IP capacity.

Each EC2 instance type has two hard limits defined by AWS:

Maximum number of ENIs that can be attached
Maximum number of IP addresses per ENI

This is an important distinction — AWS provisions the primary ENI, ipamd manages everything after that. It’s the kind of detail that matters when you’re troubleshooting ENI attachment issues or IAM permission errors, because the permissions required for secondary ENI attachment are what the IRSA role we set up in previous article.

What Prefix Delegation Changes

From 1 IP per Slot to /28 per Slot

Prefix delegation changes a single but fundamental aspect of how ipamd requests IP capacity from AWS. Instead of requesting individual IP addresses for each ENI slot, it requests an entire /28 CIDR prefix — a contiguous block of 16 IP addresses — per slot.

From AWS’s perspective, a /28 prefix consumes one ENI slot regardless of how many of those 16 IPs are actually assigned to pods. From ipamd’s perspective, a single ENI attachment operation now yields 16 times the IP capacity compared to secondary IP mode.

The ENI and slot limits imposed by the instance type remain exactly the same. What changes is the yield per slot.

Is it Safe to Enable on a Running Cluster?

This is the question that matters most before making any change to the networking layer of a production cluster. The answer is yes — with a clear understanding of what actually happens during and after the change.

Mixed State — Old ENIs vs Prefix ENIs on the Same Node

When you enable prefix delegation on a running cluster, existing nodes do not get replaced and existing pods are not affected. What changes is ipamd’s behavior going forward on each node.

Specifically, ipamd begins requesting /28 prefixes for any new ENI attachments from that point on. ENIs that were already attached to a node before the change remain in secondary IP mode — their slots continue holding individual IPs and serving existing pods normally. This creates a mixed state on existing nodes where some ENIs use secondary IPs and some use prefixes, depending on when they were attached relative to the configuration change.

This mixed state is explicitly supported by AWS and is the expected transition path. It is not an edge case or an unsupported configuration — it is how the rollout is designed to work.

How Existing Pods Are Unaffected

From a running pod’s perspective, nothing changes. A pod that was assigned an IP from a secondary ENI slot before prefix delegation was enabled continues using that IP. ipamd does not revoke or reassign existing pod IPs during the transition. The pod’s network namespace, routes, and VPC reachability remain completely intact.

The change only affects how new IP capacity is added to the node going forward. Existing capacity is untouched.

Incompatibility with Security Groups per Pod ❗

There is one hard incompatibility to be aware of. If your cluster uses Security Groups per Pod — enabled via ENABLE_POD_ENI: true on the VPC CNI DaemonSet — prefix delegation cannot be used. The two features rely on different ENI assignment models that are architecturally incompatible. Security Groups per Pod requires dedicated ENIs per pod for granular security group assignment, while prefix delegation aggregates multiple pod IPs onto shared ENI slots.

Before enabling prefix delegation, verify this setting:

kubectl get daemonset aws-node -n kube-system \
  -o jsonpath='{.spec.template.spec.containers[0].env}' | \
  jq '.[] | select(.name=="ENABLE_POD_ENI")'

If the value is false or the field is absent, you are safe to proceed.

Instance Type Compatibility

Prefix delegation is not available on all EC2 instance types. Before enabling it, you need to verify that every instance type in your NodePools supports it — a single incompatible node can result in ipamd failing to assign prefixes on that node, causing pod scheduling failures.

The Nitro Requirement

Prefix delegation requires AWS Nitro-based instances. Nitro is AWS’s hypervisor and hardware platform that underpins all modern EC2 instance types. The enhanced networking capabilities that Nitro provides are what make ENI prefix attachment possible at the hardware level.

The practical rule is straightforward — any instance type from the 5th generation onwards is Nitro-based and supports prefix delegation. Earlier generations do not.

Unsupported instance types include:

t2 — not Nitro
m4, c4, r4 — 4th generation, not Nitro
m3, c3, r3 — 3rd generation, not Nitro

Supported instance types include all 5th generation and newer across all families — m5, m6i, m6a, m7i, c5, c6i, c6a, c7i, r5, r6i, t3, t4g, and their variants. Graviton-based instances (m6g, c6g, r6g etc.) are fully supported as well.

Enabling and Verifying

With instance type compatibility confirmed across all NodePools, enabling prefix delegation is a single addon configuration update. Since VPC CNI is already running as a managed addon at this point — as covered in the previous article — the update goes through the EKS addon API rather than a direct DaemonSet edit.

aws eks update-addon \
  --cluster-name <cluster-name> \
  --addon-name vpc-cni \
  --resolve-conflicts PRESERVE \
  --configuration-values '{"env":{"AWS_VPC_ENI_MTU":"9001","WARM_ENI_TARGET":"1","WARM_PREFIX_TARGET":"1","AWS_VPC_K8S_CNI_LOGLEVEL":"DEBUG","AWS_VPC_K8S_PLUGIN_LOG_LEVEL":"DEBUG","ENABLE_PREFIX_DELEGATION":"true"}}' \
  --region <region>

A few important points about this command:

--resolve-conflicts PRESERVE is correct here. Unlike the initial migration where OVERWRITE was required, this is an update to an already managed addon. PRESERVE ensures your existing custom configuration values are not reset to AWS defaults during the update.

The full set of configuration values must be included in every update-addon call. The --configuration-values parameter is not additive — it replaces the entire configuration. Omitting a previously set value like AWS_VPC_ENI_MTU from this call would reset it to the addon default.

Verifying /28 Prefixes via AWS CLI

Once the addon reaches ACTIVE, confirm that prefix delegation is actually active on your nodes by checking the ENI configuration at the AWS level:

# Get a node instance ID
kubectl get nodes -o jsonpath='{.items[0].spec.providerID}' | sed 's|.*/||'

# Check ENI prefixes on that instance
aws ec2 describe-instances \
  --instance-ids <instance-id> \
  --query 'Reservations[].Instances[].NetworkInterfaces[].Ipv4Prefixes' \
  --region <region>

A successful output looks like:

[
    [
        {
            "Ipv4Prefix": "10.10.128.0/28"
        }
    ]
]

The presence of a /28 prefix in the ENI configuration confirms ipamd is operating in prefix delegation mode on that node. Check a few nodes across different NodePools to get a broader confirmation.

Conclusion

When Prefix Delegation Is and Isn’t the Right Choice

Prefix delegation is a targeted solution to a specific problem — the IP capacity and warm pool replenishment limitations of secondary IP mode under dynamic, high-churn workloads. It delivers the most value when your cluster runs high pod density workloads, experiences burst scheduling pressure, or cycles nodes frequently through spot interruptions and Karpenter consolidation.

It is not a universal recommendation. If your pods are large and resource-heavy, your per-node pod count is naturally low and the IP layer is never the bottleneck — enabling prefix delegation adds operational complexity without meaningful benefit. Similarly, if your subnets are small, the /28 prefix allocation pattern consumes address space in larger chunks and requires careful subnet capacity planning before enabling.

When the conditions are right, prefix delegation is one of the highest-leverage configuration changes you can make to a production EKS cluster. A single command unlocks a fundamentally different IP capacity model, the rollout is non-disruptive, and the operational overhead going forward is zero.