Claude Code for EKS Karpenter (2026)
Managing Amazon EKS clusters with Karpenter doesn’t have to be complex. With Claude Code, you can automate Karpenter provisioning, optimize node lifecycle management, and build reproducible infrastructure workflows. This guide walks you through practical examples to integrate Claude Code into your EKS Karpenter operations.
Understanding the EKS Karpenter Workflow
Karpenter is an open-source Kubernetes autoscaler that dynamically provisions nodes based on workload requirements. It replaces the traditional Kubernetes Cluster Autoscaler (CAS) with faster, more cost-effective node provisioning. Understanding the differences helps you see where Claude Code adds the most value:
| Feature | Cluster Autoscaler | Karpenter |
|---|---|---|
| Node provisioning speed | 2–5 minutes | 30–60 seconds |
| Instance selection | Fixed node groups | Any instance type matching constraints |
| Spot diversification | Manual node group per type | Automatic across hundreds of types |
| Bin packing | Limited | Native consolidation |
| AWS integration | ASG-based | Direct EC2 Fleet API |
| Config complexity | Node group YAML | NodePool + EC2NodeClass YAML |
When combined with Claude Code, you can create intelligent automation scripts that handle everything from initial cluster setup to ongoing node lifecycle management. Claude Code excels at generating the YAML configurations, validating policies, identifying misconfigured resources, and writing reusable shell scripts for routine operations.
The typical Karpenter workflow involves four phases:
- Bootstrap. Install Karpenter, create IAM roles, and configure EC2NodeClass
- Provision. Define NodePools that match your workload profiles
- Operate. Monitor node health, costs, and scaling events
- Optimize. Tune consolidation policies and right-size instance selection
Claude Code can assist at every phase. Let’s walk through each one.
Setting Up Claude Code for EKS Karpenter
Before diving into workflows, ensure Claude Code is configured with appropriate AWS credentials and kubectl access. You’ll need:
- AWS CLI configured with appropriate IAM permissions (EKS, EC2, IAM, SNS, Cost Explorer)
- kubectl configured for your EKS cluster
- Karpenter installed on your cluster (v0.37+ for the stable v1 API)
- helm and eksctl are helpful but not required
Once Claude Code has access to your terminal and file system, you can begin creating automation scripts. Start by verifying your cluster connectivity and Karpenter health:
Authenticate to the cluster
aws eks update-kubeconfig --name your-cluster-name --region us-west-2
Verify node and pod status
kubectl get nodes -o wide
kubectl get pods -n karpenter
Check Karpenter version and controller logs
kubectl get deployment karpenter -n karpenter -o jsonpath='{.spec.template.spec.containers[0].image}'
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=50
If Karpenter is not yet installed, ask Claude Code to generate the full bootstrap commands for your account and region. Claude Code can produce the complete eksctl or helm installation sequence, including the required IAM role for service accounts (IRSA) setup:
Ask Claude Code to generate this for your specific account
export KARPENTER_VERSION="1.1.1"
export CLUSTER_NAME="production-eks"
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export AWS_REGION="us-west-2"
Install Karpenter via Helm
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
--version "${KARPENTER_VERSION}" \
--namespace karpenter \
--create-namespace \
--set "settings.clusterName=${CLUSTER_NAME}" \
--set "settings.interruptionQueue=${CLUSTER_NAME}" \
--set controller.resources.requests.cpu=1 \
--set controller.resources.requests.memory=1Gi \
--set controller.resources.limits.cpu=1 \
--set controller.resources.limits.memory=1Gi \
--wait
Creating Karpenter NodeClasses and Provisioners with Claude Code
One of the most powerful applications of Claude Code is generating Karpenter configurations from a plain-language description. Instead of memorizing the exact YAML schema, you describe your requirements conversationally.
EC2NodeClass
The EC2NodeClass defines AWS-specific settings that all NodePools in a cluster share or reference:
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2023 # Amazon Linux 2023
role: KarpenterNodeRole # IAM role attached to provisioned nodes
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: production
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: production
amiSelectorTerms:
- alias: al2023@latest # Always use the latest AL2023 AMI
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 50Gi
volumeType: gp3
iops: 3000
throughput: 125
encrypted: true
userData: |
#!/bin/bash
/etc/eks/bootstrap.sh production-eks
echo "vm.max_map_count=262144" >> /etc/sysctl.conf
sysctl -p
tags:
Environment: production
ManagedBy: karpenter
Claude Code can generate this file after you describe your VPC tagging scheme, the IAM role name, and any custom userData requirements. It will also validate that your subnet and security group selector tags exist in your account.
Provisioner for Compute-Intensive Workloads
For a Provisioner targeting compute-intensive workloads, you might ask Claude Code to generate a configuration that:
- Uses On-Demand instances for reliability
- Targets specific instance types
- Sets appropriate consolidation policies
- Defines TTL for idle nodes
Claude Code produces the complete YAML:
apiVersion: karpenter.sh/v1
kind: Provisioner
metadata:
name: compute-workload
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: node.kubernetes.io/instance-type
operator: In
values: ["c5.2xlarge", "c5.4xlarge", "c6i.2xlarge", "c6i.4xlarge", "m5.2xlarge"]
limits:
cpu: 1000
memory: 1000Gi
consolidation:
enabled: true
ttlSecondsAfterEmpty: 300
ttlSecondsUntilExpired: 86400 # Recycle nodes after 24h for security patching
providerRef:
name: default
Note the use of ttlSecondsUntilExpired: 86400. This ensures nodes are regularly recycled, which keeps them on the latest AMI and reduces the risk of long-lived configuration drift. Claude Code can help you choose an appropriate TTL based on your patching policy.
Building Automated Node Pool Management
NodePools allow you to define different node characteristics for various workload types. A well-structured production environment typically uses three or more NodePools with different cost and performance profiles:
General Purpose NodePool
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: general-purpose
spec:
template:
metadata:
labels:
workload-type: general
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["m", "c"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["4"] # Prefer newer generations
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
name: default
taints: [] # No taints. accepts all workloads
limits:
cpu: 500
memory: 2000Gi
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 30s
expireAfter: 720h # 30 days
Memory-Optimized NodePool
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: memory-optimized
spec:
template:
metadata:
labels:
workload-type: memory
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["r"] # r6i, r7i, r8g families
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["5"]
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
name: default
taints:
- key: workload-type
value: memory
effect: NoSchedule # Only accept pods that tolerate this taint
limits:
cpu: 200
memory: 4000Gi
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 60s
Spot NodePool for Batch Workloads
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: spot-batch
spec:
template:
metadata:
labels:
workload-type: spot-batch
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
- key: karpenter.k8s.aws/instance-size
operator: NotIn
values: ["nano", "micro", "small"] # Avoid tiny instances
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"] # Both arches for flexibility
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
name: default
taints:
- key: workload-type
value: spot-batch
effect: NoSchedule
limits:
cpu: 1000
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 10s
budgets:
- nodes: "10%" # Never disrupt more than 10% of nodes at once
Claude Code can generate these configurations and also create corresponding Helm values files for GitOps deployments. Ask it to produce a Helmfile or Kustomize overlay that manages all three NodePools as a single unit.
Matching Workloads to NodePools
Pods select a NodePool through nodeSelector or nodeAffinity. Here is a deployment using the memory-optimized pool:
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-cache
spec:
replicas: 3
selector:
matchLabels:
app: redis-cache
template:
metadata:
labels:
app: redis-cache
spec:
nodeSelector:
workload-type: memory # Matches the NodePool label
tolerations:
- key: workload-type
value: memory
effect: NoSchedule # Required to schedule on the tainted pool
containers:
- name: redis
image: redis:7-alpine
resources:
requests:
memory: "4Gi"
cpu: "500m"
limits:
memory: "4Gi"
Implementing Cost Optimization Workflows
Karpenter’s consolidation capability is one of its strongest cost levers. Claude Code can help you implement comprehensive strategies at multiple levels.
Right-Sizing Recommendations
Ask Claude Code to analyze your current workload patterns and recommend NodePool configurations:
Collect current node usage data
kubectl top nodes --sort-by=cpu > node-utilization.txt
kubectl get nodes -o json | jq '
.items[] | {
name: .metadata.name,
instance_type: .metadata.labels["node.kubernetes.io/instance-type"],
capacity_type: .metadata.labels["karpenter.sh/capacity-type"],
cpu_capacity: .status.capacity.cpu,
memory_capacity: .status.capacity.memory
}
' > node-inventory.json
Collect pod resource requests per node
kubectl get pods -A -o json | jq '
[.items[] | select(.spec.nodeName != null) | {
node: .spec.nodeName,
pod: .metadata.name,
namespace: .metadata.namespace,
cpu_request: (.spec.containers[0].resources.requests.cpu // "0"),
memory_request: (.spec.containers[0].resources.requests.memory // "0")
}]
' > pod-requests.json
Feed these files to Claude Code:
Review node-utilization.txt, node-inventory.json, and pod-requests.json.
Identify:
1. Nodes where CPU usage is below 20%. these are consolidation candidates
2. Any instance types being provisioned that are consistently oversized
3. Workloads that have no resource requests set (they block effective bin-packing)
Recommend specific NodePool changes to reduce our monthly compute cost.
Budget Alerts and Automation
Create a Claude Code script that monitors Karpenter-driven costs and triggers alerts:
#!/bin/bash
karpenter-cost-monitor.sh. run daily via cron or GitHub Actions
set -euo pipefail
CLUSTER_NAME="production-eks"
BUDGET_THRESHOLD="${COST_THRESHOLD:-1000}"
SNS_TOPIC_ARN="${COST_ALERT_SNS_ARN:?}"
Get current month's EC2 costs for nodes tagged with our cluster
COST=$(aws ce get-cost-and-usage \
--time-period "Start=$(date +%Y-%m-01),End=$(date +%Y-%m-%d)" \
--granularity MONTHLY \
--metrics UnblendedCost \
--filter '{
"And": [
{"Dimensions": {"Key": "SERVICE", "Values": ["Amazon EC2"]}},
{"Tags": {"Key": "karpenter.sh/nodepool", "Values": ["general-purpose", "memory-optimized", "spot-batch"]}}
]
}' \
--query "ResultsByTime[0].Total.UnblendedCost.Amount" \
--output text)
echo "Current month Karpenter EC2 cost: \$$COST"
if python3 -c "import sys; sys.exit(0 if float('$COST') > $BUDGET_THRESHOLD else 1)"; then
MESSAGE="Karpenter EC2 cost for ${CLUSTER_NAME} has reached \$${COST} this month, exceeding threshold of \$${BUDGET_THRESHOLD}."
echo "ALERT: $MESSAGE"
aws sns publish \
--topic-arn "$SNS_TOPIC_ARN" \
--subject "Karpenter Cost Alert: ${CLUSTER_NAME}" \
--message "$MESSAGE"
fi
Spot Interruption Handling
Karpenter automatically handles Spot interruption notices via the SQS interruption queue, but you should also configure Pod Disruption Budgets to prevent too many pods from being evicted at once:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-server-pdb
spec:
minAvailable: 2 # Always keep at least 2 replicas running during disruptions
selector:
matchLabels:
app: api-server
Ask Claude Code to audit all your Deployments and generate PDB manifests for any that are missing one:
Review all Deployments in the default and production namespaces.
For any Deployment with 2 or more replicas that does not have a PodDisruptionBudget,
generate a PDB that keeps at least 50% of replicas available at all times.
Output each PDB as a separate YAML file named {deployment-name}-pdb.yaml.
Disaster Recovery and Migration
Claude Code can also assist with disaster recovery scenarios involving Karpenter. A solid backup and validation workflow ensures you can recover quickly from misconfigurations or cluster failures.
Backup All Karpenter Resources
#!/usr/bin/env bash
karpenter-backup.sh
set -euo pipefail
BACKUP_DIR="karpenter-backup-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BACKUP_DIR"
Export all Karpenter custom resources
echo "Exporting NodePools..."
kubectl get nodepools -o yaml > "$BACKUP_DIR/nodepools.yaml"
echo "Exporting EC2NodeClasses..."
kubectl get ec2nodeclasses -o yaml > "$BACKUP_DIR/ec2nodeclasses.yaml"
echo "Exporting NodeClaims (currently provisioned nodes)..."
kubectl get nodeclaims -o yaml > "$BACKUP_DIR/nodeclaims.yaml"
Also export Karpenter controller config
echo "Exporting Karpenter configmaps..."
kubectl get configmap -n karpenter -o yaml > "$BACKUP_DIR/karpenter-configmaps.yaml"
echo "Backup saved to $BACKUP_DIR/"
tar czf "${BACKUP_DIR}.tar.gz" "$BACKUP_DIR/"
echo "Archive: ${BACKUP_DIR}.tar.gz"
Validate Cluster State Before Maintenance
Before applying Karpenter updates or making significant changes, validate the cluster is in a healthy state:
#!/usr/bin/env bash
pre-maintenance-check.sh
set -euo pipefail
echo "=== Karpenter Controller Health ==="
kubectl get deployment karpenter -n karpenter
kubectl get pods -n karpenter -o wide
echo ""
echo "=== NodePool Status ==="
kubectl get nodepools
echo ""
echo "=== NodeClaim Status ==="
kubectl get nodeclaims
echo ""
echo "=== Recent Karpenter Events ==="
kubectl get events -n karpenter --sort-by='.lastTimestamp' | tail -20
echo ""
echo "=== Nodes Not Ready ==="
kubectl get nodes | grep -v " Ready " || echo "All nodes are Ready"
echo ""
echo "=== Pending Pods ==="
kubectl get pods -A --field-selector=status.phase=Pending | head -20 || echo "No pending pods"
Pipe the output to Claude Code and ask:
Review the pre-maintenance check output and tell me:
1. Is the cluster in a healthy state to apply a Karpenter upgrade?
2. Are there any NodeClaims in an unexpected state?
3. Are the pending pods caused by resource constraints or node taints?
Rolling Back a Bad NodePool Configuration
If a NodePool configuration causes nodes to fail to provision, you can restore from backup:
Identify the broken NodePool
kubectl describe nodepool general-purpose | tail -30
kubectl get events --field-selector reason=FailedProvisioning -n karpenter
Roll back to the last known-good configuration
kubectl apply -f karpenter-backup-20260314-120000/nodepools.yaml
Verify recovery
kubectl get nodepools
kubectl get nodeclaims
Monitoring Karpenter with Prometheus and Grafana
Karpenter exposes a Prometheus metrics endpoint. Claude Code can generate the ServiceMonitor and alerting rules needed to ingest these metrics:
karpenter-service-monitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: karpenter
namespace: karpenter
labels:
release: prometheus # Must match your Prometheus operator selector label
spec:
selector:
matchLabels:
app.kubernetes.io/name: karpenter
endpoints:
- port: http-metrics
interval: 30s
path: /metrics
Key Karpenter metrics to alert on:
| Metric | Alert Condition | Meaning |
|---|---|---|
karpenter_nodes_total |
Rapid increase | Possible runaway scaling |
karpenter_provisioner_scheduling_duration_seconds |
p99 > 60s | Provisioning is slow |
karpenter_interruption_actions_performed_total |
Spike | High Spot interruption rate |
karpenter_deprovisioning_actions_performed_total |
Stops increasing | Consolidation is stalled |
karpenter_nodeclaims_terminating |
Stuck > 10 min | Node drain issue |
Ask Claude Code to generate a Prometheus PrometheusRule for these alerts, then a Grafana dashboard JSON that visualizes node count, cost per hour, and consolidation efficiency over time.
Best Practices for Claude Code and Karpenter
To get the most out of your Claude Code and Karpenter integration:
- Version Control Everything
Store all Karpenter configurations in Git and use Claude Code to generate diff reviews before applying changes. A useful pre-apply workflow:
Preview what changes will be applied
kubectl diff -f nodepools/ 2>/dev/null | head -100
Paste the diff into Claude Code and ask it to flag any changes that could disrupt running workloads.
- Always Use Dry Run Mode
Never apply Claude Code-generated Kubernetes manifests without previewing them first:
kubectl apply --dry-run=client -f nodepool-update.yaml
kubectl apply --dry-run=server -f nodepool-update.yaml # Validates against live API
- Implement Budget Limits on Every NodePool
A misconfigured workload can cause Karpenter to provision thousands of nodes. Always set limits in your NodePool definitions and configure AWS Service Quotas alerts as a second safety layer.
- Use Weighted NodePools for Spot Fallback
Rather than a single mixed NodePool, use weight to define a preference order. On-Demand as fallback for Spot:
In your spot NodePool spec:
spec:
weight: 100 # Higher weight = preferred by scheduler
In your on-demand fallback NodePool spec:
spec:
weight: 1 # Lower weight = used only when spot is unavailable
- Monitor Continuously
Use Claude Code to generate monitoring dashboards and alerting rules for Karpenter metrics. Schedule the pre-maintenance health check script to run daily and pipe its output into a Slack channel for passive visibility.
- Document Your Workflows
Have Claude Code generate documentation for your Karpenter setup as you build your automation. A simple prompt at the end of each work session:
Review the YAML files I've created today for Karpenter and write a
RUNBOOK.md section explaining: what each NodePool is for,
how workloads select them, and the steps to add a new NodePool type.
Troubleshooting Common Issues
Claude Code is particularly useful for troubleshooting because it can correlate information from multiple sources, controller logs, events, node status, and pod conditions, at once.
Pods Stuck in Pending
Collect diagnostic data
kubectl describe pod <pending-pod-name> -n <namespace> > pod-describe.txt
kubectl get events -n karpenter --sort-by='.lastTimestamp' > karpenter-events.txt
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=100 > karpenter-logs.txt
Feed all three files to Claude Code:
A pod has been Pending for 10 minutes. Review pod-describe.txt,
karpenter-events.txt, and karpenter-logs.txt and explain why
Karpenter is not provisioning a node for this pod.
Is it a resource request issue, a NodePool selector mismatch,
a taint/toleration problem, or an AWS capacity issue?
Node Consolidation Not Working
kubectl get nodepools -o yaml > nodepools.yaml
kubectl get nodeclaims -o wide > nodeclaims.txt
kubectl top nodes > node-utilization.txt
Review nodepools.yaml, nodeclaims.txt, and node-utilization.txt.
I have nodes that appear underutilized but are not being consolidated.
What conditions would prevent Karpenter from consolidating these nodes?
Check for: PodDisruptionBudgets, do-not-disrupt annotations, non-evictable system pods, or consolidation policy settings.
Conclusion
Claude Code transforms Karpenter management from manual configuration to intelligent automation. By using Claude Code’s ability to understand infrastructure requirements and generate valid Kubernetes manifests, you can streamline your EKS operations while maintaining best practices.
The workflows in this guide cover the full operational lifecycle: bootstrapping a Karpenter installation, defining NodePools for different workload profiles, implementing cost optimization and alerting, setting up monitoring, and handling disaster recovery. Each workflow is designed to be composable, start with simple Provisioner generation and progressively build more sophisticated automation as your Karpenter deployment grows.
The key is to treat Claude Code as a collaborative partner: describe your requirements clearly, review generated configurations before applying them (always use --dry-run), and let automation handle the repetitive tasks while you focus on architecture decisions. With this approach, you get the speed and cost efficiency of Karpenter without the operational toil of managing complex Kubernetes infrastructure manually.
Try it: Paste your error into our Error Diagnostic for an instant fix.
Related Reading
- Claude Code for EKS IRSA Workflow
- AI Assisted Architecture Design Workflow Guide
- AI Assisted Code Review Workflow Best Practices
Built by theluckystrike. More at zovo.one
Know your costs → Use our Claude Code Cost Calculator to estimate your monthly spend.