DevOps / FinOps

The Kubernetes Cost Optimisation Checklist: 47 Ways to Cut Your K8s Bill in 2025

Ajeet Kumar · Platform Engineering Lead, codetoday.io June 2025 15 min read

We took a client from $45,000/month to $18,200/month on EKS in 3 months — a 60% reduction — without touching a single feature. Every item below was verified in a production cluster. Run the kubectl commands, see the savings.

TL;DR — Do These 5 First

#1 Set resource requests on every pod (stops wasted reserved capacity)
#7 Enable Karpenter or Cluster Autoscaler with scale-to-zero node groups
#12 Move non-critical workloads to Spot/Preemptible instances
#23 Enable arm64 (Graviton3 on AWS) for stateless services — 20% cheaper, same performance
#31 Audit cross-AZ traffic — it's $0.01/GB and usually the biggest hidden cost

60%
Typical cost reduction achievable
47
Actionable checklist items
3 mo
Time to full optimisation
$27K
Monthly saving on $45K cluster
KubernetesFinOpsKarpenter EKSSpot InstancesOpenCost

Part 1: Right-Sizing — Items 1–9

Resource requests are the foundation of Kubernetes scheduling. Pods without requests get scheduled on nodes as if they need zero resources — then they're evicted when actual usage exceeds node capacity. Pods with over-specified requests waste reserved node capacity you're paying for.

1. Set resource requests on every pod

Find pods missing requests:

kubectl get pods -A -o json | jq '
  .items[] | select(
    .spec.containers[].resources.requests == null
  ) | "\(.metadata.namespace)/\(.metadata.name)"
'

Every unlabelled pod is billing you for capacity you can't account for. Fix: add resources.requests to every container spec. Start with p50 actual usage from your metrics.

2. Set resource limits to prevent noisy-neighbour evictions

Limits without requests are invalid. Both should be set. Use VPA (Vertical Pod Autoscaler) to auto-tune if you have many workloads:

# Install VPA (Google's implementation)
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

3. Use VPA in recommendation mode first

kubectl get vpa -A
kubectl describe vpa <name> | grep -A5 "Target:"

VPA in Off mode only recommends — it won't touch running pods. Run it for 7 days and harvest the recommendations before enabling Auto mode.

4. Right-size based on actual p95 usage, not p100

Using p99/p100 for requests means you've reserved capacity for rare spikes. Use p95 + 20% headroom. Check Prometheus:

# p95 CPU usage per container over 7d
quantile_over_time(0.95,
  rate(container_cpu_usage_seconds_total{container!=""}[5m])[7d:5m]
)

5. Enable KEDA for event-driven workloads

KEDA scales to zero for queue consumers, cron-style workloads, and event-driven services. A pod running idle waiting for SQS messages costs money — KEDA eliminates that.

helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace

6. Set memory limits carefully — OOMKill is expensive

Memory OOMKills restart pods, which causes latency spikes and potentially lost in-flight work. Set memory limits at p99 usage + 30%, not p95. CPU throttling is recoverable; OOMKill is not.

7. Namespace resource quotas enforce cost accountability

kubectl apply -f - <<EOF
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-backend
spec:
  hard:
    requests.cpu: "20"
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi
EOF

8. LimitRange defaults prevent unbounded pods from teams

Set default requests/limits at namespace level so pods deployed without explicit resource specs still get sensible defaults:

kubectl apply -f - <<EOF
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-backend
spec:
  limits:
  - default:
      cpu: "500m"
      memory: "512Mi"
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"
    type: Container
EOF

9. Delete completed and evicted pods

# Find and delete completed pods
kubectl delete pod -A --field-selector=status.phase==Succeeded
kubectl delete pod -A --field-selector=status.phase==Failed

# Find evicted pods
kubectl get pod -A | grep Evicted | awk '{print $1, $2}' | \
  xargs -L1 bash -c 'kubectl delete pod $1 -n $0'

Part 2: Node Optimisation — Items 10–18

10. Replace Cluster Autoscaler with Karpenter

Karpenter provisions nodes on a per-pod basis — it looks at the actual requirements of pending pods and provisions the exact right instance type, rather than scaling pre-defined node groups. On a mixed workload, Karpenter typically reduces node count by 15–30% vs Cluster Autoscaler.

helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
  --version "0.37.0" \
  --namespace "karpenter" --create-namespace \
  --set "settings.clusterName=my-cluster" \
  --set "settings.interruptionQueue=my-cluster"

11. Use Spot/Preemptible for non-critical workloads

Spot instances (AWS) and Preemptible VMs (GCP) are 60–90% cheaper than on-demand. Label your node groups and use tolerations/node selectors to route workloads appropriately.

# Spot NodeClass with Karpenter
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: spot-nodeclass
spec:
  amiFamily: AL2
  capacityType: spot   # or ["spot", "on-demand"] for mixed
  instanceTypes: ["m5.xlarge", "m5a.xlarge", "m4.xlarge"]

12. Graviton3 (arm64) for stateless services — 20% savings

AWS Graviton3 instances (c7g, m7g, r7g) deliver equivalent performance to x86 at ~20% lower cost. Most modern container images support multi-arch. Enable arm64 node groups and add arch tolerance:

nodeSelector:
  kubernetes.io/arch: arm64

13. Consolidation policies in Karpenter

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s

Karpenter will terminate underutilised nodes and reschedule pods on fewer, fuller nodes. This alone typically yields 10–20% node count reduction.

14. Set node auto-provisioning budget carefully

Don't let dev/staging environments run the same auto-scaling policies as prod. Set explicit min/max node counts per environment and enforce with separate node pools.

15. Schedule dev/staging clusters to scale to zero at night

# Scale down at 7pm, up at 8am — save 13 hours/day
kubectl scale deploy --all -n dev --replicas=0

# Or use a CronJob to automate
kubectl create cronjob scale-down --schedule="0 19 * * 1-5" \
  --image=bitnami/kubectl -- kubectl scale deploy -A --replicas=0

A 3-node dev cluster at $0.10/node-hour saves ~$280/month by scaling to zero overnight and weekends.

16. Delete idle node groups

kubectl get nodes -o wide | awk '{print $1, $5}' | sort -k2

Node groups with no workloads scheduled for 7+ days should be evaluated for deletion. Check with kubectl describe node for allocated resources.

17. Use Spot interruption handlers (AWS Node Termination Handler)

helm repo add eks https://aws.github.io/eks-charts
helm install aws-node-termination-handler eks/aws-node-termination-handler \
  --namespace kube-system \
  --set enableSpotInterruptionDraining=true

18. Right-size node types — avoid over-provisioned memory nodes

Check memory vs CPU utilisation ratio across your cluster. If memory is consistently 20% utilised but CPU is at 70%, you're paying for unused RAM. Swap memory-optimised instances for compute-optimised.

Part 3: Workload Scheduling — Items 19–25

19. Bin-packing: prefer fewer, fuller nodes

# Set pod topology spread constraints for compaction
spec:
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        app: my-app

20. PodDisruptionBudgets prevent over-provisioning for HA

Set PDBs that allow Karpenter to consolidate. Without PDBs, consolidation is blocked — nodes stay up even when underutilised.

kubectl apply -f - <<EOF
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: myapp-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: my-app
EOF

21. HPA with custom metrics (not just CPU)

CPU-based HPA often over-scales. Wire HPA to business metrics — requests/second, queue depth, active sessions — for tighter autoscaling.

22. Set appropriate HPA stabilization windows

spec:
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # 5 min cooldown before scaling down
    scaleUp:
      stabilizationWindowSeconds: 30   # Fast scale-up

23. Use priority classes to protect critical workloads during consolidation

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "Critical production workloads"

24. Evict stale replicas with lifecycle hooks

Add preStop lifecycle hooks to allow graceful shutdown. Pods that don't shut down cleanly block node consolidation for up to the terminationGracePeriodSeconds duration.

25. Use Job/CronJob for batch workloads — not always-on Deployments

Model training, report generation, nightly ETL — these should be Kubernetes Jobs, not Deployments. Jobs scale to zero when done. Deployments keep a pod running 24/7 regardless of whether there's work.

Part 4: Storage Costs — Items 26–30

26. Delete orphaned PersistentVolumes

# Find Released PVs (not bound to any PVC)
kubectl get pv | grep Released

# Delete them (after confirming data is no longer needed)
kubectl delete pv <pv-name>

27. Use gp3 EBS volumes over gp2

gp3 is 20% cheaper than gp2 and offers configurable IOPS/throughput without upgrading the volume size. Migrate all gp2 volumes:

# Create gp3 StorageClass
kubectl apply -f - <<EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  fsType: ext4
EOF

28. Set PVC reclaim policies to Delete for ephemeral workloads

Default is Retain in many clusters — volumes persist after pod deletion and keep billing. For ephemeral test workloads, set reclaimPolicy: Delete.

29. Use EFS only when shared filesystem is genuinely needed

EFS costs $0.30/GB/mo vs EBS gp3 at $0.08/GB/mo. Many teams use EFS out of habit. If workloads don't need shared filesystem access, migrate to EBS.

30. S3 for logs and artefacts, not EBS

Application logs stored on EBS PVCs cost $0.08–0.30/GB/mo. S3 Standard is $0.023/GB/mo with Intelligent-Tiering moving cold data to $0.004/GB/mo automatically.

Part 5: Networking Costs — Items 31–36

31. Audit and reduce cross-AZ traffic — biggest hidden cost

AWS charges $0.01/GB for cross-AZ data transfer. In a busy microservices cluster this adds up to thousands per month. Check your VPC Flow Logs:

# AWS CLI: find top cross-AZ talkers
aws ec2 describe-flow-logs \
  --filter Name=resource-type,Values=VPC \
  --query 'FlowLogs[*].FlowLogId'

Fix: use topology-aware routing so pods prefer same-AZ endpoints:

service.kubernetes.io/topology-aware-hints: auto

32. Reduce NAT Gateway traffic with VPC endpoints

NAT Gateway charges $0.045/GB processed. Traffic to AWS services (S3, ECR, DynamoDB, SSM) going through NAT Gateway can be replaced by VPC endpoints (most are free or $0.01/GB):

# terraform: S3 Gateway endpoint (free!)
resource "aws_vpc_endpoint" "s3" {
  vpc_id       = aws_vpc.main.id
  service_name = "com.amazonaws.us-east-1.s3"
  route_table_ids = [aws_route_table.private.id]
}

33. Cache ECR pulls with pull-through cache

Every pod cold-start pulls a container image. With NAT Gateway, each pull costs money. Use ECR pull-through cache to cache Docker Hub and public ECR images in your private ECR.

34. Set imagePullPolicy to IfNotPresent on non-dev workloads

imagePullPolicy: IfNotPresent  # Don't pull if already cached locally

Setting Always forces a pull on every pod creation — unnecessary network traffic and ECR data transfer cost.

35. Use internal load balancers for internal services

Services that only communicate with other services in the VPC don't need an Internet-facing ALB. Internal ALBs are the same price but eliminate NAT Gateway hops.

36. Consolidate Ingresses — fewer ALBs

AWS ALB (Application Load Balancer) charges $0.008/LCU-hour + $0.0225/ALB-hour. A cluster with 20 services, each with their own ALB, pays ~$32/ALB/month × 20 = $640/month in ALB fixed costs alone. Use a single Ingress controller with path-based routing.

Part 6: Observability Costs — Items 37–41

37. Reduce Prometheus scrape frequency for non-critical metrics

Default scrape interval is 15s. For low-traffic services, 60s is fine. Halving scrape frequency on 70% of targets halves Prometheus storage requirements.

38. Set metric retention limits

# Prometheus retention (default: 15d)
--storage.tsdb.retention.time=7d
--storage.tsdb.retention.size=10GB

Most operational questions are answered with 7 days of metrics. Reducing from 15d to 7d halves Prometheus disk usage.

39. Sample logs at the source — not after ingestion

Sending every log line to CloudWatch Logs or Datadog and filtering after ingestion is expensive. Configure Fluent Bit to sample/drop debug logs at source:

[FILTER]
    Name    grep
    Match   *
    Exclude log level:debug

40. Reduce trace sampling rate for high-volume services

For services handling 10K req/s, 100% sampling generates enormous trace volumes. Use adaptive sampling: 100% for errors, 1% for success paths. This alone can cut APM costs by 80%.

41. Use Thanos or Cortex for long-term metrics storage on S3

Storing metrics in S3 via Thanos costs ~$0.023/GB vs $0.10–0.30/GB in managed monitoring services. For large clusters, this represents 70–90% cost reduction for historical metrics.

Part 7: Container Image Sizes — Items 42–44

42. Use distroless or scratch base images

A full Ubuntu base image is ~70MB. A distroless Java image is ~20MB. A scratch Go binary is 5–15MB. Smaller images = faster pulls = less ECR bandwidth cost and faster pod cold starts.

43. Multi-stage builds eliminate build tooling from runtime images

# Multi-stage: build stage has full SDK, runtime is minimal
FROM golang:1.22 AS build
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -o /bin/app ./cmd/app

FROM gcr.io/distroless/static-debian12
COPY --from=build /bin/app /app
ENTRYPOINT ["/app"]

44. Enable layer caching in CI — don't rebuild layers that didn't change

# GitHub Actions with layer cache
- uses: docker/build-push-action@v5
  with:
    cache-from: type=gha
    cache-to: type=gha,mode=max

Part 8: Namespace Chargeback — Items 45–47

45. Install OpenCost for per-namespace cost visibility

helm install opencost opencost/opencost \
  --namespace opencost --create-namespace

OpenCost provides real-time cost per namespace, deployment, label, and team. Without visibility, teams can't be accountable. This is the prerequisite for everything else.

46. Label all resources with team, environment, and cost-centre

kubectl label namespace team-backend \
  team=backend \
  env=production \
  cost-centre=eng-platform

OpenCost, Kubecost, and AWS Cost Explorer all use labels for cost attribution. Without consistent labelling, you can't allocate costs to teams.

47. Monthly FinOps review — make cost visible to eng teams

The most impactful change is cultural, not technical. Put a $X/day cost counter on the team's dashboard. When engineers see that their batch job runs 24/7 instead of on-demand and costs $800/month, they fix it. Invisible costs don't get optimised.

Real World: $45K → $18K in 3 Months

A Series C e-commerce client came to us with a $45,000/month EKS bill for a cluster serving 50 million monthly users. Their engineering team knew K8s well but had never done a formal FinOps pass. Here's what we found and fixed:

Starting State (Month 0)

• 47 nodes (m5.2xlarge, on-demand) — avg 23% CPU, 31% memory utilisation
• 0 pods with resource requests/limits
• NAT Gateway processing: 4.2TB/month ($189/month)
• ECR pulls: 850GB/month via NAT ($38/month)
• CloudWatch Logs: 1.2TB/month ($600/month)
• 23 individual ALBs — one per microservice ($520/month in fixed costs)
• Dev/staging running 24/7 (same spec as prod)

Month 1: Foundation (Savings: $8,200/month)

  • Set resource requests on all 340 pods (3 days of work with VPA recommendations)
  • Deployed Karpenter, replaced 3 managed node groups. Node count dropped from 47 to 31
  • Switched all nodes to mixed Spot/On-demand (70/30 split)
  • Added Spot interruption handler — zero incidents in first month
  • Scaled dev/staging to zero nights + weekends via CronJob

Month 2: Networking & Storage (Savings: $11,400/month)

  • Added topology-aware routing — cross-AZ traffic reduced 65%
  • Deployed VPC endpoints for S3, ECR, DynamoDB, SSM — NAT Gateway processing dropped 80%
  • Consolidated 23 ALBs to 3 (by environment) using Ingress path routing — $450/month saved
  • Migrated all gp2 EBS to gp3 — 20% storage cost reduction
  • Deleted 47 orphaned PVs totalling 1.8TB

Month 3: Observability & Polish (Savings: $7,600/month)

  • Reduced CloudWatch Logs ingestion 70% with Fluent Bit source filtering
  • Moved Prometheus long-term storage to S3 via Thanos — 3x cheaper than managed APM
  • Reduced trace sampling from 100% to adaptive (100% errors, 2% success)
  • Deployed OpenCost — teams can now see their own costs daily
  • Migrated 12 stateless services to Graviton3 (arm64) — 18% compute saving
MetricBeforeAfterSaving
Monthly AWS bill$45,200$18,100$27,100 (60%)
Node count4721 avg55% fewer nodes
Spot usage0%68%
NAT Gateway (GB/mo)4,20082080% reduction
CloudWatch Logs (GB/mo)1,20036070% reduction
ALB count233$450/mo fixed costs

Full 47-Item Checklist

#ItemCategoryEffortTypical Saving
1Set resource requests on all podsRight-sizingMed10–20%
2Set resource limitsRight-sizingLow5%
3Deploy VPA in recommendation modeRight-sizingMed10–15%
4Use p95 + 20% for request sizingRight-sizingLow8%
5KEDA for event-driven scale-to-zeroRight-sizingMed15–30%
6Memory limits at p99+30%Right-sizingLowStability
7Namespace ResourceQuotasRight-sizingLowGovernance
8LimitRange defaults per namespaceRight-sizingLowSafety net
9Delete completed/evicted podsRight-sizingLowHygiene
10Replace CA with KarpenterNodesHigh15–30%
11Spot/Preemptible for non-criticalNodesMed60–80%/node
12Graviton3 arm64 stateless servicesNodesMed20%
13Karpenter consolidation policiesNodesLow10–20%
14Per-env min/max node countsNodesLowGovernance
15Scale dev/staging to zero at nightNodesLow$200–500/mo
16Delete idle node groupsNodesLowVariable
17Spot interruption handlerNodesLowStability
18Right-size node instance typesNodesMed10–15%
19Bin-packing topology spreadSchedulingLow10%
20PodDisruptionBudgets for consolidationSchedulingLowEnables 10-13
21Custom metric HPASchedulingMed10–20%
22HPA stabilization windowsSchedulingLow5%
23Priority classes for prod workloadsSchedulingLowSafety
24Lifecycle preStop hooksSchedulingLowStability
25Jobs not Deployments for batchSchedulingMedVariable
26Delete orphaned PVsStorageLowVariable
27Migrate gp2 → gp3StorageLow20%
28Reclaim policy Delete for ephemeralStorageLowHygiene
29EFS only for shared filesystemStorageMedVariable
30S3 for logs/artefacts not EBSStorageMed70–90%
31Topology-aware routing (same-AZ)NetworkingMed50–80% xAZ cost
32VPC endpoints for AWS servicesNetworkingLow$100–500/mo
33ECR pull-through cacheNetworkingLowECR bandwidth
34imagePullPolicy IfNotPresentNetworkingLow5%
35Internal LBs for internal servicesNetworkingLowNAT savings
36Consolidate Ingresses / ALBsNetworkingMed$100–500/mo
37Reduce Prometheus scrape frequencyObservabilityLow30–50% storage
38Set metric retention limitsObservabilityLow30–50%
39Sample logs at source (Fluent Bit)ObservabilityMed40–70%
40Adaptive trace samplingObservabilityMed60–80% APM
41Thanos/Cortex for S3 metric storageObservabilityHigh70–90%
42Distroless/scratch base imagesImagesMedStartup speed
43Multi-stage buildsImagesMedPull bandwidth
44Layer caching in CIImagesLowCI minutes
45OpenCost for cost visibilityChargebackLowEnables all
46Label all resources (team, env, cc)ChargebackMedAccountability
47Monthly FinOps review with eng teamsChargebackLowCultural driver

Want us to run this checklist on your cluster?

We do a 2-week K8s FinOps audit — identify savings, prioritise the top 10 items, and build a reduction roadmap. No obligation, transparent pricing.

Book a free K8s cost audit