DevOps / FinOps

The Kubernetes Cost Optimisation Checklist: 47 Ways to Cut Your K8s Bill in 2025

Ajeet Kumar · Platform Engineering Lead, codetoday.io June 2025 15 min read

We took a client from $45,000/month to $18,200/month on EKS in 3 months — a 60% reduction — without touching a single feature. Every item below was verified in a production cluster. Run the kubectl commands, see the savings.

TL;DR — Do These 5 First

#1 Set resource requests on every pod (stops wasted reserved capacity)
#7 Enable Karpenter or Cluster Autoscaler with scale-to-zero node groups
#12 Move non-critical workloads to Spot/Preemptible instances
#23 Enable arm64 (Graviton3 on AWS) for stateless services — 20% cheaper, same performance
#31 Audit cross-AZ traffic — it's $0.01/GB and usually the biggest hidden cost

60%

Typical cost reduction achievable

Actionable checklist items

3 mo

Time to full optimisation

$27K

Monthly saving on $45K cluster

KubernetesFinOpsKarpenter EKSSpot InstancesOpenCost

Part 1: Right-Sizing — Items 1–9

Resource requests are the foundation of Kubernetes scheduling. Pods without requests get scheduled on nodes as if they need zero resources — then they're evicted when actual usage exceeds node capacity. Pods with over-specified requests waste reserved node capacity you're paying for.

1. Set resource requests on every pod

Find pods missing requests:

kubectl get pods -A -o json | jq '
  .items[] | select(
    .spec.containers[].resources.requests == null
  ) | "\(.metadata.namespace)/\(.metadata.name)"
'

Every unlabelled pod is billing you for capacity you can't account for. Fix: add resources.requests to every container spec. Start with p50 actual usage from your metrics.

2. Set resource limits to prevent noisy-neighbour evictions

Limits without requests are invalid. Both should be set. Use VPA (Vertical Pod Autoscaler) to auto-tune if you have many workloads:

# Install VPA (Google's implementation)
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

3. Use VPA in recommendation mode first

kubectl get vpa -A
kubectl describe vpa <name> | grep -A5 "Target:"

VPA in Off mode only recommends — it won't touch running pods. Run it for 7 days and harvest the recommendations before enabling Auto mode.

4. Right-size based on actual p95 usage, not p100

Using p99/p100 for requests means you've reserved capacity for rare spikes. Use p95 + 20% headroom. Check Prometheus:

# p95 CPU usage per container over 7d
quantile_over_time(0.95,
  rate(container_cpu_usage_seconds_total{container!=""}[5m])[7d:5m]
)

5. Enable KEDA for event-driven workloads

KEDA scales to zero for queue consumers, cron-style workloads, and event-driven services. A pod running idle waiting for SQS messages costs money — KEDA eliminates that.

helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace

6. Set memory limits carefully — OOMKill is expensive

Memory OOMKills restart pods, which causes latency spikes and potentially lost in-flight work. Set memory limits at p99 usage + 30%, not p95. CPU throttling is recoverable; OOMKill is not.

7. Namespace resource quotas enforce cost accountability

kubectl apply -f - <<EOF
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-backend
spec:
  hard:
    requests.cpu: "20"
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi
EOF

8. LimitRange defaults prevent unbounded pods from teams

Set default requests/limits at namespace level so pods deployed without explicit resource specs still get sensible defaults:

kubectl apply -f - <<EOF
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-backend
spec:
  limits:
  - default:
      cpu: "500m"
      memory: "512Mi"
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"
    type: Container
EOF

9. Delete completed and evicted pods

# Find and delete completed pods
kubectl delete pod -A --field-selector=status.phase==Succeeded
kubectl delete pod -A --field-selector=status.phase==Failed

# Find evicted pods
kubectl get pod -A | grep Evicted | awk '{print $1, $2}' | \
  xargs -L1 bash -c 'kubectl delete pod $1 -n $0'

Part 2: Node Optimisation — Items 10–18

10. Replace Cluster Autoscaler with Karpenter

Karpenter provisions nodes on a per-pod basis — it looks at the actual requirements of pending pods and provisions the exact right instance type, rather than scaling pre-defined node groups. On a mixed workload, Karpenter typically reduces node count by 15–30% vs Cluster Autoscaler.

helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
  --version "0.37.0" \
  --namespace "karpenter" --create-namespace \
  --set "settings.clusterName=my-cluster" \
  --set "settings.interruptionQueue=my-cluster"

11. Use Spot/Preemptible for non-critical workloads

Spot instances (AWS) and Preemptible VMs (GCP) are 60–90% cheaper than on-demand. Label your node groups and use tolerations/node selectors to route workloads appropriately.

# Spot NodeClass with Karpenter
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: spot-nodeclass
spec:
  amiFamily: AL2
  capacityType: spot   # or ["spot", "on-demand"] for mixed
  instanceTypes: ["m5.xlarge", "m5a.xlarge", "m4.xlarge"]

12. Graviton3 (arm64) for stateless services — 20% savings

AWS Graviton3 instances (c7g, m7g, r7g) deliver equivalent performance to x86 at ~20% lower cost. Most modern container images support multi-arch. Enable arm64 node groups and add arch tolerance:

nodeSelector:
  kubernetes.io/arch: arm64

13. Consolidation policies in Karpenter

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s

Karpenter will terminate underutilised nodes and reschedule pods on fewer, fuller nodes. This alone typically yields 10–20% node count reduction.

14. Set node auto-provisioning budget carefully

Don't let dev/staging environments run the same auto-scaling policies as prod. Set explicit min/max node counts per environment and enforce with separate node pools.

15. Schedule dev/staging clusters to scale to zero at night

# Scale down at 7pm, up at 8am — save 13 hours/day
kubectl scale deploy --all -n dev --replicas=0

# Or use a CronJob to automate
kubectl create cronjob scale-down --schedule="0 19 * * 1-5" \
  --image=bitnami/kubectl -- kubectl scale deploy -A --replicas=0

A 3-node dev cluster at $0.10/node-hour saves ~$280/month by scaling to zero overnight and weekends.

16. Delete idle node groups

kubectl get nodes -o wide | awk '{print $1, $5}' | sort -k2

Node groups with no workloads scheduled for 7+ days should be evaluated for deletion. Check with kubectl describe node for allocated resources.

17. Use Spot interruption handlers (AWS Node Termination Handler)

helm repo add eks https://aws.github.io/eks-charts
helm install aws-node-termination-handler eks/aws-node-termination-handler \
  --namespace kube-system \
  --set enableSpotInterruptionDraining=true

18. Right-size node types — avoid over-provisioned memory nodes

Check memory vs CPU utilisation ratio across your cluster. If memory is consistently 20% utilised but CPU is at 70%, you're paying for unused RAM. Swap memory-optimised instances for compute-optimised.

Part 3: Workload Scheduling — Items 19–25

19. Bin-packing: prefer fewer, fuller nodes

# Set pod topology spread constraints for compaction
spec:
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        app: my-app

20. PodDisruptionBudgets prevent over-provisioning for HA

Set PDBs that allow Karpenter to consolidate. Without PDBs, consolidation is blocked — nodes stay up even when underutilised.

kubectl apply -f - <<EOF
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: myapp-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: my-app
EOF

21. HPA with custom metrics (not just CPU)

CPU-based HPA often over-scales. Wire HPA to business metrics — requests/second, queue depth, active sessions — for tighter autoscaling.

22. Set appropriate HPA stabilization windows

spec:
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # 5 min cooldown before scaling down
    scaleUp:
      stabilizationWindowSeconds: 30   # Fast scale-up

23. Use priority classes to protect critical workloads during consolidation

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "Critical production workloads"

24. Evict stale replicas with lifecycle hooks

Add preStop lifecycle hooks to allow graceful shutdown. Pods that don't shut down cleanly block node consolidation for up to the terminationGracePeriodSeconds duration.

25. Use Job/CronJob for batch workloads — not always-on Deployments

Model training, report generation, nightly ETL — these should be Kubernetes Jobs, not Deployments. Jobs scale to zero when done. Deployments keep a pod running 24/7 regardless of whether there's work.

Part 4: Storage Costs — Items 26–30

26. Delete orphaned PersistentVolumes

# Find Released PVs (not bound to any PVC)
kubectl get pv | grep Released

# Delete them (after confirming data is no longer needed)
kubectl delete pv <pv-name>

27. Use gp3 EBS volumes over gp2

gp3 is 20% cheaper than gp2 and offers configurable IOPS/throughput without upgrading the volume size. Migrate all gp2 volumes:

# Create gp3 StorageClass
kubectl apply -f - <<EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  fsType: ext4
EOF

28. Set PVC reclaim policies to Delete for ephemeral workloads

Default is Retain in many clusters — volumes persist after pod deletion and keep billing. For ephemeral test workloads, set reclaimPolicy: Delete.

29. Use EFS only when shared filesystem is genuinely needed

EFS costs $0.30/GB/mo vs EBS gp3 at $0.08/GB/mo. Many teams use EFS out of habit. If workloads don't need shared filesystem access, migrate to EBS.

30. S3 for logs and artefacts, not EBS

Application logs stored on EBS PVCs cost $0.08–0.30/GB/mo. S3 Standard is $0.023/GB/mo with Intelligent-Tiering moving cold data to $0.004/GB/mo automatically.

Part 5: Networking Costs — Items 31–36

31. Audit and reduce cross-AZ traffic — biggest hidden cost

AWS charges $0.01/GB for cross-AZ data transfer. In a busy microservices cluster this adds up to thousands per month. Check your VPC Flow Logs:

# AWS CLI: find top cross-AZ talkers
aws ec2 describe-flow-logs \
  --filter Name=resource-type,Values=VPC \
  --query 'FlowLogs[*].FlowLogId'

Fix: use topology-aware routing so pods prefer same-AZ endpoints:

service.kubernetes.io/topology-aware-hints: auto

32. Reduce NAT Gateway traffic with VPC endpoints

NAT Gateway charges $0.045/GB processed. Traffic to AWS services (S3, ECR, DynamoDB, SSM) going through NAT Gateway can be replaced by VPC endpoints (most are free or $0.01/GB):

# terraform: S3 Gateway endpoint (free!)
resource "aws_vpc_endpoint" "s3" {
  vpc_id       = aws_vpc.main.id
  service_name = "com.amazonaws.us-east-1.s3"
  route_table_ids = [aws_route_table.private.id]
}

33. Cache ECR pulls with pull-through cache

Every pod cold-start pulls a container image. With NAT Gateway, each pull costs money. Use ECR pull-through cache to cache Docker Hub and public ECR images in your private ECR.

34. Set imagePullPolicy to IfNotPresent on non-dev workloads

imagePullPolicy: IfNotPresent  # Don't pull if already cached locally

Setting Always forces a pull on every pod creation — unnecessary network traffic and ECR data transfer cost.

35. Use internal load balancers for internal services

Services that only communicate with other services in the VPC don't need an Internet-facing ALB. Internal ALBs are the same price but eliminate NAT Gateway hops.

36. Consolidate Ingresses — fewer ALBs

AWS ALB (Application Load Balancer) charges $0.008/LCU-hour + $0.0225/ALB-hour. A cluster with 20 services, each with their own ALB, pays ~$32/ALB/month × 20 = $640/month in ALB fixed costs alone. Use a single Ingress controller with path-based routing.

Part 6: Observability Costs — Items 37–41

37. Reduce Prometheus scrape frequency for non-critical metrics

Default scrape interval is 15s. For low-traffic services, 60s is fine. Halving scrape frequency on 70% of targets halves Prometheus storage requirements.

38. Set metric retention limits

# Prometheus retention (default: 15d)
--storage.tsdb.retention.time=7d
--storage.tsdb.retention.size=10GB

Most operational questions are answered with 7 days of metrics. Reducing from 15d to 7d halves Prometheus disk usage.

39. Sample logs at the source — not after ingestion

Sending every log line to CloudWatch Logs or Datadog and filtering after ingestion is expensive. Configure Fluent Bit to sample/drop debug logs at source:

[FILTER]
    Name    grep
    Match   *
    Exclude log level:debug

40. Reduce trace sampling rate for high-volume services

For services handling 10K req/s, 100% sampling generates enormous trace volumes. Use adaptive sampling: 100% for errors, 1% for success paths. This alone can cut APM costs by 80%.

41. Use Thanos or Cortex for long-term metrics storage on S3

Storing metrics in S3 via Thanos costs ~$0.023/GB vs $0.10–0.30/GB in managed monitoring services. For large clusters, this represents 70–90% cost reduction for historical metrics.

Part 7: Container Image Sizes — Items 42–44

42. Use distroless or scratch base images

A full Ubuntu base image is ~70MB. A distroless Java image is ~20MB. A scratch Go binary is 5–15MB. Smaller images = faster pulls = less ECR bandwidth cost and faster pod cold starts.

43. Multi-stage builds eliminate build tooling from runtime images

# Multi-stage: build stage has full SDK, runtime is minimal
FROM golang:1.22 AS build
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -o /bin/app ./cmd/app

FROM gcr.io/distroless/static-debian12
COPY --from=build /bin/app /app
ENTRYPOINT ["/app"]

44. Enable layer caching in CI — don't rebuild layers that didn't change

# GitHub Actions with layer cache
- uses: docker/build-push-action@v5
  with:
    cache-from: type=gha
    cache-to: type=gha,mode=max

Part 8: Namespace Chargeback — Items 45–47

45. Install OpenCost for per-namespace cost visibility

helm install opencost opencost/opencost \
  --namespace opencost --create-namespace

OpenCost provides real-time cost per namespace, deployment, label, and team. Without visibility, teams can't be accountable. This is the prerequisite for everything else.

46. Label all resources with team, environment, and cost-centre

kubectl label namespace team-backend \
  team=backend \
  env=production \
  cost-centre=eng-platform

OpenCost, Kubecost, and AWS Cost Explorer all use labels for cost attribution. Without consistent labelling, you can't allocate costs to teams.

47. Monthly FinOps review — make cost visible to eng teams

The most impactful change is cultural, not technical. Put a $X/day cost counter on the team's dashboard. When engineers see that their batch job runs 24/7 instead of on-demand and costs $800/month, they fix it. Invisible costs don't get optimised.

Real World: $45K → $18K in 3 Months

A Series C e-commerce client came to us with a $45,000/month EKS bill for a cluster serving 50 million monthly users. Their engineering team knew K8s well but had never done a formal FinOps pass. Here's what we found and fixed:

Starting State (Month 0)

• 47 nodes (m5.2xlarge, on-demand) — avg 23% CPU, 31% memory utilisation
• 0 pods with resource requests/limits
• NAT Gateway processing: 4.2TB/month ($189/month)
• ECR pulls: 850GB/month via NAT ($38/month)
• CloudWatch Logs: 1.2TB/month ($600/month)
• 23 individual ALBs — one per microservice ($520/month in fixed costs)
• Dev/staging running 24/7 (same spec as prod)

Month 1: Foundation (Savings: $8,200/month)

Set resource requests on all 340 pods (3 days of work with VPA recommendations)
Deployed Karpenter, replaced 3 managed node groups. Node count dropped from 47 to 31
Switched all nodes to mixed Spot/On-demand (70/30 split)
Added Spot interruption handler — zero incidents in first month
Scaled dev/staging to zero nights + weekends via CronJob

Month 2: Networking & Storage (Savings: $11,400/month)

Added topology-aware routing — cross-AZ traffic reduced 65%
Deployed VPC endpoints for S3, ECR, DynamoDB, SSM — NAT Gateway processing dropped 80%
Consolidated 23 ALBs to 3 (by environment) using Ingress path routing — $450/month saved
Migrated all gp2 EBS to gp3 — 20% storage cost reduction
Deleted 47 orphaned PVs totalling 1.8TB

Month 3: Observability & Polish (Savings: $7,600/month)

Reduced CloudWatch Logs ingestion 70% with Fluent Bit source filtering
Moved Prometheus long-term storage to S3 via Thanos — 3x cheaper than managed APM
Reduced trace sampling from 100% to adaptive (100% errors, 2% success)
Deployed OpenCost — teams can now see their own costs daily
Migrated 12 stateless services to Graviton3 (arm64) — 18% compute saving

Metric	Before	After	Saving
Monthly AWS bill	$45,200	$18,100	$27,100 (60%)
Node count	47	21 avg	55% fewer nodes
Spot usage	0%	68%	—
NAT Gateway (GB/mo)	4,200	820	80% reduction
CloudWatch Logs (GB/mo)	1,200	360	70% reduction
ALB count	23	3	$450/mo fixed costs

Full 47-Item Checklist

#	Item	Category	Effort	Typical Saving
1	Set resource requests on all pods	Right-sizing	Med	10–20%
2	Set resource limits	Right-sizing	Low	5%
3	Deploy VPA in recommendation mode	Right-sizing	Med	10–15%
4	Use p95 + 20% for request sizing	Right-sizing	Low	8%
5	KEDA for event-driven scale-to-zero	Right-sizing	Med	15–30%
6	Memory limits at p99+30%	Right-sizing	Low	Stability
7	Namespace ResourceQuotas	Right-sizing	Low	Governance
8	LimitRange defaults per namespace	Right-sizing	Low	Safety net
9	Delete completed/evicted pods	Right-sizing	Low	Hygiene
10	Replace CA with Karpenter	Nodes	High	15–30%
11	Spot/Preemptible for non-critical	Nodes	Med	60–80%/node
12	Graviton3 arm64 stateless services	Nodes	Med	20%
13	Karpenter consolidation policies	Nodes	Low	10–20%
14	Per-env min/max node counts	Nodes	Low	Governance
15	Scale dev/staging to zero at night	Nodes	Low	$200–500/mo
16	Delete idle node groups	Nodes	Low	Variable
17	Spot interruption handler	Nodes	Low	Stability
18	Right-size node instance types	Nodes	Med	10–15%
19	Bin-packing topology spread	Scheduling	Low	10%
20	PodDisruptionBudgets for consolidation	Scheduling	Low	Enables 10-13
21	Custom metric HPA	Scheduling	Med	10–20%
22	HPA stabilization windows	Scheduling	Low	5%
23	Priority classes for prod workloads	Scheduling	Low	Safety
24	Lifecycle preStop hooks	Scheduling	Low	Stability
25	Jobs not Deployments for batch	Scheduling	Med	Variable
26	Delete orphaned PVs	Storage	Low	Variable
27	Migrate gp2 → gp3	Storage	Low	20%
28	Reclaim policy Delete for ephemeral	Storage	Low	Hygiene
29	EFS only for shared filesystem	Storage	Med	Variable
30	S3 for logs/artefacts not EBS	Storage	Med	70–90%
31	Topology-aware routing (same-AZ)	Networking	Med	50–80% xAZ cost
32	VPC endpoints for AWS services	Networking	Low	$100–500/mo
33	ECR pull-through cache	Networking	Low	ECR bandwidth
34	imagePullPolicy IfNotPresent	Networking	Low	5%
35	Internal LBs for internal services	Networking	Low	NAT savings
36	Consolidate Ingresses / ALBs	Networking	Med	$100–500/mo
37	Reduce Prometheus scrape frequency	Observability	Low	30–50% storage
38	Set metric retention limits	Observability	Low	30–50%
39	Sample logs at source (Fluent Bit)	Observability	Med	40–70%
40	Adaptive trace sampling	Observability	Med	60–80% APM
41	Thanos/Cortex for S3 metric storage	Observability	High	70–90%
42	Distroless/scratch base images	Images	Med	Startup speed
43	Multi-stage builds	Images	Med	Pull bandwidth
44	Layer caching in CI	Images	Low	CI minutes
45	OpenCost for cost visibility	Chargeback	Low	Enables all
46	Label all resources (team, env, cc)	Chargeback	Med	Accountability
47	Monthly FinOps review with eng teams	Chargeback	Low	Cultural driver

Want us to run this checklist on your cluster?

We do a 2-week K8s FinOps audit — identify savings, prioritise the top 10 items, and build a reduction roadmap. No obligation, transparent pricing.

Book a free K8s cost audit

// Share this article

Share on X Share on LinkedIn