Kubernetes Design & Code Review Checklist

Kubernetes Design & Code Review Checklist

1. Architecture Review

Cluster Design

[ ] Multi-node cluster (avoid single point of failure)
[ ] Separate environments (dev/staging/prod)
[ ] Proper namespace strategy
[ ] ResourceQuota configured
[ ] LimitRange configured
[ ] RBAC enabled
[ ] Network Policies enabled
[ ] Audit logging enabled
[ ] High availability control plane

Namespace Design

[ ] One namespace per application/domain
[ ] Environment isolation
[ ] Consistent naming convention
[ ] Quotas per namespace

Example:

production-payment
production-order
staging-payment
staging-order

2. Workload Review

Workload Type Selection

Requirement	Kubernetes Resource
Stateless API	Deployment
Background Worker	Deployment
Database	StatefulSet
Cache	StatefulSet
Scheduled Task	CronJob
One-time Task	Job
Daemon on every node	DaemonSet

Checklist:

[ ] Correct workload type selected
[ ] One responsibility per workload
[ ] Horizontal scaling supported

3. Deployment Review

Deployment Configuration

[ ] replicas > 1 in production
[ ] RollingUpdate strategy used
[ ] revisionHistoryLimit configured
[ ] Proper labels
[ ] Proper selectors

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxUnavailable: 1
    maxSurge: 1

Labels

Required labels:

labels:
  app: payment-api
  version: v1.2.0
  env: prod
  team: backend

Checklist:

[ ] app
[ ] version
[ ] env
[ ] team

4. Container Review

Container Image

Checklist:

[ ] No latest tag
[ ] Immutable version tag
[ ] Trusted registry
[ ] Vulnerability scan performed

Bad:

image: api:latest

Good:

image: api:v1.3.5

Security Context

Checklist:

[ ] runAsNonRoot
[ ] readOnlyRootFilesystem
[ ] allowPrivilegeEscalation=false
[ ] capabilities dropped

securityContext:
  runAsNonRoot: true
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false

5. Resource Management

CPU & Memory

Checklist:

[ ] CPU request
[ ] CPU limit
[ ] Memory request
[ ] Memory limit

resources:
  requests:
    cpu: "250m"
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

Autoscaling

Checklist:

[ ] HPA configured
[ ] minReplicas defined
[ ] maxReplicas defined
[ ] CPU target configured
[ ] Memory target configured

Example:

minReplicas: 2
maxReplicas: 10

6. Health Checks

Liveness Probe

Checklist:

[ ] Configured
[ ] Fast endpoint

livenessProbe:
  httpGet:
    path: /health
    port: 8080

Readiness Probe

Checklist:

[ ] Configured
[ ] Service traffic blocked until ready

readinessProbe:
  httpGet:
    path: /ready
    port: 8080

Startup Probe

Checklist:

[ ] Configured for slow startup applications

7. Networking Review

Service Review

Checklist:

[ ] ClusterIP for internal services
[ ] LoadBalancer only when required
[ ] NodePort avoided

Ingress Review

Checklist:

[ ] TLS enabled
[ ] HTTPS redirect enabled
[ ] Rate limiting configured
[ ] WAF considered

Network Policies

Checklist:

[ ] Default deny policy
[ ] Explicit allow rules
[ ] Namespace isolation

kind: NetworkPolicy

8. Storage Review

Persistent Volumes

Checklist:

[ ] Dynamic provisioning
[ ] StorageClass used
[ ] Backup strategy exists
[ ] Recovery tested

Stateful Applications

Checklist:

[ ] StatefulSet used
[ ] PVC attached
[ ] Data persistence verified

9. Configuration Management

ConfigMap

Checklist:

[ ] Only non-sensitive data
[ ] Version controlled
[ ] Environment specific

Secret Management

Checklist:

[ ] No secrets in Git
[ ] No secrets in ConfigMap
[ ] Rotation process defined
[ ] External secret manager preferred

10. Security Review

RBAC

Checklist:

[ ] Least privilege principle
[ ] Dedicated ServiceAccounts
[ ] No cluster-admin usage

Bad:

cluster-admin

Good:

Role
RoleBinding

Pod Security

Checklist:

[ ] Non-root containers
[ ] No privileged mode
[ ] Seccomp profile
[ ] AppArmor profile

Supply Chain Security

Checklist:

[ ] Image signing
[ ] SBOM generated
[ ] Vulnerability scanning

11. Reliability Review

High Availability

Checklist:

[ ] Multiple replicas
[ ] Pod anti-affinity
[ ] Multi-zone deployment

podAntiAffinity:

Pod Disruption Budget

Checklist:

[ ] PDB configured

minAvailable: 1

Graceful Shutdown

Checklist:

[ ] SIGTERM handled
[ ] preStop hook configured
[ ] terminationGracePeriodSeconds set

12. Observability Review

Logging

Checklist:

[ ] Centralized logging
[ ] Structured JSON logs
[ ] Correlation ID support

Metrics

Checklist:

[ ] CPU metrics
[ ] Memory metrics
[ ] Request metrics
[ ] Error metrics
[ ] Business metrics

Tracing

Checklist:

[ ] Distributed tracing enabled
[ ] Request correlation supported

13. CI/CD Review

Deployment Pipeline

Checklist:

[ ] Automated build
[ ] Automated test
[ ] Automated deployment
[ ] Rollback support

GitOps

Checklist:

[ ] Git as source of truth
[ ] Pull-based deployment
[ ] Drift detection enabled

Deployment Strategies

Checklist:

[ ] Rolling deployment
[ ] Canary deployment
[ ] Blue-Green deployment

14. Cost Optimization

Checklist:

[ ] Requests properly sized
[ ] HPA configured
[ ] Cluster Autoscaler configured
[ ] Spot instances evaluated
[ ] Unused resources removed

15. Disaster Recovery

Backup

Checklist:

[ ] Database backup
[ ] Persistent volume backup
[ ] Secret backup
[ ] Configuration backup

Recovery

Checklist:

[ ] Restore procedure documented
[ ] Recovery tested regularly
[ ] Recovery Time Objective (RTO) defined
[ ] Recovery Point Objective (RPO) defined

16. Production Readiness Scorecard

Category	Target Score
Security	9/10
Reliability	9/10
Scalability	9/10
Observability	9/10
Maintainability	9/10
Cost Optimization	8/10+
Disaster Recovery	8/10+

Final Production Review Questions

[ ] Will it survive a Pod crash?
[ ] Will it survive a Node crash?
[ ] Will it survive a Zone failure?
[ ] Can it scale automatically?
[ ] Can it be deployed with zero downtime?
[ ] Can it be rolled back safely?
[ ] Is it secure by default?
[ ] Is it observable?
[ ] Can another engineer maintain it?
[ ] Can it run at 3 AM without waking me up?

If all answers are YES, the Kubernetes platform/workload is considered Production Ready.

Table of Contents

Kubernetes Design & Code Review Checklist

1. Architecture Review

Cluster Design

Namespace Design

2. Workload Review

Workload Type Selection

3. Deployment Review

Deployment Configuration

Labels

4. Container Review

Container Image

Security Context

5. Resource Management

CPU & Memory

Autoscaling

6. Health Checks

Liveness Probe

Readiness Probe

Startup Probe

7. Networking Review

Service Review

Ingress Review

Network Policies

8. Storage Review

Persistent Volumes

Stateful Applications

9. Configuration Management

ConfigMap

Secret Management

10. Security Review

RBAC

Pod Security

Supply Chain Security

11. Reliability Review

High Availability

Pod Disruption Budget

Graceful Shutdown

12. Observability Review

Logging

Metrics

Tracing

13. CI/CD Review

Deployment Pipeline

GitOps

Deployment Strategies

14. Cost Optimization

15. Disaster Recovery

Backup

Recovery

16. Production Readiness Scorecard

Final Production Review Questions