Purpose: For platform engineers, provides component-level tuning recommendations to optimize cluster throughput and latency at scale.
etcd Tuning
| Parameter | Default | Recommended (Large) | Effect |
|---|
--quota-backend-bytes | 2 GB | 8 GB | Prevents quota alarm at scale |
--auto-compaction-retention | 5m | 10m | Reduces compaction pressure |
--snapshot-count | 10,000 | 50,000 | Fewer snapshot I/O events |
| Disk type | — | NVMe SSD | etcd is latency-sensitive; p99 < 10ms required |
API Server Tuning
| Parameter | Default | Recommended (Large) | Effect |
|---|
--max-requests-inflight | 400 | 800 | Higher concurrent read capacity |
--max-mutating-requests-inflight | 200 | 400 | Higher write throughput |
--watch-cache-sizes | default | Increase for Pods, Services | Reduces etcd round-trips |
Kubelet Tuning
| Parameter | Default | Recommended | Effect |
|---|
--max-pods | 110 | 110 | Keep at validated maximum |
--kube-api-qps | 50 | 100 | Faster node status updates |
--kube-api-burst | 100 | 200 | Burst capacity for registration |
--serialize-image-pulls | true | false | Parallel image pulls |
FluxCD Tuning
| Parameter | Default | Recommended (Large) | Effect |
|---|
--concurrent (source-controller) | 2 | 8 | Parallel source fetches |
--concurrent (kustomize-controller) | 4 | 12 | Parallel reconciliations |
--concurrent (helm-controller) | 4 | 8 | Parallel Helm installs |
--requeue-dependency | 30s | 15s | Faster dependency resolution |
- Kyverno: Set
--backgroundScan=false for clusters above 5,000 pods; rely on admission-time enforcement
- Prometheus: Use recording rules to pre-aggregate high-cardinality metrics
- Loki: Increase
ingester.chunk-idle-period to reduce chunk flush frequency
Applying Tuning
All tuning is applied via Kustomize overlays in the customer GitOps repository. See Customizing Services for the overlay pattern.