Skip to main content

Fleet Upgrades

In Development (Q4 2026)

This feature is currently in development. Fleet upgrade orchestration described here is subject to change.

Purpose: For platform engineers and operators, explains how to coordinate Kubernetes and platform service upgrades across a fleet of clusters safely.

Upgrade Strategy

Fleet upgrades use a wave-based promotion model:

Wave 1: Canary cluster (dev/staging)
↓ validate (automated health checks)
Wave 2: Non-critical production clusters
↓ validate (SLO checks, 24h soak)
Wave 3: Critical production clusters
↓ validate (full regression)
Wave 4: Regulated/air-gapped clusters

Fleet Upgrade Plan

apiVersion: fleet.opencenter.cloud/v1alpha1
kind: FleetUpgradePlan
metadata:
name: k8s-1-34-upgrade
spec:
targetVersion: "1.34"
waves:
- name: canary
clusterSelector:
matchLabels:
upgrade-wave: canary
validation:
healthCheckDuration: 1h
automated: true
- name: production-standard
clusterSelector:
matchLabels:
upgrade-wave: standard
validation:
healthCheckDuration: 24h
automated: false # Manual gate
- name: production-critical
clusterSelector:
matchLabels:
upgrade-wave: critical
validation:
healthCheckDuration: 48h
automated: false

Validation Gates

Between waves, the system validates:

CheckAutomatedBlocks Next Wave
Node health (Ready status)YesYes
Pod restart rateYesYes (if > threshold)
FluxCD reconciliation successYesYes
Prometheus alert firingYesYes (critical severity)
Custom health endpointYesConfigurable
Manual approvalNoYes (if configured)

Rollback

If validation fails:

  • Wave halts automatically
  • Affected clusters remain at current version
  • Alert fires to fleet operators
  • Manual rollback available via opencenter fleet upgrade rollback

Service Upgrades

Platform service upgrades (gitops-base tag bumps) follow the same wave model:

  • Hub updates fleet GitOps repo with new tag
  • FleetKustomizations propagate to clusters per wave schedule
  • Each cluster's FluxCD reconciles the new service versions