Uninstall / Decommission

Purpose: For operators, documents the safe removal of openCenter components and full cluster decommissioning, including infrastructure teardown and Git cleanup.

Prerequisites

Cluster admin access (kubectl configured)
SSH access to nodes (for drain/cleanup)
Terraform/OpenTofu state accessible (for infrastructure destruction)
Git access to the customer GitOps repository

Decommission Checklist

Complete this checklist in order. Each step is destructive and non-reversible after infrastructure teardown.

Step 1 — Backup Before Decommission

# Create a final backup
velero backup create final-backup-$(date +%Y%m%d) \
  --include-namespaces '*' \
  --wait

# Verify backup completed
velero backup describe final-backup-$(date +%Y%m%d)

# Export the backup metadata
opencenter cluster validate --generate-debug-config

Step 2 — Remove FluxCD

Suspending FluxCD prevents it from recreating resources you delete.

# Suspend all Kustomizations
flux suspend kustomization --all -n flux-system

# Delete FluxCD Kustomizations (this removes managed resources if prune=true)
# Use --prune=false flag to avoid cascading deletes
kubectl delete kustomizations.kustomize.toolkit.fluxcd.io --all -n flux-system

# Remove FluxCD controllers
flux uninstall --silent

# Verify
kubectl get namespace flux-system
# Should show Terminating or NotFound

Step 3 — Remove Platform Services

Remove services in reverse dependency order:

# Remove monitoring stack
kubectl delete namespace monitoring --timeout=120s

# Remove Keycloak
kubectl delete namespace keycloak --timeout=120s

# Remove cert-manager
kubectl delete namespace cert-manager --timeout=120s

# Remove Kyverno
kubectl delete namespace kyverno --timeout=120s

# Remove network services
kubectl delete namespace metallb-system --timeout=120s 2>/dev/null || true

# Remove remaining platform namespaces
for ns in $(kubectl get namespaces -l opencenter.io/managed=true -o name); do
  kubectl delete $ns --timeout=120s
done

# Remove CRDs installed by platform services
kubectl get crds -o name | grep -E '(fluxcd|cert-manager|kyverno|metallb)' | xargs kubectl delete

Step 4 — Drain Nodes

# Cordon all worker nodes
kubectl get nodes -l '!node-role.kubernetes.io/control-plane' -o name | \
  xargs -I{} kubectl cordon {}

# Drain workers (evicts pods)
kubectl get nodes -l '!node-role.kubernetes.io/control-plane' -o name | \
  xargs -I{} kubectl drain {} --ignore-daemonsets --delete-emptydir-data --timeout=300s

Step 5 — Remove Kubernetes

On each node (or via Ansible from the bastion):

# Reset kubeadm
kubeadm reset -f

# Remove Kubernetes packages
apt-get remove -y kubeadm kubelet kubectl 2>/dev/null || \
yum remove -y kubeadm kubelet kubectl 2>/dev/null || true

# Clean up directories
rm -rf /etc/kubernetes /var/lib/kubelet /var/lib/etcd /etc/cni /opt/cni
rm -rf /var/lib/calico /var/run/calico

# Remove iptables rules
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

Using Kubespray's reset playbook (from the bastion):

cd ~/prod-cluster-gitops/infrastructure/clusters/prod-cluster
ansible-playbook -i inventory/hosts.yaml \
  kubespray/reset.yml \
  --become --become-user=root

Step 6 — Destroy Infrastructure

For OpenStack (OpenTofu-managed):

cd ~/prod-cluster-gitops/infrastructure/clusters/prod-cluster

# Review what will be destroyed
tofu plan -destroy

# Destroy (type 'yes' when prompted)
tofu destroy

This removes:

VMs (control plane, workers, bastion)
Networks, subnets, routers
Security groups
Floating IPs
Load balancers (Octavia)
Volumes

For VMware (pre-provisioned VMs), decommission VMs through vCenter or your VM lifecycle tool.

Step 7 — Clean Up Git Repositories

# Option A: Delete the customer GitOps repository entirely
# (if it was cluster-specific)
gh repo delete myorg/prod-cluster-gitops --yes

# Option B: Remove the cluster overlay from a shared repo
cd ~/shared-gitops
git checkout -b decommission/prod-cluster
rm -rf applications/overlays/prod-cluster
rm -rf infrastructure/clusters/prod-cluster
git add . && git commit -m "chore: decommission prod-cluster"
git push -u origin decommission/prod-cluster
# Open PR and merge

Step 8 — Revoke Secrets and Credentials

# Revoke the SOPS Age key (mark as compromised/retired)
# The key at ~/.config/opencenter/clusters/secrets/<org>/<cluster>/age/keys/ can be deleted

# Revoke OpenStack application credentials
openstack application credential delete prod-cluster-cred

# Revoke SSH keys
# Remove the cluster SSH key from authorized_keys on any shared infrastructure

# Revoke Git deploy keys
# Remove from repository settings (GitHub/GitLab)

# Remove local cluster configuration
opencenter cluster delete prod-cluster

Verification

After decommission:

# Verify VMs are gone (OpenStack)
openstack server list | grep prod-cluster
# Should return empty

# Verify networks are gone
openstack network list | grep prod-cluster
# Should return empty

# Verify DNS records removed
dig +short *.prod-cluster.example.com
# Should return NXDOMAIN

# Verify local config cleaned
opencenter cluster list | grep prod-cluster
# Should not appear

Prerequisites​

Decommission Checklist​

Step 1 — Backup Before Decommission​

Step 2 — Remove FluxCD​

Step 3 — Remove Platform Services​

Step 4 — Drain Nodes​

Step 5 — Remove Kubernetes​

Step 6 — Destroy Infrastructure​

Step 7 — Clean Up Git Repositories​

Step 8 — Revoke Secrets and Credentials​

Verification​