Skip to main content

Uninstall / Decommission

Purpose: For operators, documents the safe removal of openCenter components and full cluster decommissioning, including infrastructure teardown and Git cleanup.

Prerequisites

  • Cluster admin access (kubectl configured)
  • SSH access to nodes (for drain/cleanup)
  • Terraform/OpenTofu state accessible (for infrastructure destruction)
  • Git access to the customer GitOps repository

Decommission Checklist

Complete this checklist in order. Each step is destructive and non-reversible after infrastructure teardown.

  • Notify stakeholders and schedule maintenance window
  • Back up cluster state (Velero full backup)
  • Export any data from PersistentVolumes that must be retained
  • Remove FluxCD (stops reconciliation)
  • Remove platform services
  • Drain and cordon nodes
  • Remove Kubernetes
  • Destroy infrastructure (VMs, networks, load balancers)
  • Clean up Git repositories
  • Revoke secrets and credentials

Step 1 — Backup Before Decommission

# Create a final backup
velero backup create final-backup-$(date +%Y%m%d) \
--include-namespaces '*' \
--wait

# Verify backup completed
velero backup describe final-backup-$(date +%Y%m%d)

# Export the backup metadata
opencenter cluster validate --generate-debug-config

Step 2 — Remove FluxCD

Suspending FluxCD prevents it from recreating resources you delete.

# Suspend all Kustomizations
flux suspend kustomization --all -n flux-system

# Delete FluxCD Kustomizations (this removes managed resources if prune=true)
# Use --prune=false flag to avoid cascading deletes
kubectl delete kustomizations.kustomize.toolkit.fluxcd.io --all -n flux-system

# Remove FluxCD controllers
flux uninstall --silent

# Verify
kubectl get namespace flux-system
# Should show Terminating or NotFound

Step 3 — Remove Platform Services

Remove services in reverse dependency order:

# Remove monitoring stack
kubectl delete namespace monitoring --timeout=120s

# Remove Keycloak
kubectl delete namespace keycloak --timeout=120s

# Remove cert-manager
kubectl delete namespace cert-manager --timeout=120s

# Remove Kyverno
kubectl delete namespace kyverno --timeout=120s

# Remove network services
kubectl delete namespace metallb-system --timeout=120s 2>/dev/null || true

# Remove remaining platform namespaces
for ns in $(kubectl get namespaces -l opencenter.io/managed=true -o name); do
kubectl delete $ns --timeout=120s
done

# Remove CRDs installed by platform services
kubectl get crds -o name | grep -E '(fluxcd|cert-manager|kyverno|metallb)' | xargs kubectl delete

Step 4 — Drain Nodes

# Cordon all worker nodes
kubectl get nodes -l '!node-role.kubernetes.io/control-plane' -o name | \
xargs -I{} kubectl cordon {}

# Drain workers (evicts pods)
kubectl get nodes -l '!node-role.kubernetes.io/control-plane' -o name | \
xargs -I{} kubectl drain {} --ignore-daemonsets --delete-emptydir-data --timeout=300s

Step 5 — Remove Kubernetes

On each node (or via Ansible from the bastion):

# Reset kubeadm
kubeadm reset -f

# Remove Kubernetes packages
apt-get remove -y kubeadm kubelet kubectl 2>/dev/null || \
yum remove -y kubeadm kubelet kubectl 2>/dev/null || true

# Clean up directories
rm -rf /etc/kubernetes /var/lib/kubelet /var/lib/etcd /etc/cni /opt/cni
rm -rf /var/lib/calico /var/run/calico

# Remove iptables rules
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

Using Kubespray's reset playbook (from the bastion):

cd ~/prod-cluster-gitops/infrastructure/clusters/prod-cluster
ansible-playbook -i inventory/hosts.yaml \
kubespray/reset.yml \
--become --become-user=root

Step 6 — Destroy Infrastructure

For OpenStack (OpenTofu-managed):

cd ~/prod-cluster-gitops/infrastructure/clusters/prod-cluster

# Review what will be destroyed
tofu plan -destroy

# Destroy (type 'yes' when prompted)
tofu destroy

This removes:

  • VMs (control plane, workers, bastion)
  • Networks, subnets, routers
  • Security groups
  • Floating IPs
  • Load balancers (Octavia)
  • Volumes

For VMware (pre-provisioned VMs), decommission VMs through vCenter or your VM lifecycle tool.

Step 7 — Clean Up Git Repositories

# Option A: Delete the customer GitOps repository entirely
# (if it was cluster-specific)
gh repo delete myorg/prod-cluster-gitops --yes

# Option B: Remove the cluster overlay from a shared repo
cd ~/shared-gitops
git checkout -b decommission/prod-cluster
rm -rf applications/overlays/prod-cluster
rm -rf infrastructure/clusters/prod-cluster
git add . && git commit -m "chore: decommission prod-cluster"
git push -u origin decommission/prod-cluster
# Open PR and merge

Step 8 — Revoke Secrets and Credentials

# Revoke the SOPS Age key (mark as compromised/retired)
# The key at ~/.config/opencenter/clusters/secrets/<org>/<cluster>/age/keys/ can be deleted

# Revoke OpenStack application credentials
openstack application credential delete prod-cluster-cred

# Revoke SSH keys
# Remove the cluster SSH key from authorized_keys on any shared infrastructure

# Revoke Git deploy keys
# Remove from repository settings (GitHub/GitLab)

# Remove local cluster configuration
opencenter cluster delete prod-cluster

Verification

After decommission:

# Verify VMs are gone (OpenStack)
openstack server list | grep prod-cluster
# Should return empty

# Verify networks are gone
openstack network list | grep prod-cluster
# Should return empty

# Verify DNS records removed
dig +short *.prod-cluster.example.com
# Should return NXDOMAIN

# Verify local config cleaned
opencenter cluster list | grep prod-cluster
# Should not appear