How to Safely Remove a Control-Plane Node from a K3s Cluster
Decommissioning a control-plane node from a K3s cluster with embedded etcd requires precision to maintain cluster stability. In this post, I’ll guide you through removing a control-plane node called stormrider
from a K3s cluster, covering pod drainage, etcd member removal, node deletion, and verification steps to ensure a seamless process.
Background
The K3s cluster consisted of six nodes:
lunar-probe
,nebula-42
,quantum-core
(worker nodes)skyforge-77
,nova-prime
,stormrider
(control-plane, etcd, master nodes)
The task was to remove stormrider
, running Fedora Linux Cosmic Edition and hosting four DaemonSet-managed pods (svclb-*
) for LoadBalancer services via K3s’s klipper-lb. As an etcd member, stormrider
’s removal needed to preserve quorum to protect the control plane.
Prerequisites
Ensure you have:
kubectl
access to the cluster.- SSH access to
stormrider
. etcdctl
installed (e.g., v3.5.18, matching your K3s version).- etcd certificates (e.g.,
/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt
,client.crt
,client.key
).
Step-by-Step Guide
1. Verify Cluster and Node State
Check the cluster’s nodes:
kubectl get nodes -o wide
Output (abridged for stormrider):
NAME STATUS ROLES AGE VERSION INTERNAL-IP OS-IMAGE
stormrider Ready control-plane,etcd,master 27d v1.32.3+k3s1 192.168.7.99 Fedora Linux Cosmic Edition
List pods on stormrider:
bash
kubectl get pods --all-namespaces -o wide | grep stormrider
Output:
kube-system svclb-starlink-gateway-4f8b2c9d-xk7lp 1/1 Running 10.43.9.12 stormrider
kube-system svclb-datastream-relay-6d4e3f2a-qw5mn 2/2 Running 10.43.9.11 stormrider
kube-system svclb-comms-hub-8c9a4g3b-vz8rk 4/4 Running 10.43.9.10 stormrider
kube-system svclb-astro-core-2b7d5h4c-nj3pm 5/5 Running 10.43.9.09 stormrider
These `svclb-*` pods are DaemonSet-managed for LoadBalancer services.
2. Check etcd Health
Since stormrider is an etcd member, verify etcd health:
./etcdctl endpoint health --endpoints=https://192.168.7.11:2379,https://192.168.7.22:2379,https://192.168.7.99:2379 --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt --key=/var/lib/rancher/k3s/server/tls/etcd/client.key
Output:
https://192.168.7.22:2379 is healthy: successfully committed proposal: took = 6.512345ms
https://192.168.7.99:2379 is healthy: successfully committed proposal: took = 6.789123ms
https://192.168.7.11:2379 is healthy: successfully committed proposal: took = 7.123456ms
All nodes (nova-prime, skyforge-77, stormrider) are healthy.
Get the etcd member list:
./etcdctl member list --endpoints=https://192.168.7.11:2379,https://192.168.7.22:2379,https://192.168.7.99:2379 --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt --key=/var/lib/rancher/k3s/server/tls/etcd/client.key
Output:
4a8e56cd7890ef12, started, nova-prime-3g7h8j9k, https://192.168.7.11:2380, https://192.168.7.11:2379, false
7b9f67de8901ab23, started, stormrider-5k2m3n4p, https://192.168.7.99:2380, https://192.168.7.99:2379, false
2c0d34ef9012bc45, started, skyforge-77-6q8r9s0t, https://192.168.7.22:2380, https://192.168.7.22:2379, false
Note stormrider’s member ID: `7b9f67de8901ab23`.
3. Drain the Node
Drain stormrider to evict pods:
kubectl drain stormrider --ignore-daemonsets --delete-emptydir-data --force
Output:
node/stormrider already cordoned
Warning: ignoring DaemonSet-managed Pods: kube-system/svclb-starlink-gateway-4f8b2c9d-xk7lp, kube-system/svclb-datastream-relay-6d4e3f2a-qw5mn, kube-system/svclb-comms-hub-8c9a4g3b-vz8rk, kube-system/svclb-astro-core-2b7d5h4c-nj3pm
node/stormrider drained
The `svclb-*` pods are skipped, as they’re DaemonSet-managed.
4. Remove Node from etcd
Remove stormrider from etcd:
./etcdctl member remove 7b9f67de8901ab23 --endpoints=https://192.168.7.11:2379,https://192.168.7.22:2379,https://192.168.7.99:2379 --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt --key=/var/lib/rancher/k3s/server/tls/etcd/client.key
Output:
Member 7b9f67de8901ab23 removed from cluster 12ab34cd56ef7890
Verify the member list:
./etcdctl member list --endpoints=https://192.168.7.11:2379,https://192.168.7.22:2379 --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt --key=/var/lib/rancher/k3s/server/tls/etcd/client.key
Output:
4a8e56cd7890ef12, started, nova-prime-3g7h8j9k, https://192.168.7.11:2380, https://192.168.7.11:2379, false
2c0d34ef9012bc45, started, skyforge-77-6q8r9s0t, https://192.168.7.22:2380, https://192.168.7.22:2379, false
Check etcd health:
./etcdctl endpoint health --endpoints=https://192.168.7.11:2379,https://192.168.7.22:2379 --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt --key=/var/lib/rancher/k3s/server/tls/etcd/client.key
Output:
https://192.168.7.22:2379 is healthy: successfully committed proposal: took = 5.987654ms
https://192.168.7.11:2379 is healthy: successfully committed proposal: took = 6.234567ms
The etcd cluster is stable with two nodes.
5. Delete the Node
Remove stormrider from the cluster:
kubectl delete node stormrider
Output:
node "stormrider" deleted
6. Clean Up the Node
SSH into stormrider and stop K3s:
sudo systemctl stop k3s
sudo systemctl disable k3s
6. Verify the Cluster
Check nodes:
kubectl get nodes -o wide
Confirm stormrider is gone, leaving five nodes: lunar-probe, nebula-42, skyforge-77, nova-prime, quantum-core.
Verify pods:
kubectl get pods --all-namespaces -o wide
Ensure `svclb-*` pods have rescheduled. Check services:
kubectl get svc --all-namespaces
## Optionally, recheck etcd health:
./etcdctl endpoint health --endpoints=https://192.168.7.11:2379,https://192.168.7.22:2379 --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt --key=/var/lib/rancher/k3s/server/tls/etcd/client.key
## Post-Removal Considerations
etcd Quorum: With two control-plane nodes, quorum holds, but fault tolerance is reduced. Consider adding a third node.
Services: Monitor starlink-gateway, datastream-relay, comms-hub, and astro-core services. Check svclb-astro-core for stability if it had prior issues.
Pods: Klipper-lb should reschedule svclb-* pods. Verify service accessibility.
Troubleshooting Tips
Stuck Pods: Use --force for drain cautiously or debug pod dependencies.
etcd Quorum Loss: Restore from backups or rejoin nodes if quorum fails.
Service Issues: Inspect pods and services if LoadBalancers are inaccessible.
## Conclusion
Removing stormrider from a K3s cluster required careful steps to drain pods, remove it from etcd, delete the node, and clean up. By verifying each step, we ensured cluster stability and pod rescheduling. Use this guide to safely decommission control-plane nodes in K3s, keeping your cluster robust.
Happy clustering!