Kubernetes Pod Termination Process
A pod can be evicted for multiple reasons.
- when a node is drained of all it's pods,
- via kubectl delete,
- scheduler evicts pods to allow the execution of higher-priority ones...
Eviction process
- Delete request is issued
- API server modifies the state to
Terminating
in etcd. - The kubelet and the endpoints-controller start the eviction process:
- Kubelet performs the pod eviction.
- The endpoints-controller handles the endpoint removal process. Both operations are asynchronous.
Kubelet sequence
Kubelet initiates a shutdown sequence for each container in the pod.
- Kubelet runs the container’s
pre-stop
hook (if it exists) - sends a SIGTERM to the containers and
- waits for the termination of the containers.
This sequential process by default is set to 30 seconds
(or the value in seconds specified in the spec.terminationGracePeriodSeconds
field).
If the container is still running beyond this time, kubelet waits for approximately 2 more seconds, and then kills the container forcibly by sending a SIGKILL signal.
In parallel, the endpoints-controller removes the pod’s endpoint by requesting the API server. This server notifies all the kube-proxies on the worker nodes. Each kube-proxy removes the endpoint from the iptables rules in its node.
Implications
We cannot make assumptions about which of the eviction processes will complete first. If the endpoint removal process finishes before the containers receive the SIGTERM signal, no new requests will arrive while the containers are terminating. However, if the containers start terminating before the endpoint removal process is finished, the pods will continue to receive requests. In that case, clients will get “Connection timeout” or “Connection refused” errors as responses. Because the endpoint removal must propagate to every node in the cluster before it is complete, there is a high probability that the pod eviction process completes first. LearnK8s created a neat visual representation of these two scenarios.
Tips for graceful termination
- Handle SIGTERM properly
- Tune
spec.lifecycle.preStop
andspec.terminationGracePeriodSeconds
- Use a stop-watch timer to count and drain connections and only then shut-down
Top comments (0)