Sebastian

Posted on Jul 24, 2023

Kubernetes with K3S: How I Upgraded a Production Cluster from v1.17 to v1.25

#kubernetes #k3s

Since its inception in 2020, my Kubernetes stack happily serves this blog and my lighthouse service. While I updated the application code base, I did stay with the Kubernetes version installed at that date and time: v1.17. It’s time to change that, and upgrade stepwise to a most recent version. The upgrade seemed to be challengingly, and so I made some notes which ultimately led to this blog post.

This blog post is a concise summary of upgrading a rancher K3 cluster from v1.17 to 1.24. Read it to get a thorough, practical explanation of potential problems and their solutions.

This article originally appeared at my blog admantium.com.

K3S Upgrade Preparation

The preparation for a Kubernetes cluster depends very much on the Kubernetes platform or distribution, the cluster size, and the workloads that you host. In my case, my private cluster consists of 3 nodes and 2 applications and is based on k3S, a lightweight distribution. I have written about K3S in previous articles, such as K3S introduction or K3S installation tutorial. K3S should make upgrading Nodes easy because all Kubernetes components are bundled into one binary, which, when changed, will upgrade all K8S Components in one step. However, you need to be aware of version changes in the Kubernetes components themselves, and consider if they are still compatible with your application configuration, the YAML manifest files that represent deployments, endpoints, and ingress definitions. This is especially true when you work with complex HELM charts - they are very likely to brake between Kubernetes upgrades, which means you should upgrade them first.

With this in mind, here are the things to look for:

Kubernetes Distribution: Read the upgrade process and requirements of your distribution carefully, and consider which of this might impact you. For the K3S upgrade plan, I see no obstacles.
K8S manifest files: Check if upgrading to newer versions will change the apiVersion field in the YAML resource manifests, and if/or structural changes happen. There is a concise API deprecation guide that lists the changes. Read it to understand the changes, and update the manifests accordingly. Upgrading manifests can be automated to a certain degree with kubectl convert see docs
Helm Charts: Read your Helm Charts documentation to identify potential version upgrade problems. The typical solution is to upgrade the Helm chart to a compatible Kubernetes version first before upgrading Kubernetes itself. In some cases, it might be the other way around.

Ok, it’s time to start. For all of the following, keep in mind that I could allow the cluster to be not available for some time, which might not be the case for your upgrade journey.

Step 1: Upgrading Kubernetes from 1.17 => 1.19

The first step is only a small upgrade of two minor versions.

Backup All Manifest Files

Let’s start by creating a complete YAML manifest backup of all resources with the following command:

kb get all -A -o yaml > all.yaml

And then see the currently installed versions:

kb get nodes

NAME         STATUS   ROLES    AGE      VERSION
k3s-server   Ready    master   2y122d   v1.17.2+k3s1
k3s-node1    Ready    <none>   2y122d   v1.17.2+k3s1
k3s-node2    Ready    <none>   2y122d   v1.17.2+k3s1

Upgrade to Kubernetes v1.18

To identify the very next available minor version, the K3S Github releases page is the best source. Search for the next best version tag, which in my case s [v1.18.8+k3s1], and apply it as shown:

# server
curl -sfL <https://get.k3s.io> | INSTALL_K3S_CHANNEL=v1.18.8+k3s1 sh -

# on each node
curl -sfL <https://get.k3s.io> | INSTALL_K3S_CHANNEL=v1.18.8+k3s1 K3S_TOKEN=$SECRET sh -

The installation logs show no errors at all:

$> curl -sfL <https://get.k3s.io> | INSTALL_K3S_CHANNEL=v1.18.8+k3s1 sh -
[INFO]  Finding release for channel v1.18.8+k3s1
[INFO]  Using v1.18.8+k3s1 as release
[INFO]  Downloading hash <https://github.com/k3s-io/k3s/releases/download/v1.18.8+k3s1/sha256sum-amd64.txt>
[INFO]  Downloading binary <https://github.com/k3s-io/k3s/releases/download/v1.18.8+k3s1/k3s>
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Skipping installation of SELinux RPM
[INFO]  Skipping /usr/local/bin/kubectl symlink to k3s, already exists
[INFO]  Skipping /usr/local/bin/crictl symlink to k3s, already exists
[INFO]  Skipping /usr/local/bin/ctr symlink to k3s, already exists
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[INFO]  systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO]  systemd: Starting k3s

And with kubectl get nodes, everything looks fine as well:

NAME         STATUS   ROLES    AGE      VERSION
k3s-server   Ready    master   2y122d   v1.18.8+k3s1
k3s-node1    Ready    <none>   2y122d   v1.18.8+k3s1
k3s-node2    Ready    <none>   2y122d   v1.18.8+k3s1

Upgrade to Kubernetes v1.19

Continuing to version v1.19.4+k3s2:

# server
curl -sfL <https://get.k3s.io> | INSTALL_K3S_CHANNEL=v1.19.4+k3s2 sh -

# on each node
curl -sfL <https://get.k3s.io> | INSTALL_K3S_CHANNEL=v1.19.4+k3s2 K3S_TOKEN=$SECRET sh -

This time, I ran into an error: node1 and node2 was marked as not ready:

k3s-server   Ready       master   2y122d   v1.19.4+k3s2
k3s-node2    NotReady    <none>   2y122d   v1.19.4+k3s2
k3s-node1    NotReady    <none>   2y122d   v1.19.4+k3s2

On node2, I executed these commands:

systemctl stop k3s
systemctl stop k3s-agent
systemctl start k3s-agent

On node1, this did not work. Checking the Kubernetes logfile, I saw this message:

kube-proxy  failed to start proxier healthz on 127.0.0.1:10256: listen tcp 127.0.0.1:10256: bind: address already in use

However, this particular error opened a rabbit hole in which I spend too much time trying different things. Finally I simply restarted the node1, and a short time after:

k3s-server   Ready    master   2y122d   v1.19.4+k3s2
k3s-node2    Ready    <none>   2y122d   v1.19.4+k3s2
k3s-node1    Ready    <none>   2y122d   v1.19.4+k3s2

Fix the Docker Registry

The private Docker registry hosted inside my Kubernetes cluster was removed during the update. Therefore, I needed to upload all images anew.

Finally, all services were back online:

NAME                       READY   UP-TO-DATE   AVAILABLE   AGE
docker-registry            1/1     1            1           2y121d
nginx-ingress-controller   1/1     1            1           2y122d
lighthouse-redis           1/1     1            1           481d
lighthouse-scanner         3/3     3            3           2y65d
lighthouse-api             1/1     1            1           2y65d
lighthouse-web             1/1     1            1           2y65d
admantium-blog             1/1     1            1           2y90d

Step 2: Upgrading Kubernetes from 1.19 => 1.21

For the next upgrade, I decide to use the very same approach: Grab the latest patch release of the next minor version, and apply it.

Updgrade to Kubernetes v1.20

Applying version v1.20.15:

# server
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.20.15+k3s1 sh -

# on each node
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.20.15+k3s1 K3S_TOKEN=$SECRET sh -

The update was smooth, but node01 was reported as not-ready, although I could connect via SSH normally. I saw several processes consuming an excessive amount of CPU. After killing them, the node became available.

  Type     Reason                   Age    From        Message
  ----     ------                   ----   ----        -------
  Normal   Starting                 2m43s  kube-proxy  Starting kube-proxy.
  Normal   Starting                 2m43s  kubelet     Starting kubelet.
  Warning  InvalidDiskCapacity      2m43s  kubelet     invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory  2m43s  kubelet     Node k3s-node1 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    2m43s  kubelet     Node k3s-node1 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     2m43s  kubelet     Node k3s-node1 status is now: NodeHasSufficientPID
  Normal   NodeNotReady             2m43s  kubelet     Node k3s-node1 status is now: NodeNotReady
  Normal   NodeAllocatableEnforced  2m42s  kubelet     Updated Node Allocatable limit across pods
  Normal   NodeReady                51s    kubelet     Node k3s-node1 status is now: NodeReady

Upgrade to Kubernetes v1.21

Let’s continue with v1.21.8+k3s2,

# server
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.21.8+k3s2 sh -

# on each node
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.21.8+k3s2 K3S_TOKEN=$SECRET sh -

All nodes were ready, but one of the application deployments did not work anymore.

NAME                       READY   UP-TO-DATE   AVAILABLE   AGE
docker-registry            1/1     1            1           2y121d
nginx-ingress-controller   1/1     1            1           2y123d
lighthouse-redis           1/1     1            1           482d
lighthouse-web             1/1     1            1           2y66d
admantium-blog             1/1     1            1           2y90d
lighthouse-scanner         3/3     3            3           2y66d
lighthouse-api             0/1     1            0           2y66d

Fix AppArmor Error

One container, the lighthouse-api, did not start because of this error:

Error: failed to create containerd container: get apparmor_parser version: exec: "apparmor_parser": executable file not found in $PATH

This apparmor related error occurs in k3s v1.21, and this bug ticket provided the solution: On each node, run the following commands, then reboot the node:

apt install apparmor apparmor-utils

All applications are running again.

NAME                 READY   UP-TO-DATE   AVAILABLE   AGE
lighthouse-redis     1/1     1            1           482d
docker-registry      1/1     1            1           2y122d
admantium-blog       1/1     1            1           2y91d
lighthouse-api       1/1     1            1           2y66d
lighthouse-web       1/1     1            1           2y66d
lighthouse-scanner   3/3     3            3           2y66d

Step 3: Updating Manifests for v1.22

When moving towards v1.22, several API manifests are expected tp change:

Ingress: Use the new api version networking.k8s.io/v1, change the structure of backend, servicName, servicePort, and add a pathType annotation
Certificate Signing Requests: Use the new api version certificates.k8s.io/v1, and change the signing clients
RBAC: Use API version rbac.authorization.k8s.io/v1

I definitely need to update the Ingress resources, otherwise my blog and lighthouse service are not working anymore. Let’s update the Ingress definitions first using kubectl convert. As an example, let’s take the ingress definition for the blog.

The pre v1.22 spec is this:

kind: Ingress
metadata:
  annotations:
    cert-manager.io/cluster-issuer: admantium-blog-cert
    kubernetes.io/ingress.class: nginx
spec:
  rules:
  - host: admantium.com
    http:
      paths:
      - backend:
          serviceName: admantium-blog
          servicePort: 8080
        path: /
        pathType: ImplementationSpecific
  tls:
  - hosts:
    - admantium.com
    secretName: admantium-blog-cert
status:
  loadBalancer:
    ingress:
    - ip: 49.12.45.6

And after:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/cluster-issuer: admantium-blog-cert
    kubernetes.io/ingress.class: nginx
spec:
  rules:
  - host: admantium.com
    http:
      paths:
      - backend:
          service:
            name: admantium-blog
            port:
              number: 8080
        path: /
        pathType: ImplementationSpecific
  tls:
  - hosts:
    - admantium.com
    secretName: admantium-blog-cert
status:
  loadBalancer:
    ingress:
    - ip: 49.12.45.6

By running kb convert -f blog.yaml --output-version networking.k8s.io/v1 > blog_new.yaml && kb apply -f blog_new.yaml, the ingress resource was updated successfully. I did similar commands for the other ingresses too, and then continued.

Step 4: Upgrading Helm Charts for v1.22

During the initial setup of my cluster, I used the tool arkade, see also my 2020-08-03 blog post. This tool packages essential Kubernetes helm charts with a simple installer. Using the helm binary directly, I see these installed releases:

cert-manager    cert-manager 5        2022-09-17 12:38:54.128091774 +0200 CEST deployed cert-manager-v0.14.3  v0.14.3
docker-registry default      2        2020-04-26 19:29:33.850171 +0200 CEST    deployed docker-registry-1.9.2 2.7.1
ingress-nginx   default      1        2022-08-27 12:32:50.667457391 +0200 CEST deployed ingress-nginx-4.2.3   1.3.0

And these registries:

helm repo list
NAME          URL
ingress-nginx https://kubernetes.github.io/ingress-nginx
jetstack      https://charts.jetstack.io

All of these need to be updated.

Ingress Nginx

During an earlier update, I installed ingress-nginx, assuming it would replace the nginx-ingress release. But because of naming differences, this resulted in two separate installations:

helm list --all-namespaces
NAME            NAMESPACE    REVISION UPDATED                                  STATUS   CHART                 APP VERSION
cert-manager    cert-manager 3        2020-04-27 19:58:05.144256 +0200 CEST    deployed cert-manager-v0.12.0  v0.12.0
docker-registry default      2        2020-04-26 19:29:33.850171 +0200 CEST    deployed docker-registry-1.9.2 2.7.1
ingress-nginx   default      1        2022-08-27 10:27:22.303325075 +0200 CEST failed   ingress-nginx-4.2.3   1.3.0
nginx-ingress   default      2        2020-05-08 14:11:09.757913 +0200 CEST    deployed nginx-ingress-1.36.3  0.30.0
traefik         kube-system  3        2022-08-26 16:01:19.368774236 +0000 UTC  deployed traefik-1.81.001      1.7.19

The solution was to cleanly uninstall and reinstall the components:

helm delete nginx-ingress
helm delete ingress-nginx
helm install ingress-nginx

Then the deployment was running again:

helm search repo ingress-nginx -l

NAME                        CHART VERSION APP VERSION DESCRIPTION
ingress-nginx/ingress-nginx 4.2.3         1.3.0       Ingress controller for Kubernetes using NGINX a...

Cert Manager

For the certificates, I 'm using cert manager. On its release information page, I could see that my installed version supports Kubernetes up to v1.21. The upgrade notes for cert manager are in sync with the same practice as upgrading Kubernetes: One minor version at a time, and using the highest available patch version.

Upgrade to v0.14 and Fix CRD Error

The first upgrade resulted in an error:

kubectl delete -n cert-manager deployment cert-manager cert-manager-cainjector cert-manager-webhook

helm upgrade --set installCRDs=true --version 0.14 cert-manager jetstack/cert-manager --namespace=cert-manager

Error: UPGRADE FAILED: cannot patch "cert-manager-cainjector" with kind Deployment: Deployment.apps "cert-manager-cainjector" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"cainjector", "app.kubernetes.io/instance":"cert-manager", "app.kubernetes.io/name":"cainjector"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable && cannot patch "cert-manager" with kind Deployment: Deployment.apps "cert-manager" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"controller", "app.kubernetes.io/instance":"cert-manager", "app.kubernetes.io/name":"cert-manager"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable && cannot patch "cert-manager-webhook" with kind Deployment: Deployment.apps "cert-manager-webhook" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"webhook", "app.kubernetes.io/instance":"cert-manager", "app.kubernetes.io/name":"webhook"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable

I found the solution in this cert-manager issue: Manually uninstall outdated CRDs, then perform the upgrade.

for i in certificates.cert-manager.io challenges.acme.cert-manager.io clusterissuers.cert-manager.io issuers.cert-manager.io orders.acme.cert-manager.io
do
  k delete crd $i
done

helm upgrade --set installCRDs=true --version 0.14 cert-manager jetstack/cert-manager --namespace=cert-manager

...

NAME: cert-manager
LAST DEPLOYED: Sat Sep 17 12:38:54 2022
NAMESPACE: cert-manager
STATUS: deployed
REVISION: 5
TEST SUITE: None
NOTES:
cert-manager has been deployed successfully!

Upgrade to v1.9

Continuing all the way from 0.14 to 1.9 was unproblematic. After each upgrade, I checked the Kubernetes event messages. A typical printout was this:

kube-system    0s          Normal   LeaderElection      lease/cert-manager-controller                            cert-manager-7b8d75c477-rmtgw-external-cert-manager-controller became leader
kube-system    0s          Normal   LeaderElection      lease/cert-manager-cainjector-leader-election            cert-manager-cainjector-6cd8d7f84b-tc2vn_649414c2-b9cb-4ace-af6f-8feaa5a0f06b became leader

I was especially delighted about this message:

default        0s          Normal   CreateCertificate   ingress/lighthouse                                       Successfully created Certificate "lighthouse-cert"
default        0s          Normal   CreateCertificate   ingress/blog                                             Successfully created Certificate "admantium-blog-cert"
default        0s          Normal   CreateCertificate   ingress/docker-registry                                  Successfully created Certificate "docker-registry"

Finally the most recent version is used:

helm list -A
NAME            NAMESPACE    REVISION UPDATED                                  STATUS   CHART                 APP VERSION
cert-manager    cert-manager 21       2022-09-18 10:35:16.958630764 +0200 CEST deployed cert-manager-v1.9.1   v1.9.1

Docker-Registry

The docker-registry that was installed with the tool arkade stems originally from the Github repo helm/charts. This repo was deprecated with announcement, and moved to a new Repo

The question now is: Can I update the helm chart from this new repo, or do I need to install from scratch? The release page lists the most oldest version as 1.9.7. Let’s try an upgrade with the --dry-run option.

helm upgrade --version 1.9.7  docker-registry docker-registry/docker-registry --namespace=default --dry-run

...
NAME: docker-registry
LAST DEPLOYED: Sun Sep 18 11:19:47 2022
NAMESPACE: default
STATUS: pending-upgrade
REVISION: 3
TEST SUITE: None
...

This looks good! All printed manifests file is similar to the current ones. Let’s upgrade, and then try a docker push command:

helm upgrade --version 1.9.7  docker-registry docker-registry/docker-registry --namespace=default --dry-run

docker push docker.admantium.com/lighthouse-web:0.4.2
The push refers to repository [docker.admantium.com/lighthouse-web]
f1a5039ecf29: Pushed
221ee9f09112: Pushed
70d0aad4ac8b: Pushed
b539cf60d7bb: Pushed
bdc7a32279cc: Pushed

This went well! In the same manner as before, I upgraded one minor version and tried the docker push command. The only notable release info is the upgrade from 1.16.0 to 2.0.0 which added the ingress.spec.ingressClassName field so that the ingress resource works as before.

Helm Chart Upgrades Completed

Yes!

NAME            NAMESPACE    REVISION UPDATED                                  STATUS   CHART                 APP VERSION
cert-manager    cert-manager 21       2022-09-18 10:35:16.958630764 +0200 CEST deployed cert-manager-v1.9.1   v1.9.1
docker-registry default      14       2022-09-18 11:35:49.363746406 +0200 CEST deployed docker-registry-2.2.2 2.8.1
ingress-nginx   default      1        2022-08-27 12:32:50.667457391 +0200 CEST deployed ingress-nginx-4.2.3   1.3.0

Step 5: Upgrading Kubernetes from 1.21 => v1.24

For the final Kubernetes version upgrades, I used an even better the more comprehensive changelog from the Kubernetes GitHub repository. As before, I determined the most recent patch version of the next minor version to install, and continued.

Upgrading to Kubernetes v1.21

The upgrade command:

# server
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.21.14+k3s1 sh -

# on each node
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="agent" INSTALL_K3S_CHANNEL=v1.21.14+k3s1 K3S_TOKEN=$SECRET sh -

Again, the worker nodes were not immediately available. After manually stopping and staring k3s on the nodes, the update was successful:

systemctl stop k3s
systemctl stop k3s-agent
systemctl start k3s-agent

Upgrading to Kubernetes v1.22

The upgrade to v1.22 was not so smooth. From this version on, K3S defaults to install traefik as the ingress manager via a Helm chart. This disrupted the pod communication. I uninstalled traefik, and then added a specific flag to the upgrade command, like shown:

# server
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--no-deploy traefik --disable-network-policy" INSTALL_K3S_CHANNEL=v1.22.13+k3s1 sh -

# on each node
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.22.13+k3s1 K3S_URL=https://49.12.45.6:6443 K3S_TOKEN=$SECRET sh -

All applications worked.

Upgrading to Kubernetes v1.23

The most recent version is v1.23.10.

# server
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--no-deploy traefik --disable-network-policy" INSTALL_K3S_CHANNEL=v1.23.10+k3s1 sh -

# on each node
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.23.10+k3s1 K3S_URL=https://49.12.45.6:6443 K3S_TOKEN=$SECRET sh -

All applications worked.

NAME         STATUS   ROLES                  AGE      VERSION
k3s-node2    Ready    <none>                 2y151d   v1.23.10+k3s1
k3s-server   Ready    control-plane,master   2y151d   v1.23.10+k3s1
k3s-node1    Ready    <none>                 2y151d   v1.23.10+k3s1

Upgrading to Kubernetes v1.24

Upgrade to v1.24.4+k3s1.

# server
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--no-deploy traefik --disable-network-policy" INSTALL_K3S_CHANNEL=v1.24.4+k3s1 sh -

# on each node
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.24.4+k3s1 K3S_URL=https://49.12.45.6:6443 K3S_TOKEN=$SECRET sh -

This one was also very smooth, even no node restart was required.

k3s-server   Ready    control-plane,master   2y151d   v1.24.4+k3s1
k3s-node1    Ready    <none>                 2y151d   v1.24.4+k3s1
k3s-node2    Ready    <none>                 2y151d   v1.24.4+k3s1

Upgrading to Kubernetes v1.25

The final upgrade is at hand:

# server
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--no-deploy traefik --disable-network-policy" INSTALL_K3S_CHANNEL=v1.25.5+k3s1 sh -

# on each node
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.25.5+k3s1 K3S_URL=https://49.12.45.6:6443 K3S_TOKEN=$SECRET sh -

All services work. The upgrade is finished.

Conclusion

Upgrading a Kubernetes cluster to its most recent version can be an intimidating task. In this article, you saw a practical example for upgrading a K3S cluster from v1.17 to v1.25. During the update process, I encountered some errors, and solved them as follows a) If a node is not available, restart the k3s binary or the complete node, b) If pod communication or incoming Ingress traffic is disrupted, check the ingress configuration and which ingress solution is installed via Helm, c) on Debian systems, be sure that apparmor and apparmor-utils are installed. In general, you update Kubernetes one minor version at a time, and you need to check three different things: a) Updates to the Kubernetes API, b) Update to Kubernetes manifests, c) Updates of the Helm chart. I'm happy that the updates were successful, and that at the time of writing I had a complete up-to-cluster.

DEV Community