I'm one of the luckiest folks who live in his dream. I make money from Kubernetes development (this is a one-feature ticket). Huge shout-out to my team at Ondat for making this possible. I spent a few days setting up my development environment and hopefully, the "end" result should be useful for others too.
The story in short, the feature which we would like to deliver requires a distributed environment. It isn't possible to test it on a single node. Kubernetes devs made a pretty nice job related to how to run a single node Kubernetes from source code on your local machine. But what about multi-node clusters? There are several solutions to run Kubernetes cluster on a machine, but they are using some sort of images (or they just didn't come to my face). But I don't want to build images, push them to a repository, start a cluster, pull images, etc. There are too many BUTs, let's solve the problem instead!
I started my project based on Techiescamp's Vagrant setup, thanks for the great work. I use this environment just for running Kubernetes, I compile source code on my host Linux machine (way faster), but all the build tools are included, so if you would like to use it as a build environment, just Go ahead and don't forget to increase master
node resources. With this setup, I'm able to test code changes in 6 minutes on my laptop on a multi-node system by executing 4 simple commands.
If you are not interested in the details, please follow the readme how to start the cluster and skip the rest of the post.
Here are the things I had to change
Systemd resolver configures localhost as resolver ...
... which created a loop and coredns
wasn't able to start. I simply replaced /etc/resolv.conf
with a static config. Simple but powerful.
systemctl disable systemd-resolved
systemctl stop systemd-resolved
rm -f /etc/resolv.conf
cat <<EOF > /etc/resolv.conf
nameserver 1.1.1.1
nameserver 8.8.8.8
EOF
Load br_netfilter
module at boot time
modprobe br_netfilter
echo "br_netfilter" >> /etc/modules
Containerd on the base repo of Ubuntu is too old for the HEAD of Kubernetes ...
... and the right version depends on the version of Kubernetes. I think the easiest (not most secure) way to install a specific version of the software is to download the binary.
wget https://github.com/containerd/containerd/releases/download/v1.6.8/containerd-1.6.8-linux-amd64.tar.gz
sudo tar Czxvf /usr/local containerd-1.6.8-linux-amd64.tar.gz
wget https://raw.githubusercontent.com/containerd/containerd/main/containerd.service
sudo mv containerd.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now containerd
Download cfssl
curl -Lo /usr/local/bin/cfssl https://github.com/cloudflare/cfssl/releases/download/v1.5.0/cfssl_1.5.0_linux_amd64
chmod +x /usr/local/bin/cfssl
curl -Lo /usr/local/bin/cfssljson https://github.com/cloudflare/cfssl/releases/download/v1.5.0/cfssljson_1.5.0_linux_amd64
chmod +x /usr/local/bin/cfssljson
Somehow we have to share data between Kubernetes nodes
Vagrant has a built-in shared volume, but it requires two (or N+) time synchronization. That doesn't sounds well, so I created an NFS share on the master node.
if [[ $(hostname) = ${MASTER_NAME} ]]; then
mkdir -p /var/run/kubernetes
apt install -y nfs-kernel-server make
cat <<EOF > /etc/exports
/var/run/kubernetes ${MASTER_IP}/24(rw,sync,no_subtree_check,all_squash,insecure)
EOF
exportfs -a
systemctl restart nfs-kernel-server
else
apt install -y nfs-common
fi
Disable firewall
systemctl disable --now ufw
ufw reset ||:
apt remove -y ufw
Shell setup
Vagrant user jumps to the source location
cat <<EOF >> /home/vagrant/.bashrc
(cd ${SOURCE} ; sudo su) ; exit
EOF
Bunch of environment variables
alias k=kubectl
export CNI_CONFIG_DIR=/tmp
export LOG_LEVEL=4
export ALLOW_PRIVILEGED=1
export ETCD_HOST=${MASTER_IP}
export API_SECURE_PORT=443
export API_HOST=${MASTER_IP}
export ADVERTISE_ADDRESS=${MASTER_IP}
export API_CORS_ALLOWED_ORIGINS=".*"
export KUBE_CONTROLLERS="*,bootstrapsigner,tokencleaner"
export KUBE_ENABLE_NODELOCAL_DNS=true
export KUBECONFIG=/var/run/kubernetes/admin.kubeconfig
export WHAT="cmd/kube-proxy cmd/kube-apiserver cmd/kube-controller-manager cmd/kubelet cmd/kubeadm cmd/kube-scheduler cmd/kubectl cmd/kubectl-convert"
export POD_CIDR="172.16.0.0/16"
export CLUSTER_CIDR="172.0.0.0/8"
export SERVICE_CLUSTER_IP_RANGE="172.17.0.0/16"
export FIRST_SERVICE_CLUSTER_IP="172.17.0.1"
export KUBE_DNS_SERVER_IP="172.17.63.254"
export GOPATH=/vagrant/github.com/kubernetes/kubernetes
export GOROOT=/opt/go
export PATH=/opt/go/bin:${SOURCE}/third_party:${SOURCE}/third_party/etcd:${SOURCE}/_output/local/bin/linux/amd64:${PATH}
Starting one node Kubernetes command is simple
Execute start
on the master node.
start() {
rm -rf /var/run/kubernetes/* ||:
KUBELET_HOST=${MASTER_IP} HOSTNAME_OVERRIDE=${MASTER_NAME} ./hack/local-up-cluster.sh -O
}
Join token generation is where the magic happens :)
I had to dig deep into kubeadm join
command to figure out how to join a new node. Execute config
on the master node.
Generate join command
kubeadm token create --print-join-command > /var/run/kubernetes/join.sh
Kubeadm bootstrap has to be able to read cluster-info
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kubeadm:bootstrap-signer-clusterinfo
namespace: kube-public
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: kubeadm:bootstrap-signer-clusterinfo
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: User
name: system:anonymous
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: kubeadm:bootstrap-signer-clusterinfo
namespace: kube-public
rules:
- apiGroups:
- ''
resources:
- configmaps
verbs:
- get
Create cluster-info
config map
It contains the KUBECONFIG
for joining. This config has some requirements because of JWT token signature validation, so it is better to generate a new config.
cat <<EOFI > /var/run/kubernetes/kubeconfig
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: $(base64 -iw0 /var/run/kubernetes/server-ca.crt)
server: https://${MASTER_IP}:${API_SECURE_PORT}/
name: ''
contexts: []
current-context: ''
kind: Config
preferences: {}
users: []
EOFI
kubectl delete cm -n kube-public cluster-info |:
kubectl create cm -n kube-public --from-file=/var/run/kubernetes/kubeconfig cluster-info
Kubelet needs lots of permissions, so I just gave them all
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kubelet:operate
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kubelet:operate
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: User
name: system:anonymous
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kubelet:operate
rules:
- apiGroups:
- '*'
resources:
- '*'
verbs:
- '*'
SECURITY ALERT !!! Kubelet is running with anonymous auth, so as you see I gave all rights to an unauthorized user!!!
The bootstrap client of Kubeadm also needs some permission fixes
token_id="$(cat /var/run/kubernetes/join.sh | awk '{print $5}' | cut -d. -f1)"
cat <<EOFI | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: kubeadm:bootstrap-signer-kubeadm-config
namespace: kube-system
rules:
- apiGroups:
- ''
resourceNames:
- kubeadm-config
- kube-proxy
- kubelet-config
resources:
- configmaps
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kubeadm:bootstrap-signer-kubeadm-config
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: kubeadm:bootstrap-signer-kubeadm-config
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: User
name: system:bootstrap:${token_id}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kubeadm:bootstrap-signer-kubeadm-config
rules:
- apiGroups:
- ''
resources:
- nodes
verbs:
- '*'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kubeadm:bootstrap-signer-kubeadm-config
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kubeadm:bootstrap-signer-kubeadm-config
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: User
name: system:bootstrap:${token_id}
EOFI
Kubeadm fetches the ClusterConfig, so I had to prepare one
cat <<EOFI > /var/run/kubernetes/ClusterConfiguration
apiServer:
timeoutForControlPlane: 2m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: local-up-cluster
imageRepository: registry.k8s.io
kind: ClusterConfiguration
kubernetesVersion: ${KUBE_VERSION}
networking:
dnsDomain: cluster.local
podSubnet: ${POD_CIDR}
serviceSubnet: ${SERVICE_CLUSTER_IP_RANGE}
EOFI
kubectl delete cm -n kube-system kubeadm-config |:
kubectl create cm -n kube-system --from-file=/var/run/kubernetes/ClusterConfiguration kubeadm-config
Copy various config files to the shared folder
last=$(ls /tmp -t | grep "local-up-cluster.sh." | head -1)
if [[ "${last}" ]]; then
cp -rf /tmp/${last}/* /var/run/kubernetes
else
cp -rf /tmp/kube* /var/run/kubernetes
fi
Create necessary config maps
cat /var/run/kubernetes/kube-proxy.yaml | sed -e "s/${MASTER_NAME}/''/" -e "s/${MASTER_IP}/${NODE_IP}/" > /var/run/kubernetes/config.conf
kubectl delete cm -n kube-system kube-proxy |:
kubectl create cm -n kube-system --from-file=/var/run/kubernetes/config.conf kube-proxy
cp -f /var/run/kubernetes/kubelet.yaml /var/run/kubernetes/kubelet
kubectl delete cm -n kube-system kubelet-config |:
kubectl create cm -n kube-system --from-file=/var/run/kubernetes/kubelet kubelet-config
Finally refresh the shared folder and set permissions
exportfs -a
chmod -R a+rw /var/run/kubernetes/*
Join a member to the cluster
Execute member
on the worker node.
Mount the shared volume if not mounted
mkdir -p /var/run/kubernetes ; mount | grep /var/run/kubernetes 1>/dev/null || mount ${MASTER_IP}:/var/run/kubernetes /var/run/kubernetes
Create a Systemd service units for Kube-proxy and Kubelet
cat <<EOFI > /etc/systemd/system/kube-proxy.service
[Unit]
Wants=network-online.target
After=network-online.target
[Service]
ExecStart=/vagrant/github.com/kubernetes/kubernetes/_output/local/bin/linux/amd64/kube-proxy \
--v=3 \
--config=/var/run/kubernetes/config.conf \
--master="https://${MASTER_IP}:${API_SECURE_PORT}"
Restart=on-failure
StartLimitInterval=0
RestartSec=10
[Install]
WantedBy=multi-user.target
EOFI
cat <<EOFI > /etc/systemd/system/kubelet.service
[Unit]
Wants=kube-proxy
After=kube-proxy
[Service]
ExecStart=/vagrant/github.com/kubernetes/kubernetes/_output/local/bin/linux/amd64/kubelet \
--address="${NODE_IP}" \
--hostname-override=$(hostname) \
--pod-cidr="${POD_CIDR}" \
--node-ip="${NODE_IP}" \
--register-node=true \
--v=3 \
--bootstrap-kubeconfig=/var/run/kubernetes/admin.kubeconfig \
--kubeconfig=/var/run/kubernetes/admin.kubeconfig \
--container-runtime-endpoint=unix:///var/run/containerd/containerd.sock \
--client-ca-file=/var/run/kubernetes/client-ca.crt \
--config=/var/run/kubernetes/kubelet.yaml
Restart=no
StartLimitInterval=0
RestartSec=10
[Install]
WantedBy=multi-user.target
EOFI
systemctl daemon-reload
systemctl restart kube-proxy
Finally execute the generated script
sh /var/run/kubernetes/join.sh
Verify Kubernetes cluster
Please follow the readme how to start the cluster.
# k get no -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master-node Ready <none> 3m51s v1.27.0-alpha.0.367+1fe7f09b46850f-dirty 192.168.56.10 <none> Ubuntu 22.04.1 LTS 5.15.0-57-generic containerd://1.6.8
worker-node01 Ready <none> 2m22s v1.27.0-alpha.0.367+1fe7f09b46850f-dirty 192.168.56.11 <none> Ubuntu 22.04.1 LTS 5.15.0-57-generic containerd://1.6.8
# k get po -Ao wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system calico-kube-controllers-57b57c56f-chbnq 1/1 Running 0 2m49s 192.168.0.3 master-node <none> <none>
kube-system calico-node-hlxqg 1/1 Running 0 2m49s 192.168.56.10 master-node <none> <none>
kube-system calico-node-jx4gr 1/1 Running 0 2m29s 192.168.56.11 worker-node01 <none> <none>
kube-system coredns-6846b5b5f-qqhx8 1/1 Running 0 4m27s 192.168.0.2 master-node <none> <none>
Here are the things I have to change
This project is evolving based on my requirements, so here are some potential improvements.
Join command times out
The cluster works well, but the health check at the join command doesn't.
Top comments (0)