This post originally appeared on my personal blog.
Introduction
Scaleway is a French cloud provider that mostly specialises in (custom designed) bare metal ARM servers, standard VPSes, and has recently started adding some additional services like x86 bare metal servers, Load Balancers, a new and improved object storage, managed databases, container registry, managed firewalls, and, hotly anticipated, a managed Kubernetes Service, Kapsule. There's plenty of competition in the managed Kubernetes space, but Scaleway have a few potential advantages, most notably:
- the possibility to have bare metal and ARM-based node pools (not yet available in the public beta, but ARM is kind of Scaleway's specialty, so it'd be surprising if they didn't offer it)
- decent integrated ecosystem - Load Balancers, Container Registry, block and object storage, and plenty of others soon
- pricing, which is on par with Digital Ocean's pricing (even if the smallest possible node type is pretty big at 4vCPU/16GB RAM and at 39 euros/month), the control plane is free, and associated services aren't expensive - load balancers are at 9 euros/month, container registry is at 0.025 euros/GB/month for storage, with networking free within the same region and at 0.03 euros/GB/month
- cluster upgrades, which aren't offered by everyone (and for a long time neither Digital Ocean nor OVH proposed it, and Kapsule is younger than either of them)
- cluster autoscaling (based on the cluster autoscaler Kubernetes project), which, again, isn't offered by everyone
- EU-based (datacenters in Paris and Amsterdam)
So, let's give a Kapsule cluster a spin and see if it lives up to my expectations.
Kapsule Overview
Currently Kapsule is in beta, so some features aren't ready yet (like cluster autoscaling or upgrades from the web UI), and for now only the Paris region is supported.
Kapsule clusters can use calico, weave, flannel or cilium for the overlay network, and upon creation Scaleway can optionally deploy traefik (1.x, for now) or nginx ingress controllers, and the Kubernetes dashboard (which of course you can do later on your own, via Helm or otherwise, but it's a nice touch). You get an automatic wildcard DNS with your cluster's id.nodes.k8s.fr-par.scw.cloud
, which can come in handy for testing.
Deploying a cluster
For now, cluster deployment is possible via the API, the web console or terraform (which sadly wasn't available when i started writing this post).
Let's deploy a small cluster via the API. First you'll need your organization id and a token secret key, both available from the credentials page of the Scaleway console, and export the latter for easier reuse to SCALEWAY_TOKEN
. Then, create a JSON file along these lines (updating the organization_id with yours), which is the bare minimum to create a 1.14.8 (on purpose, to test cluster upgrades to a more recent version, 1.15.x, later on) Kubernetes cluster with 2 GP-1S nodes in the default node pool, with calico and treafik ingress:
Warning Creating the following resources will cost you money, as per Scaleway's pricing page.
{
"organization_id": "xxx",
"name": "k8s-test",
"version": "1.14.8",
"cni": "calico",
"dashboard": true,
"ingress": "traefik",
"default_pool_commercial_type": "gp1_s",
"default_pool_autoscaling": false,
"default_pool_size": 2
}
Then POST it to the Scaleway API in the right region (fr-par only for now):
curl -XPOST -H "X-Auth-Token: ${SCALEWAY_TOKEN}" -d @data.json https://api.scaleway.com/k8s/v1beta2/regions/fr-par/clusters
And we get a response straight away with the cluster ID and the DNS wildcard:
{
"autoscaler_config": {
"balance_similar_node_groups": false,
"estimator": "binpacking",
"expander": "random",
"expendable_pods_priority_cutoff": -10,
"ignore_daemonsets_utilization": false,
"scale_down_delay_after_add": "10m",
"scale_down_disable": false
},
"cluster_ip": "",
"cluster_port": 0,
"cluster_url": "https://f50f0126-b994-47a9-9949-b68a3ed1335b.api.k8s.fr-par.scw.cloud:6443",
"cni": "calico",
"created_at": "2019-09-15T15:32:01.278854200Z",
"current_core_count": 0,
"current_mem_count": 0,
"current_node_count": 0,
"description": "",
"dns_wildcard": "*.f50f0126-b994-47a9-9949-b68a3ed1335b.nodes.k8s.fr-par.scw.cloud",
"id": "f50f0126-b994-47a9-9949-b68a3ed1335b",
"name": "k8s-bal",
"organization_id": "xxx",
"region": "fr-par",
"status": "creating",
"sub_status": "no_details",
"tags": [],
"updated_at": "2019-09-15T15:32:01.278854200Z",
"version": "1.14.8"
}
Note down the cluster ID, and store it in a varialbe (like CLUSTER_ID
) for future use, so that you can check the status and get the kubeconfig file:
curl -XGET -H "X-Auth-Token: ${SCALEWAY_TOKEN}" "https://api.scaleway.com/k8s/v1beta2/regions/fr-par/clusters/${CLUSTER_ID}"
curl -XGET -H "X-Auth-Token: ${SCALEWAY_TOKEN}" "https://api.scaleway.com/k8s/v1beta2/regions/fr-par/clusters/${CLUSTER_ID}/kubeconfig?dl=1" > /tmp/mycluster.kubeconfig
Once the cluster is ready, we can check connectivity with kubectl
:
export KUBECONFIG=/tmp/mycluster.kubeconfig
kubectl cluster-info
Kubernetes master is running at https://f50f0126-b994-47a9-9949-b68a3ed1335b.api.k8s.fr-par.scw.cloud:6443
CoreDNS is running at https://f50f0126-b994-47a9-9949-b68a3ed1335b.api.k8s.fr-par.scw.cloud:6443/api/v1/namespaces/kube-system/services/coredns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
Testing the traefik ingress controller
Let's create a simple Ingress for the Treafik dashboard deployed for us by Scaleway to test it and the Traefik Ingress Controller.
First, generate a file which will contain the basic auth credentials with htpasswd
(apache2-utils package on Debian/Ubuntu), with the -B
option which forces bcrypt instead of the horribly outdated and insecure MD5 used by default, and create a Kubernetes secret in the kube-system
namespace based on it :
htpasswd -c -B ./auth admin
New password:
Re-type new password:
Adding password for user admin
kubectl create secret generic traefik-dash-basic-auth --from-file auth --namespace=kube-system
Then, create an Ingress object for the Traefik dashboard, swapping XXX
in the host field with your cluster id:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: traefik-web-ui
namespace: kube-system
annotations:
traefik.ingress.kubernetes.io/auth-type: basic
traefik.ingress.kubernetes.io/auth-secret: traefik-dash-basic-auth
spec:
rules:
- host: dashboard.XXX.nodes.k8s.fr-par.scw.cloud
http:
paths:
- path:
backend:
serviceName: ingress-traefik
servicePort: admin
and apply it:
kubectl apply -n=kube-system -f traefik-ingress.yml
After that, going to dashboard.XXX.nodes.k8s.fr-par.scw.cloud should get you to the traefik dashboard, protected by the username and password you created with htpasswd
.
So our cluster is up and running, and so is the Traefik ingress.
Kapsule with Scaleway's Container Registry
Scaleway provide a managed container registry service, called simply "Container Registry". It's rather barebones - no security scanning or anything, just a basic Docker Registry that is separated in namespaces
which have regionally unique names, can't be recreated and can either be publicly accessible or private. If they are private, accessing them requires Scaleway API tokens, which aren't scoped and thus give full read/write access to the Scaleway API, which isn't great. You only pay for storage (0.025/GB/month) and traffic outside of the same region (a Kapsule cluster in fr-par pulling from a Container Registry in fr-par will cost nothing).
To use it, you only need to create a namespace(more details on that below), docker login
, docker build
/ docker tag
your image, and docker push
it, like this:
docker login rg.fr-par.scw.cloud -u anyuser -p ${SCALEWAY_TOKEN}
docker build . -t rg.fr-par.scw.cloud/sofixa/golang-example-web-server-k8s:latest
docker push rg.fr-par.scw.cloud/sofixa/golang-example-web-server-k8s
To be able to use the container image created from inside Kapsule, if your registry is private, you need to configure the registry authentication on your cluster (link to the official docs), by first creating the secret in your namespace with the Scaleway token:
kubectl -n=<your-namespace> create secret docker-registry regcred --docker-server=rg.fr-par.scw.cloud --docker-username=anyuser --docker-password=${SCALEWAY_TOKEN} --docker-email=<your-email>
And then using imagePullSecrets
in your template spec on Deployments/Daemon Sets/Stateful Sets/Pods:
spec:
selector:
matchLabels:
name: golang-test # Label selector that determines which Pods belong to the DaemonSet
template:
metadata:
labels:
name: golang-test # Pod template's label selector
spec:
containers:
- name: golang-test
image: rg.fr-par.scw.cloud/xxx/golang-example-web-server-k8s:1.0
imagePullSecrets:
- name: regcred
Kapsule with Scaleway's Load balancer
Scaleway's Load Balancer service is an active-passive Load balancer that supports multiple frontends (listeners) and multiple backends (targets), which can be of different types, with pre-defined healthchecks for backends like PostgreSQL, MySQL, Redis, LDAP or plain old TCP or HTTP. It can do SSL offloading (via automatically provisioned Let's Encrypt certificates) or SSL passthrough (by configuring a frontend and backend with TCP/443).
To create a Scaleway Load Balancer from inside Kapsule, you need to create a Service of Type LoadBalancer
, like this:
apiVersion: v1
kind: Service
metadata:
name: golang-test
spec:
selector:
name: golang-test
ports:
- port: 80
targetPort: 80
type: LoadBalancer
Which creates a LB with one frontend on port 80, one backend with all Kapsule nodes containing your service (targeting it on port 80), with round-robin load balancing (the default). The Scaleway Cloud Controller doesn't seem to be fully documented yet, so here is a list of the available annotations (kudos to ben from Scaleway's Community Slack for sharing them):
service.beta.kubernetes.io/scw-loadbalancer-forward-port-algorithm #annotation to choose the load balancing algorithm
service.beta.kubernetes.io/scw-loadbalancer-sticky-sessions #annotation to enable cookie-based session persistence
service.beta.kubernetes.io/scw-loadbalancer-sticky-sessions-cookie-name #annotation for the cookie name for sticky sessions
service.beta.kubernetes.io/scw-loadbalancer-health-check-type #health check used
service.beta.kubernetes.io/scw-loadbalancer-health-check-delay #time between two consecutive health checks
service.beta.kubernetes.io/scw-loadbalancer-health-check-timeout #additional check timeout, after the connection has been already established
service.beta.kubernetes.io/scw-loadbalancer-health-check-max-retries #number of consecutive unsuccessful health checks, after wich the server will be considered dead
service.beta.kubernetes.io/scw-loadbalancer-health-check-http-uri #URI that is used by the "http" health check
service.beta.kubernetes.io/scw-loadbalancer-health-check-http-method #method used by the "http" health check
service.beta.kubernetes.io/scw-loadbalancer-health-check-http-code #HTTP code that the "http" health check will be matching against
service.beta.kubernetes.io/scw-loadbalancer-health-check-mysql-user #MySQL user used to check the MySQL connection when using the "mysql" health check
service.beta.kubernetes.io/scw-loadbalancer-health-check-pgsql-user #PgSQL user used to check the PgSQL connection when using the "pgsql" health check
service.beta.kubernetes.io/scw-loadbalancer-send-proxy-v2 #annotation that enables PROXY protocol version 2 (must be supported by backend servers)
service.beta.kubernetes.io/scw-loadbalancer-timeout-server # maximum server connection inactivity time
service.beta.kubernetes.io/scw-loadbalancer-timeout-connect #maximum initical server connection establishment time
service.beta.kubernetes.io/scw-loadbalancer-timeout-tunnel #maximum tunnel inactivity time
service.beta.kubernetes.io/scw-loadbalancer-on-marked-down-action# annotation that modifes what occurs when a backend server is marked down
Details about them and their possible values are available on the Load Balancer API docs.
Cluster upgrades
One of the most interesting features of Kapsule is the possibility to do rolling cluster upgrades, let's test it!
We'll create a random service behind a Load Balancer to test if our service stays up during the upgrade, as it should. To see what's going on, the service will based on a DaemonSet (a pod running on each node) with a simple Golang http server which prints the node hostname. Along the way we'll test Scaleway's Load Balancer and Container Registry services.
Prerequisites
First, create your container registry with a similar JSON file, editing the appropriate lines:
{
"name": "xxx",
"description": "My awesome container registry",
"organization_id": "xxx",
"is_public": false
}
and POST it to Scaleway's API endpoint for Container Registry namespaces:
https://api.scaleway.com/registry/v1/regions/{region}/namespaces
curl -XPOST -H "X-Auth-Token: ${SCALEWAY_TOKEN}" -d @data.json https://api.scaleway.com/registry/v1/regions/fr-par/namespaces
Second, clone the example code in my GitHub repository , build its Docker container, and push it to your registry:
git clone git@github.com:sofixa/golang-example-web-server-k8s.git
cd golang-example-web-server-k8s/
docker build . -t rg.fr-par.scw.cloud/xxx/golang-example-web-server-k8s:1.0
docker push rg.fr-par.scw.cloud/xxx/golang-example-web-server-k8s
Once the image is successfully uploaded to the registry, let's create a DaemonSet from it and a Load Balancer in front with the manifest files (based on the examples from the Load Balancer and the Container registry sections) in the same repository ( the DaemonSet manifest will try to use a regcred
secret to authentify to your registry, so if your registry is public, you should remove the imagePullSecrets
part from it):
kubectl create namespace test-golang
namespace/test-golang created
kubectl apply -n=test-golang -f daemon-set.yml
daemonset.apps/golang-test created
kubectl apply -n=test-golang -f load-balancer.yml
service/golang-test created
To get the Load Balancer's external IP, get
the corresponding service and copy the EXTERNAL-IP
:
kubectl get svc -n=test-golang
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
golang-test LoadBalancer 10.38.179.245 51.159.25.121 80:31120/TCP 5d23h
Once the load balancer is ready, curl
-ing it's public IP should get you a 200
response code with a similar response body:
curl -v http://51.159.25.121
* Rebuilt URL to: http://51.159.25.121/
* Trying 51.159.25.121...
* TCP_NODELAY set
* Connected to 51.159.25.121 (51.159.25.121) port 80 (#0)
> GET / HTTP/1.1
> Host: 51.159.25.121
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Mon, 11 Nov 2019 20:11:57 GMT
< Content-Length: 65
< Content-Type: text/plain; charset=utf-8
<
Hello world from scw-k8s-test-default-16b716e81c7a4!
* Connection #0 to host 51.159.25.121 left intact
Notice the response, which contains the node name, composed of the following:
- cluster name
k8s-test
- node pool name
default
- node id
16b716e81c7a4
The test
Now, let's set up vegeta, a HTTP load testing tool, to test if our cluster continues to respond during a cluster upgrade. Follow the installation instructions, and do a quick test on your load balancer's IP:
echo "GET http://51.159.25.121" | vegeta attack -rate=1/s
wResult Attack
SeqCode TimestampLatencBytesOutBytesInError
Body
Timeb���6P<bAAHello world from scw-k8s-test-default-3cb12e55883e4!
d���6*<H.�AAHello world from scw-k8s-test-default-3cb12e55883e4!
Everything seems to be working (ignore the binary output, vegeta output isn't meant to be parsed directly by humans), you can quit/Ctrl+C the process.
You can now launch a continuous "attack" in vegeta parlance, storing the results in a json file:
echo "GET http://51.159.25.121" | vegeta attack -rate=1/s | vegeta encode > results.json
cat
-ing the file should result in a similar output, which contains all sorts of useful information for a normal load test (the main purpose of vegeta), but for our test case, only the response code and body (which is base64 encoded) matter:
cat results.json | jq
"attack": "",
"seq": 0,
"code": 200,
"timestamp": "2019-11-11T21:24:15.375938851+01:00",
"latency": 21093314,
"bytes_out": 0,
"bytes_in": 65,
"error": "",
"body": "SGVsbG8gd29ybGQgZnJvbSBzY3ctazhzLWVsb3F1ZW50LWxlaG1hbm4tZGVmYXVsdC0zY2IxMmU1NTg4M2U0IQo="
}
A base64decode
/ base64 -d
on that body will get the same result as a bare curl:
head -n 1 results.json | jq .body | base64 -i -d
Hello world from scw-k8s-test-default-3cb12e55883e4!
Once the vegeta load test is running, time to launch an upgrade to the Kapsule cluster.
The upgrade
WARNING Always read the full Kubernetes release notes when doing an upgrade. API's get depreceated, breaking changes happen, stuff is no longer compatible, you should always check.
Kubernetes cluster upgrades are done in two main stages:
- the control plane and all its components
- the worker node pools, a node at a time
During the control plane upgrade, it remains accessible (worst case scenario you might get an EOF or two).
For a worker node to be updated, it needs to be drained from all pods. Scaleway do this for you automatically, but for that to happen there must be enough resources on the other nodes for all the pods required to be scheduled.
Usually you can only upgrade a minor version at a time (e.g. 1.14.x to 1.15.x, but not 1.16.x directly), and this is the case with Scaleway. In any case, doing the upgrade over API is cooler, so that's what we'll do. All that's required is a small JSON file with the version wanted (remember, only patches or a single minor version upwards are supported) and a boolean wether we want to upgrade the nodes as well (duh).
{
"version": "1.15.5",
"upgrade_pools": true
}
All that's left to do is to POST that to the /upgrade endpoint:
curl -XPOST -d @data.json -H "X-Auth-Token: ${SCALEWAY_TOKEN}" "https://api.scaleway.com/k8s/v1beta2/regions/fr-par/clusters/${CLUSTER_ID}/upgrade
Which would get a similar result:
{
"region": "fr-par",
"id": "ae336b4a-7875-4150-8b1c-ed1e4c9f3b2b",
"organization_id": "37a7df83-e2f2-43aa-a181-170a52aec2ac",
"created_at": "2019-11-11T19:46:54.261230Z",
"updated_at": "2019-11-11T20:49:16.331469260Z",
"name": "k8s-test",
"description": "",
"cluster_ip": "",
"cluster_port": 0,
"current_node_count": 2,
"status": "updating",
"sub_status": "deploy_controlplane",
"version": "1.15.5",
"cni": "calico",
"tags": [],
"current_core_count": 8,
"current_mem_count": 34359738368,
"cluster_url": "https://ae336b4a-7875-4150-8b1c-ed1e4c9f3b2b.api.k8s.fr-par.scw.cloud:6443",
"dns_wildcard": "*.ae336b4a-7875-4150-8b1c-ed1e4c9f3b2b.nodes.k8s.fr-par.scw.cloud",
"autoscaler_config": {
"scale_down_disable": false,
"scale_down_delay_after_add": "10m",
"estimator": "binpacking",
"expander": "random",
"ignore_daemonsets_utilization": false,
"balance_similar_node_groups": false,
"expendable_pods_priority_cutoff": -10
}
}
To check the status of the upgrade, you can use GET
the status of the cluster and the node pools via the Scaleway API on the cluster
, pools
and nodes
endpoints or kubectl get nodes
:
curl -XGET -H "X-Auth-Token: ${SCALEWAY_TOKEN}" "https://api.scaleway.com/k8s/v1beta2/regions/fr-par/clusters/${CLUSTER_ID}"
{
"region": "fr-par",
"id": "ae336b4a-7875-4150-8b1c-ed1e4c9f3b2b",
"organization_id": "37a7df83-e2f2-43aa-a181-170a52aec2ac",
"created_at": "2019-11-11T19:46:54.261230Z",
"updated_at": "2019-11-11T21:04:47.430331Z",
"name": "k8s-test",
"description": "",
"cluster_ip": "",
"cluster_port": 0,
"current_node_count": 2,
"status": "ready",
"sub_status": "deploy_controlplane",
"version": "1.15.5",
"cni": "calico",
"tags": [],
"current_core_count": 8,
"current_mem_count": 34359738368,
"cluster_url": "https://ae336b4a-7875-4150-8b1c-ed1e4c9f3b2b.api.k8s.fr-par.scw.cloud:6443",
"dns_wildcard": "*.ae336b4a-7875-4150-8b1c-ed1e4c9f3b2b.nodes.k8s.fr-par.scw.cloud",
"autoscaler_config": {
"scale_down_disable": false,
"scale_down_delay_after_add": "10m",
"estimator": "binpacking",
"expander": "random",
"ignore_daemonsets_utilization": false,
"balance_similar_node_groups": false,
"expendable_pods_priority_cutoff": -10
}
}
curl -XGET -H "X-Auth-Token: ${SCALEWAY_TOKEN}" "https://api.scaleway.com/k8s/v1beta2/regions/fr-par/clusters/${CLUSTER_ID}/pools"
{
"total_count": 1,
"pools": [
{
"region": "fr-par",
"id": "feb2c164-8805-4130-b5d3-57889dc35652",
"cluster_id": "ae336b4a-7875-4150-8b1c-ed1e4c9f3b2b",
"created_at": "2019-11-11T19:46:54.266926Z",
"updated_at": "2019-11-11T21:02:09.443695Z",
"name": "default",
"current_node_count": 2,
"status": "updating",
"version": "1.16.2",
"commercial_type": "gp1_xs",
"autoscaling": false,
"size": 2,
"min_size": 2,
"max_size": 2,
"current_core_count": 8,
"current_mem_count": 34359738368,
"container_runtime": "docker",
"autohealing": false
}
]
}
curl -XGET -H "X-Auth-Token: ${SCALEWAY_TOKEN}" "https://api.scaleway.com/k8s/v1beta2/regions/fr-par/clusters/${CLUSTER_ID}/nodes"
{
"total_count": 2,
"nodes": [
{
"region": "fr-par",
"id": "16b716e8-1c7a-4486-9861-067717cd44ea",
"cluster_id": "ae336b4a-7875-4150-8b1c-ed1e4c9f3b2b",
"created_at": "2019-11-11T19:47:52.240391Z",
"updated_at": "2019-11-11T21:06:40.605505Z",
"pool_id": "feb2c164-8805-4130-b5d3-57889dc35652",
"status": "ready",
"npd_status": {
"DiskPressure": "False",
"KernelDeadlock": "False",
"MemoryPressure": "False",
"NetworkUnavailable": "False",
"PIDPressure": "False",
"Ready": "True"
},
"name": "scw-k8s-test-default-16b716e81c7a4",
"public_ip_v4": "51.158.69.231",
"public_ip_v6": null
},
{
"region": "fr-par",
"id": "3cb12e55-883e-404e-9617-9716a1ab22aa",
"cluster_id": "ae336b4a-7875-4150-8b1c-ed1e4c9f3b2b",
"created_at": "2019-11-11T19:47:54.410118Z",
"updated_at": "2019-11-11T21:06:40.732254Z",
"pool_id": "feb2c164-8805-4130-b5d3-57889dc35652",
"status": "notready",
"npd_status": {
"DiskPressure": "Unknown",
"KernelDeadlock": "False",
"MemoryPressure": "Unknown",
"NetworkUnavailable": "False",
"PIDPressure": "Unknown",
"Ready": "Unknown"
},
"name": "scw-k8s-test-default-3cb12e55883e4",
"public_ip_v4": "51.15.217.30",
"public_ip_v6": null
}
]
}
kubectl get nodes
NAME STATUS ROLES AGE VERSION
scw-k8s-test-default-16b716e81c7a4 Ready <none> 77m v1.15.5
scw-k8s-test-default-3cb12e55883e4 NotReady,SchedulingDisabled <none> 76m v1.14.8
Upgrades are pretty quick (<5 minutes) if your cluster is empty, as was mine, but as usual, your mileage might vary.
However, they aren't fully non-disruptive - checking the negative results of the vegeta run, we see there were a few timeouts:
less results.json | jq '. | select (.code != 200)'
{
"attack": "",
"seq": 41,
"code": 0,
"timestamp": "2019-11-11T22:02:41.187717035+01:00",
"latency": 30000794729,
"bytes_out": 0,
"bytes_in": 0,
"error": "Get http://51.159.25.121: net/http: request canceled (Client.Timeout exceeded while awaiting headers)",
"body": null
}
{
"attack": "",
"seq": 42,
"code": 0,
"timestamp": "2019-11-11T22:02:42.187457957+01:00",
"latency": 30000935369,
"bytes_out": 0,
"bytes_in": 0,
"error": "Get http://51.159.25.121: net/http: request canceled (Client.Timeout exceeded while awaiting headers)",
"body": null
}
{
"attack": "",
"seq": 43,
"code": 0,
"timestamp": "2019-11-11T22:02:43.187827619+01:00",
"latency": 30000555385,
"bytes_out": 0,
"bytes_in": 0,
"error": "Get http://51.159.25.121: net/http: request canceled (Client.Timeout exceeded while awaiting headers)",
"body": null
}
It looks like the Load balancer's health check isn't frequent enough to detect that the backend is down during upgrades. The Load Balancer API docs show that it is possible to tweak the delay between checks (not available via the web UI though). As mentioned before, there are a few (not yet publicly documented) annotations we can use on our service object for that (the available choices and detailed explanations are available on the Load Balancer API page):
service.beta.kubernetes.io/scw-loadbalancer-health-check-delay # time between two consecutive health checks, in milliseconds
service.beta.kubernetes.io/scw-loadbalancer-health-check-timeout # additional check timeout, after the connection has been already established
service.beta.kubernetes.io/scw-loadbalancer-health-check-max-retries # number of consecutive unsuccessful health checks, after wich the server will be considered dead
This is what our Service object could look like (example values, the precise ones will vary depending on your environment):
apiVersion: v1
kind: Service
metadata:
name: golang-test
annotations:
service.beta.kubernetes.io/scw-loadbalancer-health-check-delay: 10000 # a check every 10s
service.beta.kubernetes.io/scw-loadbalancer-health-check-timeout: 3000 # a 3s timeout, which can be dangerously low
service.beta.kubernetes.io/scw-loadbalancer-health-check-max-retries: 2 # 2 retries
spec:
selector:
name: golang-test
ports:
- port: 80
targetPort: 80
type: LoadBalancer
Summary
Pros:
- quick to provision and good provisioning tooling (API, terraform)
- good amount of features/integrations (cluster upgrades, autohealing, optional installation of dashboard, ingress for Kapsule, automatic cetrs via Let's Encrypt and various healthchecks for Load Balancer)
- Container Registry is good enough and inexpensive (0.025 eur/GB/month for storage and 0.03 eur/GB/month traffic outside of the same region)
- Kubernetes versions come out pretty quickly (~1 week for 1.16)
- ecosystem is good - Scaleway have managed Databases, Object and Block storage, Load Balancers (which will soon be multi-cloud) and a few potentially very interesting services in beta/preview like Serverless Functions, VPCs, IoT Hub, AI Inference, Domains (which includes promissing features like "Dynamic routing capabilities: balance traffic depending on resource health and more to come" ), striking, for me, a fine balance between more basic platforms (Linode, Vultr, Digital Ocean to an extent) and full featured clouds (AWS, GCP, Azure) while still retaining pricing closer to the former
Cons:
- only relatively expensive instance types are available (starting at 40 eur/month), for now, making Kapsule less affordable compared to the main competitors - hopefully that will change soon, and we might even get bare metal nodes one day
- Container Registry auth requires a full Scaleway token, which gives full read/write API access - read-only tokens would be a cool addition
- documentation is somewhat lacking in some areas
- there's no way to monitor/get statistics from services like Load Balancer or Container Registry; for Kapsule Heapster and metrics-server are preinstalled, so there's access to Kubernetes-level statistics, but nothing else
- there's no way to create and manage Scaleway ressources (other than Load Balancer and block storage) from inside Kapsule, like with GCP's Cloud Connector (Note: you can probably use AppsCode Kubeform for that, since it's just a Kubernetes CRD wrapper around terraform providers, and Scaleway's terraform provider is pretty decent)
- cluster upgrades can result in small (a few seconds) downtimes with the default configuration, some fine tuning is required
So, in conclusion, Kapsule and the Scaleway stack are very interesting and evolving quickly, but come with some rough edges (well, Kapsule is still in beta, so completely normal). Once those are polished, Scaleway Kapsule and the rest of their ecosystem will make for a very compelling development / small-to-mid scale cloud environment, with just the right amount of managed services and features for a bargain price, and EU data sovereignty, which can be a requirement/plus in some cases.
Top comments (0)