Joseph D. Marhee

Posted on Jan 31, 2019

Automating Kubernetes Node Pools with Terraform

#kubernetes #terraform #systemadministration #docker

When you want to add, remove, or resize the node pool in Kubernetes, the ability to quickly bootstrap these nodes and have them rejoin the cluster is pretty important to be able to do in a repeatable, time-efficient way.

Terraform manages state in such a way that deploying your nodes as part of a pool is one reason I love using it to assist in automating cluster operations. In my Terraform variables, for example, I might have variables like kube_version or count that can change, and when you plan and apply your new state, Terraform will attempt to reconcile this state change with your provider.

In the example of this Terraform project that deploys to DigitalOcean:

https://bitbucket.org/jmarhee/cousteau/src

which you can read more about using this repo here, and in kube-node.tf, we can modify it to define a pool like this:

data "template_file" "node" {
  template = "${file("${path.module}/node.tpl")}"

  vars {
    kube_token      = "${random_string.kube_init_token_a.result}.${random_string.kube_init_token_b.result}"
    primary_node_ip = "${digitalocean_droplet.k8s_primary.ipv4_address}"
    kube_version    = "${var.kubernetes_version}"
  }
}

resource "digitalocean_droplet" "k8s_node" {
  name               = "${format("pool1-%02d", count.index)}"
  image              = "ubuntu-16-04-x64"
  count              = "${var.count}"
  size               = "${var.primary_size}"
  region             = "${var.region}"
  private_networking = "true"
  ssh_keys           = "${var.ssh_key_fingerprints}"
  user_data          = "${data.template_file.node.rendered}"
}

to create a pool of nodes, size of ${var.count} nodes, with the name pool1.

When you run terraform plan, you get a state that reflects this:

  + digitalocean_droplet.k8s_node_standby
      id:                   <computed>
      backups:              "false"
      disk:                 <computed>
      image:                "ubuntu-16-04-x64"
      ipv4_address:         <computed>
      ipv4_address_private: <computed>
      ipv6:                 "false"
      ipv6_address:         <computed>
      ipv6_address_private: <computed>
      locked:               <computed>
      memory:               <computed>
      monitoring:           "false"
      name:                 "poolv1-00"
      price_hourly:         <computed>
      price_monthly:        <computed>
      private_networking:   "true"
      region:               "tor1"
      resize_disk:          "true"
      size:                 "4gb"
      ssh_keys.#:           "1"
      ssh_keys:  ""
      status:               <computed>
      user_data:            ""
      vcpus:                <computed>
      volume_ids.#:         <computed>

and when you plan and apply with count variables adjusted, it will add or remove resources of that type accordingly.

However, one great feature of Kubernetes is the ability to cordon and drain nodes that you wish to remove from rotation, so let's say you'd like to deploy a new node pool using a new kube_version value (i.e. upgrading from v1.11 to 1.12). In node.tf, add a new pool, call it poolv2 like this:

resource "digitalocean_droplet" "k8s_node" {
  name               = "${format("pool2-%02d", count.index)}"
  image              = "ubuntu-16-04-x64"
  count              = "${var.count}"
  size               = "${var.primary_size}"
  region             = "${var.region}"
  private_networking = "true"
  ssh_keys           = "${var.ssh_key_fingerprints}"
  user_data          = "${data.template_file.node.rendered}"
}

Plan, and apply, and you'll have two sets of nodes, those named pool1, and those named pool2:

# kubectl get nodes
NAME                              STATUS     ROLES    AGE     VERSION
digitalocean-k8s-pool1-00          Ready      <none>   4m45s   v1.12.3
digitalocean-k8s-pool1-01          Ready      <none>   4m42s   v1.12.3
digitalocean-k8s-pool1-02          Ready      <none>   4m48s   v1.12.3
digitalocean-k8s-pool1-03          Ready      <none>   4m54s   v1.12.3
digitalocean-k8s-pool1-04          Ready      <none>   4m51s   v1.12.3
digitalocean-k8s-pool2-00          NotReady   <none>   3m27s   v1.13.1
digitalocean-k8s-primary           Ready      master   7m1s    v1.13.1

On your Kubernetes cluster, at this point, you can do an operation like this:

kubectl cordon digitalocean-k8s-pool1-00

on all of the pool1 nodes, and then, once they are all cordoned, drain them to move the workloads to the new pool to avoid having to force Kubernetes to reschedule resources (though this is a valid pattern as well):

kubectl drain digitalocean-k8s-node-00 --ignore-daemonsets

When you do an operation like:

kubectl describe node digitalocean-k8s-node-00

You should see Pods terminating, or gone completely, and see them scheduled on the pool2 nodes.

At this point, you can set the count parameter back in node.tf to 0, and then plan and apply Terraform to terminate this pool.

To make this configuration more robust, let's take the example of blue/green deployments, and assume we'll always have, at least, two pools and modify node.tf to look like this:

data "template_file" "node" {
  template = "${file("${path.module}/node.tpl")}"

  vars {
    kube_token      = "${random_string.kube_init_token_a.result}.${random_string.kube_init_token_b.result}"
    primary_node_ip = "${digitalocean_droplet.k8s_primary.ipv4_address}"
    kube_version    = "${var.kubernetes_version}"
  }
}

resource "digitalocean_droplet" "k8s_node_pool_blue" {
  name               = "${format("${var.cluster_name}-node-blue-%02d", count.index)}"
  image              = "ubuntu-16-04-x64"
  count              = "${var.count_blue}"
  size               = "${var.primary_size}"
  region             = "${var.region}"
  private_networking = "true"
  ssh_keys           = "${var.ssh_key_fingerprints}"
  user_data          = "${data.template_file.node.rendered}"
}

resource "digitalocean_droplet" "k8s_node_pool_green" {
  name               = "${format("${var.cluster_name}-node-green-%02d", count.index)}"
  image              = "ubuntu-16-04-x64"
  count              = "${var.count_green}"
  size               = "${var.primary_size}"
  region             = "${var.region}"
  private_networking = "true"
  ssh_keys           = "${var.ssh_key_fingerprints}"
  user_data          = "${data.template_file.node.rendered}"
}

and update vars.tf to handle this more sensitively:

-variable "count" {
-  default     = "3"
-  description = "Number of nodes."
+variable "count_blue" {
+  description = "Number of nodes in pool blue."
+}
+
+variable "count_green" {
+  description = "Number of nodes in pool green."

so we have a count_blue and count_green to manage these pool sizes, and manage this in your terraform.tfvars file to scale up and down. Your new process would be to, for example, if pool_blue is active at 3 nodes, scale pool_green up to 3 as well, cordon and drain pool_blue's nodes, verify services are up on pool_green, and then scale pool_blue to 0 and apply.

This, for example, would be a good refactor candidate; pools could be managed as a Terraform module, and then you can run multiple types of pools, for example, using different instance types (high CPU, high performance storage, etc.) and take a lot of these options that apply to all node pools in our example into a new template state.

Top comments (2)

Sai pavan • Feb 26 '20

Is this option available in Google Cloud GKE??

Joseph D. Marhee • Feb 29 '20

Google Cloud does have a Terraform provider, so presumably, yes, you can adapt this to use the Google provider, or the cloud provider of your choosing, and replicate this behavior.

DEV Community

Automating Kubernetes Node Pools with Terraform

Further Reading

Top comments (2)

Read next

Resolving the "Request Entity Too Large" Error in Helm Deployments: Effective Strategies and Solutions

Build, Publish, Secure: AWS CodePipeline Now Simplifies ECR Publishing and Vulnerability Scans

Using gVisor's container runtime in Docker Desktop

Dockerfile Anti-Patterns: What Not to Do