- Initial thoughts
- Variables
- Providers and locals
- The enclosing VPC network
- The actual Kubernetes cluster
- Docker registries
- S3 application bucket
- Outputs
- Wrapping up
- Further reading
Initial thoughts
Terraform is an infrastructure-as-code tool that lets you build, change, and version cloud and on-prem resources safely and efficiently.
An AWS spot instance is an instance that uses spare EC2 capacity that is available for less than the On-Demand price. Because Spot Instances enable you to request unused EC2 instances at steep discounts, you can lower your Amazon EC2 costs significantly. And Kubernetes is a perfect candidate to use unstable virtual machines.
Surprisingly, complete Terraform examples using multiple kinds of spot instances are hard to find on the internet, since EKS Module version 18 parameters rework. This blog post is there to fill that hole.
We detail here how to deploy, using Terraform, an EKS Cluster with the following characteristics:
- A VPC network with private and public subnets using a gateway
- A Kubernetes cluster using mixed type spot instances
- Docker registries
For below file blocks to work, you need to know basics of Terraform, and have aws-cli configured on a profile named after the cluster. But if you prefer sticking to the default aws profile, remove --profile
pieces of code below and everything will be fine.
Variables
Here are variables used by most resources described below. All are simple values except var.aws_auth_users
variable "region" {
description = "Cluster region"
default = "eu-west-x"
}
variable "cluster_name" {
description = "Name of the EKS cluster"
default = "my-project"
}
variable "kubernetes_version" {
description = "Cluster Kubernetes version"
default = "1.24"
}
# Being in this list is required to see Kubernetes resources in AWS console
variable "aws_auth_users" {
description = "Developers with access to the dev K8S cluster and the container registries"
default = [
{
userarn = "arn:aws:iam::xxx:user/user.name1"
username = "user.name1"
groups = ["system:masters"]
},
{
userarn = "arn:aws:iam::xxx:user/user.name2"
username = "user.name2"
groups = ["system:masters"]
}
]
}
Providers and locals
Let's set providers and common tags.
provider "aws" {
region = var.region
# prerequisite locally: aws configure --profile <cluster-name>
profile = var.cluster_name
}
provider "kubernetes" {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
# This requires the awscli to be installed locally where Terraform is executed
args = ["--profile", var.cluster_name, "eks", "get-token", "--cluster-name", var.cluster_name]
}
}
locals {
tags = {
Environment = "NON-PROD"
creation-date = "01/02/2023" # a variable would update the value on each tf apply
}
}
data "aws_caller_identity" "current" {}
The enclosing VPC network
Here is a sample Terraform block for the VPC network where the Kubernetes cluster will be created.
A few notes:
- a secured network is composed of a private and public subnets, one in each availability zone of the chosen region.
- be careful to have enough IP available in ranges for your needs ; here it can fit 8,192 IPs
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 3.0"
name = var.cluster_name
cidr = "10.0.0.0/16" # Last IP: 10.0.255.255
azs = ["${var.region}a", "${var.region}b", "${var.region}c"]
# use https://www.ipaddressguide.com/cidr
# /19: 8,192 IPs
private_subnets = ["10.0.0.0/19", "10.0.32.0/19", "10.0.64.0/19"] # No hole in IP ranges
public_subnets = ["10.0.96.0/19", "10.0.128.0/19", "10.0.160.0/19"] # No hole in IP ranges
enable_nat_gateway = true
single_nat_gateway = true
enable_dns_hostnames = true
public_subnet_tags = {
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
"kubernetes.io/role/elb" = "1"
}
private_subnet_tags = {
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
"kubernetes.io/role/internal-elb" = "1"
}
tags = local.tags
}
The actual Kubernetes cluster
Now defining the main dish: the Kubernetes cluster.
A few notes:
- The cluster addons used are mandatory for the cluster to work properly
- The coredns addon may fail on first apply, just apply again (the timeout is customized to avoid waiting for too long)
- AWS users with access right to the cluster are defined in
var.aws_auth_users
, an example is show later in this article - Security groups are simplified, and may be adjusted for your security needs:
- no access from internet (except using an ingress controller)
- full access node to node
- full access from nodes to internet
- EC2 used for workers are t3 and t3a spot instances. Mixed instance types ensure no starvation of nodes when AWS runs out of one type
- A commented out
fulltime-az-a
would allow to create also on-demands instances if uncommented and adapted to your needs - Nodes are created only in one availability zone. In a production environment, use at least 2 availability zones, by creating a
spot-az-b
similar tospot-az-a
. Zone to zone network is not free and, in this example, development environment potential downtime is acceptable
module "eks" {
source = "terraform-aws-modules/eks/aws"
cluster_name = var.cluster_name
cluster_version = var.kubernetes_version
cluster_endpoint_private_access = true
cluster_endpoint_public_access = true
cluster_addons = {
coredns = {
most_recent = true
timeouts = {
create = "2m" # default 20m. Times out on first launch while being effectively created
}
}
kube-proxy = {
most_recent = true
}
vpc-cni = {
most_recent = true
}
aws-ebs-csi-driver = {
most_recent = true
}
}
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
# Self managed node groups will not automatically create the aws-auth configmap so we need to
create_aws_auth_configmap = true
manage_aws_auth_configmap = true
aws_auth_users = var.aws_auth_users
enable_irsa = true
node_security_group_additional_rules = {
ingress_self_all = {
description = "Node to node all ports/protocols"
protocol = "-1"
from_port = 0
to_port = 0
type = "ingress"
self = true
}
egress_all = { # by default, only https urls can be reached from inside the cluster
description = "Node all egress"
protocol = "-1"
from_port = 0
to_port = 0
type = "egress"
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
}
}
self_managed_node_group_defaults = {
# enable discovery of autoscaling groups by cluster-autoscaler
autoscaling_group_tags = {
"k8s.io/cluster-autoscaler/enabled": true,
"k8s.io/cluster-autoscaler/${var.cluster_name}": "owned",
}
# from https://github.com/terraform-aws-modules/terraform-aws-eks/issues/2207#issuecomment-1220679414
# to avoid "waiting for a volume to be created, either by external provisioner "ebs.csi.aws.com" or manually created by system administrator"
iam_role_additional_policies = {
AmazonEBSCSIDriverPolicy = "arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy"
}
}
# possible values: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/node_groups.tf
self_managed_node_groups = {
default_node_group = {
create = false
}
# fulltime-az-a = {
# name = "fulltime-az-a"
# subnets = [module.vpc.private_subnets[0]]
# instance_type = "t3.medium"
# desired_size = 1
# bootstrap_extra_args = "--kubelet-extra-args '--node-labels=node.kubernetes.io/lifecycle=normal'"
# }
spot-az-a = {
name = "spot-az-a"
subnet_ids = [module.vpc.private_subnets[0]] # only one subnet to simplify PV usage
# availability_zones = ["${var.region}a"] # conflict with previous option. TODO try subnet_ids=null at creation (because at modification it fails)
desired_size = 2
min_size = 1
max_size = 10
bootstrap_extra_args = "--kubelet-extra-args '--node-labels=node.kubernetes.io/lifecycle=spot'"
use_mixed_instances_policy = true
mixed_instances_policy = {
instances_distribution = {
on_demand_base_capacity = 0
on_demand_percentage_above_base_capacity = 0
spot_allocation_strategy = "lowest-price" # "capacity-optimized" described here: https://aws.amazon.com/blogs/compute/introducing-the-capacity-optimized-allocation-strategy-for-amazon-ec2-spot-instances/
}
override = [
{
instance_type = "t3.xlarge"
weighted_capacity = "1"
},
{
instance_type = "t3a.xlarge"
weighted_capacity = "1"
},
]
}
}
}
tags = local.tags
}
Docker registries
Most of us deploy Kubernetes clusters for custom applications, so here are AWS docker registries blocks.
Using AWS, Kubernetes and Docker registries integrate flawlessly, no need for other configuration.
resource "aws_ecr_repository" "module-a" {
name = "my-app/module-a"
}
resource "aws_ecr_repository" "module-b" {
name = "my-app/module-b"
}
resource "aws_ecr_repository" "module-c" {
name = "my-app/module-c"
}
S3 application bucket
More often than not, projects needs a S3 bucket to store files, so here is code for a secured S3 bucket, with access from the backend
kubernetes service account.
resource "aws_s3_bucket" "bucket" {
bucket = "${var.cluster_name}-bucket"
tags = local.tags
}
resource "aws_s3_bucket_acl" "bucket_acl" {
bucket = aws_s3_bucket.bucket.id
acl = "private"
}
data "aws_iam_policy_document" "role_policy" {
statement {
actions = ["sts:AssumeRoleWithWebIdentity"]
effect = "Allow"
condition {
test = "StringLike"
variable = "${replace(module.eks.cluster_oidc_issuer_url, "https://", "")}:sub"
values = ["system:serviceaccount:*:backend"] # system:serviceaccount:<K8S_NAMESPACE>:<K8S_SERVICE_ACCOUNT>
}
principals {
identifiers = ["arn:aws:iam::${var.aws_account_id}:oidc-provider/${replace(module.eks.cluster_oidc_issuer_url, "https://", "")}"]
type = "Federated"
}
}
}
data "aws_iam_policy_document" "s3_policy" {
statement {
actions = [
"s3:ListAllMyBuckets",
]
resources = [
"*",
]
}
statement {
actions = [
"s3:*",
]
resources = [
aws_s3_bucket.bucket.arn,
"${aws_s3_bucket.bucket.arn}/*"
]
}
}
resource "aws_iam_role" "role" {
assume_role_policy = data.aws_iam_policy_document.role_policy.json
name = "${var.cluster_name}-backend-role"
}
resource "aws_iam_policy" "policy" {
name = "${var.cluster_name}-backend-policy"
path = "/"
policy = data.aws_iam_policy_document.s3_policy.json
}
resource "aws_iam_role_policy_attachment" "attach" {
policy_arn = aws_iam_policy.policy.arn
role = aws_iam_role.role.name
}
You need also to:
- Create or use a kubernetes service account (
backend
in this example) used by your pods using the bucket- on your deployment, set
serviceAccount: backend
- on your deployment, set
- Annotate the kubernetes service account with:
eks.amazonaws.com/role-arn: arn:aws:iam::<AWS_ACCOUNT_ID>:role/<CLUSTER_NAME>-backend-role
Outputs
No need for any specific output, especially kube.config
, since aws-cli
can configure access to cluster. But for convenience, the aws-cli command is printed as an output.
output "update_local_context_command" {
description = "Command to update local kube context"
value = "aws --profile ${var.cluster_name} eks update-kubeconfig --name=${var.cluster_name} --alias=${var.cluster_name} --region=${var.region}"
}
Wrapping up
Getting all these pieces of Terraform code together, you should be able to deploy a cluster in one command under 20 minutes.
If you think some code might be improved, please advise in the comments 🤓
You can go further with optimizing your cluster costs by having a look at FinOps EKS: 10 tips to reduce the bill up to 90% on AWS managed Kubernetes clusters.
Wonder if it is worthy to use OVHCloud managed cluster alongside your already existing AWS clusters ? You can have a look at Managed Kubernetes: Our dev is on AWS, our prod is on OVHCloud.
Illustrations generated locally by Automatic1111 using Dark Sushi 2.5D model with Detail Tweaker LoRA
Top comments (2)
Great stuff!!! Thanks so so so much for this
Glad it helped you 🥰
Something to change / describe better in the article, or was that perfect for your use case ?