In my last blog post, I talked about how Ana Margarita Medina and I used Terraform to show off Observability-Landscape-as-Code in practice, leveraging the OpenTelemetry Demo App to do so. The Demo App showcases instrumentation of Traces and Metrics of different services written in different languages using OpenTelemetry (OTel). Our Terraform code did the following:
- Created a Kubernetes cluster
- Deployed the Demo App to Kubernetes
- Deployed OpenTelemetry Collector to Kubernetes, and configured it to send Traces and Metrics to Lightstep
- Created dashboards in Lightstep
Now, I’m a fan of beautiful code, so we organized our code using Terraform Modules. We used a module for provisioning the Kubernetes cluster, one for deploying the OTel Demo App and the OTel Collector, and one for creating the Lightstep dashboards.
We also leveraged the following Terraform Providers:
- Google Cloud Provider for spinning up a Kubernetes Cluster in Google Cloud Platform (GCP)
- The Kubernetes Provider and the Helm Provider for deploying the app to Kubernetes
- The Lightstep Provider for creating the dashboards in Lightstep
All good, right? Except for one teeeensy little problem...the last time I’d touched Terraform was in early 2021, and even then, I was just tweaking code. So I kinda had to teach myself Terraform all over again. And I hit up a few snags along the way. Cue. The. Panic.
Fortunately, Google came through, and we were able to resolve the issues. In today’s blog post, I will cover THREE Terraform gotchas that Ana and I hit, and how we solved them, so that you will hopefully be spared our utter despair and panic. 😅
Let’s do this!
NOTE: If you want to follow along to see the full Terraform source code, you can check it out here. Even though the source code is specific to the Observability-Landscape-as-Code use case, the main Terraform concepts in this blog post can be ported over to other scenarios.
Gotcha #1: The Chicken-and-Egg Scenario
After creating a Kubernetes cluster, we needed to create a Kubernetes resource before we could apply the Helm chart to install the OpenTelemetry demo app. The Demo App’s Helm Chart deploys an OpenTelemetry Collector. We wanted to configure the Collector to send OTel data to Lightstep. To do so, you need to add a Lightstep Access Token, which is stored as a Kubernetes secret.
You can learn more about the specifics of this setup here.
To create the secret in Kubernetes before running the Helm Chart, we used the Kubernetes Provider. In order to use this provider, Terraform needs to know information about your cluster, so that it knows what cluster to apply the manifest to. To do this, I needed to store the cluster information in the data
stanza, like this:
data "google_client_config" "default" {}
data "google_container_cluster" "primary" {
name = var.cluster_name
location = var.region
}
provider "kubernetes" {
host = "https://${data.google_container_cluster.primary.endpoint}"
token = data.google_client_config.default.access_token
cluster_ca_certificate = base64decode(
data.google_container_cluster.primary.master_auth.0.cluster_ca_certificate
)
}
Easy peasey, right? Unfortunately, when I ran terraform apply
, I kept getting the following errors:
Error: Invalid template interpolation value
And
Error: Attempt to index null value
Basically, Terraform was trying to evaluate the contents of the data
stanza (which were null) before it had any information about the Kubernetes cluster. Which of course it didn’t, because the cluster didn’t yet exist!! Hence the null contents.
I frantically Googled this one for a while, spinning my wheels. And then, the “aha” moment hit me, when I saw somewhere in one of my searches that I could use the depends_on
attribute in the data
stanza. So I added depends_on = [module.k8s_cluster_create]
to both my data
stanzas, which basically says, “Hey buddy, don’t try to evaluate this until AFTER the k8s_cluster_create module
(i.e. the module in which the Kubernetes cluster is created) is run. So now, after adding depends_on
, my providers.tf (lines 32-49) code looked like this:
data "google_client_config" "default" {
depends_on = [module.k8s_cluster_create]
}
data "google_container_cluster" "primary" {
depends_on = [module.k8s_cluster_create]
name = var.cluster_name
location = var.region
}
provider "kubernetes" {
host = "https://${data.google_container_cluster.primary.endpoint}"
token = data.google_client_config.default.access_token
cluster_ca_certificate = base64decode(
data.google_container_cluster.primary.master_auth.0.cluster_ca_certificate
)
}
And after making that change, all was well with the world. Huzzah!
Gotcha #2: Using Modules with depends_on
While the above problem went away, I then found myself face-to-face with yet another conundrum. When I initially wrote my Terraform code, everything was in one big file, and it worked just fine. So OF COURSE I just assumed that when I prettified my code and moved things into modules, I could just get away defining my Providers in the Modules themselves. Well, you can. That is…if you don’t use the depends_on
attribute in your Module call.
So basically, when I tried to say that the Module lightstep_dashboards
depended on k8s_cluster_create
like this:
module "k8s_cluster_create" {
source = "./modules/k8s"
cluster_name = var.cluster_name
project_id = var.project_id
region = var.region
network = var.network
subnet = var.subnet
}
module "deploy_otel_demo_app" {
source = "./modules/otel_demo_app"
otel_demo_namespace = var.otel_demo_namespace
ls_access_token = var.ls_access_token
cluster_name = var.cluster_name
project_id = var.project_id
region = var.region
network = var.network
subnet = var.subnet
}
module "lightstep_dashboards" {
source = "./modules/lightstep"
depends_on = [module.k8s_cluster_create]
lightstep_project = var.ls_project
}
I kept getting this error when I ran terraform apply
:
Error: Module is incompatible with count, for_each and depends_on
This error happens when the Child Module contains a provider
block and the Module that you’re trying to call is using count
, depends_on
, and/or for_each
. Why? Because provider
blocks inside a Child Module are not allowed when your Module call is using count
, depends_on
, and/or for_each
. You can read up more on this here.
Well, it turns out that correct practice is to define your provider
block in the Root Module, as Providers are automagically passed down to the Child Modules. So to make the above error go away, I moved all of my Provider definitions to the Root Module, and was able to keep depends_on in my Module call. If I didn’t have any dependencies, I could’ve left out the depends_on
block, but I wouldn’t really be following the recommended practice.
NOTE: You can learn more about Providers and Modules here.
Gotcha #3: Referencing a non-TF provider in a module
Two problems down. Awesome! Unfortunately, my problems were not over. I continued to anger the Module Gods, because I encountered yet another issue when I moved my non-modularized code into Modules. This time, it had to do with using the Lightstep Provider. You see, this Provider comes from a third-party (i.e. not HashiCorp), which in this case is Lightstep. Lightstep is what is known as a Partner Provider. This means that in the Provider Registry, the Provider is named lightstep/lightstep
, where the first lightstep
means that the Provider is created and maintained by Lightstep, and the second lightstep
is the actual Provider name. For comparison, the hashicorp/google
provider is an Official Provider, because it is created and maintained by HashiCorp.
Now here’s the odd part. When I tried to run terraform init
, I was graced with this error:
Error: Failed to query available provider packages
Could not retrieve the list of available versions for provider hashicorp/lightstep
Um...what? This did not compute, because in my providers.tf
file, I CLEARLY said that the Provider name was lightstep/lightstep, so where oh where was it getting this hashicorp/lightstep
business from?? LOOK ⬇️⬇️⬇️
terraform {
required_providers {
...
lightstep = {
source = "lightstep/lightstep"
version = ">=1.70.0"
}
...
}
O Google gods, help meeeeee!!
Well, it turns out that when using a Partner Provider in a Module, Terraform assumes the Provider is an Official Provider, and is therefore automagically given a hashicorp
suffix when passing it down to the Module. So Terraform basically thought that the Provider was called hashicorp/lightstep
, even though I clearly defined it correctly in the Providers section of the Root Module.
To fix this issue, I ended up having to define a required_providers
stanza in the Root Module, as I had already done, AND I also had to add a required_providers stanza to my Child Module, as per the snippet below:
terraform {
required_providers {
lightstep = {
source = "lightstep/lightstep"
version = ">=1.70.0"
}
}
}
After that, my terraform init
stopped screaming at me!
Final Thoughts
Today we learned that Terraform can be a wee finicky. We learned that:
- Adding
depends_on
to thedata
stanza used to capture your Kubernetes cluster configuration data ensures that Terraform doesn’t try to evaluate thedata
stanza until AFTER the cluster is created, thereby avoiding some serious Terraform Anger™. - If you want to use
depends_on
in a Module call, the Provider configuration must be done in the Root Module. Also, it’s the recommended practice even if you don’t want to usedepends_on
. - If you have a Module that references a Partner Provider, you need to define a
required_providers
stanza in both the Root Module and the Child Module.
I hope that these tips prevent you from experiencing Terraform Anguish™ next time you find yourself Terraformin’. And now, I shall reward you with a picture of my rat Mookie, who is seen below peering out of an authentic Wisconsin Cheese Head hat.
Peace, love, and code. 🦄 🌈 💫
Got questions about Terraform or Observability-Landscape-as-Code? Talk to me! Feel free to connect through e-mail, or hit me up on Twitter, Mastodon, or LinkedIn. Hope to hear from y’all!
Top comments (0)