Adriana Villela for Lightstep

Posted on Nov 21, 2022 • Edited on Nov 22, 2022 • Originally published at lightstep.com

Three Terraform Mistakes, and How to Avoid Them

#tutorial #terraform #tech #hashicorp

In my last blog post, I talked about how Ana Margarita Medina and I used Terraform to show off Observability-Landscape-as-Code in practice, leveraging the OpenTelemetry Demo App to do so. The Demo App showcases instrumentation of Traces and Metrics of different services written in different languages using OpenTelemetry (OTel). Our Terraform code did the following:

Created a Kubernetes cluster
Deployed the Demo App to Kubernetes
Deployed OpenTelemetry Collector to Kubernetes, and configured it to send Traces and Metrics to Lightstep
Created dashboards in Lightstep

Now, I’m a fan of beautiful code, so we organized our code using Terraform Modules. We used a module for provisioning the Kubernetes cluster, one for deploying the OTel Demo App and the OTel Collector, and one for creating the Lightstep dashboards.

We also leveraged the following Terraform Providers:

Google Cloud Provider for spinning up a Kubernetes Cluster in Google Cloud Platform (GCP)
The Kubernetes Provider and the Helm Provider for deploying the app to Kubernetes
The Lightstep Provider for creating the dashboards in Lightstep

All good, right? Except for one teeeensy little problem...the last time I’d touched Terraform was in early 2021, and even then, I was just tweaking code. So I kinda had to teach myself Terraform all over again. And I hit up a few snags along the way. Cue. The. Panic.

Fortunately, Google came through, and we were able to resolve the issues. In today’s blog post, I will cover THREE Terraform gotchas that Ana and I hit, and how we solved them, so that you will hopefully be spared our utter despair and panic. 😅

Let’s do this!

NOTE: If you want to follow along to see the full Terraform source code, you can check it out here. Even though the source code is specific to the Observability-Landscape-as-Code use case, the main Terraform concepts in this blog post can be ported over to other scenarios.

Gotcha #1: The Chicken-and-Egg Scenario

After creating a Kubernetes cluster, we needed to create a Kubernetes resource before we could apply the Helm chart to install the OpenTelemetry demo app. The Demo App’s Helm Chart deploys an OpenTelemetry Collector. We wanted to configure the Collector to send OTel data to Lightstep. To do so, you need to add a Lightstep Access Token, which is stored as a Kubernetes secret.

You can learn more about the specifics of this setup here.

To create the secret in Kubernetes before running the Helm Chart, we used the Kubernetes Provider. In order to use this provider, Terraform needs to know information about your cluster, so that it knows what cluster to apply the manifest to. To do this, I needed to store the cluster information in the data stanza, like this:

data "google_client_config" "default" {}

data "google_container_cluster" "primary" {
 name     = var.cluster_name
 location = var.region
}

provider "kubernetes" {
 host  = "https://${data.google_container_cluster.primary.endpoint}"
 token = data.google_client_config.default.access_token
 cluster_ca_certificate = base64decode(
   data.google_container_cluster.primary.master_auth.0.cluster_ca_certificate
 )
}

Easy peasey, right? Unfortunately, when I ran terraform apply, I kept getting the following errors:

Error: Invalid template interpolation value

And

Error: Attempt to index null value

Basically, Terraform was trying to evaluate the contents of the data stanza (which were null) before it had any information about the Kubernetes cluster. Which of course it didn’t, because the cluster didn’t yet exist!! Hence the null contents.

I frantically Googled this one for a while, spinning my wheels. And then, the “aha” moment hit me, when I saw somewhere in one of my searches that I could use the depends_on attribute in the data stanza. So I added depends_on = [module.k8s_cluster_create] to both my data stanzas, which basically says, “Hey buddy, don’t try to evaluate this until AFTER the k8s_cluster_create module (i.e. the module in which the Kubernetes cluster is created) is run. So now, after adding depends_on, my providers.tf (lines 32-49) code looked like this:

data "google_client_config" "default" {
 depends_on = [module.k8s_cluster_create]
}

data "google_container_cluster" "primary" {
 depends_on = [module.k8s_cluster_create]
 name     = var.cluster_name
 location = var.region
}

provider "kubernetes" {
 host  = "https://${data.google_container_cluster.primary.endpoint}"
 token = data.google_client_config.default.access_token
 cluster_ca_certificate = base64decode(
   data.google_container_cluster.primary.master_auth.0.cluster_ca_certificate
 )
}

And after making that change, all was well with the world. Huzzah!

Gotcha #2: Using Modules with depends_on

While the above problem went away, I then found myself face-to-face with yet another conundrum. When I initially wrote my Terraform code, everything was in one big file, and it worked just fine. So OF COURSE I just assumed that when I prettified my code and moved things into modules, I could just get away defining my Providers in the Modules themselves. Well, you can. That is…if you don’t use the depends_on attribute in your Module call.

So basically, when I tried to say that the Module lightstep_dashboards depended on k8s_cluster_create like this:

module "k8s_cluster_create" {
   source = "./modules/k8s"

   cluster_name = var.cluster_name
   project_id = var.project_id
   region = var.region
   network = var.network
   subnet = var.subnet
}

module "deploy_otel_demo_app" {
   source = "./modules/otel_demo_app"

   otel_demo_namespace = var.otel_demo_namespace
   ls_access_token = var.ls_access_token
   cluster_name = var.cluster_name
   project_id = var.project_id
   region = var.region
   network = var.network
   subnet = var.subnet
}

module "lightstep_dashboards" {
   source = "./modules/lightstep"
   depends_on = [module.k8s_cluster_create]

   lightstep_project = var.ls_project
}

I kept getting this error when I ran terraform apply:

Error: Module is incompatible with count, for_each and depends_on

This error happens when the Child Module contains a provider block and the Module that you’re trying to call is using count, depends_on, and/or for_each. Why? Because provider blocks inside a Child Module are not allowed when your Module call is using count, depends_on, and/or for_each. You can read up more on this here.

Well, it turns out that correct practice is to define your provider block in the Root Module, as Providers are automagically passed down to the Child Modules. So to make the above error go away, I moved all of my Provider definitions to the Root Module, and was able to keep depends_on in my Module call. If I didn’t have any dependencies, I could’ve left out the depends_on block, but I wouldn’t really be following the recommended practice.

NOTE: You can learn more about Providers and Modules here.

Gotcha #3: Referencing a non-TF provider in a module

Two problems down. Awesome! Unfortunately, my problems were not over. I continued to anger the Module Gods, because I encountered yet another issue when I moved my non-modularized code into Modules. This time, it had to do with using the Lightstep Provider. You see, this Provider comes from a third-party (i.e. not HashiCorp), which in this case is Lightstep. Lightstep is what is known as a Partner Provider. This means that in the Provider Registry, the Provider is named lightstep/lightstep, where the first lightstep means that the Provider is created and maintained by Lightstep, and the second lightstep is the actual Provider name. For comparison, the hashicorp/google provider is an Official Provider, because it is created and maintained by HashiCorp.

Now here’s the odd part. When I tried to run terraform init, I was graced with this error:

Error: Failed to query available provider packages

Could not retrieve the list of available versions for provider hashicorp/lightstep

Um...what? This did not compute, because in my providers.tf file, I CLEARLY said that the Provider name was lightstep/lightstep, so where oh where was it getting this hashicorp/lightstep business from?? LOOK ⬇️⬇️⬇️

terraform {
 required_providers {
...
   lightstep = {
     source = "lightstep/lightstep"
     version = ">=1.70.0"
   }
...
}

O Google gods, help meeeeee!!

Well, it turns out that when using a Partner Provider in a Module, Terraform assumes the Provider is an Official Provider, and is therefore automagically given a hashicorp suffix when passing it down to the Module. So Terraform basically thought that the Provider was called hashicorp/lightstep, even though I clearly defined it correctly in the Providers section of the Root Module.

To fix this issue, I ended up having to define a required_providers stanza in the Root Module, as I had already done, AND I also had to add a required_providers stanza to my Child Module, as per the snippet below:

terraform {
 required_providers {
   lightstep = {
     source = "lightstep/lightstep"
     version = ">=1.70.0"
   }
 }
}

After that, my terraform init stopped screaming at me!

Final Thoughts

Today we learned that Terraform can be a wee finicky. We learned that:

Adding depends_on to the data stanza used to capture your Kubernetes cluster configuration data ensures that Terraform doesn’t try to evaluate the data stanza until AFTER the cluster is created, thereby avoiding some serious Terraform Anger™.
If you want to use depends_on in a Module call, the Provider configuration must be done in the Root Module. Also, it’s the recommended practice even if you don’t want to use depends_on.
If you have a Module that references a Partner Provider, you need to define a required_providers stanza in both the Root Module and the Child Module.

I hope that these tips prevent you from experiencing Terraform Anguish™ next time you find yourself Terraformin’. And now, I shall reward you with a picture of my rat Mookie, who is seen below peering out of an authentic Wisconsin Cheese Head hat.

Peace, love, and code. 🦄 🌈 💫

Got questions about Terraform or Observability-Landscape-as-Code? Talk to me! Feel free to connect through e-mail, or hit me up on Twitter, Mastodon, or LinkedIn. Hope to hear from y’all!