DEV Community

Cover image for The Four Stages of Terraform Automation
env0 for env0

Posted on • Originally published at env0.com

The Four Stages of Terraform Automation

As your organization grows, so does the complexity of your infrastructure, making automation not just beneficial but essential. 

In this blog post, we'll explore the four stages of Terraform automation, providing a roadmap as you look to scale your Terraform practices.

Video Overview


TLDR: You can find the main repo here.

Disclaimer

All use cases of Terraform automation discussed here work similarly in OpenTofu, the open-source Terraform alternative. However, to keep it simple and familiar for DevOps engineers, we will refer to these as "Terraform automation" throughout this blog post.

Terraform Workflow Basics

Terraform operates on a simple yet powerful workflow: write, plan, and apply. However, manually running these commands can be time-consuming and error-prone, especially as infrastructure scales. Automation brings several key benefits:

  • Increases efficiency: Reduces the need for manual intervention
  • Minimizes human errors: Ensures that scripts run consistently every time
  • Ensures consistency: Maintains uniformity across multiple deployments
  • Enables scalable operations: Supports larger teams and complex environments without bottlenecks

I want to introduce you to the four stages of Terraform Automation based on my experience over the years working with customers as they mature in their adoption of Terraform.

The Four Stages of Terraform Automation

Stage 1: Basic Automation with VCS

We will skip over discussing the use of the command line with Terraform as we assume a certain level of expertise with the tool. However, if you want to learn more about the CLI, check out this blog post: Terraform CLI: Terraform Commands, Examples and Best Practices.

Let's start our discussion by looking at how to automate Terraform deployments with a version control system (VCS). This is the first step towards streamlined operations. Tools like Atlantis integrate with VCS to trigger automated Terraform plans and applies through pull requests (PRs).

Workflow with Atlantis

  1. Creating Pull Requests: When a PR is opened or updated, Atlantis automatically runs terraform plan and posts the output as a comment on the PR.
  2. Applying Changes: After reviewing the plan, if changes are approved, commenting atlantis apply on the PR applies the changes, and posts the results.

How env0 Can Help

env0 simplifies this process even further by enabling you to create templates tied to your VCS, automating the redeployment process and running Terraform plans on pull requests. This setup forms the foundation of your Terraform automation journey, ensuring that changes are tracked, reviewed, and applied seamlessly.

Stage 2: Infrastructure as Code (IaC) Specialized Pipelines

As organizations mature in their automation practices, they adopt specialized CI/CD pipelines tailored for IaC. Building infrastructure provisioning pipelines allows teams to enforce best practices such as integrating linting, security scans, and testing as they provision infrastructure. 

This makes the automation process more robust and reliable and the same pipelines can then pass the logic to a configuration management tool such as Ansible to continue configuring the resources as needed.

Custom Workflows

Creating workflows designed for IaC involves:

  • Linting Terraform code: Ensuring code quality and style consistency
  • Running security scans: Detecting vulnerabilities early
  • Validating configurations: Checking the correctness of configurations before they are applied
  • Configuration management tools: Integrating tools like Ansible for post-deployment configuration

How env0 Can Help

env0 supports IaC specialized pipelines through custom flows. These custom flows allow you to run bash commands at various stages of the Terraform process, such as init, plan, apply, and output. 

You can also install tools such as Ansible for configuration management. This flexibility ensures that all necessary checks and configurations are performed automatically, enhancing the reliability and security of your deployments.

Below is an example of using custom flows in the env0.yaml file from the blog post: Ansible vs. Terraform: Choose One or Use Both?

deploy:
  steps:
    terraformOutput:
      after:
        - terraform output -raw private_key > /tmp/myKey.pem
        - chmod 400 /tmp/myKey.pem
        - sed -i "s/[placeholder_app]/$(terraform output -raw public_ip)/g" Ansible/inventory
        - pip3 install --user ansible
        - ls -lah
        - cat Ansible/inventory
        - cd Ansible && ansible-playbook --private-key /tmp/myKey.pem -i inventory jenkinsPlaybook.yaml
Enter fullscreen mode Exit fullscreen mode

Stage 3: Advanced Terraform Orchestration

As automation practices advance, managing infrastructure as micro-infrastructure becomes crucial. This approach involves breaking down large, monolithic Terraform configurations into smaller, loosely coupled ones based on environment, function, or team. The advantages of doing so are similar to breaking monolithic applications into microservices. Here are a few:

  • Improved Scalability: Each infrastructure component can be scaled independently based on its specific needs, ensuring optimal performance and resource usage
  • Enhanced Maintainability: Managing smaller, independent infrastructure components simplifies updates, testing, and troubleshooting, making the overall system easier to maintain
  • Greater Flexibility and Agility with RBAC: Teams have control over their own piece of the micro-infrastructure with Role-Based Access Control (RBAC). For example, a networking team can have read/write access to their networking environment while providing read access to others, allowing for faster adjustments and deployments in response to changing requirements.

Orchestrating Micro-Infrastructures

We need a mechanism to orchestrate these micro-infrastructures and determine their dependencies. 

For instance, if we want an application to run, we first need a database. So, we should have a workflow to create the database first and have the necessary credentials and configuration ready for apps that need access to it. 

Below are the key components for orchestration:

  • Determining dependencies: Identifying the order in which resources should be created
  • Managing environments: Creating environments owned by certain teams via RBAC that are specific to environment, function, and team. For example, an application could have the following environments:
    • dev-compute-teamA
    • dev-database-teamA
    • dev-storage-teamA
    • dev-networking-teamA
    • qa-compute-teamA
    • qa-database-teamA ...and so on
  • Using workflows: Automating the creation and configuration of dependent resources using an orchestrator

How env0 Can Help

  • env0 uses the term environment for a micro-infrastructure
  • The way to orchestrate multiple environments with their dependencies is by using env0’s Workflows functionality
  • Benefits of env0’s workflows:
    • Manage your entire infrastructure with complex dependencies between Environments
    • Visual presentation of the complex deployment
    • Each environment can use a different IaC tool – one environment can be managed by Terraform while another is managed by K8s, for example

Stage 4: Self-Service with Governance

At the pinnacle of Terraform automation, governance becomes a critical focus. This includes ensuring security, compliance, cost management, and reliability.

Terraform Security and Compliance

As the organization matures with its Terraform automation efforts, it will typically move towards a self-service model. In a self-service model, you would have producers and consumers.

Producers would build standardized Terraform modules that consumers would use. Producers are typically well-versed in Terraform, whereas consumers don’t necessarily need to be profoundly acquainted with It.

For this self-service model to work, we must have governance and policies in place. We can ensure these guardrails are in place using Policy as Code. Open Policy Agent (OPA) has become an industry standard for Policy as Code.

How env0 Can Help

env0 supports Policy as Code, allowing you to automate policy enforcement and maintain compliance across your infrastructure. 

With env0, you can define and enforce policies at runtime and during deployments, ensuring that all infrastructure changes adhere to your organization’s standards.

Cost Management with Automated Terraform Cost Control

Cost management is a crucial aspect of governance. Tools like Infracost help estimate the cost impact of IaC changes. 

Proper tagging of resources is also crucial for optimizing costs. Some organizations implement bots that scan environments and shut down resources based on tags. Manually tagging resources is a painful process, one that an automated process can definitely help with.

How env0 Can Help

env0 offers Cost Estimation and Monitoring and Budget Notifications to keep an eye on cost. env0 also offers an automatic tagging feature using its open-source Terratag project.

Reliability with Auto-Drift Detection and Remediation

Monitoring infrastructure changes and detecting configuration drift is vital for maintaining reliability. Drift can occur for various reasons, and detecting and addressing it promptly is important.

How env0 Can Help

env0 provides real-time monitoring and smart Drift Detection features, ensuring that any configuration changes or drifts are identified and remediated quickly, maintaining the integrity and reliability of your infrastructure.

env0 Drift Detection

Demo Time!

This section provides a detailed demo showcasing how a mature organization can leverage env0 to automate Terraform workflows. This demo will highlight the practical applications of the concepts discussed in the previous sections.

Diagram for our Demo

Setting Up the Initial Environment

The network team has already set up basic networking requirements to kick off, such as a VPC and some subnets. This highlights the idea of micro-infrastructures and relationships between different environments and teams. Here's a quick overview of what to work with:

  • Region: us-east-1
  • VPC ID: Unique identifier for the VPC
  • VPC Private Subnet IDs: List of subnet IDs within the VPC

These outputs from the initial networking setup will be crucial for configuring our subsequent environments and they are found in a project called "Networking-AWC" with the screenshot of the output below:

Output from the Networking Project

Creating the Terraform Automation Project

Next, you will need to create a new project called "Terraform Automation for Mature Org." This project will include several key templates:

  1. Terraform Automation Template: This workflow template sets up an EKS (Elastic Kubernetes Service) cluster and deploys Grafana and Prometheus inside the cluster
  2. EKS Template: This Terraform template handles the creation of the EKS cluster
  3. Grafana and Prometheus Helm Templates: These Helm templates manage the deployment of Grafana and Prometheus via Helm charts

Here are our project templates below:

Project Templates

Each template is configured with specific variables and tied to our version control system.

Configuring Templates and Environments

The templates are tied to our project, and the environment variables are set up to ensure smooth deployment. Let's take a look at the Terraform Automation template first.

The Terraform Automation Template

These are the environment variables used:

  • INFRACOST_API_KEY: An API key for Infracost to run cost estimates on the infrastructure
  • ENV0_TERRATAG_CUSTOM_TAGS: A map of key-value pairs for Terratag to automatically tag all the resources you're building
  • SOFT_FAIL: A setting for soft fail during Terraform scans by Checkov, which gives us a notification of the Checkov policies that have passed and failed with their severity without stopping the run

The EKS Environment

Below are the Terraform variables that we are assigning: VPC ID, Subnet IDs, and Region: These variables are imported from the outputs of our initial networking project.

The Prometheus and Grafana Environments

The Prometheus and Grafana VCS templates are linked to their Helm repositories on GitHub as seen below.

Prometheus Template VCS Settings

Grafana Template VCS Settings

Also, look at the settings for the Prometheus and Grafana environments. Notice the Environment Name, Release Name, and Namespace for Prometheus below.

Prometheus Environment Settings

And again for Grafana:

Grafana Environment Settings

Project Settings

  • Credentials: AWS and Kubernetes credentials are configured to allow Terraform to interact with your cloud resources. You can also configure cost credentials to monitor your environment's cost.

Deployment Credentials for our Project

Cost Credentials

  • Approval Policies: We have two approval policies using OPA. The Main one checks costs, and the Remediation one ensures approval for any "create" or "destroy" operations in the Terraform workflow.

Approval Policies

Deploying the Environments

Using a combination of env0’s Workflows along with Custom Flows, the automation process will:

  1. Create the EKS Cluster: Deploying the EKS cluster based on the Terraform configuration
  2. Create Namespace: Setting up a Kubernetes namespace for monitoring purposes
  3. Deploy Prometheus and Grafana: Using Helm charts to install these monitoring tools within the EKS cluster

Deploying or redeploying the workflow environment will look similar. You can deploy the entire workflow or just a subset.

Deploy a Workflow Environment

Here is what your deployment will look like once completed:

Deployment Complete

Deployment Steps

The deployment logs provide detailed insights into each step. Here they are listed for the Terraform EKS environment:

  1. Git Clone: Clones the repository containing the Terraform configuration files from the specified Git source
  2. Get Working Directory: Prepares the working directory for the deployment by setting up the necessary files and structure
  3. Loading env0 YAML file: Reads and processes the env0 YAML configuration file to apply specified settings, configurations, and custom flows
  4. Load Variables: Loads environment-specific variables needed for the Terraform deployment
  5. Setting Version: Ensures the correct versions of Terraform and other required tools are set up for the deployment
  6. Initialize: Prepares the environment for Terraform operations
  7. Terraform Init: Initializes the Terraform configuration, setting up the backend and provider configurations (Learn more about terraform init)
  8. Setting Terraform Workspace: Configures the appropriate Terraform workspace to isolate state and resources for different environments
  9. Tag Resources: Applies tags to the managed resources for better organization and cost tracking. This step runs Terratag
  10. Terraform Plan: Generates and displays an execution plan showing what actions Terraform will take to achieve the desired state
  11. Checkov Install: Installs Checkov, a static code analysis tool for Infrastructure as Code, to ensure compliance with security policies
  12. Checkov Security Scan: Runs a security scan using Checkov to identify potential misconfigurations and security issues in the Terraform code
  13. Cost Estimation: Provides an estimate of the costs associated with the resources defined in the Terraform plan. This step runs Infracost
  14. Approval Policies: Applies approval policies to ensure that the plan meets organizational requirements before execution
  15. Terraform Apply: Executes the actions proposed in the Terraform plan to create or update infrastructure resources
  16. Terraform Output: Retrieves and displays the output values defined in the Terraform configuration
  17. Create Monitoring Namespace: Creates the monitoring namespace in Kubernetes for Prometheus and Grafana
  18. Store Working Directory: Saves the working directory state and relevant files for future reference or reuse

Moreover, here are the steps for the Grafana and Prometheus environments:

  1. Get Working Directory: Prepares the working directory by setting up the necessary files and structure for the deployment
  2. Loading env0 YAML file: Reads and processes the env0 YAML configuration file to apply specified settings, configurations, and custom flows
  3. Load Variables: Loads environment-specific variables
  4. Setting Version: Ensures the correct versions of Helm and other required tools are set up for the deployment
  5. Initialize: Connects to the EKS cluster, updates kubeconfig, and adds the specified Helm repository
  6. Helm Diff: Compares the current Helm release with the proposed changes to show what will be updated
  7. Approval Policies: Applies approval policies to ensure the proposed Helm changes meet organizational requirements before execution
  8. Helm Upgrade: Applies the changes to the Kubernetes cluster by upgrading the Helm release with the specified configurations
  9. Store Working Directory: Saves the working directory state and relevant files for future reference or reuse

Drift Detection and Remediation Setup

To maintain the integrity of the environments, you need to enable automatic drift detection and remediation. This involves:

  • Enabling Drift detection: This was initially enabled when creating the environments
  • Scheduled Deployments: Setting up a schedule in the form of a cron job to run deployments every two hours, ensuring any drift is detected and addressed promptly
  • OPA Policies: Enforcing policies that allow automatic updates but require approvals for resource creation or deletion, mitigating the risk of unintended changes causing outages

Below is a screenshot of the Workflow settings showing Scheduling for continuous deployment along with Drift Detection.

Simulating and Handling Drift

In the demo video, I manually introduced drift by changing the tags on the EKS cluster to test the drift detection and remediation setup. 

The automation detected the drift, and the redeployment reverted the changes, demonstrating the robustness of our setup. It was a simple update in this case, so the approval policy allowed the automation to continue unattended.

Additionally, I simulated a more complex drift scenario by adding an S3 bucket via Terraform. This triggered an approval workflow, ensuring that significant changes are reviewed before being applied.

Code Walk-through

Finally, let's explore the key configuration files in our repository. Below is the file structure in our repo.

├── LICENSE
├── Policies
│   ├── Main
│   │   └── cost-policy.rego
│   └── Remediation
│       └── update-only.rego
├── README.md
└── Terraform
    ├── EKS
    │   ├── LICENSE
    │   ├── README.md
    │   ├── env0.yml
    │   ├── input.json
    │   ├── main.tf
    │   ├── outputs.tf
    │   ├── plan.json
    │   ├── variables.tf
    │   └── versions.tf
    ├── VPC
    │   ├── LICENSE
    │   ├── README.md
    │   ├── env0.yml
    │   ├── main.tf
    │   ├── outputs.tf
    │   ├── variables.tf
    │   └── versions.tf
    └── env0.workflow.yaml
Enter fullscreen mode Exit fullscreen mode
  • Policies: OPA policies are defined in the Policies folder and used to manage approvals and cost estimations.
  • Terraform Scripts: The Terraform directory contains detailed configurations for creating the VPC and the EKS cluster. Notice that we have our variables.tf file, which is our variable definitions file, but the variable assignment is directly added to templates and environments within env0. You could add *.auto.tfvars files here as well if you like.
  • Workflow Definitions: Two files are of importance here: env0.workflow.yaml for workflow definitions and env0.yaml for custom flows

Conclusion

Automating Terraform with env0 offers numerous benefits, including increased efficiency, reduced errors, enhanced security, and compliance. 

Following the four stages outlined in this post, organizations can systematically scale their Terraform automation practices, ensuring robust and scalable infrastructure management.

To recap:

  • Stage 1: Basic automation with VCS sets the foundation for streamlined operations
  • Stage 2: Infrastructure as Code (IaC) Specialized Pipelines introduce additional checks and configurations
  • Stage 3: Advanced Terraform orchestration breaks down complex infrastructure into manageable components
  • Stage 4: Self-service with governance ensures security, compliance, and cost management at scale

Explore env0's features further to elevate your Terraform automation practices. Start your journey today by signing up for an env0 account and begin transforming your infrastructure automation process.

Happy automating!

Top comments (0)