DEV Community

Stepan Vrany
Stepan Vrany

Posted on • Edited on

Part I: EC2 with Prometheus

I'm a big fan of lean Kubernetes clusters. This means that I'm quite hesitant to run there some resource-intensive workload that can't be easily scaled. So what to do with Prometheus that fits such description?

Well, there's a pretty new feature called remote write receiver that can help us to run Prometheus somewhere else and keep Kubernetes clusters small.

So let's just quickly go through the basic setup with single Prometheus server running in standalone EC2 instance.

This tutorial is very AWS and Terraform specific but it does not necessarily means that you can't use it anywhere else.

Prometheus installation: summary

Let's summarize what we need in order to get this running:

  • Security group with ingress rule for 9090 port
  • S3 bucket for configuration files
  • IAM role
  • IAM permissions
  • Instance profile with role attached
  • EBS volume
  • some cloud-init that puts all the pieces together and start the Prometheus
  • internal hostname

Security group

So the story here is that some agent running inside the cluster will be accessing the Prometheus server. Such cluster has some Security Group attached so we can use it as the reference in our rules - var.source_security_group_id

resource "aws_security_group" "this" {
  name   = "ec2_prometheus_${var.environment}"
  vpc_id = var.vpc_id
}

resource "aws_security_group_rule" "egress_all" {
  type     = "egress"
  to_port  = 0
  protocol = "-1"
  cidr_blocks = [
    "0.0.0.0/0",
  ]
  from_port         = 0
  security_group_id = aws_security_group.this.id
}

resource "aws_security_group_rule" "ingress_prometheus" {
  type                     = "ingress"
  to_port                  = 9090
  protocol                 = "tcp"
  source_security_group_id = var.source_security_group_id
  from_port                = 9090
  security_group_id        = aws_security_group.this.id
}
Enter fullscreen mode Exit fullscreen mode

Note the 0.0.0.0/0 egress rule. This is kinda classic rule that basically allow all outgoing communication. In our case we'll need it to pull Prometheus artifacts from GitHub. This might differ based on your configuration and security requirements.

S3 bucket for the configuration

S3 bucket is used as the store for prometheus config, rules config and the alertmanager config. We can also use userdata but it's no so convenient since every change in configuration files would replace the EC2 instance.

Hence we render configuration files, put them into s3 bucket and Prometheus instance will be pulling them as needed.

So the following snippet create one S3 bucket with 3 objects - config, rules and alertmanager config.

locals {
  prometheus_config = yamlencode({
    global = {
      scrape_interval = "1m",
    },
    rule_files = [
      "/etc/prometheus/prometheus.rules.yaml",
    ],
    alerting = {
      alertmanagers : [
        {
          static_configs = [
            {
              targets = [
                "localhost:9093",
              ]
            }
          ],
        }
      ],
    },
  })
}

resource "aws_s3_bucket" "config" {
  bucket = "${var.project_name}-${var.environment}"
}

resource "aws_s3_object" "prometheus_config" {
  bucket = aws_s3_bucket.config.id
  key    = "prometheus.yaml"
  content = templatefile(
    "${path.module}/prometheus.config.yaml",
    {
      environment = var.environment,
      s3_bucket   = aws_s3_bucket.config.id,
      config      = local.prometheus_config,
    }
  )

  force_destroy = true
}

resource "aws_s3_object" "prometheus_rules" {
  bucket = aws_s3_bucket.config.id
  key    = "prometheus.rules.yaml"
  content = templatefile(
    "${path.module}/prometheus.rules.config.yaml",
    {
      environment = var.environment,
      s3_bucket   = aws_s3_bucket.config.id,
      config      = var.config_prometheus_rules,
    }
  )
  force_destroy = true
}

resource "aws_s3_object" "alertmanager_config" {
  bucket = aws_s3_bucket.config.id
  key    = "alertmanager.yaml"
  content = templatefile(
    "${path.module}/alertmanager.config.yaml",
    {
      environment = var.environment,
      s3_bucket   = aws_s3_bucket.config.id,
      config      = var.config_alertmanager,
    }
  )

  force_destroy = true
}
Enter fullscreen mode Exit fullscreen mode

Now let's check all the templates referenced there. The first one is the prometheus config.

# prometheus configuration
# environment: ${environment}
# bucket: ${s3_bucket}
---
${config}%
Enter fullscreen mode Exit fullscreen mode

It does not contain anything special and same applies for the other files. This is the file for the rules

# prometheus configuration
# environment: ${environment}
# bucket: ${s3_bucket}
---
${config}
Enter fullscreen mode Exit fullscreen mode

and this is the alertmanager configuration:

# alertmanager configuration
# environment: ${environment}
# bucket: ${s3_bucket}
---
${config}
Enter fullscreen mode Exit fullscreen mode

The reason why we use just config placeholder is simple: we'd like to compose all the configuration with HCL, not YAML. Use local.prometheus_config for the reference. That's the magic.

IAM role, policy and instance profile

Everyone's playing some role. This EC2 instance is not different. It needs to communicate with s3 bucket and SSM (for the remote management) so we take this into account.

resource "aws_iam_role" "this" {
  name = "ec2_prometheus_${var.environment}"
  assume_role_policy = jsonencode({
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ec2.amazonaws.com"
      }
    }]
    Version = "2012-10-17"
  })
}

resource "aws_iam_policy" "this" {
  name = "ec2_prometheus_${var.environment}"

  policy = jsonencode({
    Statement = [
      # SSM Session manager stuff
      {
        Effect = "Allow"
        Action = [
          "ssm:DescribeAssociation",
          "ssm:GetDeployablePatchSnapshotForInstance",
          "ssm:GetDocument",
          "ssm:DescribeDocument",
          "ssm:GetManifest",
          "ssm:ListAssociations",
          "ssm:ListInstanceAssociations",
          "ssm:PutInventory",
          "ssm:PutComplianceItems",
          "ssm:PutConfigurePackageResult",
          "ssm:UpdateAssociationStatus",
          "ssm:UpdateInstanceAssociationStatus",
          "ssm:UpdateInstanceInformation"
        ]
        Resource = "*"
      },
      {
        Effect = "Allow"
        Action = [
          "ssmmessages:CreateControlChannel",
          "ssmmessages:CreateDataChannel",
          "ssmmessages:OpenControlChannel",
          "ssmmessages:OpenDataChannel"
        ]
        Resource = "*"
      },
      {
        Effect = "Allow"
        Action = [
          "ec2messages:AcknowledgeMessage",
          "ec2messages:DeleteMessage",
          "ec2messages:FailMessage",
          "ec2messages:GetEndpoint",
          "ec2messages:GetMessages",
          "ec2messages:SendReply"
        ]
        Resource = "*"
      },
      // s3 for the configuration
      {
        Effect = "Allow"
        Action = [
          "s3:GetBucketLocation",
          "s3:ListAllMyBuckets",
        ]
        Resource = "*"
      },
      {
        Effect = "Allow"
        Action = [
          "s3:List*",
        ]
        Resource = [
          aws_s3_bucket.config.arn,
        ]
      },
      {
        Effect = "Allow"
        Action = [
          "s3:*",
        ]
        Resource = [
          "${aws_s3_bucket.config.arn}/*",
        ]
      },
    ]
    Version = "2012-10-17"
  })
}

resource "aws_iam_role_policy_attachment" "this" {
  policy_arn = aws_iam_policy.this.arn
  role       = aws_iam_role.this.name
}

resource "aws_iam_instance_profile" "this" {
  name = "ec2_prometheus_${var.environment}"
  role = aws_iam_role.this.name
}
Enter fullscreen mode Exit fullscreen mode

EC2 instance

Now we can put all the pieces together and create the EC2 instance itself. In the following snippet you can also see the EBS volume and its attachment to the instance.

resource "aws_ebs_volume" "prometheus_0" {
  availability_zone = data.aws_subnet.this.availability_zone
  size              = var.volume_size
  type              = "gp2"
}

resource "aws_instance" "prometheus_0" {
  ami = var.ami_id
  vpc_security_group_ids = [
    aws_security_group.this.id,
  ]
  subnet_id            = var.subnet_id
  iam_instance_profile = aws_iam_instance_profile.this.name
  instance_type        = var.instance_type
  tags = {
    Name = "${var.environment}_prometheus_0"
  }
  user_data_base64 = base64encode(
    templatefile(
      "${path.module}/cloud.init.yaml",
      {
        s3_bucket   = aws_s3_bucket.config.id,
        environment = var.environment,
      }
    )
  )
}

resource "aws_volume_attachment" "prometheus_0" {
  instance_id = aws_instance.prometheus_0.id
  volume_id   = aws_ebs_volume.prometheus_0.id
  device_name = "/dev/sdf"
}
Enter fullscreen mode Exit fullscreen mode

Please note the user_data_base64 property in the EC2 instance definition. This section refers to the YAML template with the following content:

#cloud-config

# environment: ${environment}
runcmd:
  # install AWS CLI, neeeded for downloading of configuration files
  - |
      apt-get update && apt-get install unzip -y
      curl -Lo awscli.zip https://awscli.amazonaws.com/awscli-exe-linux-aarch64.zip
      unzip awscli.zip
      ./aws/install
      rm awscli.zip

  # install prometheus binary
  - |
      curl -Lo prometheus.tar.gz https://github.com/prometheus/prometheus/releases/download/v2.33.1/prometheus-2.33.1.linux-arm64.tar.gz
      tar -xvf prometheus.tar.gz
      cp ./prometheus-2.33.1.linux-arm64/prometheus /usr/local/bin/prometheus
      rm -rf ./prometheus-2.33.1.linux-arm64
      rm -rf prometheus.tar.gz

  # install alertmanager binary
  - |
      curl -Lo alertmanager.tar.gz https://github.com/prometheus/alertmanager/releases/download/v0.23.0/alertmanager-0.23.0.linux-arm64.tar.gz
      tar -xvf alertmanager.tar.gz
      mv ./alertmanager-0.23.0.linux-arm64/alertmanager /usr/local/bin/alertmanager
      rm -rf alertmanager-0.23.0.linux-arm64
      rm alertmanager.tar.gz

  # vait for EBS volume
  - |
      while [ ! -b $(readlink -f /dev/nvme1n1) ];
      do
        echo "waiting for device /dev/nvme1n1"
        sleep 5
      done

      # format volume
      blkid $(readlink -f /dev/nvme1n1) || mkfs -t ext4 $(readlink -f /dev/nvme1n1)

      # create a mount
      mkdir -p /data
      if ! grep "/dev/nvme1n1" /etc/fstab;
      then
        echo "/dev/nvme1n1 /data ext4 defaults,discard 0 0" >> /etc/fstab
      fi

      # mount volume
      mount /data

  # enable and start systemd services
  - |
      systemctl daemon-reload
      systemctl enable prepare-prometheus.service && systemctl start prepare-prometheus.service && sleep 10
      systemctl enable prometheus.service && systemctl start prometheus.service
      systemctl enable alertmanager.service && systemctl start alertmanager.service

write_files:

  - path: /usr/local/bin/prepare-prometheus
    permissions: '0744'
    content: |
      #!/bin/sh

      mkdir -p /etc/prometheus
      aws s3 cp s3://${s3_bucket}/prometheus.yaml /etc/prometheus/prometheus.yaml
      aws s3 cp s3://${s3_bucket}/alertmanager.yaml /etc/prometheus/alertmanager.yaml
      aws s3 cp s3://${s3_bucket}/prometheus.rules.yaml /etc/prometheus/prometheus.rules.yaml
      curl -X POST http://localhost:9090/-/reload || true

  - path: /etc/systemd/system/prepare-prometheus.service
    content: |
      [Unit]
      Description=Prepare prometheus / alertmanager configuration
      Wants=network-online.target
      After=network-online.target

      [Service]
      Type=oneshot
      ExecStart=/usr/local/bin/prepare-prometheus

  # please note data.mount in dependencies
  - path: /etc/systemd/system/prometheus.service
    content: |
      [Unit]
      Description=Prometheus
      Wants=network-online.target
      After=network-online.target data.mount prepare-prometheus.service

      [Service]
      Type=simple
      ExecStart=/usr/local/bin/prometheus \
          --config.file /etc/prometheus/prometheus.yaml \
          --storage.tsdb.path /data/ \
          --web.enable-lifecycle \
          --web.console.templates=/etc/prometheus/consoles \
          --web.console.libraries=/etc/prometheus/console_libraries \
          --enable-feature=remote-write-receiver

      [Install]
      WantedBy=multi-user.target

  - path: /etc/systemd/system/alertmanager.service
    content: |
      [Unit]
      Description=Alert Manager
      Wants=network-online.target
      After=network-online.target data.mount prepare-prometheus.service

      [Service]
      Type=simple
      ExecStart=/usr/local/bin/alertmanager \
          --config.file /etc/prometheus/alertmanager.yaml \
          --storage.path=/data/

      [Install]
      WantedBy=multi-user.target
Enter fullscreen mode Exit fullscreen mode

When you start the instance with such user data, it will download all the tools required and create systemd units for prometheus and alertmanager.

Last but not least, there's also prepare-prometheus service. This oneshot (run for completion) downloads configuration files and puts them to the respective path - /etc/prometheus. And since prometheus and alertmanager services have this helper as a dependency - it's gonna happer before the prometheus starts.

Internal hostname

We'll also need to assign some hostname to the EC2 instance. The reason is obvious - we don't want to edit all the upcoming configuration files every time we replace the instance with the new one.

resource "aws_route53_zone" "prometheus" {
  name = "prometheus.local"
  vpc {
    vpc_id = var.vpc_id
  }
}

resource "aws_route53_record" "prometheus_0" {
  name = "p01.prometheus.local"
  type = "A"
  records = [
    aws_instance.prometheus_0.private_ip,
  ]
  zone_id = aws_route53_zone.prometheus.zone_id
  ttl     = 60
}
Enter fullscreen mode Exit fullscreen mode

Please note the vpc block in aws_route53_zone.prometheus resource. This means that prometheus.local names will be resolvable in the VPC. We're gonna use this functionality in the upcoming chapters.

Connecting to the EC2 instance

INSTANCE_ID=$(aws ec2 describe-instances \
    --filters 'Name=tag:Name,Values=dev-ap-south-1_prometheus_0' 'Name=instance-state-name,Values=running' \
    --output text --query 'Reservations[*].Instances[*].InstanceId')
Enter fullscreen mode Exit fullscreen mode

Forwarding Prometheus to localhost

aws ssm start-session --target "${INSTANCE_ID}" --document-name AWS-StartPortForwardingSession --parameters '{"portNumber":["9090"],"localPortNumber":["9090"]}'
Enter fullscreen mode Exit fullscreen mode

Wrap

In this part we've gone through the basic configuration of standalone prometheus instance. Such setup does not require any manual configuration and it will handle even AMI updates - all the persistent data are stored on EBS.

In the next chapter I'm gonna show you the other side - agents running in the Kubernetes cluster.

Top comments (0)