Arshad Zackeriya 🇳🇿 ☁️ for AWS Heroes

Posted on Oct 28

Use Amazon Q developer and AWS Infrastructure Composer to automate the monitoring of available IP addresses in Subnets.

#aws #amazonq #awsvpc #eks

I want to begin with saying that Amazon Q developer and AWS Infrastructure Composer helped me to design this solution in a matter of minutes.

Amazon Q: https://aws.amazon.com/q/
AWS Infrastructure Composer: https://aws.amazon.com/infrastructure-composer/

Problem:

Let's discuss the problem I'm attempting to tackle. IP exhaustion, which occurs when given subnets run out of IPs, is a problem that may arise if you are using Amazon EKS and your workload is growing.

Unless you have IPAM, AWS Cloudwatch metrics do not support them at the time I am writing this blog. Monitoring your available IP addresses in subnets without the use of IPAM is what I'm attempting to accomplish here.

Solution:

AWS Services involved in this solution:

AWS Lambda
Event Bridge Scheduler
AWS Cloudwatch Metrics
AWS Cloudwatch Alarm
AWS SNS

Lambda Function

I was able to create this in a matter of minutes with the help of Amazon Q Developer, however, I obviously needed to make a few little adjustments. This is very beneficial if you understand the basics and what you are doing. Instead of configuring AWS services blindly, I recommend everyone to better understand AWS services.

Full Python Script here:

import boto3
import os
from botocore.exceptions import ClientError

def lambda_handler(event, context):
    vpc_id = os.environ['VPC_ID']
    subnet_ids = os.environ['SUBNET_IDS'].split(',')
    namespace = os.environ['NAMESPACE']

    ec2 = boto3.client('ec2')
    cloudwatch = boto3.client('cloudwatch')

    try:
        response = ec2.describe_subnets(
            Filters=[
                {'Name': 'vpc-id', 'Values': [vpc_id]},
                {'Name': 'subnet-id', 'Values': subnet_ids}
            ]
        )

        for subnet in response['Subnets']:
            subnet_id = subnet['SubnetId']
            available_ip_count = subnet['AvailableIpAddressCount']
            cidr_block = subnet['CidrBlock']
            total_ip_count = 2 ** (32 - int(cidr_block.split('/')[1])) - 5  # Subtract 5 for reserved IPs

            subnet_name = subnet_id  # Default to subnet ID if no name tag
            for tag in subnet.get('Tags', []):
                if tag['Key'] == 'Name':
                    subnet_name = tag['Value']
                    break

            utilization_percentage = ((total_ip_count - available_ip_count) / total_ip_count) * 100

            # Send metrics to CloudWatch
            cloudwatch.put_metric_data(
                Namespace=namespace,
                MetricData=[
                    {
                        'MetricName': 'AvailableIPAddresses',
                        'Dimensions': [
                            {'Name': 'SubnetName', 'Value': subnet_name},
                            {'Name': 'SubnetId', 'Value': subnet_id}
                        ],
                        'Value': available_ip_count,
                        'Unit': 'Count'
                    },
                    {
                        'MetricName': 'IPUtilizationPercentage',
                        'Dimensions': [
                            {'Name': 'SubnetName', 'Value': subnet_name},
                            {'Name': 'SubnetId', 'Value': subnet_id}
                        ],
                        'Value': utilization_percentage,
                        'Unit': 'Percent'
                    }
                ]
            )

            print(f"Metrics sent for Subnet: {subnet_name} (ID: {subnet_id})")

    except ClientError as e:
        print(f"An error occurred: {e}")
        return {
            'statusCode': 500,
            'body': str(e)
        }

    return {
        'statusCode': 200,
        'body': 'Subnet monitoring completed'
    }

Get IP address utilization:

Send metrics to CloudWatch:

Use AWS Infrastructure Composer to design the infrastructure.

This further enables you design your infrastructure visually, generate Infrastructure as Code and deploy it using AWS SAM (AWS Serverless Application Model) https://aws.amazon.com/serverless/sam/.

How to Deploy

Prerequisites

AWS CLI installed and configured with appropriate permissions
AWS Toolkit for Visual Studio Code installed and configured
AWS SAM CLI installed

Deployment Steps

Repository for entire code and instructions on how to deploy: https://github.com/awsfanboy/aws-subnet-ip-address-utilization-monitor

Modify the template.yaml file to adjust default parameter values or add/remove resources as needed. eg: VPC ID, Subnet Name, Subnet ID, CloudWatch Metric Namespace.
(Optional) Update the lambda_function.py file in the src directory.
Build the SAM application: sam build
Deploy the SAM application: sam deploy --guided
This will start an interactive deployment process. You'll be prompted to provide values for the parameters defined in the template. You can accept the default values or provide your own.
During the deployment, you'll be asked to confirm the creation of IAM roles and the changes to be applied. Review and confirm these.
SAM will output the ARNs of the created Lambda function and SNS topic once the deployment is complete.

Parameters

    VpcId: The ID of the VPC to monitor
    SubnetIds: Comma-separated list of subnet IDs to monitor
    SubnetName1: Name of the first subnet
    SubnetName2: Name of the second subnet
    CWMetericNamespace: The CloudWatch metric namespace
    AlertEmail: Email address to receive alerts

Resources Created

Lambda function for monitoring subnets
EventBridge rule to trigger the Lambda function every minute
SNS topic for sending alerts
CloudWatch alarms for each monitored subnet

Customization

To monitor more than two subnets, duplicate the SubnetUtilizationAlarm resource in the template and adjust the SubnetIds parameter.
Modify the Lambda function code in src/lambda_function.py to implement your specific monitoring logic.
Adjust the alarm thresholds and evaluation periods in the SubnetUtilizationAlarm resources as needed.

Cleanup

To remove all resources created by this stack: sam delete
Follow the prompts to confirm the deletion of resources.

Demo

I have an Amazon EKS cluster running a deployment with 6 replicas. Worker nodes are running on 2 Subnets. IP address utilization is looking good.

The alarm state is OK.

Okay! let's increase the number of replicas from 6 to 600.

Let's check metrics from the CloudWatch and ooops! now we can see that IP utilization is high.

Now, let's check the Alarms in the CloudWatch. Now the state changed from OK to ALARM state.

Let's check my emails

I can see there are 2 emails in my inbox.

Cost

I calculated the cost using calculator.aws, and it appears to be not bad though.

What Next?

These notifications can be sent to Slack, PagerDuty, and other platforms.

Conclusion

I hope my automation will help someone who doesn't want to use IPAM to monitor IP address utilization in subnets, and I truly wish we could access these metrics straight from CloudWatch.

If you have any suggestions for improvement or if you would like to use anything you currently have in a different way, please feel free to share.

Top comments (4)

Pawel Zubkiewicz • Oct 30

This is useful solution. Few years back I've done very similar thing to monitor available addresses in a database subnets. My client had a problem with failing Glue jobs, after investigation it became clear that Glue jobs were running in parallel and were using up all free ips in the subnets (not my design). Such metric as described in this article was very helpful in configuration of max Glue jobs concurrency.

Arshad Zackeriya 🇳🇿 ☁️ • Oct 31

Thanks @pzubkiewicz , yeah mate 100%. thanks for sharing another use case.

Jones Zachariah Noel • Oct 28

That's a brilliant round up about how Serverless can be used for Ops automations and metrics!!! 😜

Arshad Zackeriya 🇳🇿 ☁️ • Oct 28

Aye aye! Serverless FTW :P

DEV Community