The Key Rotation project was one of my favorite projects to complete. The goal of this project was to automate the distribution of AWS access keys to all agency-wide cloud users at scale. There are several security based reasons why the agency depended on the successful completion of this project. The organization had one person in charge of distributing all AWS access keys. The method for sending AWS keys to users was through a clear text email which is less than ideal. This can be referred to as a security bottle neck or liability. Should anything happen to that person several adjustments would need to be made. In addition to that there was no key age policy in place, so many keys were well over 90 days old. Completing this project would be paramount to rectifying security concerns and establishing new organizational standards.
The finished product would be a self-service site that any cloud user with programmatic access could use to rotate their existing keys or generate new ones.
Given the scope of this project I knew I would need to deploy multiple Lambda functions from the beginning. This architectural requirement ensured that I would need to choose a service that could enable one Lambda function to communicate with the other while reliably handling that data in a predictable way. That is why I chose AWS SQS. This was a data driven architectural decision.
The tool I used to create and configure all the resources was Terraform due to the increased operational efficiency over a manual deployment. Once I knew that I could use Terraform to speed up my development cycle I started to focus my attention on the behavioral aspects of the code itself. For this I relied on my manager to define the scope of work. His requirements were as follows:
○ Generate a list of users in all accounts
○ Evaluate the age of their keys
○ Generate and send a message to users with the AWS access key self-service link inside to those with expiring or expired keys
○ Delete the key if expired
○ Once the user clicks on the button in the self-service portal any old keys will be deleted and a new one will be generated
Once I understood the behavioral aspects of the code I designed and developed the solution according to the business use case.
- Lambda 1 will handle generating and evaluating all of the user keys. A JSON payload would be created and sent to SQS for processing
- Lambda 2 will take the data from SQS and ingest the payload. The remediation of the key in question will be completed and email messages will be sent to the users with expiring or expired keys
- Lambda 3 Once a user accesses the self-service portal they will click on "Rotate Key". As a result all access keys will be deleted while a new one is generated and presented on screen for them to copy.
During the building phase I encountered one issue in particular that called for remediation. One Lambda function needed to be placed in a VPC with security groups behind an Elastic Load Balancer because Lambda Function URLs aren't enabled when using AWS Gov Cloud regions. Once I found that out I made the adjustment and continued to develop the project using an ELB. I ran into another problem right after that however. The target group for the Lambda function could not be automatically attached to the Load Balancer. This was a well documented Terraform issue and the only work around was to manually add the target group every time I wanted to make a change. This was a small inconvenience due to the amount of infrastructure I was provisioning, plus I was almost done with the project anyway. All in all the project was successfully deployed and operates in production at scale today. The organization now benefits from an efficient event-driven process while eliminating severe security flaws.
Top comments (0)