In a previous blog post I talked about AWS Cost Control and some simple methods I used to cut cost on a customer environment by more than 50%.
I thought I would show you how this was achieved in a how to guide, so please join me below to see how I did it.
Pre-requisites
Regardless of which AMI you decide to use to run on your instances, ensure that SSM Agent is installed, this is present in some AMI’s but not all so it always worth checking.
This should be a standard install on all instances really in my view, Systems Manager has such an array of features it is silly not to have it.
How will we Identify the instances?
This is the first thing we need to decide… AWS Systems Manager has a couple of different ways to do this through the Maintenance Windows feature, you can manually select all instances in the Registered Targets but this is not ideal as it requires manual intervention when new servers are added, and the chances of someone remembering to do this, is slim.
The alternative is to use Instance Tags, I have two thought processes on this, you can either use an existing tag you already use on all instances, for example “environment:non-prod” on the face of it this seems like a great idea and for some it will, however I personally think using existing tags makes things a little less flexible, for instance if you wish to exclude some servers from being identified you would have to either remove the tag from those instances or change it, neither is really a great solution since tags are great for identifying instances. The other option is to have a specific tag for Auto Start/Stop, this is a tag that can be applied to all instances through IAC and then you only need to modify this tag to exclude instances… But you just said changing tags isn't a great idea, yes I did however this tag is a specific tag it means all my other identify tags are intact.
So in my solution I went for the tag as below:-
“auto_start_stop:yes”
So when “yes” is present in the key value pair the instance is included in the auto start/stop solution and when it is set to “no” it is not.
Two Instances
So in my Dev lab I have launched 2 instances Test_Server_1 and Test_Server_2. Test_Server_1 has the tag “auto_start_stop” set to “yes” and Test_Server_2 has it set to “no”.
At this point it is important to point out that the EC2 Instances should have an IAM Instance Profile attached with the sufficient permissions to allow Systems Manager Automation. I have attached the AWS managed policy “AmazonSSMAutomationRole” (see below) the others are used to allow the SSM Session Connection Manager to connect to EC2 instances that don't have a public IP.
Always remember AWS IAM Permissions make the world go round :-)
Resource Groups
The next thing we need to do is create a Resource Group in “Resource Groups & Tag Editor” under “Management & Governance” we will call this group later on in the setup.
So when you reach the correct page, create a new resource group.
Group Type = Tag based
Resources Types = AWS::EC2::Instance
Tags = auto_start_stop:yes
At this point if you Click the “Preview group resources” button you will notice the table below will populate with instances that match the tag combination.
Just a note that if you just add the Tag “auto_start_stop” with a blank entry when you click the preview it will pick up every instances with the tag and encompass all instances including instances you may not want to be included, so make sure you use the “yes” identifier as well.
Do you want to create a Maintenance Window?
Maintenance Windows are a feature of AWS Systems Manager if you hadn't gathered already. So browse to Systems Manager in the AWS Console, and on the left hand side find “Maintenance Windows” under the “Change Management” section.
Click “Create Maintenance Window”
Fill out the information needed, see my entries below
Name — auto_stop_2100 (for me it makes sense to have the time in to easily identify the window)
Description — will stop EC2 Instances at 2100
Unregistered Targets — Untick this
Schedule — Select CRON/Rate expression
CRON/Rate expression — cron(00 21 ? * MON-FRI *)
Duration — 3
Stop initiating tasks — 1
Schedule timezone — set this to your time zone ( this is especially important if you are looking to turn of resources when no one is using them overnight otherwise it will default to the AWS System Time Zone for the region.
Leave others options blank as they are optional.
This is obviously set for my specific time zone and requirements you will need to adjust your CRON expression and time zone to what you want.
Registering Targets
Now we have created our Maintenance Window we need to register some targets for it to run against.
So click on your maintenance window and click the “Targets” tab
Now click “Register Target”
Give your target a name, the select the radio button “Choose a resource group”.
In the drop down select the resource group we created earlier.
Under “Resource types” select “AWS::EC2::Instance” this is optional though I added it for completeness.
Then click “Register target”
Creating an Automation Task
The final thing we need to do is create an Automation task that will stop the instances.
So click the “Tasks” tab
The select the “Register tasks” drop down and select “Register Automation task”
Give your task a name
In the Automation Document section this is where you select the Automation you wish to run, here is where you see just how powerful Systems Manager is, the document we want is AWS-StopEC2Instance, and Document version we want to set to “Latest version at runtime”
Under the targets section we select the targets we want this automation to run against.
Select the tick box next to the target group we created earlier
The next section is the Input Parameters for the Automation, you will notice that Instance id is a required field, now you might think that you need to put in all the instance ID’s of your instances that you want the automation to run again, and if you like huge admin overhead then this would be the way to go, and I am not ashamed to admit in my first few plays with this that's exactly what I did… however AWS have a great way to get round this, it is called a pseudo parameter , so instead of putting all the instance id’s in we put the parameter
{{RESOURCE_ID}}
in this box. What this does is read the instance ID from the targets and passes it to the SSM Automation process.
The next section is rate control, this is how much you want done at once..
This is personal preference really I have set mine to Percentage and set Concurrency to 80% and an Error Threshold to 40%.
The final thing we need to set is the IAM Service Role, this should be a Role that has the permissions to run Systems Manager Automation and read the Resource Groups, for this demo I am using the same Role I assigned to my EC2 Instances however this wouldn't be best practice outside of a Dev Environment it would be a different role (I am being lazy here).
and that's it, click “Register Automation task”
We now have a Maintenance Window set to run Monday to Friday at 21:00 every night that will stop EC2 Instances with a Tag “auto_start_stop:yes”
Over to you
So as Morpheus said
I’m trying to free your mind, Neo. But I can only show you the door. You’re the one that has to walk through it.
I have shown you how to create a Maintenance Window that Auto Stops EC2 Instances, now its your turn to create the opposite and create one to Auto Start the EC2 Instances…
Good Luck and I Hope you found this useful. This is just one way of doing this there are many others, but rememeber your customer may not grant you full access to all services and features in AWS so you need to be flexible.
*UPDATE — should you wish to skip playing yourself, the link here will take you to my repo where there is terraform code to deploy what you need to run this solution.
Top comments (1)
great article!
and in case you want to decide when to manually reboot, this might help: dev.to/aws-builders/use-aws-stepfu...