If you use EC2, you’re probably already aware of the existence of Spot instances. Spot has existed for years, but like many AWS services, has dramatically changed over time. It’s Amazon’s way of keeping their servers busy by selling spare computing capacity at a 50% to 90% discount - then reclaiming that computing capacity when On-Demand customers need it again.
While the computing environment for Spot has always been the exact same as On-Demand, Amazon has been rolling out more and more integrations of Spot within other AWS services. Here are five ways Amazon has made it easier for your organization to take advantage of the cost savings of Spot instances:
1. EC2 Auto Scaling and EC2 Fleet features a new allocation strategy that can automatically mix in available Spot instances.
First, you configure the pools of Spot instances capable of running your workloads, then select the “lowest price” allocation option. This will spin up capacity only using the lowest-priced Spot instances among your chosen pools. EC2 Fleet also has a “diversified” strategy that will evenly spin up Spot instances among your chosen pools.
2. Batch workloads were always a natural pairing for Spot instances, but it’s even easier now with an option within AWS Batch.
If your job gets interrupted, Batch can automatically resubmit it onto a different Spot or On-Demand instance. With the cheaper compute provided by Spot, organizations are able to dramatically increase their processing power to accomplish more jobs, more quickly. To learn more, take a listen to the AWS Podcast discussion on Spot instances with Batch here.
3. The computing capacity demanded by Machine Learning is going to continue to grow exponentially. This can be cost prohibitive for many businesses, but recently Amazon added Managed Spot Training to SageMaker.
Now SageMaker can automatically checkpoint data when Spot instances are interrupted - and restart the workloads when the capacity becomes available again. The AWS Blog recently summarized how one AI company had saved 70% on ML training runs by taking advantage of the Managed Spot Training.
4. At re:Invent 2019, Amazon announced Fargate Spot for both Elastic Container Services and Elastic Kubernetes Services.
Fargate manages the provisioning and scaling of resources needed to run your containers - and now can do so with Spot to save up to 70%. Some workloads just aren’t fault-tolerant, but if they can restart after an interruption, then Fargate Spot is a way to significantly lower your AWS bill.
5. Amazon has made several changes over the last couple years to make it easier to estimate the costs and service interruptions of Spot:
• AWS recently announced launch events for Spot instances will now be available through the Amazon CloudWatch Events CLI. This will enable customers to better control when certain workflows are launched when a Spot instance becomes available
• Amazon no longer relies on a complicated bidding process to determine Spot pricing. Now your organization either pays the prevailing Spot price if the spare capacity is available - or it doesn’t. This has made the pricing significantly more stable and instead of bidding against other potential AWS customers, you simply tell AWS the maximum amount your organization is willing to pay.
• Engineers can easily research the price savings and interruption averages of every available instance type in every Availability Zone using the Spot Instance Advisor. For most instance types, it is the customer choosing to terminate Spot instances - not AWS - for more than 95% of the time. But for other instance types it can be AWS interrupting service more than 20% of the time, making them potentially unsuitable for certain workloads.
Spot instances may not be the best option for all workloads or business scenarios. But when you need to scale up your computing power at an affordable price, retooling your architecture to rely on Spot could be worth it. Looking to learn more about Spot instances? Here are a few great resources:
• Western Digital used AWS Batch with 40,000 Spot instances to complete 2.5 million HPC tasks in just 8 hours.
• Amiram Shachar’s podcast on SpotInst at Software Engineering Daily.
Top comments (0)