Serverless computing has revolutionized the way developers build and deploy applications, offering significant benefits such as reduced operational complexity, automatic scaling, and a pay-as-you-go pricing model.
However, while serverless architectures can help you save on costs, they are not free. So, managing their costs effectively requires careful planning and optimization.
This article explores three key techniques for serverless cost optimization, helping you improve your serverless applications and avoid uneccessary expenses.
Serverless computing: an introduction for developers
Before discussing and presenting the strategies for serverless cost optimization, we want to briefly introduce what is serverless computing and why you may need it.
Introducing serverless computing
Serverless computing is a cloud-native development model that allows developers to build and run applications without managing the infrastructure. In a serverless setup, in fact, cloud service providers automatically allocate and manage servers to execute code in response to events, such as HTTP requests, database changes, or message queue activities. This allows developers to focus only on writing and deploying code, rather than worrying about server provisioning, scaling, and maintenance.
Also, serverless architecture is particularly appealing for several reasons like the following:
Payment model. Serverless offers a true pay-as-you-go model, where you only pay for the compute time you consume. This can lead to significant cost savings, especially for applications with variable or unpredictable workloads. A typical use case implemented nowadays regards the fact that big AI models, like Deep Neural Networks or Large Language Models, need GPUs to be trained. To save costs on GPUs, a solution can be the possibility of using serverless so that you pay-as-you-train the models.
Automatic scaling. Serverless provides automatic scaling, which means your application can handle the variation of loads seamlessly without manual intervention. When demand spikes, the serverless platform automatically scales out; when demand drops, it scales back, ensuring optimal resource usage. This also helps save on costs, since you pay-as-you-use the service, without the need to buy extensive hardware or to pay a monthly fee to a cloud service.
Comparing serverless to other technologies
Serverless advantages can be compared to other methodologies like traditional server-based (virtual machines or dedicated servers), Platform as a Service (PaaS), and containerization:
Traditional server-based models. This solution requires developers to manage the entire stack, from the physical or virtual server to the application code. This includes tasks like OS updates, patching, and capacity planning, which can be time-consuming and prone to errors.
PaaS. These solutions simplify some of the tasks needed with traditional server-based models by providing a managed environment for application deployment, but developers still need to handle aspects like scaling and environment configuration.
Containerization. Containers, often implemented with technologies like Docker and Kubernetes, offer another layer of abstraction by packaging applications and their dependencies into containers. This approach provides greater flexibility and scalability compared to traditional servers and PaaS. However, managing container orchestration, scaling, and networking can still be complex and resource-intensive.
Serverless, on the other hand, abstracts all infrastructure management tasks, allowing developers to deploy individual functions that execute in response to specific triggers. This model reduces operational issues, speeds up development cycles, and improves application resilience by leveraging the cloud provider’s infrastructure. It also integrates seamlessly with other cloud services, enabling the creation of highly scalable, event-driven applications with minimal effort.
So, serverless solutions should be preferred to the other mentioned in cases of:
Variable or unpredictable workloads. Serverless is ideal for applications with workloads that vary significantly or are difficult to predict, thanks to its automatic scaling feature.
Event-driven applications. Applications that are inherently event-driven, such as those responding to HTTP requests, processing files in object storage, reacting to database changes, or stream processing, are well-suited for serverless. The event-driven nature of serverless platforms, in fact, allows functions to execute in response to specific triggers, making it efficient and straightforward to build such applications.
Rapid development and deployment situations. When speed to market is crucial, serverless can accelerate development cycles. By eliminating the need to manage infrastructure, in fact, developers can focus only on writing and deploying code. This may be particularly beneficial for startups or projects requiring rapid iteration and deployment.
So, given the fact that serverless computing can help you save time and money with respect to the other methodologies described, they, anyway, come with their costs. So, let's continue this article by providing three strategies for serverless cost optimization.
Serverless cost optimization strategy 1: optimizing function execution time
One of the most direct ways to reduce serverless costs is by minimizing function execution time which refers to the duration from when a serverless function starts executing until it finishes.
Serverless providers such as AWS Lambda, Azure Functions, and Google Cloud Functions, charge based on the time it takes for the function to execute. The billing is typically calculated in milliseconds, and combined with the memory allocated to the function, determines the overall cost.
Here are some best practices to optimize the function execution time:
Write efficient code. Ensure that your code is optimized for performance by avoiding unnecessary computations, and using efficient algorithms. For example, prefer in-memory operations over database queries where possible.
Asynchronous processing. Utilize asynchronous processing to handle tasks that can be performed in parallel or do not require immediate completion. This can reduce the time your functions spend waiting, thus lowering execution time and costs. For instance, background tasks such as sending emails or processing logs can typically be handled asynchronously.
Memory Allocation. Choose the appropriate memory allocation for your functions. Allocating more memory can sometimes speed up execution due to higher CPU availability, but over-allocating memory leads to higher costs. Use monitoring tools to analyze your functions' performance and adjust memory settings accordingly.
A simple example of making efficient Python code that saves memory usage could be the following:
Inefficient:
def sum_of_squares_inefficient(numbers):
# Use list comprehension inside a sum function
return sum([x * x for x in numbers])
# Example usage
numbers = list(range(1, 10001))
result = sum_of_squares_inefficient(numbers)
print(result)
Efficient:
def sum_of_squares_efficient(numbers):
# Use generator expression inside a sum function
return sum(x * x for x in numbers)
# Example usage
numbers = list(range(1, 10001))
result = sum_of_squares_efficient(numbers)
print(result)
The inefficient version uses a list comprehension inside the sum()
function. This creates an intermediate list in memory, which can be memory-intensive and slow, especially for large lists.
The efficient version uses a generator expression inside the sum()
function. This avoids creating an intermediate list, yielding elements one by one. This approach is more memory efficient and faster for large datasets and leads to a reduction in the execution time.
Serverless cost optimization strategy 2: implementing auto-scaling and scheduled scaling
Auto-scaling is a fundamental feature provided by serverless platforms that automatically adjusts the number of function instances based on demand. However, without proper configuration, auto-scaling can lead to cost overruns.
So, here are some best practices to implement the auto-scaling feature in serverless and save on costs:
-
Demand-based auto-scaling. Set up auto-scaling policies that align with your application's usage patterns. Configure thresholds for scaling up and down based on metrics such as CPU usage, memory usage, or custom application metrics. This ensures that you are only using the resources you need, when you need them. Note that the most known serverless providers grant the possibility of implementing auto-scaling. For example, AWS Lambda provides AWS Auto Scaling, Azure Functions provides Azure Monitor, while Google Cloud Functions can be configured for auto-scaling with the help of Google Cloud Monitoring.
Also, consider the possibility of using concurrency autoscaling. This refers to the automatic adjustment of the number of concurrent executions or instances of a serverless function based on the current demand. This helps ensure that the function can handle incoming requests efficiently without being overwhelmed, while also controlling costs by scaling down when demand is low. While the most known serverless providers grant the possibility of implementing concurrency autoscaling, you can also use proper packages for your serverless projects such as the NPM package.
Scheduled scaling. For applications with predictable traffic patterns, scheduled scaling can be highly effective. By scheduling scaling events to match peak and off-peak times, you can ensure that your application has sufficient resources during high-demand periods while saving costs during low-demand periods. For example, if you know that your application experiences high traffic during business hours, you can schedule additional instances to be available during those times.
Implementation example
NOTE: All the code described in this section is available in this repository.
Let's take AWS as an example to illustrate a simple implementation: we want to deploy a Lambda function on AWS using Semaphore CI.
But before that, you need to install the Python package boto3 - if you haven't done it yet - by typing:
pip install boto3
Now, let's create an example that sets up a demand-based auto-scaling solution by dynamically adjusting the resources allocated to your serverless functions, based on real-time usage metrics in Python (see /function/function.py
in the linked repository).
First of all, define the metrics that reflect your application's performance and load, such as CPU usage or request latency:
import boto3
cloudwatch = boto3.client('cloudwatch')
response = cloudwatch.put_metric_data(
Namespace='MyApp',
MetricData=[
{
'MetricName': 'CPUUsage',
'Dimensions': [
{
'Name': 'FunctionName',
'Value': 'my_lambda_function'
},
],
'Value': 70.0,
'Unit': 'Percent'
},
]
)
Then, create alarms based on these metrics to trigger scaling actions:
response = cloudwatch.put_metric_alarm(
AlarmName='HighCPUUsageAlarm',
MetricName='CPUUsage',
Namespace='MyApp',
Statistic='Average',
Period=300,
EvaluationPeriods=1,
Threshold=75.0,
ComparisonOperator='GreaterThanThreshold',
AlarmActions=[
'arn:aws:autoscaling:us-west-2:123456789012:scalingPolicy:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:autoScalingGroupName/my-asg:policyName/MyScalingPolicy'
]
)
Finally, link your Lambda function with Application Auto Scaling to adjust concurrency based on the CloudWatch alarms.
appscaling = boto3.client('application-autoscaling')
response = appscaling.register_scalable_target(
ServiceNamespace='lambda',
ResourceId='function:my_lambda_function',
ScalableDimension='lambda:function:ProvisionedConcurrency',
MinCapacity=1,
MaxCapacity=10
)
response = appscaling.put_scaling_policy(
PolicyName='MyScalingPolicy',
ServiceNamespace='lambda',
ResourceId='function:my_lambda_function',
ScalableDimension='lambda:function:ProvisionedConcurrency',
PolicyType='TargetTrackingScaling',
TargetTrackingScalingPolicyConfiguration={
'TargetValue': 75.0,
'PredefinedMetricSpecification': {
'PredefinedMetricType': 'LambdaProvisionedConcurrencyUtilization'
},
'ScaleOutCooldown': 60,
'ScaleInCooldown': 60
}
)
NOTE: Here we reported the code as shown, for simplicity. In the repository, the Python code remains the same but we created a function out of it, for obvious reasons (we are deploying a lambda function on AWS...).
We created the function in Python. Now we need to write the code that executes the deployment:
#!/bin/bash
# Define variables
FUNCTION_NAME="my_lambda_function"
ZIP_FILE="function.zip"
HANDLER="function.lambda_handler"
ROLE_ARN="arn:aws:iam::123456789012:role/my-lambda-role"
RUNTIME="python3.8"
TIMEOUT=30
# Go to /function folder
cd function
# Install requirements and pack the Python function
pip install -r requirements.txt -t .
zip -r ../$ZIP_FILE .
# Go to main directory
cd ..
# Verify if lambda function already exists
aws lambda get-function --function-name $FUNCTION_NAME
if [ $? -eq 0 ]; then
echo "Updating existing function..."
aws lambda update-function-code \
--function-name $FUNCTION_NAME \
--zip-file fileb://$ZIP_FILE
else
echo "Creating new function..."
aws lambda create-function \
--function-name $FUNCTION_NAME \
--zip-file fileb://$ZIP_FILE \
--handler $HANDLER \
--runtime $RUNTIME \
--role $ROLE_ARN \
--timeout $TIMEOUT
fi
# Remove zip file after upload
rm $ZIP_FILE
The deploy.sh
bash script does the following:
- Goes into the
function
directory and installs the Python dependencies that are in therequirements.txt
file in the current directory. - Creates a
.zip
file that contains the Python function and the dependencies. - Verifies if the lambda function already exists:
- If exists, it updates the code with the new one contained in the
.zip
file. - If it does not exist, it creates a new Lambda function using the
.zip
file.
- If exists, it updates the code with the new one contained in the
- Removes the
.zip
file after the deployment is ended.
Finally, the file semaphore.yaml
defines the CI/CD pipeline:
version: v1.0
name: Initial Pipeline
agent:
machine:
type: e1-standard-2
os_image: ubuntu2004
blocks:
- name: Install Dependencies
task:
jobs:
- name: Install AWS CLI
commands:
- sudo apt-get update
- sudo apt-get install -y python3-pip
- pip3 install awscli
- pip3 install boto3
prologue:
commands:
- checkout
- name: Deploy to AWS
task:
jobs:
- name: aws_credentials
commands:
- chmod +x deploy.sh
- ./deploy.sh
prologue:
commands:
- checkout
The semaphore.yaml
does the following:
- In the initial part, it specifies a
version
, aname
for the pipeline, a machine type (machine
), and an OS image. - The section
block
(Install Dependencies
):- Downloads the latest version of the code with
checkout
. - Updates the Ubuntu packages.
- Installs
pip
to manage Python packages. - Installs
awscli
andboto3
.
- Downloads the latest version of the code with
- The section
Deploy to AWS
:- Defines the credentials to make the deployment throuh
secrets
. - Deploys the Lambda function with the latest commands (
checkout
, etc...).
- Defines the credentials to make the deployment throuh
Note that, to make the code work, you need to configure the secrets
in Sempahore CI including:
-
AWS_ACCESS_KEY_ID
: this is the ID to access AWS. -
AWS_SECRET_ACCESS_KEY
: this is the secret key to access AWS.
NOTE: To learn more about how to use
yaml
in Semaphore, read the documentation.
Serverless cost optimization strategy 3: monitoring and right-sizing resource usage
Continuous monitoring of serverless functions is another useful way of maintaining cost efficiency in serverless solutions. By regularly reviewing performance metrics and resource usage, in fact, you can make informed decisions about resource allocation and configuration, and make adjustments accordingly.
Here are some best practices to implement as a reference:
Use monitoring tools. Use monitoring tools provided by your serverless platform or third-party solutions to track function performance, execution times, and resource usage. Tools like AWS CloudWatch, Azure Monitor, or Google Cloud Monitoring offer insights into how your functions are performing and where inefficiencies may lie.
-
Analyze metrics. Analyze metrics regularly to identify patterns and anomalies. For example, look for functions with consistently high execution times or memory usage and investigate potential causes. This can help you pinpoint areas where optimizations are needed.
Also, if you work in a CI/CI environment, you can consider using Semaphore as it streamlines issue detection and addresses error-prone tasks and unpredictable tests that could cause sporadic build failures.
Right-size resources. Based on your analyses, right-size your functions to ensure they have the appropriate resources. This might involve reducing memory allocation for functions that do not require it, or splitting larger functions into smaller, more efficient ones. Right-sizing helps avoid over-provisioning and ensures that you are not paying for unused resources.
Conclusions
In this article, we've shown that serverless cost optimization involves a combination of optimizing function execution time, implementing intelligent scaling strategies, and continuously monitoring and right-sizing resource usage.
By adopting these strategies, you can ensure that your serverless applications run efficiently and cost-effectively.
Top comments (0)