DEV Community

Cover image for Reduce AWS Lambda Cold Starts in .NET
jordan gonzález for AWS Community Builders

Posted on

Reduce AWS Lambda Cold Starts in .NET

Performance is a key concern for engineers, as it directly impacts spending, user experience, scalability, and reliability. Initialization time falls within this spectrum when working with any Serverless environments.

When executed for the first time, or after a long break, a Serverless workload requires provisioning resources. This is what we call a cold start. Initialization duration, the time spent initializing code and runtime, is part of the cold start.

Here’s a great developer guide on how AWS defines cold starts for AWS Lambda.


A few months ago, I was tasked with reducing Datadog’s .NET tracing overhead. The first thing that came to my mind was what my colleague Rey Abolofia did for Python in:

So I thought, since .NET is a framework which requires C# to be compiled, there has to be a way to reduce the amount of work being done during runtime. After reading more about how .NET compilation works, I set out to compile our tracer ahead of time.

As a result, I achieved a 25% performance improvement in cold starts. Let me explain how I accomplished this.

.NET Compilation

Understanding how the .NET framework works is crucial, it will allow you to improve how your code is delivered. Many engineers overlook this aspect, mainly because they prioritize shipping code – but investing time in understanding how it works will pay dividends over time, as there are optimizations one can miss without this crucial knowledge.

Default Compilation

.NET applications are compiled into a language-agnostic Common Intermediate Language (CIL). Compiled code is stored in assemblies: files with a .dll or .exe file extension.

During runtime, the Common Language Runtime (CLR) is in charge of taking the assemblies and using a Just-In-Time (JIT) compiler to turn the Intermediate Language code into native code for the local machine to run. [1]

.NET Compilation explained

So, even though .NET applications are required to be compiled, there’s another compilation step during runtime – which requires compute power, and which subsequently translates into execution time.

There are two main techniques on how we can improve our Cold Starts, but the main idea behind them is Ahead-Of-Time (AOT) compilation. One is ReadyToRun, and the other one is Native AOT.


ReadyToRun

ReadyToRun (R2R) is a form of ahead-of-time (AOT) compilation. The binaries produced improve the startup performance by reducing the amount of work that the JIT compiler needs to do as our application loads. [2]

The main disadvantage is that R2R binaries are much larger because they contain both IL code and the native version of the same code.

Native AOT

Native AOT compilation produces an app that has been ahead-of-time compiled into native code for a specific architecture. Therefore, these applications will not use the JIT compiler during runtime. Not only will they have a faster startup time, but also a smaller memory footprint.

Another great advantage is that these binaries do not require the local machine to have the .NET runtime installed at all. Although a limitation is that you cannot cross-compile. [3]


Choosing a Compilation Strategy

The easy pick would be to compile with Native AOT all the time, right? Because it doesn’t require the .NET runtime, nor a JIT compiler. Unfortunately, there will be scenarios which you simply cannot do. [4]

For example, if you are doing dynamic loading, through Assembly.LoadFile, or runtime code generation using reflection, with System.Reflection.Emit, when compiling to Native AOT, you will find warnings during the process and your app will behave unexpectedly.

In my specific task, I couldn’t take advantage of Native AOT compilation because the Datadog .NET tracer uses dynamic loading and reflection. Due to the amount of required changes needed for this to work, I had to settle with R2R until we update the tracer.

How to?

R2R

To enable ReadyToRun compilation, simply add the following property in your .csproj:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <!-- ...other properties -->
    <PublishReadyToRun>true</PublishReadyToRun>
Enter fullscreen mode Exit fullscreen mode

Native AOT

For Native AOT compilation, you can set the property, also in your .csproj:

<PublishAot>true</PublishAot>
Enter fullscreen mode Exit fullscreen mode

To ensure that your application is Native AOT compatible, you can set this property in the same file:

<IsAotCompatible>true</IsAotCompatible>
Enter fullscreen mode Exit fullscreen mode

Benchmarks

To get data around cold starts, the methodology I used is simple: force a new sandbox for AWS Lambda every certain point in time, and emit telemetry by using an observability tool.

If you want a quick project to quick start and try it out for yourself, go to my example repository which uses the AWS CDK to benchmark a Hello World app with these strategies.

Hello World

For a simple AWS Lambda serializing an API Gateway HTTP payload and returning it, we see almost no benefits when using R2R, at around ~10ms removed. But when compiling to Native AOT, we can see an improvement of 75%, with around ~400ms being saved.

Graphic comparing .NET compilation methods for Arm64 where Native AOT is considerably faster than R2R and the default.

Due to lack of cross-compilation, I couldn't show the data for x86_64 for this test.

Graphic showing .NET compilations with R2R and default only for x86_64 architectures

Datadog Tracer

For an immense codebase like the Datadog .NET tracer, publishing a release with ReadyToRun enabled improved positively the performance, as said before, a 25% cut during initialization.

Graphic showing .NET compilations with R2R and default only for the Datadog Tracer

This code is publicly available, feel free to check it out in DataDog/dd-trace-dotnet#5962.

[build] Build tracer with ReadyToRun #5962

Summary of changes

Allows tracer publishing to be compiled with ReadyToRun to improve Serverless workloads init duration.

Reason for change

It has showcased a 500ms init duration improvement for AWS Lambda. Potentially could be used for other workloads in the future.

Implementation details

Followed #4573 and ReadyToRun docs.

Test coverage

  • TBD
  • Tested manually in AWS Lambda.

Other details

Increases tracer size by 3x.

Summary

In general, understanding how the compiler works will open a lot of doors for you to become a better engineer, and give you the foundational knowledge to think of ways to improve your applications’ performance.

The clear benefit of applying this in Serverless workloads is that your applications will be able to serve faster, and save money at the same time.

For more improvements, like stripping and trimming, I'd recommend deep diving into the referenced content and the AWS developer guide to compile .NET into Native AOT.


🇲🇽 This post is also available in Spanish in my personal blog

Reduce Cold Starts en .NET para AWS Lambda | Jordan González – Blog

Aprende como mejorar drasticamente el rendimiento de tus aplicaciones .NET durante un inicio en frío en AWS Lambda, o cualquier otro entorno sin servidor.

favicon jordangonzalez.dev

References

Thanks to Lucas Pimentel, who explained to me that this was possible.

[1] Microsoft. (2024). What is .NET Framework: Architecture of .NET Framework. Microsoft Learn.

[2] davidwrighton, gewarren, & Miskelly. (2022, June). ReadyToRun Compilation. Microsoft Learn.

[3] LakshanF et al. (2024, October 15). Native AOT Deployment. Microsoft Learn.

[4] stevewhims & mattwojo (2022, October). .NET Native and compilation. Microsoft Learn.

Top comments (0)