DEV Community

Clément MARTINEZ for UpSlide

Posted on

How and why did we improve our API hosting?

Background

UpSlide is a Microsoft 365 add-in that runs with PowerPoint, Excel and Word. At startup, the UpSlide add-in sends requests to our various endpoints to:

  • check the license;

  • perform auto-update;

  • retrieve global settings managed from our “Portal” website;

  • send usage statistics.

These endpoints are critical to ensure the add-in works correctly, which is why we need a stable and secure release process to guarantee their continuous availability.

To host these endpoints, we’ve been using Azure App Services for a long time, organized into three distinct environments:

Develop Testing Production
to deploy and test the code during the sprint to run automated tests in an environment close to the production one (similar data, same performance) to handle UpSlide add-in’s requests

Historically, each environment was materialized by an App Service deployment slot, in which we configured slot settings. These were used to fill the application settings to target the right resources (i.e., the right database, service, or warehouse) based on the environment. Slot settings were actually environment variables.

On release day, the production roll-out process involved:

  1. Deploying a new version on the "testing" slot
  2. Updating the “testing” slot settings
  3. Ensuring everything was working correctly
  4. Updating the “production” slot settings
  5. Swapping the "testing" and "production" slots to complete the deployment

Illustration of our release process

If the new version had any problems, we would restore the previous one.

This solution was functional, but there was still room for improvement.

Challenges with Azure App Services

First, there were still manual operations during the deployment process, which, in my opinion, often results in the introduction of bugs in a release process. Updating deployment slot settings (i.e., environment variables) because of a change in application settings was one of the elements that needed to be automated.

Imagine Alex, a super-happy developer at UpSlide. On release day, Alex manually updates the App Service production slot settings, as application settings were added or modified during the sprint.

He is bound to encounter situations where:

  • he forgets to add or edit slot settings;
  • a value entered in slot settings contains a typo;
  • thinking he's doing the right thing, he corrects a setting, but the correction is incorrect.

In the best-case scenario, these errors have a minor impact on the service. In the worst case, the service goes down.

But it's not over yet. Let's assume these slot settings changes have been carried out correctly: the new version works, but an hour later, we realize that a regression has been introduced. We expect to quickly fix the problem by returning to the previous version, just long enough to understand what's wrong. So Alex figures he can simply swap the "testing" and "production" slots again. But a new problem arises: the slot settings have changed! As these settings stick to slots during a swap operation, Alex has to urgently determine what changes have been made to revert them.

What should be a simple situation becomes a headache, and the stress and urgency can lead Alex to make other mistakes.

So far, our developers have been vigilant enough to avoid any major problems. But we didn't want to rely solely on their vigilance, so we had to prioritize improvements to the release process. We agreed on the following resolutions:

  • environment variable updates must not be done manually;
  • redeploying an older version of our endpoints must be easy and without risks.

This brought us to our new approach: Container Apps!

A new hosting solution: Container Apps

Azure Container Apps provide a serverless environment for running microservices and containerized applications. They feature the concept of 'Revisions', which encapsulate both application code and environment variables, ensuring consistent and reliable deployments.

Let’s go back to the previous situation, this time with container apps. Alex deploys the latest version of our endpoints and the set of environment variables linked to that revision. An hour later, a regression is detected. Alex just needs to reactivate the previous version and redirect traffic. In a matter of seconds, and without the risk of making a mistake, the situation returns to normal, and Alex can focus on fixing the regression.

Automating Environment Variable Updates

One problem was fixed, but we still needed a way to automatically update environment variables.

Our application defines settings provided by its running environment. Since settings can be added or removed over time, environment variables must be modified accordingly. Application settings are modified during development, while environment variables are set on a release day. These steps are usually separated by a few days, so we must remember to replicate all changes. A must-have would be to declare changes to environment variables during development. That way, all changes are prepared and can be performed automatically during the release.

We easily achieved this by adding a file called production.env, in which we declared all environment variable values, being either hard-coded values or references to an existing key vault secret. Here is an example for this file:

APPLICATIONINSIGHTS_CONNECTION_STRING=InstrumentationKey=xxxx-xxx-xxx-xxx-xxxx;IngestionEndpoint=https://applicationinsights.azure.com/;LiveEndpoint=https://monitor.azure.com/
DB_CONNECTION_STRING=secretref:dbconnectionstring
ASPNETCORE_ENVIRONMENT=Production
Enter fullscreen mode Exit fullscreen mode

We then updated the release pipeline to extract values from production.env and use them while deploying a new revision (as the deploy command has a parameter to provide a set of environment variables).

We ended up with a single source of truth for environment variables that:

  • developers can update during development when values need to be added or removed;
  • is reviewed by others before merging;
  • is automatically deployed during the release.

First Deployment

Implementing the plan wasn't without its challenges.

We set up the necessary infrastructure on Azure, including the Registry and Container App, and modify our pipelines accordingly. However, upon our first deployment, we encountered an error explaining Container Apps only support images built in a Linux environment. This is not really an issue as .NET Core is cross-platform, so we simply built our API with Linux. However, another error occurred:

System.Security.Cryptography.CryptographicException: ASN1 corrupted data.
---> System.Formats.Asn1.AsnContentException: The provided data is tagged with 'Application' class value '13', but it should have been 'Universal' class value '16'.

It was triggered by System.Security.Cryptography.X509Certificates, a set of classes we were using to manage our certificates. After some research, we found differences in the way Windows and Linux interpret the certificate value provided.

This issue highlighted the fact that, even though frameworks and packages can be cross-platform, we should still be aware of the differences that may exist in behaviors between each platform. Therefore, we decided to do without this package for the deployed container to start up correctly, as the benefits of using Container Apps justified getting rid of it.

Scaling Challenges

Once everything was set up, we redirected the traffic to our new Container Apps.

However, a new issue emerged: our endpoints experience peak loads at certain times of the day, challenging the automatic scaling capabilities of Container Apps.

Why those peaks?

Most of our users begin their workday by opening Microsoft 365, so UpSlide triggers license checks, auto-updates, and settings retrieval. This behavior results in daily peaks of requests at certain times of the day.
As UpSlide is used in different parts of the world, these peaks occur at different times, as shown in this graph:

Graph representing the request count on our API over a day

We expected Container Apps to handle this load by scaling automatically, based on rules Azure allows us to define. Those rules include predefined criteria, or custom criteria to configure more advanced rules. We initially only used a predefined criteria: the number of concurrent HTTP requests per second. When this value is exceeded, a new revision replica is created.

However, starting a revision takes about one minute, and Azure checks for a scaling need every 30 seconds as a fixed interval. So, by the time the Container App starts the new revisions, a large part of the peak load is already absorbed by the initial revision, and the benefits of scaling are lost.

Illustration of the load compared to the capacity enhanced by the automatic scaling

Currently, we mitigate this by scaling the Container Apps automatically before expected peak times. This scaling is done using a custom cron rule that allows us to specify time slots during which we want to scale, and the number of replicas expected.

Next Steps

Here we are: our Container App works well, but there are still limitations on the packages that can be used in the solution and on automatic scaling. Despite this, our release process has improved and is now more flexible and comfortable for developers.

Looking ahead, we aim to improve the startup speed of our solution to enable the Container App to have better automatic scaling.

Conclusion

Azure also offers other hosting solutions, to be chosen from according to your needs. We opted for Container Apps due to their simplicity, ease of management, and seamless integration with our existing infrastructure. The abstraction of Kubernetes complexities allowed us to focus on application deployment and scalability without the overhead of managing intricate configurations.

As we continue to innovate and adapt, we remain open to exploring new solutions that can further enhance our deployment processes and overall efficiency.

What about you? What do you use to host your APIs?

Special thanks to Fabien SINQUIN and Clément MICHEL for their major contribution to this article 🙏

Top comments (0)