DEV Community

Cover image for Serverless or AKS in an OpenAI solution: let's go against the tide
Tommaso Stocchi
Tommaso Stocchi

Posted on

Serverless or AKS in an OpenAI solution: let's go against the tide

Working as a Cloud Solution Architect, one of the most common scenarios encountered in the world ofserverless migration to the enterprise. Usually, a serverless architecture is built to support a PoC or pilot project, and then migrated to enterprise-level environments such as Azure App Service or Azure Kubernetes Service.
Does it make sense to consider a reverse path? As always, when it comes to architectures, the answer depends on the scenario we are considering.

Retrieval Augmented Generation

Let's consider an architecture based on OpenAI as a scenario, specifically an implementation of Retrieval Augmented Generation. By Retrieval Augmented Generation (or RAG) we refer to an architecture that allows us to implement a "chat with your data" project. We use a Large Language Model (LLM) to generate responses based on our specific data.
The steps to create this architecture are simple: we first need to upload our data (simple pdf files, for example), divide them into smaller text portions, and for each portion, calculate a numerical vector (embedding) and save it in a vector database. This is to ensure that based on the question asked by the user, we can conduct vector search starting from the question, understand in which indexed text portion the answer to the question is located, and provide that text portion to the LLM, in our case ChatGPT, along with the question itself so that the model can give us the answer.
Semantic search is important in this case because we need the user interaction to be as "human-like" as possible. We need to retrieve one or more paragraphs that may contain an answer to a question, so a simple word-by-word check of the paragraphs would not be sufficient. For example, my paragraph might say, "to obtain an identity card at the Como municipality, you need to apply at the police station, bringing with you [...]”, and the question might be asked as, "How to renew an identity document." The difference is minimal, but a word-by-word search would not give me a positive result. Vector search, on the other hand, is based on the semantic distance between words: "identity card" and "identity document" are very close together.

On Azure, we have different options to implement the various components of this architecture. In our case, we will use Azure Form Recognizer for dividing the PDF into paragraphs, Azure Search for saving the paragraphs and performing vector searches, Azure OpenAI for computing paragraph vectors and formulating responses, and finally an Azure Blob Storage for storing our files. As a code hosting service, if we want to build an architecture capable of scaling to multiple files uploaded simultaneously, we will have four microservices on Azure Kubernetes Service:

  • Frontend: simple applicationBlazor that allows you to upload PDFs and ask questions
  • Document Processing: API that uses Form Recognizer to divide the uploaded PDF into its paragraphs
  • Knowledge Processing: API that takes a paragraph and calculates its vector to be saved in memory
  • Search Knowledge: API that receives a user's question, calculates its embedding and searches the vector database, and then invokes Azure OpenAI to generate the answer

Document Processing and Knowledge Processing are the two microservices that will need to scale up to cope with multiple files uploaded together and the resulting number of paragraphs to save. To achieve this, the two services are not directly invoked through API calls, but are subscribed to a queue service, in our case Azure Service Bus.

Architectural design of a RAG infrastructure. The application runs on AKS leveraging Form Recognizer, Service Bus, Azure Blob Storage, Azure Search, and Azure OpenAI.

Why do we use AKS in this architecture? There could be several reasons behind this choice: corporate culture, the need for high scalability capacity,possibility of customizing the cluster or simply the assurance of being able to migrate from one installation to another with ease.

Communication among microservices, subscription to Service Bus, and reading secrets from Azure Key Vault are entrusted to Dapr. The authorization for the various resources involved is instead managed, where possible, through Azure Identity, leveraging “User-assigned Identities”.

The repository of reference for this architecture is available here.

How much does it cost me?

So now we have our active solution and we can chat with our data. But how much does it cost to keep it active?

Screenshot of the web application, where a question is asked and the answer is generated based on the documents saved in Blob Storage, shown on the right.

Assuming a cluster consisting of three B2alsv2 virtual machines (2 vCPUs, 4GB RAM), the monthly cost would be around $94.

After the initial upload of our so-called knowledge base (the set of files we want to query until we release the solution), as we move forward in time, we will need to upload a few files per month. Regardless of the reasons that prompted us to adopt AKS as a starting point, it might now make sense to migrate to a serverless.

Azure Container Apps to the rescue!

The Azure Container Apps (ACA) come into play. It is an environment that is fully managed by Azure. No node management, infrastructure management, or networking is required. In addition to the significant advantage of having an infrastructure that is fully managed by the cloud provider, ACA also offers a consumption-based billing plan, i.e. serverless. To make a comparison, while the cost plan for AKS is similar to that of a Virtual Machine Scale Set, ACA's plan is similar to that of AzureFunction consumption. Instances are not paid for the time they remain active, but for the number of requests and the execution time used to process each request.

Assuming 1 million monthly requests with 20 concurrent requests managed for each Container App and a total execution time of 2 seconds per request, the monthly cost for a Container App (4vCPUs, 8GB RAM) would be around $6.60.

For details on the billing plan of Azure Container Apps please refer to this link.

How do Azure Container Apps work from a resource perspective? The first thing to do is to create a Container Apps Environment. This resource has two main functions: managing resource allocation (and therefore billing) for our apps and configuring DNS, network, components. Dapr, certificates. You can think of the Container Apps Environment as the App Service Plan for Web Apps. Every Web App must have an App Service Plan that manages scaling based on available resources, and each ASP can have one or more Web Apps to allocate those resources to. The more the base resources assigned to this plan grow, the higher the cost. However, the two resources differ in one way. While an App Service Plan has a fixed cost that depends on how long it remains on and the chosen tier, a Container Apps Environment has a cost dependent on the tier chosen multipled by the number of requests processed per month.
To make a comparison with AKS, we can see the Container Apps Environment as the namespace of Kubernetes and the individual Azure Container App as the deployments.

In a scenario like the one described, where after the initial intensive use we expect to have a low number of monthly requests for our product, the savings generated by a migration choice of this kind are significant.

What are the operational implications of this migration? If we think about moving a REST API from a Web App to Azure, we know we will have to rewrite part of the code. On the other hand, when it comes to migrating a workload from AKS to ACA, none of this is required. We just need to create an Azure Container App and point it to the Docker image that needs to be hosted. For instance, we know we will have to rewrite part of the code. When it comes to migrating a workload from AKS to ACA, none of this is required. We just need to create an Azure Container App and point it to the Docker image that needs to be hosted. User-Assigned Identity that were already used by AKS can also be assigned to the same Container Apps.

Dapr is natively supported in Azure Container Apps and the same User-Assigned Identity that were already used by AKS can also be assigned to the same Container Apps.

Screenshot of the Dapr components configuration page in Azure Container Apps Environment, showing two components: keyvault and servicebus, with the details of the service bus. The details show how servicebus uses the connection string retrieved via the keyvault component and is assigned to four Container Apps.

Screenshot of the Identity configuration page of an Azure Container App. The details show the assignment of a

To present one of our Apps externally, we just need to check the box in the Ingress section.

Screenshot of the Ingress configuration page of an Azure Container App. The Ingress option is marked as

Conclusions

We have seen how to achieve an internal namespace of Kubernetes on a completely serverless resource, greatly reducing costs, and how this trend of migration can make sense. However, we can also go further: we can ask ourselves if, in certain scenarios, it makes sense to start with a hosting environment rather than another. The motivations that lead us to choose one hosting mode over another are various and complex, but often when we talk about microservices we are driven by habit to immediately and only talk about Kubernetes. However, it is often necessary to invest in training courses for both developers and operations departments to approach solutions of this type with my clients on the world of Kubernetes, both for the client himself and for developers. The paradigm of considering the many serverless solutions on the market can lead us to great advantages. Therefore, it is worth knowing them and taking them into consideration.

Top comments (0)