Generative AI has been the hottest technology trend from an year enterprises to startups. Almost every brand is incorporating GenAI and Large Language Models (LLM) in their solutions.
However, an under explored part of Generative AI is the managing resiliency. It is easy to build on a API provided by a LLM vendor like OpenAI, however it is hard to manage if the vendor comes across a service disruption etc.
In this blog, we will take a look at how you can create a resilient generative ai application that switches between GPT-4o to Gemini Flash by using open-source ai-gateway's fallback feature.
Before that..
What is a fallback?
In a scenario involving APIs, if the active endpoint or server goes down, as part of a fallback strategy for high availability using a load balancer, we configure both active and standby endpoints. When the active endpoint goes down, one of the configured secondary endpoints takes over and continues to serve the incoming traffic.
Why do we need fallbacks?
Basically fallbacks ensure application resiliency in disaster scenario's and help aid in quick recovery.
Note: In many cases, during recovery a loss of incoming traffic (such as HTTP requests) is a common phenomena.
Why fallbacks in LLMs?
In the context of Generative AI, having a fallback strategy is crucial to manage resiliency. A traditional server resiliency scenario is no different than in the case of Generative AI. It would imply if the active LLM becomes unavailable, one of the configured secondary LLM takes over and continues to serve incoming requests, thereby maintaining uninterrupted solution experience for users.
Challenges in creating fallbacks for LLMs
While fallbacks
in concept for LLMs looks very similar to managing the server resiliency, in reality, due to the growing ecosystem and multiple standards, new levers to change the outputs etc., it is harder to simply switch over and get similar output quality and experience.
Moreover, the amount of custom logic and effort that is needed to add this functionality with changing landscape of LLMs and LLM providers will be hard for someone whose core business is not managing LLMs.
Using open-source AI Gateway to implement fallbacks
To demonstrate fallbacks feature, we'll be building a sample Node.js
application and integrating Google's Gemini. We'll be using the OpenAI SDK and Portkey's open-source AI Gateway to demonstrate the fallback to GPT.
If you are new to AI Gateway, you can refer our previous post to learn features of open-source AI Gateway.
Creating Node.js Project
To start our project, we need to set up a Node.js environment. So, let's create a node project. Below command will initialize a new Node.js project.
npm init
Install Dependencies
Let's install the required dependencies of our project.
npm install express body-parser dotenv
This will install the following packages:
express: a popular web framework for Node.js
body-parser: middleware for parsing request bodies
portkey-ai: a package that enables us for accessing the multiple ai models
dotenv: loads environment variables from a .env file
Setting Environment Variables
Next, we'll create a .env
folder to securely store our sensitive information such as API credentials.
//.env
GEMINI_API_KEY=YOUR_API_KEY
PORT=3000
Get API Key
Before using Gemini, we need to set up API credentials from Google Developers Console. For that, We need to sign up on our Google account and create an API key.
Once signed in, Go to Google AI Studio.
Click on the Create API
key button. It will generate a unique API Key
that we'll use to authenticate requests to the Google Generative AI API.
After getting the API key we'll update the .env
file with our API key.
Create Express Server
Let's create a index.js
file in the root directory and set up a basic express server.
const express = require("express");
const dotenv = require("dotenv");
dotenv.config();
const app = express();
const port = process.env.PORT;
app.get("/", (req, res) => {
res.send("Hello World");
});
app.listen(port, () => {
console.log(`Server running on port ${port}`);
});
Here, We're using the "dotenv" package to access the PORT number from the .env
file.
At the top of the project, we're loading environment variables using dotenv.config()
to make it accessible throughout the file.
Executing the project
In this step, we'll add a start script to the package.json
file to easily run our project.
So, Add the following script to the package.json file.
"scripts": {
"start": "node index.js"
}
The package.json file should look like below:
Let's run the project using the following command:
npm run start
Above command will start the Express server. Now if we go to this URL http://localhost:3000 we'll get this:
The Project setup is now done. Next up, we'll adding Gemini to our project in the next section.
Adding Google Gemini
Set up Route
To add the Gemini to our project, We'll create a /generate
route where we'll communicate with the Gemini AI.
For that add the following code into the index.js
file.
const bodyParser = require("body-parser");
const { generateResponse } = require("./controllers/index.js");
//middleware to parse the body content to JSON
app.use(bodyParser.json());
app.post("/generate", generateResponse);
Here, We're using a body-parser
middleware to parse the content into a JSON format.
Configure OpenAI Client with Portkey Gateway
Let's create a controller folder and create a index.js
file within it.
Here, we will create a new controller function to handle the generated route declared in the above code.
First, we'll Import the Required packages and API keys that we'll be using.
Note: Portkey adheres to OpenAI API compatibility. Using Porktey AI further enables you to communicate to any LLM using our universal API feature.
import OpenAI from 'openai';
import dotenv from "dotenv";
import { createHeaders } from 'portkey-ai'
dotenv.config();
const GEMINIKEY = process.env.GEMINI_API_KEY;
Then, we'll instantiate our OpenAI client and pass the relevant provider details.
const gateway = new OpenAI({
apiKey: GEMINIKEY,
baseURL: "http://localhost:8787/v1",
defaultHeaders: createHeaders({
provider: "google",
})
})
Note: To integrate the Portkey gateway with OpenAI, We have
Set the
baseURL
to the Portkey Gateway URLIncluded Portkey-specific headers such as
provider
and others.
Implement Controller Function
Now, we'll write a controller function generateResponse
to handle the generation route (/generate) and generate a response to User requests.
export const generateResponse = async (req, res) => {
try {
const { prompt } = req.body;
const completion = await gateway.chat.completions.create({
messages: [{ role: "user", content: prompt}],
model: 'gemini-1.5-flash-latest',
});
const text = completion.choices[0].message.content;
res.send({ response: text });
} catch (err) {
console.error(err);
res.status(500).json({ message: "Internal server error" });
}
};
Here we are taking the prompt from the request body and generating a response based on the prompt using the gateway.chat.completions.create
method.
Run Gateway Locally
To run the gateway locally, run the following command in your terminal
npx @portkey-ai/gateway
This will spin up the gateway locally and it’s running on http://localhost:8787/
Run the project
Now, we have to check if our app is working correctly or not!
Let's run our project using:
npm run start
Validating Gemini's Response
Next, we'll make a Post request using Postman to validate our controller function.
We'll send a POST request to http://localhost:3000/generate with the following JSON payload:
{
"prompt": "Are you an OpenAI model?"
}
And We got our response:
{
"response": "I am a large language model, trained by Google. \n"
}
Great! Our Gemini AI integration is Working as expected!
Adding Fallback using AI Gateway
Till now, project is working as expected. But what if Gemini's API doesn't respond?
As discussed earlier, a resilient app yields better customer experience.
That's where Portkey's AI Gateway shines. It has a fallback feature that seamlessly switch between them based on their performance or availability.
If the primary LLM fails to respond or encounters an error, AI Gateway will automatically fallback to the next LLM in the list, ensuring our application's robustness and reliability.
Now, let's add fallback feature to our project!
Create Portkey Configs
First, we'll create a Portkey configuration to define routing rules for all the requests coming to our gateway. For that, Add the following Code:
const configObj = {
"strategy": {
"mode": "fallback"
},
"targets": [
{
"provider": "google",
"api_key": GEMINIKEY // Add your Gemini API Key
},
{
"provider": "openai",
"api_key": OpenAIKEY,
"override_params": {
"model": "gpt-4o"
}
}
]
}
This config will fallback to OpenAI's gpt-4o
if Google's gemini-1.5-flash-latest
fails.
Update OpenAI Client
To add the portkey config in our OpenAI client, we'll simply add the config id to the defaultHeaders object.
const gateway = new OpenAI({
apiKey: GEMINIKEY,
baseURL: "http://localhost:8787/v1",
defaultHeaders: createHeaders({
provider: "google",
config: configObj
})
})
Note: If we want to attach the configuration to only a few requests instead of modifying the client, we can send it in the request headers for OpenAI. For example:
let reqHeaders = createHeaders({config: configObj}); openai.chat.completions.create({ messages: [{role: "user", content: "Say this is a test"}], model: "gpt-3.5-turbo" }, {headers: reqHeaders})
Also, If you have a default configuration set in the client, but also include a configuration in a specific request, the request-specific configuration will take precedence and replace the default config for that particular request.
That's it! Our Setup is done.
Testing the Fallback
To see if our fallback feature is working or not, we'll remove the the Gemini API key from the .env file. And, We'll send a POST request to http://localhost:3000/generate with the following JSON payload:
{
"prompt": "Are you an OpenAI model?"
}
And We'll get this response:
{
"response": "Yes, I am powered by the OpenAI text generation model known as GPT-4o."
}
Awesome! This Means Our Fallback feature is Working perfectly!
As we have deleted the Gemini API key, the First request failed, and Portkey Automatically detected that and automatically fallback to the next LLM in the list that is OpenAI's gpt-3.5-turbo
.
Conclusion
In this article, we have explored how to integrate Gemini in our node.js application, also how to leverage AI Gateway’s fallback feature when Gemini is not available.
If you want to know more about Portkey's AI Gateway and give us a star, join our LLMs in Production Discord to hear more about what other AI Engineers are building.
Happy Building!
Top comments (0)