Orquesta is a powerful LLM Ops suite designed to manage both public and private LLMs (Large Language Models) from a single source. It offers full transparency on performance and costs while reducing your release cycles from weeks to minutes. Integrating Orquesta into your current setup or a new workflow requires a couple of lines of code, ensuring seamless collaboration and transparency in prompt engineering and prompt management for your team.
With Orquesta, you gain access to several LLM Ops features, enabling your team to:
Collaborate directly across product, engineering, and domain expert teams.
Manage prompts for both public and private LLM models.
Customize and localize prompt variants based on your data model.
Push new versions directly to production and roll back instantly.
Obtain model-specific token and cost estimates.
Gain insights into model-specific costs, performance, and latency.
Gather both quantitative and qualitative end-user feedback.
Experiment in production and gather real-world feedback.
Make decisions grounded in real-world information.
This article guides you through integrating your SaaS with Orquesta and OpenAI using our Python SDK. By the end of the article, you'll know how to set up a prompt in Orquesta, perform prompt engineering, request a prompt variant using the SDK code generator, map the Orquesta response with OpenAI, send a payload to OpenAI, and report the response back to Orquesta for observability and monitoring.
Prerequisites
For you to be able to follow along in this tutorial, you will need the following:
Jupyter Notebook (or any IDE of your choice).
An OpenAI account, you can sign up here.
Orquesta Python SDK.
Integration
Follow these steps to integrate the Python SDK with OpenAI.
Step 1 - Install SDK and create a client instance
pip install orquesta-sdk
To create a client instance, you need to have access to the Orquesta API key, which can be found in your workspace https://my.orquesta.dev/<workspace-name>/settings/developers
.
Copy it and add the following code to your notebook to initialize the Orquesta client.
from orquesta_sdk import OrquestaClient, OrquestaClientOptions
api_key = "<ORQUESTA_API_KEY>"
options = OrquestaClientOptions(
api_key=api_key,
ttl=3600
)
client = OrquestaClient(options)
The OrquestaClient
and the OrquestaClientOptions
classes which are already defined in the orquesta_sdk
module is imported. The API key, which is used for authentication, is assigned to the variable api_key
, you can either add the API key this way, or you can add it using the environment variable; api_key = os.environ.get("ORQUESTA_API_KEY", "__API_KEY__")
. The instance of the OrquestaClientOptions
class is created and configured with the api_key
and the ttl
(Time to Live) in seconds for the local cache; by default, it is 3600 seconds (1 hour).
Finally, an instance of the OrquestaClient
class is created and initialized with the previously configured options object. This client
instance can now interact with the Orquesta service using the provided API key for authentication.
Step 2 - Set up a prompt and its variants
After successfully connecting to Orquesta, you continue within the Orquesta Admin Panel to set up your prompt and variants. A prompt is the specific task you provide to LLM, and you'll get a response that is the output of the language model accomplishing the task. To create a prompt, click on Add Prompt and the prompt key.
The image above represents the Prompt Studio in Orquesta, where:
The name of the prompt variant.
Notes, this is where you drop notes for other collaborators.
Since we are working on a chat prompt, this is where you manage the System-User-Assistant messages.
Prompt variables provide flexibility in your prompts.
Prompt tokens and cost are estimated based on the model selected.
Model Selector.
Click Save once you are done.
Step 3 - Request a variant from Orquesta using the SDK
Our flexible configuration matrix allows you to define multiple prompt variants based on custom context. This allows you to work with different prompts and hyperparameters with, for example, environment, country, locale or user segment. The Code Snippet Generator makes it easy to request a prompt variant.
Once you open the Code Snippet Generator, you can use the generated snippet to consume your first prompt from your application.
Step 4 - Map the Orquesta response to OpenAI using a Helper
Map the Orquesta response to OpenAI's API using the Helper functions. Each LLM provider has its own Helper function in Orquesta.
For OpenAI, use the Helper: orquesta_openai_parameters_mapper
or Class: OrquestaOpenAIPromptParameters
.
import os
import time
import openai
from orquesta_sdk import OrquestaClient, OrquestaClientOptions
from orquesta_sdk.helpers import orquesta_openai_parameters_mapper
from orquesta_sdk.prompts import OrquestaPromptMetrics
openai.api_key = "<OPENAI_API_KEY>"
Paste the code copied from the Code Snippet Generator here.
# Query the prompt from Orquesta
prompt = client.prompts.query(
key="customer-support-chat",
context={
"environments": ["test"],
"country": ["BEL", "NLD"],
"locale": ["en"],
"user-segment": ["b2c"]
},
variables={ "customer_name": "John" },
metadata={"user_id":45515}
)
if prompt.has_error:
print("There was an error while fetching the prompt")
You can now send the payload to OpenAI and receive the response.
# Start time of the completion request
start_time = time.time()
print(f'Start time: {start_time}')
completion = openai.ChatCompletion.create(
**orquesta_openai_parameters_mapper(prompt.value),
model=prompt.value.get("model"),
messages=prompt.value.get("messages"),
)
# End time of the completion request
end_time = time.time()
print(f'End time: {end_time}')
# Calculate the difference (latency) in milliseconds
latency = (end_time - start_time) * 1000
print(f'Latency is: {latency}')
Step 5 - Report analytics back to Orquesta
After each query, Orquesta generates a log with a Trace ID. Using the add_metrics()
method, you can add additional information, such as the llm_response, metadata, latency, and economics.
# Report the metrics back to Orquesta
metrics = OrquestaPromptMetrics(
economics={
"total_tokens": completion.usage.get("total_tokens"),
"completion_tokens": completion.usage.get("completion_tokens"),
"prompt_tokens": completion.usage.get("prompt_tokens"),
},
llm_response=completion.choices[0].message.content,
latency=latency,
metadata={
"finish_reason": completion.choices[0].finish_reason,
},
)
prompt.add_metrics(metrics=metrics)
Conclusion
And that is it, and you have integrated Orquesta with OpenAI using the Python SDK! You can easily design, test, and manage prompts for all your LLM providers using Orquesta by simply leveraging its power tools with real-time logs, versioning, code snippets, and a playground for your prompts.
Orquesta supports other SDKs such as Angular, Node.js, React, and TypeScript. Refer to our documentation for more information.
Full Code Example
import os
import time
import openai
from orquesta_sdk import OrquestaClient, OrquestaClientOptions
from orquesta_sdk.helpers import orquesta_openai_parameters_mapper
from orquesta_sdk.prompts import OrquestaPromptMetrics
openai.api_key = "<OPENAI_API_KEY>"
# Initialize Orquesta client
api_key = "<ORQUESTA_API_KEY>"
options = OrquestaClientOptions(
api_key=api_key,
ttl=3600
)
client = OrquestaClient(options)
# Query the prompt from Orquesta
prompt = client.prompts.query(
key="customer-support-chat",
context={
"environments": ["test"],
"country": ["BEL", "NLD"],
"locale": ["en"],
"user-segment": ["b2c"]
},
variables={
"customer_name": "John"
},
metadata={"user_id":45515}
)
if prompt.has_error:
print("There was an error while fetching the prompt")
# Start time of the completion request
start_time = time.time()
print(f'Start time: {start_time}')
completion = openai.ChatCompletion.create(
**orquesta_openai_parameters_mapper(prompt.value),
model=prompt.value.get("model"),
messages=prompt.value.get("messages"),
)
# End time of the completion request
end_time = time.time()
print(f'End time: {end_time}')
# Calculate the difference (latency) in milliseconds
latency = (end_time - start_time) * 1000
print(f'Latency is: {latency}')
# Report the metrics back to Orquesta
metrics = OrquestaPromptMetrics(
economics={
"total_tokens": completion.usage.get("total_tokens"),
"completion_tokens": completion.usage.get("completion_tokens"),
"prompt_tokens": completion.usage.get("prompt_tokens"),
},
llm_response=completion.choices[0].message.content,
latency=latency,
metadata={
"finish_reason": completion.choices[0].finish_reason,
},
)
prompt.add_metrics(metrics=metrics)
Top comments (0)