Kerem Nalbant for epilot

Posted on Jul 18, 2024 • Edited on Jul 19, 2024

How We Integrate AI in epilot - Chapter 1: AWS Bedrock & Prompt Engineering

#ai #bedrock #anthropic #promptengineering

Introduction

When we decided to bring AI to epilot, we had so many potential use cases that we first needed to do user research and identify the most repetitive and time consuming tasks. After that, we had to find out what our customers needed the most from these.

Our research showed that users heavily utilized the messaging feature and for some cases when long email threads come into the play, we noticed that some customers were spending an average time of 16 minutes replying to an email, and we knew we could make it better by providing them with a shorter and clearer thread summary.

Enterprise AI Playbook

Work is a bundle of tasks, which are performed towards specific goals.

Tasks are the ‘atomic unit’ of any work done in the enterprise. Tasks may be performed as a human service, or may be performed by software, towards achieving a goal.

Our goal was to reduce that time down to less than a minute, and there were two different tasks being performed by users which we need to perform with AI to achieve our goal. This article addresses the Task: Send emails and the steps to complete this task:

Read and understand the email thread: Help users understand long email threads faster by providing AI-generated summary, next steps and topics
Write an answer: Provide AI-generated suggested answers.

Problem

Our customers often deal with long email threads, requiring a long time to read and answer. We needed a solution that could summarize email threads and provide recommendations for next actions.

Solution

While some problems could be solved with prompt engineering alone, some could be solved with Retrieval-Augmented Generation (RAG).

For generating summaries, we didn't need any external contextual data other than email thread to feed the prompt, so we decided to just go ahead with prompt engineering.

The next task is suggesting AI-generated replies to email threads, where RAG would be really useful. For that, we will use a Vector DB and an embedding model, which I'll write about in the next chapter.

AWS Bedrock

We decided to use AWS Bedrock, as epilot we already make use of AWS in almost every area of our platform. It provides state-of-the-art LLMs from multiple providers and out of the box solutions such as Knowledge Base to achieve RAG easily and Model Evaluation to compare models and prompts with a fancy UI.

AWS Bedrock also ensures the processed data is protected. One of our concerns was where the data would be stored, how it would be processed and whether 3rd parties would be involved.

By default, AWS Bedrock offers zero-retention policy, ensuring that logs, prompts, LLM output and any personal data are not shared with any third parties or model providers. Bedrock also ensures that all processed data remains within the EU region.

GenAI Foundation

GenAI Foundation has a central SQS queue and a handler function, ensures exactly one, concurrent and batch processing while staying within the rate limits of Bedrock.

Integrating GenAI related code and logic to our existing APIs was not a good idea. So, I've decided to create a new monorepo. Here's the reasons:

Separation of Concerns

By creating a separate repository, we maintain a clear separation of concerns. This ensures that the core functionalities of existing APIs remain clean and focused.

Language Flexibility

Since GenAI-related code requires Python, having a separate repository allows us to leverage Python's capabilities without interfering with the TypeScript-based APIs. This separation ensures that each project can use the best-suited language for its specific tasks.

Encapsulation

Encapsulating the GenAI logic within a dedicated repository makes it easier to manage, update, and scale. This also allows engineers with specific expertise in GenAI to work independently of the rest of the system.

Modularity

A modular approach allows for easier testing, maintenance, and deployment of the GenAI features. Updates and bug fixes in the GenAI module can be rolled out independently of the core APIs.

Model Choice & Prompt Engineering

We decided to go with Claude 3 Sonnet because it was the best option for us at the time, given the costs and the complexity of the task.

We are planning to switch to Claude 3.5 Sonnet once it's available in AWS Bedrock, which we're really excited about!

To choose the best model for the task, AWS Bedrock offers Model Evaluation, which lets you easily compare models with each other.
You can also do Prompt Evaluation by creating a dataset and executing it against a single model to compare the results of different prompts.

There are lots of great resources online for prompt engineering, but I want to mention that Anthropic provides really helpful tips in their documentation. I strongly suggest you to take a look, if you haven't already. For me, the most important point was using XML tags. Anthropic mentions that their models produce much better results when used with XML tags.

Prompt engineering is all about experimenting, so we spent some time on optimizing our prompt. I'd love to share an example of prompts with you!

Below you can see some example usages of common prompt engineering techniques such as "Giving a Role to LLM", "Using XML Tags", "Using Examples (Multishot Prompting)", "Being clear and direct".

System Prompt

System prompt is where we give Claude a role, and some clear instructions about the task.

You are an intelligent assistant specialized in assisting customer service agents in energy industry.

Your task is summarizing email conversations between customer service agents and customers, providing a clear and concise overview of the key points by following the instructions provided.

Your summaries will help your colleagues quickly understand the main aspects of each conversation without having to read through the entire email thread.

Your goal is to ensure that the agent can grasp the key points and next steps from your summary alone, making their workflow more efficient and effective.

You are a third person observer and must not provide any personal opinions or make any assumptions about the conversation.

You must use the third-person objective narration. You must report the events that take place without knowing the motivations or thoughts of any of the characters.

User Prompt

Here is the conversation between the customer and the agent:

<Email Thread>
<Subject>
{subject}
</Subject>
<Messages>
{conversation}
</Messages>
</Email Thread>

<General Instructions>
- You must use <Email Thread> tags to identify the email conversation.
...
- You must use only the knowledge provided in the <Email Thread> tags and do not access any other external information or knowledge you already possess.
</General Instructions>

<Language Instructions>
- You must optimize the language for making the summary easy and fast for humans to read.
...
- You must use a professional, respectful and informative tone.
</Language Instructions>

<Reference Instructions>
- You must always refer to the Customer and Agent by their names.
...
</Reference Instructions>

<Summary Instructions>
- You must get relevant quotes to complete the task from the conversation.
...
- You must write the summary in reverse chronological order.
</Summary Instructions>

<Output Instructions>
- You must give your output in JSON format. The JSON object should be valid.
...
- If you will provide quotes or emphasize any part of the conversation, you must use single quotes. e.g. 'quote'.
</Output Instructions>

# Few Shot Prompting
<Example Outputs>
{
  "summary": [
    "John Doe has processed the reimbursement request and informed Jane Doe.",
    "Jane Doe has provided the requested information and is awaiting further instructions from the team member.",
    "John Doe has acknowledged the Jane Doe refund request and has requested additional information to process the refund.",
    "Jane Doe is requesting a refund for a defective PV inverter."
  ],
  "topics": [
    "Refund request for defective product"
  ],
  "next_steps": [
    "If the information is sufficient, John Doe should process the refund and inform Jane Doe of the completion.",
    "John Doe should document the refund process for record-keeping."
  ]
}
...
</Example Outputs>

You must follow the instructions listed in the following tags:
- <General Instructions> for general guide and rules,
- <Language Instructions> for lingual instructions that declares your tone, grammar, and output language,
...
- <Example Output> for example of the output as a reference.

Human-in-the-Loop (HITL)

It's challenging to ship AI features, especially when it's the first one in the company which has thousands of users. We need to build trust, and keep it. To do that, we need to collect as much as feedback we can. Then, evaluate these feedbacks and take actions.

HITL can be applied in different ways in different solutions. In our use case, we utilize it to evaluate feedback and detect hallucinations. Also, our team evaluating the feedback can change the result generated by the AI.

Migration Strategy

Our migration strategy involves both runtime and one-time migration to ensure a smooth transition and boost the adoption for our users.
With this way, we ensure we don't unnecessarily make use of LLMs, and we save costs while ensuring seamless integration.

Rate Limits

Since AWS Bedrock imposes rate limiting, we had to reflect this to our users. In order to offer a fair usage, we set limits according to the pricing tier of our users.

Code Examples

Pay attention to the way I prefill Claude's response to force it to answer only with JSON object.

settings = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1000,
    "system": system_prompt,
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
            ],
        },
        {"role": "assistant", "content": "{"}, # Prefill Claude's response
    ],
    "temperature": temperature,
    "top_p": top_p,
    "top_k": top_k,
}
res = await client.invoke_model(
    modelId=MODEL_ID,
    contentType="application/json",
    accept="application/json",
    body=json.dumps(settings)
)

model_response = json.loads(await res["body"].read())
response_text = model_response["content"][0]["text"]

res_model = LLMResponseModel.model_validate_json("{" + response_text)

Conclusion

By integrating AI into epilot, we have significantly enhanced the capabilities of our platform. This integration not only improves the efficiency of daily tasks, but also accelerates customer support. Furthermore, it's the first step in positioning epilot as a leader in the adoption of advanced AI technologies in the energy sector.

Top comments (2)

Hamid Tanhaei • Dec 21 '24

It was a useful and helpful case study and encouraged me to have a look at Bedrock.
For the rate limits as it's one my challenges now, changing the model and the provider with a load balancing approach might be helpful.
Not sure how Bedrock works in rate limit layer, but in OpenAI there are a couple of models with their own rate limits, means you can change your model, if it works in the provided context.

Well done, you have made a good effort on it.

Kerem Nalbant • Jan 6

Thanks Hamid,

Yes, for some cases changing the model would help, but at the same time it'd cause inconsistent LLM responses across calls. For this case, one thing we could do is making use of different models per our tenants to balance the load.

AWS Launched cross-region inference endpoints for Amazon Bedrock, which doubles your rate limits and we are currently leveraging it, you can also have a look!

DEV Community