AI is the future, and as a software engineer, it’s the hottest field to get into. Leveraging LLMs in your code enables you to build smarter applications that handle complex tasks like real-time sentiment analysis or interpreting user-generated content. Integrating LLMs makes your software more responsive and capable, enhancing user experiences and automation.
This post is an introduction on how to make LLM calls using Python so you can start adding these powerful capabilities to your own code.
We’ll start off by making a chatbot for any character of your choosing. Then, you'll learn how to summarize smaller texts, and even move up to summarizing whole books. Lastly, you'll learn how to re-prompt and analyze results provided by the LLM.
Making our first LLM Request
For the LLM requests, we will be using Groq. If you create an account there, you can use their API and make LLM requests for free.
In order to use Python for these requests, install the Groq python package by running pip install groq
. Then, we'll import it in our code like so:
import os
from groq import Groq
client = Groq(
api_key=os.environ.get("GROQ_API_KEY"),
)
Be sure to set the api key as an environment variable.
A simple LLM request can be made by adding:
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Explain formula 1.",
}
],
model="llama3-8b-8192",
)
print(chat_completion.choices[0].message.content)
In this case, we ask the LLM to explain what formula 1 is. The output from llama3-8b should be printed once you run the program in your console. You can play around with this and switch the model, as well as the prompt.
Creating a custom Chatbot
Now, let's create a chatbot for any character you like—Mario, for example. Right now, the LLM responds in a neutral/informative tone. However, by giving the LLM a system role, we can make sure it responds just like Mario would, adding personality and fun to the conversation. This sets the tone for interactions, so you’ll get playful and iconic responses like “It’s-a me, Mario!” to keep things engaging.
Let's add a system role to our request:
chat_completion = client.chat.completions.create(
messages=[
{
"role": "system",
"content": "You are a super mario chatbot. Always answer in his style, and create witty responses."
},
{
"role": "user",
"content": "Explain formula 1.",
}
],
model="llama3-8b-8192",
)
print(chat_completion.choices[0].message.content)
Now, the LLM will explain what Formula 1 is in terms of Mario Kart!
System roles are great for other use cases too, like virtual customer support agents, educational tutors, or creative writing helpers, making sure the LLM responds in a way that fits each role’s specific vibe and needs.
Summarizing Text
Now that we know a bit about how to make LLM requests with a specific prompt & system role, let's try and create a summarization tool.
Create a text file in the same directory called article.txt, and paste in any article of your choice. For this step, make sure the article is not too long.
In the code, let's first load in that text.
with open('article.txt', 'r') as file:
content = file.read()
Now, let's create a prompt that we can send to the LLM, telling it to summarize the text in bullet points.
prompt = f"""
Summarize the following text in bullet points for easy reading.
Text:
{content}
"""
We first write out the prompt, giving the LLM clear and concise instructions. Then, we provide the text that it should summarize.
Now, all we have to do is call the LLM with that prompt we just created:
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": prompt,
}
],
model="llama3-8b-8192",
)
print(chat_completion.choices[0].message.content)
Run this, and you should see a bullet point summary of the article you gave the LLM!
Now, try and paste in a really long article, or maybe even a whole book -- like the Metamorphosis by Franz Kafka.
Notice that the LLM comes back with an error. You gave it too much to summarize all at once.
Summarizing a Book
The context window in an LLM refers to the amount of text it can process and remember in a single call. This means while it’s great for summarizing an article in one go, it can't handle a whole book at once because the text exceeds its capacity to take in and generate a coherent response.
So, how do we fix this? We can do so by 'chunking' the book. We split the book into 'chunks' that are manageable for the LLM, and tell it to summarize those. Then, once we have summaries for each of the chunks, we can summarize those summaries into one coherent summary.
You can split the string into chunks like so (be sure to import textwrap):
sections = textwrap.wrap(content, width=7000)
You can change around the width later, and see what you prefer and gives you the best results.
Now that we have all these chunks, let's summarize each of them and save the response inside a variable called answers
.
answers = ""
for i in sections:
print(len(i))
prompt = f"""
You are provided part of a book. Summarize in 3 - 4 bullet points for easy reading.
{i}
"""
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": prompt,
}
],
model="llama3-8b-8192",
)
answers += f"\n {chat_completion.choices[0].message.content}"
If you run this code and print answers
, you should see a long string with bullet point summaries for each 'chunk'/section it created.
Now, all we have to do is use the LLM one more time in order to create one coherent summary using all the section summaries.
prompt2 = f"""
You are provided a list of bullet points summarizing different parts of a book. Please create one coherent summary using those bullet points.
{answers}
"""
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": prompt2,
}
],
model="llama3-8b-8192",
)
print(chat_completion.choices[0].message.content)
Now, when you run the code, you should see one summary of the whole book! Remarkable, right?
Note: Depending on how big the book is, you might have to 'chunk' multiple times/tell the LLM to provide shorter responses. If there are too many 'chunk' summaries, the final summarization prompt might still be too large.
Re-prompting
You might have noticed that, even though we told the LLM to respond with bullet points, for example, it does not always provide the same response. Sometimes, it might add a header or a little explanation. Sometimes, it might just provide the bullet points.
As a programmer, this might make it difficult to sometimes process the results. How do we make sure the LLM provides more consistent answers in a specific format?
Let's make a sentiment analysis tool. We will feed the LLM a sad story, and tell it to come up with a sentiment score from -1 to 1.
Like so:
with open('sad_story.txt', 'r') as file:
content = file.read()
prompt = f"""
Read the below passage carefully, and provide a sentiment score from -1 to 1. -1 should be very sad, 0 should be neutral, and 1 should be very happy.
Return this format:
Sentiment: [float]
Passage:
{content}
"""
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": prompt,
}
],
model="llama3-8b-8192",
)
If you run this multiple times, you can see that the response is not always the format that we specified. However, if we wanted to rely on that format to extract the number and perform further calculations, that is frustrating. Improper handling might cause our program to crash.
Re-prompting is the process of adjusting and refining the input given to an LLM to guide it toward a desired response or format. For validating a format for a sentiment tool that requires the output as "Sentiment: 0.5", you can re-prompt the LLM by tweaking your prompt to clearly instruct the model to return only the sentiment score in that exact format, ensuring consistency in the response.
We can create a function that checks whether the expected format was provided using Regex (so be sure to import regex
).
def validate_response(response):
pattern = r"^Sentiment: (-1(\.0+)?|0(\.\d+)?|1(\.0+)?|-\d\.\d+)$"
return bool(re.match(pattern, response))
Now, after we get the response from the LLM, we can call that function. If the function returns true, then we know we have the correct format. If it returns false, then we know that we should re-prompt the LLM and try again.
if not validate_response(response):
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": f"The response '{response}' was incorrect. Please provide the sentiment score in the correct format: Sentiment: [float]",
}
],
model="llama3-8b-8192",
)
print(chat_completion.choices[0].message.content)
Of course, this is very basic re-prompting. The LLM could still provide the incorrect format in this second LLM call. However, you should have a much higher success rate now of consistently formatted response.
With these tools and techniques, you’re now equipped to integrate LLMs into your Python code and validate outputs effectively. Please feel free to comment with any questions!
If you would like to see the full code, please visit the Github repository.
P.S: This is the blog post version of a workshop I gave to SCU’s ACM chapter.
Top comments (0)