It took me a while to figure out how to get a python flask server and web client to support streaming OpenAI completions so I figured I'd share.
from flask import Flask, stream_template, request, Response
import openai
from dotenv import load_dotenv
import os
load_dotenv()
# put these values in an .env file parallel to this file
openai.organization = os.environ.get("OPENAI_ORG")
openai.api_key = os.environ.get('OPENAI_API_KEY')
def send_messages(messages):
return openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=messages,
stream=True
)
app = Flask(__name__)
@app.route('/chat', methods=['GET', 'POST'])
def chat():
if request.method == 'POST':
messages = request.json['messages']
def event_stream():
for line in send_messages(messages=messages):
print(line)
text = line.choices[0].delta.get('content', '')
if len(text):
yield text
return Response(event_stream(), mimetype='text/event-stream')
else:
return stream_template('./chat.html')
if __name__ == '__main__':
app.run()
chat.html
<!DOCTYPE html>
<html>
<head>
<title>Chat</title>
</head>
<body>
<h1>Chat</h1>
<form id="chat-form">
<label for="message">Message:</label>
<input type="text" id="message" name="message">
<button type="submit">Send</button>
</form>
<div id="chat-log"></div>
<script src="{{ url_for('static', filename='chat.js') }}">
</script>
</body>
</html>
You can't use EventSource
for this if you want to use POST method, this uses fetch API instead.
chat.js
const form = document.querySelector("#chat-form");
const chatlog = document.querySelector("#chat-log");
form.addEventListener("submit", async (event) => {
event.preventDefault();
// Get the user's message from the form
const message = form.elements.message.value;
// Send a request to the Flask server with the user's message
const response = await fetch("/chat", {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({ messages: [{ role: "user", content: message }] }),
});
// Create a new TextDecoder to decode the streamed response text
const decoder = new TextDecoder();
// Set up a new ReadableStream to read the response body
const reader = response.body.getReader();
let chunks = "";
// Read the response stream as chunks and append them to the chat log
while (true) {
const { done, value } = await reader.read();
if (done) break;
chunks += decoder.decode(value);
chatlog.innerHTML = chunks;
}
});
Obviously this is not an optimal chat user experience but it'll get you started.
Top comments (3)
I tried this and it worked great running on localhost, but when I tried deploying it to my makeshift webserver (rpi / nginx) it stopped streaming and waited for the response stream to finish before the message appeared. Any idea why?
edit: I needed to add 'X-Accel-Buffering' = 'no' to response headers, changing the code to
response = Response(event_stream(), mimetype='text/event-stream')
response.headers['X-Accel-Buffering'] = 'no'
return response
Exactly what I was looking for. Also dig the way you write code. E.g. one app route with an if/else rather than one for POSTs and another just to display the template. Nice.
Just going for the most concise article :)