DEV Community

AIRabbit
AIRabbit

Posted on

OpenAI’s New Predicted Outputs is a Game Changer

OpenAI recently introduced a powerful feature called Predicted Outputs that can significantly reduce latency in API responses when much of the output content is predictable. Let’s explore this feature through practical examples.

Understanding Predicted Outputs

When modifying text or code files where only small changes are expected, we can provide a prediction of what we think the output will be. The model can then use this prediction to generate responses faster by reusing parts of our prediction that match its intended output.

For more details, see the official OpenAI documentation.

Real-World Examples

Let’s look at two examples that demonstrate when Predicted Outputs are most and least effective.

Example 1: Minor Style Change

In this example, we simply want to change the background color to green. This is an ideal case for Predicted Outputs since most of the code remains unchanged.

Image description

Results:

  • Normal Completion Time: 14,115 ms
  • Predicted Outputs Time: 4,756 ms
  • Time Savings: 66%
  • Total Completion Tokens: 784
  • Accepted Tokens: 686 (reused, not billed)
  • Completion Tokens Billed: 98 (only rejected tokens)
  • Cost Savings: 88%
  • Tokens per Second (Normal): Approximately 55 tokens/sec
  • Tokens per Second (Predicted Outputs): Approximately 165 tokens/sec

This demonstrates the power of Predicted Outputs when changes are minimal. Most of the original content was reused, resulting in significant time and cost savings, and a substantial increase in tokens processed per second.

Here is the result in Langfuse from another session recorded shortly after the one above. The difference is still significant.

Please note that due to the nature of randomness in token prediction, there is always a difference, so the small difference above may change slightly or even become negative in further predictions.

Example 2: Complete Style Overhaul

In this second example, we completely change the styling and content of the page. This represents a case where Predicted Outputs offers minimal benefit since most content needs to change.

Read more in my Blog Post..

Top comments (0)