...and available as both a UI chatbot and API.
Yesterday Claude 2 model received an update doubling its context window size to 200K. Here's the official intro by Anthropic. This happened 12 days after Open AI introduced GPT-4 Turbo which was upgraded from 32K to 128K context.
Is it a lot?
Following up on my recent post translating context window size from tokens to different kinds of artefacts. Here's an updated table:
Artefact | Tokens | # in 200k |
---|---|---|
Tweet | 76 | 2,632 |
Book page (Robinson Crusoe) | 282 | 709 |
Google Results Page (copy/paste, txt) | 975 | 205 |
StackOverflow Question (copy/paste, txt) | 947 | 211 |
apple.com(copy/paste, txt) | 997 | 201 |
StackOverflow Question (Markdown) | 1037 | 193 |
Blog post (Markdown) | 4572 | 44 |
Linux Kernel average source file | 5205 | 38 |
SCRUM Guide (2020) | 5440 | 37 |
Wikipedia page (Albania, copy/paste, txt) | 42492 | 4.7 |
Wikipedia page (Albania, source, wikitext) | 76462 | 2.6 |
apple.com(source, HTML) | 92091 | 2.2 |
“The Lean Startup” book | 113264 | 1.8 |
128K Context | 128000 | 1.6 |
PM BoK (4th edition, 2008) | 228880 | 0.87 |
Google Results Page (source, HTML) | 246781 | 0.81 |
Linux Kernel largest source file | 69039016 | 0.0029 |
Context Size timeline
Here's a quick rundown of model release dates and context windows:
- November 21, 2023 : Claude 2.1 - 200K
- November 6, 2023 : GPT-4 Turbo - 128K
-
June 12, 2023 :
gpt-3.5-turbo-0613
- 16K - May 11, 2023 : Claude100K - 100K
- March 14, 2023 : GPT4 - 8K and 32K
- March 14, 2023 : Claude - 9K
- November 30, 2022 : ChatGPT/GPT3.5 - 4K
- June 11, 2020 : GPT3 - 2K
P.S.:
On a side matter. I'm fascinated by how little progress in internet search has been made. Both Bing AI and Google Bard produce complete nonsense should your request require more than a few top results to produce a meaningful result :)
Top comments (3)
What a competition between the Open AI and Claude. Looks like they are battling on how to beat with each other on solving the context window limits :)
Context window size is an easy to understand and compete on metric, like the CPU frequency in the old days)
Sorry, there are much more hidden things with the Max Token, there's research going on with the Sliding Window based token generation, however at the moment, it's impossible to build an LLM with the infinite context window, the LLM would go wild and loose context and won't be able to generate the next word per say as per the statistical next word prediction.
More research is required in this space, and it can be done by the dedicated LLM vendors such as Open AI, Anthropic, Cohere etc.