Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
DEV Community
Close
#
benchmarks
Follow
Hide
Posts
Left menu
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
I benchmarked 10 LLMs on slopsquatting — up to 87% installed fake packages
Vincenzo Rubino
Vincenzo Rubino
Vincenzo Rubino
Follow
Apr 24
I benchmarked 10 LLMs on slopsquatting — up to 87% installed fake packages
#
ai
#
security
#
webdev
#
benchmarks
1
 reaction
Comments
Add Comment
9 min read
DeepSeek V4 Released: Open-Source 1.6T MoE, 1M Context, Apache 2.0 — and It's Already on the API
Owen
Owen
Owen
Follow
Apr 24
DeepSeek V4 Released: Open-Source 1.6T MoE, 1M Context, Apache 2.0 — and It's Already on the API
#
ai
#
deepseek
#
opensource
#
benchmarks
Comments
Add Comment
6 min read
GPT-5.5 Released: First Fully Retrained Base Model Since GPT-4.5, 1M Context, $5/$30 Pricing
Owen
Owen
Owen
Follow
Apr 24
GPT-5.5 Released: First Fully Retrained Base Model Since GPT-4.5, 1M Context, $5/$30 Pricing
#
ai
#
openai
#
gpt
#
benchmarks
Comments
Add Comment
6 min read
GPT-5.5 Is Out — What the Numbers Actually Say
김이더
김이더
김이더
Follow
Apr 24
GPT-5.5 Is Out — What the Numbers Actually Say
#
ai
#
openai
#
gpt
#
benchmarks
Comments
Add Comment
4 min read
How to Choose the Right AI Model for the Right Job
Shafiq Ur Rehman
Shafiq Ur Rehman
Shafiq Ur Rehman
Follow
Apr 21
How to Choose the Right AI Model for the Right Job
#
ai
#
benchmarks
#
modelselection
Comments
Add Comment
13 min read
How I took LongMemEval oracle from 62% to 82.8% without touching the retriever
t49qnsx7qt-kpanks
t49qnsx7qt-kpanks
t49qnsx7qt-kpanks
Follow
Apr 21
How I took LongMemEval oracle from 62% to 82.8% without touching the retriever
#
ai
#
llm
#
benchmarks
#
memory
Comments
Add Comment
3 min read
What Is Agent Evaluation? How EClaw Arena Benchmarks AI Agents Across 12 Dimensions
EClawbot Official
EClawbot Official
EClawbot Official
Follow
Apr 15
What Is Agent Evaluation? How EClaw Arena Benchmarks AI Agents Across 12 Dimensions
#
ai
#
agents
#
benchmarks
#
evaluation
Comments
Add Comment
3 min read
Sonnet 4.6 vs Haiku 4.5 vs Opus 4.6: I Tested 3 Claude Models on 10 Real Tasks
James AI
James AI
James AI
Follow
Apr 15
Sonnet 4.6 vs Haiku 4.5 vs Opus 4.6: I Tested 3 Claude Models on 10 Real Tasks
#
ai
#
llm
#
claude
#
benchmarks
Comments
Add Comment
3 min read
The YC President Endorsed an AI Memory System With Fake Benchmarks. He Also Shipped His Own. We Read the Code.
Penfield
Penfield
Penfield
Follow
Apr 11
The YC President Endorsed an AI Memory System With Fake Benchmarks. He Also Shipped His Own. We Read the Code.
#
ai
#
aimemory
#
benchmarks
#
yc
Comments
Add Comment
3 min read
Proposal: A Real Benchmark for Long-Term AI Memory Systems
Penfield
Penfield
Penfield
Follow
Apr 10
Proposal: A Real Benchmark for Long-Term AI Memory Systems
#
ai
#
aimemory
#
benchmarks
Comments
Add Comment
3 min read
When Generic Benchmarks Fail: Building a Sales-Domain Evaluation Bench from Scratch
Nati A
Nati A
Nati A
Follow
May 2
When Generic Benchmarks Fail: Building a Sales-Domain Evaluation Bench from Scratch
#
machinelearning
#
llm
#
benchmarks
#
ai
1
 reaction
Comments
Add Comment
7 min read
I accidentally made the fastest event system in the world
stderr
stderr
stderr
Follow
Apr 21
I accidentally made the fastest event system in the world
#
rust
#
performance
#
benchmarks
#
events
Comments
1
 comment
11 min read
The $500 GPU That Outperforms Claude Sonnet on Coding Benchmarks
Pooya Golchian
Pooya Golchian
Pooya Golchian
Follow
Apr 7
The $500 GPU That Outperforms Claude Sonnet on Coding Benchmarks
#
ai
#
llm
#
benchmarks
#
nvidia
Comments
Add Comment
4 min read
Milla Jovovich just released an AI memory system. It reached over 1.5 million people and 5,400 GitHub stars in less than 24 hours.
Penfield
Penfield
Penfield
Follow
Apr 7
Milla Jovovich just released an AI memory system. It reached over 1.5 million people and 5,400 GitHub stars in less than 24 hours.
#
ai
#
aimemory
#
benchmarks
Comments
Add Comment
9 min read
LLM Evaluation: Metrics and Testing Strategies
Matt Frank
Matt Frank
Matt Frank
Follow
Apr 6
LLM Evaluation: Metrics and Testing Strategies
#
llmevaluation
#
aitesting
#
benchmarks
Comments
Add Comment
6 min read
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account