Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
DEV Community
Close
#
llm
Follow
Hide
Posts
Left menu
👋
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
We benchmarked 10 LLMs on 10 real agent coding tasks — here are the results
Vilius
Vilius
Vilius
Follow
May 9
We benchmarked 10 LLMs on 10 real agent coding tasks — here are the results
#
ai
#
llm
#
benchmark
#
agents
Comments
Add Comment
2 min read
How we almost wrote off 3 models as broken — the thinking-mode tax
Vilius
Vilius
Vilius
Follow
May 9
How we almost wrote off 3 models as broken — the thinking-mode tax
#
ai
#
llm
#
benchmark
#
postmortem
1
reaction
Comments
Add Comment
2 min read
Why Your AI Character Keeps Breaking Under Pressure (And What I Built Instead of Yet Another System Prompt)
Kiro
Kiro
Kiro
Follow
May 9
Why Your AI Character Keeps Breaking Under Pressure (And What I Built Instead of Yet Another System Prompt)
#
ai
#
llm
#
mcp
#
opensource
Comments
Add Comment
8 min read
What 16 Parallel Claude Agents Built Around Themselves: Deconstructing Anthropic's C Compiler Experiment
Vitalii Cherepanov
Vitalii Cherepanov
Vitalii Cherepanov
Follow
May 9
What 16 Parallel Claude Agents Built Around Themselves: Deconstructing Anthropic's C Compiler Experiment
#
agents
#
claude
#
llm
#
rust
Comments
Add Comment
12 min read
The right of an AI agent to stay silent
Vitalii Cherepanov
Vitalii Cherepanov
Vitalii Cherepanov
Follow
May 9
The right of an AI agent to stay silent
#
llm
#
ai
#
software
Comments
Add Comment
8 min read
1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4
Vilius
Vilius
Vilius
Follow
May 9
1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4
#
ai
#
llm
#
local
#
quantization
Comments
Add Comment
2 min read
I Built a Local AI Coding Agent on M5 Max 128GB — It Failed 164 Times Before Passing 35 Tests
Joseph Yeo
Joseph Yeo
Joseph Yeo
Follow
May 9
I Built a Local AI Coding Agent on M5 Max 128GB — It Failed 164 Times Before Passing 35 Tests
#
llm
#
agents
#
tdd
#
ollama
Comments
Add Comment
7 min read
How Stripe, Shopify, and Airbnb Build AI Harnesses
eleonorarocchi
eleonorarocchi
eleonorarocchi
Follow
May 9
How Stripe, Shopify, and Airbnb Build AI Harnesses
#
ai
#
llm
#
agents
Comments
Add Comment
3 min read
Anthropic plugs into SpaceX's 220,000-GPU Colossus — and doubles Claude's rate limits
Andrew Kew
Andrew Kew
Andrew Kew
Follow
May 9
Anthropic plugs into SpaceX's 220,000-GPU Colossus — and doubles Claude's rate limits
#
ai
#
anthropic
#
llm
#
api
1
reaction
Comments
Add Comment
3 min read
I Trained an LLM on 75K of My Own Messages So It Would Stop Writing Like a Chatbot
Nic Lydon
Nic Lydon
Nic Lydon
Follow
May 9
I Trained an LLM on 75K of My Own Messages So It Would Stop Writing Like a Chatbot
#
ai
#
machinelearning
#
llm
#
python
Comments
Add Comment
8 min read
What 11 big tech companies actually do with AI in 2026
kt
kt
kt
Follow
May 9
What 11 big tech companies actually do with AI in 2026
#
ai
#
productivity
#
engineering
#
llm
Comments
Add Comment
23 min read
tierKV: A Distributed KV Cache That Makes Evicted Blocks Faster to Restore Than GPU Cache Hits
prasanna kanagasabai
prasanna kanagasabai
prasanna kanagasabai
Follow
May 9
tierKV: A Distributed KV Cache That Makes Evicted Blocks Faster to Restore Than GPU Cache Hits
#
llm
#
rust
#
machinelearning
#
opensource
1
reaction
Comments
Add Comment
3 min read
Why AI Coding Agents Waste 30% of Their Tokens — And How to Fix It
Hoyin kyoma
Hoyin kyoma
Hoyin kyoma
Follow
May 9
Why AI Coding Agents Waste 30% of Their Tokens — And How to Fix It
#
agents
#
ai
#
llm
#
softwareengineering
Comments
Add Comment
6 min read
How a $0.02/Call Model Scored 78.2% on SWE-bench Verified — Beating Every Model on the Leaderboard
Hoyin kyoma
Hoyin kyoma
Hoyin kyoma
Follow
May 9
How a $0.02/Call Model Scored 78.2% on SWE-bench Verified — Beating Every Model on the Leaderboard
#
ai
#
llm
#
claude
#
minimax
Comments
Add Comment
7 min read
Stop Guessing Your RAG Quality: Automating Faithfulness Metrics with Spring AI and LLM-as-a-Judge
Machine coding Master
Machine coding Master
Machine coding Master
Follow
May 9
Stop Guessing Your RAG Quality: Automating Faithfulness Metrics with Spring AI and LLM-as-a-Judge
#
java
#
ai
#
llm
#
systemdesign
Comments
Add Comment
2 min read
👋
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account