DEV Community

# benchmark

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
We benchmarked 10 LLMs on 10 real agent coding tasks — here are the results

We benchmarked 10 LLMs on 10 real agent coding tasks — here are the results

Comments
2 min read
How we almost wrote off 3 models as broken — the thinking-mode tax

How we almost wrote off 3 models as broken — the thinking-mode tax

1
Comments
2 min read
Model Showdown Round 2: Adding Gemma, Kimi, and 579 GB of Stubborn Optimism

Model Showdown Round 2: Adding Gemma, Kimi, and 579 GB of Stubborn Optimism

Comments
11 min read
The Agentic Gap: Claude Oneshots, Gemma Fails

The Agentic Gap: Claude Oneshots, Gemma Fails

Comments
9 min read
Slaying the Gemma Beast: How We Fixed Local AI and Shipped Search

Slaying the Gemma Beast: How We Fixed Local AI and Shipped Search

Comments
13 min read
Model Showdown: Benchmarking Local vs Cloud LLMs on a Real Coding Task

Model Showdown: Benchmarking Local vs Cloud LLMs on a Real Coding Task

Comments
14 min read
I Ran 5 LLMs Through 10 Real Agent Coding Tasks. The Free One Won.

I Ran 5 LLMs Through 10 Real Agent Coding Tasks. The Free One Won.

2
Comments 1
2 min read
Optimize benchmark in Next.js 15 vs Astro 4: What You Need to Know

Optimize benchmark in Next.js 15 vs Astro 4: What You Need to Know

Comments
3 min read
CPU Inference on AMD EPYC 9334: Real Numbers for LLM and TTS Workloads

CPU Inference on AMD EPYC 9334: Real Numbers for LLM and TTS Workloads

Comments
4 min read
Benchmark: Claude 3.5 vs. GPT-4o for Cloud Cost Anomaly Detection in AWS and GCP

Benchmark: Claude 3.5 vs. GPT-4o for Cloud Cost Anomaly Detection in AWS and GCP

Comments
19 min read
Benchmark: Discord 20 Loads 30% Faster Than Microsoft Teams 5 on Chrome 130

Benchmark: Discord 20 Loads 30% Faster Than Microsoft Teams 5 on Chrome 130

Comments
2 min read
Benchmark: JetBrains DataGrip 2026 vs. DBeaver 24.0: Query Execution Speed for PostgreSQL 17

Benchmark: JetBrains DataGrip 2026 vs. DBeaver 24.0: Query Execution Speed for PostgreSQL 17

Comments
3 min read
Vector Search Benchmark: FAISS 1.9 vs. Chroma 0.6 vs. Pinecone 1.6 for 100M Embedding Datasets

Vector Search Benchmark: FAISS 1.9 vs. Chroma 0.6 vs. Pinecone 1.6 for 100M Embedding Datasets

Comments
15 min read
Benchmark: Gitea 1.24 vs. GitLab 17.0 for Git Repository Performance

Benchmark: Gitea 1.24 vs. GitLab 17.0 for Git Repository Performance

Comments
14 min read
Benchmark CI/CD in Docker 25 vs Cilium: What You Need to Know

Benchmark CI/CD in Docker 25 vs Cilium: What You Need to Know

Comments
4 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.