DEV Community

# evals

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Not Enough SMEs or Customers to Make Your Evals? Make Some!

Not Enough SMEs or Customers to Make Your Evals? Make Some!

Comments
5 min read
What 25 Years of Deterministic Software Engineering Taught Me About Building AI Systems

What 25 Years of Deterministic Software Engineering Taught Me About Building AI Systems

1
Comments
1 min read
OpenAI Agent Builder and Evals Winddown Migration Checklist

OpenAI Agent Builder and Evals Winddown Migration Checklist

Comments
11 min read
AI Evals, Part 5: From a Number to a Gate Evals in CI and Production

AI Evals, Part 5: From a Number to a Gate Evals in CI and Production

Comments
4 min read
AI Evals, Part 4: LLM-as-Judge, Done Right

AI Evals, Part 4: LLM-as-Judge, Done Right

Comments
5 min read
How to Evaluate LLM Outputs: Building Evals That Actually Catch Regressions

How to Evaluate LLM Outputs: Building Evals That Actually Catch Regressions

Comments
9 min read
AI Evals, Part 3: Golden Datasets That Dont Lie

AI Evals, Part 3: Golden Datasets That Dont Lie

Comments
5 min read
LLM-as-Judge Is Three Decisions

LLM-as-Judge Is Three Decisions

Comments 1
6 min read
AI Evals, Part 2: Error Analysis The Unglamorous Superpower Behind Good Evals

AI Evals, Part 2: Error Analysis The Unglamorous Superpower Behind Good Evals

Comments
5 min read
AI Evals, Explained: How We Actually Know Our AI Is Any Good

AI Evals, Explained: How We Actually Know Our AI Is Any Good

Comments
6 min read
The Loop Is Only as Good as the Metric

The Loop Is Only as Good as the Metric

Comments
7 min read
Why Most AI Teams Are Flying Blind: And What to Do About It

Why Most AI Teams Are Flying Blind: And What to Do About It

Comments 1
13 min read
Wait, you guys run evals?

Wait, you guys run evals?

Comments
1 min read
If You Can Survive a Toddler, You Can Ship LLMs in Production

If You Can Survive a Toddler, You Can Ship LLMs in Production

5
Comments 3
5 min read
Build an eval harness for 184 AI agent prompts with promptfoo

Build an eval harness for 184 AI agent prompts with promptfoo

Comments
8 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.