DEV Community

Frank Brsrk  profile picture

Frank Brsrk

Chilling with my dogs and keyboard fighting the AIs

Joined Joined on 
How to diagnose where your RAG agent fabricates: an open-source A/B eval workflow with cross-lab blind judges

How to diagnose where your RAG agent fabricates: an open-source A/B eval workflow with cross-lab blind judges

Comments
6 min read

Want to connect with Frank Brsrk ?

Create an account to connect with Frank Brsrk . You can also sign in below to proceed if you already have an account.

Already have an account? Sign in
Why LLM Agents Fail: Four Mechanisms of Cognitive Decay and the Reasoning Harness Layer

Why LLM Agents Fail: Four Mechanisms of Cognitive Decay and the Reasoning Harness Layer

Comments
13 min read
Why Your AI Agent Loses the Plot: Reasoning Decay and Attention Loss in Long-Running Tasks

Why Your AI Agent Loses the Plot: Reasoning Decay and Attention Loss in Long-Running Tasks

Comments 1
10 min read
Trippy Balls

Trippy Balls

Comments
1 min read
I built a multi-turn agent-vs-agent blind eval in n8n

I built a multi-turn agent-vs-agent blind eval in n8n

Comments
6 min read
I built a Python module to A/B test prompts inside Claude Code, and you can run it on yours

I built a Python module to A/B test prompts inside Claude Code, and you can run it on yours

1
Comments 1
6 min read
the model alone is not the agent. The harness plus the model is the agent.

the model alone is not the agent. The harness plus the model is the agent.

1
Comments
2 min read
Eval workflow for agentic builders: fork any prompt through baseline vs scaffolded agents, blind third-party judge.

Eval workflow for agentic builders: fork any prompt through baseline vs scaffolded agents, blind third-party judge.

Comments
2 min read
Wait, you guys run evals?

Wait, you guys run evals?

Comments
1 min read
Under Pressure. Better Harness.

Under Pressure. Better Harness.

Comments 1
2 min read
loading...