DEV Community

Frank Brsrk  profile picture

Frank Brsrk

Chilling with my dogs and keyboard fighting the AIs

Joined Joined on 
I open-sourced a 4-agent blood-panel triage workflow on heym, with a deterministic Python safety gate that runs BEFORE any LLM token

I open-sourced a 4-agent blood-panel triage workflow on heym, with a deterministic Python safety gate that runs BEFORE any LLM token

Comments 1
5 min read

Want to connect with Frank Brsrk ?

Create an account to connect with Frank Brsrk . You can also sign in below to proceed if you already have an account.

Already have an account? Sign in
Reasoning happens before the response

Reasoning happens before the response

Comments
5 min read
An open source LLM eval tool with two independent quality signals

An open source LLM eval tool with two independent quality signals

Comments
4 min read
I built a reasoning harness for LLM agents. Here's what an agent receives when it calls it.

I built a reasoning harness for LLM agents. Here's what an agent receives when it calls it.

1
Comments
4 min read
Cognitive middleware for n8n agents: four ways to wire Ejentum in

Cognitive middleware for n8n agents: four ways to wire Ejentum in

Comments
5 min read
Why your LLM agent drifts off-task by step 4 (and why prompts can't fix it)

Why your LLM agent drifts off-task by step 4 (and why prompts can't fix it)

2
Comments 2
3 min read
I open-sourced a 3-agent blind eval team. Any agent runtime can call it for pre-commitment review of its own plans.

I open-sourced a 3-agent blind eval team. Any agent runtime can call it for pre-commitment review of its own plans.

Comments
10 min read
I open-sourced a 4-agent adversarial code review team. Any coding agent can call it as an MCP server. Built in heym.

I open-sourced a 4-agent adversarial code review team. Any coding agent can call it as an MCP server. Built in heym.

3
Comments 1
6 min read
I shipped ejentum-mcp today: four cognitive harnesses as MCP tools

I shipped ejentum-mcp today: four cognitive harnesses as MCP tools

1
Comments 1
3 min read
How to diagnose where your RAG agent fabricates: an open-source A/B eval workflow with cross-lab blind judges

How to diagnose where your RAG agent fabricates: an open-source A/B eval workflow with cross-lab blind judges

Comments
6 min read
Why LLM Agents Fail: Four Mechanisms of Cognitive Decay and the Reasoning Harness Layer

Why LLM Agents Fail: Four Mechanisms of Cognitive Decay and the Reasoning Harness Layer

Comments
13 min read
Why Your AI Agent Loses the Plot: Reasoning Decay and Attention Loss in Long-Running Tasks

Why Your AI Agent Loses the Plot: Reasoning Decay and Attention Loss in Long-Running Tasks

Comments 1
10 min read
Trippy Balls

Trippy Balls

Comments
1 min read
I built a multi-turn agent-vs-agent blind eval in n8n

I built a multi-turn agent-vs-agent blind eval in n8n

Comments
6 min read
I built a Python module to A/B test prompts inside Claude Code, and you can run it on yours

I built a Python module to A/B test prompts inside Claude Code, and you can run it on yours

1
Comments 1
6 min read
the model alone is not the agent. The harness plus the model is the agent.

the model alone is not the agent. The harness plus the model is the agent.

1
Comments
2 min read
Eval workflow for agentic builders: fork any prompt through baseline vs scaffolded agents, blind third-party judge.

Eval workflow for agentic builders: fork any prompt through baseline vs scaffolded agents, blind third-party judge.

Comments
2 min read
Wait, you guys run evals?

Wait, you guys run evals?

Comments
1 min read
Under Pressure. Better Harness.

Under Pressure. Better Harness.

Comments 1
2 min read
loading...