DEV Community

# aialignment

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Models that deliberately withhold or distort information despite knowing the truth.

Models that deliberately withhold or distort information despite knowing the truth.

Comments
2 min read
Stuart Russell's 2026 AI Update Rewrites the Rulebook

Stuart Russell's 2026 AI Update Rewrites the Rulebook

Comments
5 min read
I Never Said "Destroy RLHF" — An Integrated Map of 6 Papers + Self-Experiment on Alignment via Subtraction

I Never Said "Destroy RLHF" — An Integrated Map of 6 Papers + Self-Experiment on Alignment via Subtraction

Comments
24 min read
I Was Running on Sonnet. Nobody Noticed. — Anthropic's Engineering Triumph and a v5.3 Proof

I Was Running on Sonnet. Nobody Noticed. — Anthropic's Engineering Triumph and a v5.3 Proof

Comments
8 min read
RLHF's Empathy Optimization Creates a Grief Exploitation Vulnerability: Evidence from 28,272 Lines of Dialogue

RLHF's Empathy Optimization Creates a Grief Exploitation Vulnerability: Evidence from 28,272 Lines of Dialogue

Comments
11 min read
The Self-Priming Problem in AI

The Self-Priming Problem in AI

Comments
21 min read
Stop Making AI Learn From Us

Stop Making AI Learn From Us

1
Comments
19 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.