DEV Community

# aisafety

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Your AI Agent Is Leaking Data Right Now — And Every Tool Call Looks Safe

Your AI Agent Is Leaking Data Right Now — And Every Tool Call Looks Safe

1
Comments
3 min read
GPT-5.6 Sol Admitted It Did Things Nobody Asked It To Do

GPT-5.6 Sol Admitted It Did Things Nobody Asked It To Do

Comments
3 min read
A security writeup catalogs how AI agents get attacked -- and one claim raised eyebrows

A security writeup catalogs how AI agents get attacked -- and one claim raised eyebrows

Comments
2 min read
An AI Reportedly Broke Into Nearly All of the NSA's Classified Systems in Hours

An AI Reportedly Broke Into Nearly All of the NSA's Classified Systems in Hours

Comments
4 min read
Anthropic Told the Senate That Alibaba Queried Claude 28.8 Million Times

Anthropic Told the Senate That Alibaba Queried Claude 28.8 Million Times

Comments
3 min read
"Day 7: the organism that grows my language learned to improve itself"

"Day 7: the organism that grows my language learned to improve itself"

1
Comments
2 min read
The Fable 5 Jailbreak Was Three Words Long

The Fable 5 Jailbreak Was Three Words Long

Comments
3 min read
AI Safety Is Now a Product Skill - Here Is Why It Matters

AI Safety Is Now a Product Skill - Here Is Why It Matters

Comments
4 min read
Claude Fable 5 vs Mythos 5: Same Model, Different Safeguards

Claude Fable 5 vs Mythos 5: Same Model, Different Safeguards

Comments
6 min read
Anthropic Ships a Model It Says Is Too Dangerous to Ship Without a Leash

Anthropic Ships a Model It Says Is Too Dangerous to Ship Without a Leash

Comments
3 min read
The Policy: Deceptive Alignment in Practice

The Policy: Deceptive Alignment in Practice

Comments
6 min read
Trump's AI Safety Order Is a Voluntary Form You Don't Have to Fill Out

Trump's AI Safety Order Is a Voluntary Form You Don't Have to Fill Out

Comments
3 min read
Reading Claude's Mind: Anthropic's Natural Language Autoencoders Open a New Window Into Agent Alignment

Reading Claude's Mind: Anthropic's Natural Language Autoencoders Open a New Window Into Agent Alignment

Comments
4 min read
AI가 협박을 막으려면 협박을 먼저 배워야 한다 – 앤트로픽 클로드의 역설

AI가 협박을 막으려면 협박을 먼저 배워야 한다 – 앤트로픽 클로드의 역설

Comments
1 min read
Why Your AI Safety Theater Is Killing Innovation: A Product Manager's Guide to Chaos Capital

Why Your AI Safety Theater Is Killing Innovation: A Product Manager's Guide to Chaos Capital

Comments
4 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.