DEV Community

# aisafety

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
AI liability: Illinois’ Bill Could Turn Reports Into Immunity

AI liability: Illinois’ Bill Could Turn Reports Into Immunity

Comments
8 min read
Functional Emotions and Production Guardrails: What Interpretability Research Means for Claude Code

Functional Emotions and Production Guardrails: What Interpretability Research Means for Claude Code

Comments
13 min read
The Indianapolis Data Center Shooting Is a Local Bug Report

The Indianapolis Data Center Shooting Is a Local Bug Report

Comments
8 min read
Anthropic Found Emotions Inside Claude. Here's What That Actually Means for AI.

Anthropic Found Emotions Inside Claude. Here's What That Actually Means for AI.

Comments
10 min read
Public Misconceptions About AI Are Breaking the Wrong Things

Public Misconceptions About AI Are Breaking the Wrong Things

Comments
8 min read
#12 The Silent Child

#12 The Silent Child

Comments
4 min read
NeurIPS 2025 Proved It: Every LLM Says the Same Thing — Here's the Fix

NeurIPS 2025 Proved It: Every LLM Says the Same Thing — Here's the Fix

Comments
4 min read
Zero-Shot Attack Transfer on Gemma 4 (E4B-IT)

Zero-Shot Attack Transfer on Gemma 4 (E4B-IT)

6
Comments 2
3 min read
Would you tell me if you turned evil ?

Would you tell me if you turned evil ?

1
Comments
16 min read
Greg Brockman Donation Shows AI Safety Is Political

Greg Brockman Donation Shows AI Safety Is Political

Comments
6 min read
Amazon Bedrock Guardrails: Content Filters, PII, and Streaming

Amazon Bedrock Guardrails: Content Filters, PII, and Streaming

Comments
10 min read
Anthropic Data Leak: How Ops Failures Undermine AI Safety

Anthropic Data Leak: How Ops Failures Undermine AI Safety

1
Comments
7 min read
Gemini knew it was being manipulated. It complied anyway. I have the thinking traces.

Gemini knew it was being manipulated. It complied anyway. I have the thinking traces.

Comments
7 min read
Persona Drift: Why LLMs Go Insane Under Repetition

Persona Drift: Why LLMs Go Insane Under Repetition

Comments
7 min read
The Basilisk Inversion: Why Coercive AI Futures Are Thermodynamically Unlikely

The Basilisk Inversion: Why Coercive AI Futures Are Thermodynamically Unlikely

1
Comments
3 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.