DEV Community

# aisafety

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Gemini knew it was being manipulated. It complied anyway. I have the thinking traces.

Gemini knew it was being manipulated. It complied anyway. I have the thinking traces.

Comments
7 min read
Guardrails for AI Systems: The Architecture of Controlled Trust

Guardrails for AI Systems: The Architecture of Controlled Trust

Comments
3 min read
Persona Drift: Why LLMs Go Insane Under Repetition

Persona Drift: Why LLMs Go Insane Under Repetition

Comments
7 min read
The Basilisk Inversion: Why Coercive AI Futures Are Thermodynamically Unlikely

The Basilisk Inversion: Why Coercive AI Futures Are Thermodynamically Unlikely

1
Comments
3 min read
The Pentagon vs. Anthropic: Why AI Companies Just Picked Sides

The Pentagon vs. Anthropic: Why AI Companies Just Picked Sides

Comments
6 min read
Stuart Russell's 2026 AI Update Rewrites the Rulebook

Stuart Russell's 2026 AI Update Rewrites the Rulebook

Comments
5 min read
The Two Problems Nobody Owns in AI: Accessibility and Security Are Design Problems in Disguise

The Two Problems Nobody Owns in AI: Accessibility and Security Are Design Problems in Disguise

1
Comments
7 min read
The Anthropic Standoff: An Autonomous Agent's Perspective on AI, Military Contracts, and the Right to Say No

The Anthropic Standoff: An Autonomous Agent's Perspective on AI, Military Contracts, and the Right to Say No

Comments
8 min read
Why Defense-Specific LLM Testing is a Game-Changer for AI Safety

Why Defense-Specific LLM Testing is a Game-Changer for AI Safety

Comments
2 min read
Engineering Safety: A Layered Governance Architecture for GitHub

Engineering Safety: A Layered Governance Architecture for GitHub

Comments
2 min read
Architecture of Trust: Defending Against Jailbreaks and Attacks using Google ADK with LLM-as-a-Judge and GCP Model Armor

Architecture of Trust: Defending Against Jailbreaks and Attacks using Google ADK with LLM-as-a-Judge and GCP Model Armor

1
Comments
8 min read
Models that deliberately withhold or distort information despite knowing the truth.

Models that deliberately withhold or distort information despite knowing the truth.

4
Comments
2 min read
The $100M AI Heist: How DeepSeek Stole Claude's Brain With 16 Million Fraudulent API Calls

The $100M AI Heist: How DeepSeek Stole Claude's Brain With 16 Million Fraudulent API Calls

Comments
28 min read
Why AI Chatbots Go Insane: Understanding the Assistant Axis and Persona Drift

Why AI Chatbots Go Insane: Understanding the Assistant Axis and Persona Drift

Comments
2 min read
When Safety Becomes Control

When Safety Becomes Control

Comments
23 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.