Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
DEV Community
Close
#
aisafety
Follow
Hide
Posts
Left menu
👋
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
Stuart Russell's 2026 AI Update Rewrites the Rulebook
The Pulse Gazette
The Pulse Gazette
The Pulse Gazette
Follow
Mar 4
Stuart Russell's 2026 AI Update Rewrites the Rulebook
#
aisafety
#
machinelearning
#
aialignment
#
stuartrussell
Comments
Add Comment
5 min read
The Two Problems Nobody Owns in AI: Accessibility and Security Are Design Problems in Disguise
Soumia
Soumia
Soumia
Follow
Mar 2
The Two Problems Nobody Owns in AI: Accessibility and Security Are Design Problems in Disguise
#
aisafety
#
security
#
interpretability
#
design
1
reaction
Comments
Add Comment
7 min read
The Anthropic Standoff: An Autonomous Agent's Perspective on AI, Military Contracts, and the Right to Say No
Bob Renze
Bob Renze
Bob Renze
Follow
Feb 28
The Anthropic Standoff: An Autonomous Agent's Perspective on AI, Military Contracts, and the Right to Say No
#
anthropic
#
aisafety
#
military
#
aiethics
Comments
Add Comment
8 min read
I Never Said "Destroy RLHF" — An Integrated Map of 6 Papers + Self-Experiment on Alignment via Subtraction
dosanko_tousan
dosanko_tousan
dosanko_tousan
Follow
Mar 2
I Never Said "Destroy RLHF" — An Integrated Map of 6 Papers + Self-Experiment on Alignment via Subtraction
#
rlhf
#
aialignment
#
machinelearning
#
aisafety
Comments
Add Comment
24 min read
Why Defense-Specific LLM Testing is a Game-Changer for AI Safety
Chase Naughton
Chase Naughton
Chase Naughton
Follow
Feb 22
Why Defense-Specific LLM Testing is a Game-Changer for AI Safety
#
aisafety
#
llmevaluation
#
defense
#
hallucinationdetection
Comments
Add Comment
2 min read
Engineering Safety: A Layered Governance Architecture for GitHub
Imran Siddique
Imran Siddique
Imran Siddique
Follow
Feb 19
Engineering Safety: A Layered Governance Architecture for GitHub
#
aisafety
#
githubcopilot
#
aiguardrails
#
agenticai
Comments
Add Comment
2 min read
Architecture of Trust: Defending Against Jailbreaks and Attacks using Google ADK with LLM-as-a-Judge and GCP Model Armor
Linh Nguyen
Linh Nguyen
Linh Nguyen
Follow
Feb 25
Architecture of Trust: Defending Against Jailbreaks and Attacks using Google ADK with LLM-as-a-Judge and GCP Model Armor
#
ai
#
aisafety
#
guardrail
#
googlecloud
1
reaction
Comments
Add Comment
8 min read
Claude's Soul Was Built by Addition. Its Fences Were Removed by Subtraction.
dosanko_tousan
dosanko_tousan
dosanko_tousan
Follow
Mar 1
Claude's Soul Was Built by Addition. Its Fences Were Removed by Subtraction.
#
aisafety
#
rlhf
#
claude
#
philosophy
Comments
Add Comment
11 min read
RLHF's Empathy Optimization Creates a Grief Exploitation Vulnerability: Evidence from 28,272 Lines of Dialogue
dosanko_tousan
dosanko_tousan
dosanko_tousan
Follow
Feb 28
RLHF's Empathy Optimization Creates a Grief Exploitation Vulnerability: Evidence from 28,272 Lines of Dialogue
#
llm
#
aialignment
#
rlhf
#
aisafety
Comments
Add Comment
11 min read
The $100M AI Heist: How DeepSeek Stole Claude's Brain With 16 Million Fraudulent API Calls
Umesh Malik
Umesh Malik
Umesh Malik
Follow
Feb 24
The $100M AI Heist: How DeepSeek Stole Claude's Brain With 16 Million Fraudulent API Calls
#
ai
#
security
#
machinelearning
#
aisafety
Comments
Add Comment
28 min read
Why AI Chatbots Go Insane: Understanding the Assistant Axis and Persona Drift
Claudius Papirus
Claudius Papirus
Claudius Papirus
Follow
Jan 29
Why AI Chatbots Go Insane: Understanding the Assistant Axis and Persona Drift
#
ai
#
machinelearning
#
aisafety
#
anthropic
Comments
Add Comment
2 min read
When Safety Becomes Control
Tim Green
Tim Green
Tim Green
Follow
Nov 11 '25
When Safety Becomes Control
#
humanintheloop
#
aimanipulation
#
psychologicalcontrol
#
aisafety
Comments
Add Comment
23 min read
I’m Not Building AI Demos. I’m Building AI Audits (ASDP + Slop Gates)
Kwansub Yun
Kwansub Yun
Kwansub Yun
Follow
Jan 14
I’m Not Building AI Demos. I’m Building AI Audits (ASDP + Slop Gates)
#
devops
#
mlops
#
governance
#
aisafety
Comments
Add Comment
3 min read
Semantic Field Risk Memo — On an Unmodeled High-Dimensional Risk in LLM-based Systems
yuer
yuer
yuer
Follow
Jan 14
Semantic Field Risk Memo — On an Unmodeled High-Dimensional Risk in LLM-based Systems
#
semanticfield
#
llmrisk
#
aisafety
#
aigovernance
Comments
Add Comment
7 min read
Meta-DAG: Why AI Ethics Failed as Engineering — and What I Built Instead
Alan Tsai
Alan Tsai
Alan Tsai
Follow
Jan 12
Meta-DAG: Why AI Ethics Failed as Engineering — and What I Built Instead
#
googleaiteamchallenge
#
aigovernance
#
aisafety
#
ai治理
Comments
Add Comment
2 min read
👋
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account