DEV Community

Site Reliability Engineering

Site Reliability Engineering principles, practices, and culture.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud

Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud

6
Comments 1
8 min read
Build an Alert Decision Layer CLI in Python

Build an Alert Decision Layer CLI in Python

Comments 1
4 min read
AI in Incident Response: Hype vs. Reality in 2024

AI in Incident Response: Hype vs. Reality in 2024

Comments
3 min read
The Future of Infrastructure Is Control Surfaces

The Future of Infrastructure Is Control Surfaces

Comments
4 min read
Go Context Timeouts That Save Real Money

Go Context Timeouts That Save Real Money

Comments
9 min read
Hiring SREs: What I Look For After Interviewing 100+ Candidates

Hiring SREs: What I Look For After Interviewing 100+ Candidates

Comments
3 min read
Part 2: Hands-on tc Framework: Building a Full-Stack Async API with Pages

Part 2: Hands-on tc Framework: Building a Full-Stack Async API with Pages

Comments
7 min read
Log Management at Scale: How We Cut Costs 70% Without Losing Signal

Log Management at Scale: How We Cut Costs 70% Without Losing Signal

Comments
2 min read
Canary Deployments: The Pattern That Cut Our Rollback Rate by 80%

Canary Deployments: The Pattern That Cut Our Rollback Rate by 80%

Comments 1
2 min read
Platform Engineering: Building an Internal Developer Platform That Teams Actually Use

Platform Engineering: Building an Internal Developer Platform That Teams Actually Use

Comments
2 min read
How We Handle SSL Certificate Expiration Alerts at Scale

How We Handle SSL Certificate Expiration Alerts at Scale

Comments
6 min read
This is what separates teams that scale from teams that survive:

This is what separates teams that scale from teams that survive:

1
Comments
1 min read
Public status page guide for SaaS teams selling to enterprise

Public status page guide for SaaS teams selling to enterprise

1
Comments
4 min read
# Sentinel Diary #4: From Dashboard to Incident Response — The deterministic path to reliable SRE

# Sentinel Diary #4: From Dashboard to Incident Response — The deterministic path to reliable SRE

Comments
5 min read
Chaos Engineering for Teams That Aren't Netflix

Chaos Engineering for Teams That Aren't Netflix

Comments
3 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.