Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
DEV Community
Close
#
mlops
Follow
Hide
Posts
Left menu
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
Handling Failure: The Most Important Part of AI Systems
Siddhartha Reddy
Siddhartha Reddy
Siddhartha Reddy
Follow
May 29
Handling Failure: The Most Important Part of AI Systems
#
ai
#
machinelearning
#
systemdesign
#
mlops
Comments
Add Comment
2 min read
QAT vs PTQ on our edge vision model: 6 months of A/B data
Marco Rinaldi
Marco Rinaldi
Marco Rinaldi
Follow
May 28
QAT vs PTQ on our edge vision model: 6 months of A/B data
#
machinelearning
#
computervision
#
mlops
#
pytorch
Comments
Add Comment
4 min read
Serving 40 LoRA adapters on one base model: the throughput we got
Marcus Chen
Marcus Chen
Marcus Chen
Follow
May 29
Serving 40 LoRA adapters on one base model: the throughput we got
#
machinelearning
#
llm
#
pytorch
#
mlops
Comments
Add Comment
4 min read
torch.compile recompiled our SDXL UNet 38 times in production
Elise Moreau
Elise Moreau
Elise Moreau
Follow
May 29
torch.compile recompiled our SDXL UNet 38 times in production
#
pytorch
#
machinelearning
#
computervision
#
mlops
Comments
Add Comment
4 min read
AI Observability: Stop Flying Blind in Production
qodors
qodors
qodors
Follow
May 27
AI Observability: Stop Flying Blind in Production
#
ai
#
monitoring
#
mlops
#
observability
Comments
Add Comment
4 min read
Semantic caching the VLM step in our product-photo pipeline
Elise Moreau
Elise Moreau
Elise Moreau
Follow
May 27
Semantic caching the VLM step in our product-photo pipeline
#
machinelearning
#
mlops
#
computervision
#
llm
1
 reaction
Comments
Add Comment
4 min read
LLM-as-judge variance broke our DPO training signal for 3 weeks
Marcus Chen
Marcus Chen
Marcus Chen
Follow
May 27
LLM-as-judge variance broke our DPO training signal for 3 weeks
#
machinelearning
#
mlops
#
llm
#
pytorch
Comments
Add Comment
4 min read
The bf16 grad accumulator that killed our SDXL LoRA training
Elise Moreau
Elise Moreau
Elise Moreau
Follow
May 27
The bf16 grad accumulator that killed our SDXL LoRA training
#
machinelearning
#
pytorch
#
mlops
#
computervision
Comments
Add Comment
4 min read
Token-level eval harness for tool-calling agents: what we wired up
Marcus Chen
Marcus Chen
Marcus Chen
Follow
May 26
Token-level eval harness for tool-calling agents: what we wired up
#
machinelearning
#
llm
#
mlops
#
devops
Comments
Add Comment
4 min read
Capping VLM spend per CV researcher: hierarchical budgets in practice
Marco Rinaldi
Marco Rinaldi
Marco Rinaldi
Follow
May 26
Capping VLM spend per CV researcher: hierarchical budgets in practice
#
machinelearning
#
computervision
#
llm
#
mlops
1
 reaction
Comments
2
 comments
4 min read
Part 2: Enterprise Decision Intelligence Architecture: AI Governance, Threshold Policy Engines, and Operational AI Systems
Shallabh Dixitt
Shallabh Dixitt
Shallabh Dixitt
Follow
May 26
Part 2: Enterprise Decision Intelligence Architecture: AI Governance, Threshold Policy Engines, and Operational AI Systems
#
ai
#
architecture
#
governance
#
mlops
Comments
Add Comment
11 min read
Auto-labelling 1.2M robotics frames with VLMs: a failover story
Marco Rinaldi
Marco Rinaldi
Marco Rinaldi
Follow
May 25
Auto-labelling 1.2M robotics frames with VLMs: a failover story
#
computervision
#
mlops
#
llm
Comments
Add Comment
4 min read
We Audited Our Agent Tool-Call Traces. Half Our Eval Data Was Garbage.
Marcus Chen
Marcus Chen
Marcus Chen
Follow
May 25
We Audited Our Agent Tool-Call Traces. Half Our Eval Data Was Garbage.
#
mlops
#
llm
#
machinelearning
#
infrastructure
Comments
Add Comment
4 min read
How to Detect GPU Waste in a Kubernetes Cluster
Sam Hosseini
Sam Hosseini
Sam Hosseini
Follow
May 25
How to Detect GPU Waste in a Kubernetes Cluster
#
kubernetes
#
gpu
#
mlops
#
devops
Comments
Add Comment
5 min read
Cost accounting for diffusion image generation at $0.0008 per render
Elise Moreau
Elise Moreau
Elise Moreau
Follow
May 25
Cost accounting for diffusion image generation at $0.0008 per render
#
machinelearning
#
mlops
#
infrastructure
#
llm
Comments
Add Comment
4 min read
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account