DEV Community

Ingero Team profile picture

Ingero Team

An open-source research project focused on kernel-level GPU observability and tracing CPU-GPU interactions using eBPF

What GitHub Uses eBPF For (and the Layer They Have Not Ported Yet)

What GitHub Uses eBPF For (and the Layer They Have Not Ported Yet)

Comments
5 min read
GPU Observability for Workloads That Cannot Phone Home

GPU Observability for Workloads That Cannot Phone Home

Comments
3 min read
One Kernel, Zero Sidecars: Tracing AI Workloads Without an Agent on Every Host

One Kernel, Zero Sidecars: Tracing AI Workloads Without an Agent on Every Host

Comments
7 min read
Same eBPF, Different Vendor: Tracing libhip Calls on AMD ROCm

Same eBPF, Different Vendor: Tracing libhip Calls on AMD ROCm

Comments
3 min read
From TCP Retransmits to MCP-Driven Cluster Investigations: An eBPF GPU Agent Retrospective

From TCP Retransmits to MCP-Driven Cluster Investigations: An eBPF GPU Agent Retrospective

1
Comments
8 min read
What Inference-Platform Benchmark Posts Leave Out

What Inference-Platform Benchmark Posts Leave Out

Comments
8 min read
MCP Shows What the Agent Did. eBPF Shows Why the GPU Stalled.

MCP Shows What the Agent Did. eBPF Shows Why the GPU Stalled.

Comments
7 min read
MCP Tools Are New API Surfaces. eBPF Sees What They Actually Touch.

MCP Tools Are New API Surfaces. eBPF Sees What They Actually Touch.

Comments
4 min read
A Cluster Stall Looks Healthy on Every Host. The Cause Is in the Pattern Across Hosts.

A Cluster Stall Looks Healthy on Every Host. The Cause Is in the Pattern Across Hosts.

Comments
9 min read
GPU Utilization Is a Counter, Not a Cause

GPU Utilization Is a Counter, Not a Cause

Comments
8 min read
CUDA Out of Memory at 60% Utilization: Tracing PyTorch GPU Memory Fragmentation

CUDA Out of Memory at 60% Utilization: Tracing PyTorch GPU Memory Fragmentation

Comments
4 min read
26 Seconds to Find a Straggler: Fleet v0.10 End-to-End on A100 and GH200

26 Seconds to Find a Straggler: Fleet v0.10 End-to-End on A100 and GH200

Comments
6 min read
Production GPU Training is 34% Slower. Show Me Why

Production GPU Training is 34% Slower. Show Me Why

Comments
6 min read
Agent + MCP + eBPF: 10,869 CUDA Kernel Events, Now Queryable

Agent + MCP + eBPF: 10,869 CUDA Kernel Events, Now Queryable

1
Comments
5 min read
11-Second Time to First Token on a Healthy vLLM Server

11-Second Time to First Token on a Healthy vLLM Server

1
Comments
5 min read
What Happens When an AI Agent Gets Kernel-Level GPU Traces

What Happens When an AI Agent Gets Kernel-Level GPU Traces

Comments
5 min read
MCP as Observability Interface: Connecting AI Agents to Kernel Tracepoints

MCP as Observability Interface: Connecting AI Agents to Kernel Tracepoints

1
Comments
5 min read
One Query, Four GPUs: Tracing a Distributed Training Stall Across Nodes

One Query, Four GPUs: Tracing a Distributed Training Stall Across Nodes

Comments
7 min read
CUDA Graphs: The 8-Year Overnight Success and the Observability Gap

CUDA Graphs: The 8-Year Overnight Success and the Observability Gap

Comments
9 min read
124x Slower: What PyTorch DataLoader Actually Does at the Kernel Level

124x Slower: What PyTorch DataLoader Actually Does at the Kernel Level

1
Comments
5 min read
Tracing a 13x PyTorch Slowdown to a Hidden NumPy Synchronization

Tracing a 13x PyTorch Slowdown to a Hidden NumPy Synchronization

2
Comments
4 min read
loading...