Ingero Team

Ingero Team

Jun 12

nvidia-smi Reports 97% Utilization While the GPU Sits Idle

#gpu #ebpf #observability #mlops

6 min read

Ingero Team

Jun 10

Generation-Side Tooling Outpaces Validation-Side Tooling

#ai #machinelearning #gpu #programming

3 min read

Ingero Team

Jun 8

Wave-Level GPU Introspection Was Already in Production (Server Side)

#programming #gpu #performance #monitoring

5 min read

Ingero Team

Jun 5

GPU Incident at 3am: eBPF Tracing from Page to Root Cause in 60 Seconds

#gpu #ebpf #observability #sre

6 min read

Ingero Team

Jun 4

GPU Tracing With cgroup Awareness: Per-Tenant Investigation on Shared Hosts

#kubernetes #devops #observability #gpu

3 min read

Ingero Team

Jun 1

Auto-Generated CUDA Kernels Need Kernel-Level Validation

#ai #machinelearning #gpu #performance

5 min read

Ingero Team

May 29

From Kernel Scheduler to Python Source Line: Tracing a GPU Stall End to End

#ebpf #gpu #python #observability

6 min read

Ingero Team

May 28

Tracing torch.cuda.empty_cache() on an RTX 4090 - Where Do the 53 MB Go?

#gpu #cuda #pytorch #debugging

5 min read

Ingero Team

May 27

AllReduce Stalls Are Network Stalls. Most Tools See Neither.

#machinelearning #devops #performance #networking

4 min read

Ingero Team

May 26

TCP Retransmits Are Not a Fabric Signal on InfiniBand

#ebpf #gpu #rdma #infiniband

4 min read

Ingero Team

May 25

What GitHub Uses eBPF For (and the Layer They Have Not Ported Yet)

#devops #linux #opensource #observability

5 min read

Ingero Team

May 20

GPU Observability for Workloads That Cannot Phone Home

#security #devops #linux #opensource

3 min read

Ingero Team

May 18

One Kernel, Zero Sidecars: Tracing AI Workloads Without an Agent on Every Host

#linux #devops #observability #monitoring

7 min read

Ingero Team

May 15

Same eBPF, Different Vendor: Tracing libhip Calls on AMD ROCm

#linux #programming #gpu #performance

3 min read

Ingero Team

May 14

From TCP Retransmits to MCP-Driven Cluster Investigations: An eBPF GPU Agent Retrospective

#ebpf #gpu #observability #mcp

1

8 min read

Ingero Team

May 13

What Inference-Platform Benchmark Posts Leave Out

#machinelearning #ai #gpu #performance

8 min read

Ingero Team

May 11

MCP Shows What the Agent Did. eBPF Shows Why the GPU Stalled.

#ai #machinelearning #monitoring #gpu

7 min read

Ingero Team

May 7

MCP Tools Are New API Surfaces. eBPF Sees What They Actually Touch.

#ai #machinelearning #monitoring #devops

4 min read

Ingero Team

May 6

A Cluster Stall Looks Healthy on Every Host. The Cause Is in the Pattern Across Hosts.

#machinelearning #devops #gpu #observability

9 min read

Ingero Team

May 4

GPU Utilization Is a Counter, Not a Cause

#gpuobservability #ebpf #gpu #observability

8 min read

Ingero Team

May 4

CUDA Out of Memory at 60% Utilization: Tracing PyTorch GPU Memory Fragmentation

#gpu #cuda #pytorch #debugging

4 min read

Ingero Team

Apr 23

Production GPU Training is 34% Slower. Show Me Why

#gpu #ebpf #observability #mlops

6 min read

Ingero Team

Apr 21

11-Second Time to First Token on a Healthy vLLM Server

#vllm #observability #ebpf #mcp

1

5 min read

Ingero Team

Apr 21

Agent + MCP + eBPF: 10,869 CUDA Kernel Events, Now Queryable

#gpu #ebpf #mcp #observability

1

5 min read

Ingero Team

Apr 16

What Happens When an AI Agent Gets Kernel-Level GPU Traces

#gpu #ebpf #observability #gpuobservability

5 min read

Ingero Team

Apr 16

MCP as Observability Interface: Connecting AI Agents to Kernel Tracepoints

#ebpf #mcp #gpuobservability

1

5 min read

Ingero Team

Apr 13

One Query, Four GPUs: Tracing a Distributed Training Stall Across Nodes

#gpu #ebpf #distributedcomputing

7 min read

Ingero Team

Apr 8

CUDA Graphs: The 8-Year Overnight Success and the Observability Gap

#cuda #gpu #ebpf #ai

9 min read

Ingero Team

Apr 1

124x Slower: What PyTorch DataLoader Actually Does at the Kernel Level

#pytorch #gpu #python #cuda

1

5 min read

Ingero Team

Mar 31

Tracing a 13x PyTorch Slowdown to a Hidden NumPy Synchronization

#pytorch #cuda #python #gpu

2

4 min read

DEV Community

Badges

Writing Debut

Skills/Languages

Currently hacking on

nvidia-smi Reports 97% Utilization While the GPU Sits Idle

Generation-Side Tooling Outpaces Validation-Side Tooling

Wave-Level GPU Introspection Was Already in Production (Server Side)

GPU Incident at 3am: eBPF Tracing from Page to Root Cause in 60 Seconds

GPU Tracing With cgroup Awareness: Per-Tenant Investigation on Shared Hosts

Auto-Generated CUDA Kernels Need Kernel-Level Validation

From Kernel Scheduler to Python Source Line: Tracing a GPU Stall End to End

Tracing torch.cuda.empty_cache() on an RTX 4090 - Where Do the 53 MB Go?

AllReduce Stalls Are Network Stalls. Most Tools See Neither.

TCP Retransmits Are Not a Fabric Signal on InfiniBand

What GitHub Uses eBPF For (and the Layer They Have Not Ported Yet)

GPU Observability for Workloads That Cannot Phone Home

One Kernel, Zero Sidecars: Tracing AI Workloads Without an Agent on Every Host

Same eBPF, Different Vendor: Tracing libhip Calls on AMD ROCm

From TCP Retransmits to MCP-Driven Cluster Investigations: An eBPF GPU Agent Retrospective

What Inference-Platform Benchmark Posts Leave Out

MCP Shows What the Agent Did. eBPF Shows Why the GPU Stalled.

MCP Tools Are New API Surfaces. eBPF Sees What They Actually Touch.

A Cluster Stall Looks Healthy on Every Host. The Cause Is in the Pattern Across Hosts.

GPU Utilization Is a Counter, Not a Cause

CUDA Out of Memory at 60% Utilization: Tracing PyTorch GPU Memory Fragmentation

Production GPU Training is 34% Slower. Show Me Why

11-Second Time to First Token on a Healthy vLLM Server

Agent + MCP + eBPF: 10,869 CUDA Kernel Events, Now Queryable

What Happens When an AI Agent Gets Kernel-Level GPU Traces

MCP as Observability Interface: Connecting AI Agents to Kernel Tracepoints

One Query, Four GPUs: Tracing a Distributed Training Stall Across Nodes

CUDA Graphs: The 8-Year Overnight Success and the Observability Gap

124x Slower: What PyTorch DataLoader Actually Does at the Kernel Level

Tracing a 13x PyTorch Slowdown to a Hidden NumPy Synchronization