DEV Community

# gpu

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
From Kernel Scheduler to Python Source Line: Tracing a GPU Stall End to End

From Kernel Scheduler to Python Source Line: Tracing a GPU Stall End to End

Comments
6 min read
Tracing torch.cuda.empty_cache() on an RTX 4090 - Where Do the 53 MB Go?

Tracing torch.cuda.empty_cache() on an RTX 4090 - Where Do the 53 MB Go?

Comments
5 min read
5090 vs 4090 for AI Workloads: Buy, Rent, or Validate in the Cloud?

5090 vs 4090 for AI Workloads: Buy, Rent, or Validate in the Cloud?

Comments
15 min read
SemiAnalysis访Makora联合创始人谈自动化GPU优化与AI推理前沿

SemiAnalysis访Makora联合创始人谈自动化GPU优化与AI推理前沿

Comments
1 min read
CUDA 13.3 Lands, AI Writes Blackwell Kernels, & FP4 VRAM Optimization for LLMs

CUDA 13.3 Lands, AI Writes Blackwell Kernels, & FP4 VRAM Optimization for LLMs

Comments
3 min read
FlashAttention CUDA Kernel, Strix Halo MOE Boost, & NVIDIA DLSS 4.5 Driver Update

FlashAttention CUDA Kernel, Strix Halo MOE Boost, & NVIDIA DLSS 4.5 Driver Update

Comments
3 min read
PatentLLM: CUDA TileLang/Triton B200 5x Speedup, RTX 5090 Power, PTX Grammar

PatentLLM: CUDA TileLang/Triton B200 5x Speedup, RTX 5090 Power, PTX Grammar

Comments
3 min read
How to Detect GPU Waste in a Kubernetes Cluster

How to Detect GPU Waste in a Kubernetes Cluster

Comments
5 min read
Why Your PyTorch Training Crawls on a Beefy GPU (And How to Fix It)

Why Your PyTorch Training Crawls on a Beefy GPU (And How to Fix It)

Comments
5 min read
RTX 5080 Undervolt Benchmarks, CGO-Free CUDA API Binding, & AMD GPU Compatibility Fix

RTX 5080 Undervolt Benchmarks, CGO-Free CUDA API Binding, & AMD GPU Compatibility Fix

Comments
3 min read
Intel Arc & Arm Mali: New GPUs, Drivers & Benchmarks for Linux

Intel Arc & Arm Mali: New GPUs, Drivers & Benchmarks for Linux

Comments
3 min read
AMD GPU/AI Launches, Legacy Driver Update & CUDA Optimization Platform

AMD GPU/AI Launches, Legacy Driver Update & CUDA Optimization Platform

Comments
3 min read
Running LTX-2.3 Alongside TTS on a Single 96GB GPU with a Cold-Start Architecture

Running LTX-2.3 Alongside TTS on a Single 96GB GPU with a Cold-Start Architecture

Comments
5 min read
RTX 5090 Cooling, BeeLlama VRAM Opts, Resizable BAR Performance Gains

RTX 5090 Cooling, BeeLlama VRAM Opts, Resizable BAR Performance Gains

1
Comments
4 min read
Five Years Later, I Finally Have 96GB VRAM — What It Actually Unlocks for Agent Loops

Five Years Later, I Finally Have 96GB VRAM — What It Actually Unlocks for Agent Loops

Comments
8 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.