DEV Community

# cuda

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
512MiB 512MB — the silent trtexec bug

512MiB 512MB — the silent trtexec bug

Comments
2 min read
Getting ONNX Runtime CUDA Working on NVIDIA Blackwell (GX10/DGX Spark)

Getting ONNX Runtime CUDA Working on NVIDIA Blackwell (GX10/DGX Spark)

Comments 1
4 min read
Memory Coalescing: Same computation, 6x Performance Difference

Memory Coalescing: Same computation, 6x Performance Difference

Comments
6 min read
Setting Up NVIDIA Drivers and CUDA for ML/DL on Ubuntu 22.04

Setting Up NVIDIA Drivers and CUDA for ML/DL on Ubuntu 22.04

1
Comments
3 min read
CUDA Graphs: The 8-Year Overnight Success and the Observability Gap

CUDA Graphs: The 8-Year Overnight Success and the Observability Gap

Comments
9 min read
Achieving Neuro‑Sama‑Tier Speech‑to‑Text for Your Local AI Companion (Whisper + CUDA + LivinGrimoire)

Achieving Neuro‑Sama‑Tier Speech‑to‑Text for Your Local AI Companion (Whisper + CUDA + LivinGrimoire)

Comments
5 min read
124x Slower: What PyTorch DataLoader Actually Does at the Kernel Level

124x Slower: What PyTorch DataLoader Actually Does at the Kernel Level

1
Comments
5 min read
Tracing a 13x PyTorch Slowdown to a Hidden NumPy Synchronization

Tracing a 13x PyTorch Slowdown to a Hidden NumPy Synchronization

2
Comments
4 min read
Installing NVIDIA Drivers Without CUDA

Installing NVIDIA Drivers Without CUDA

2
Comments
7 min read
AMD ROCm on Consumer GPUs: The Open-Source CUDA Alternative That Actually Works Now [2026 Guide]

AMD ROCm on Consumer GPUs: The Open-Source CUDA Alternative That Actually Works Now [2026 Guide]

2
Comments
7 min read
I built the first open-source FP8 linear solver in Python — 2-3x faster than cuBLAS

I built the first open-source FP8 linear solver in Python — 2-3x faster than cuBLAS

2
Comments
3 min read
Implementing Pollard's Kangaroo Algorithm on CUDA

Implementing Pollard's Kangaroo Algorithm on CUDA

1
Comments
5 min read
Nvidia Open-Weight Models: Why the $26B Bet Matters

Nvidia Open-Weight Models: Why the $26B Bet Matters

2
Comments
7 min read
AI Builds AI: How Anthropic’s Claude Codes Its Future

AI Builds AI: How Anthropic’s Claude Codes Its Future

2
Comments
10 min read
From 2-Adic Geometry to Cunningham Chains: Visualization-Driven GPU Search

From 2-Adic Geometry to Cunningham Chains: Visualization-Driven GPU Search

3
Comments
4 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.