Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
DEV Community
Close
#
inference
Follow
Hide
Posts
Left menu
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
The KV cache, why LLM inference is memory-bound, not compute-bound
I Want To Learn Programming
I Want To Learn Programming
I Want To Learn Programming
Follow
Jul 4
The KV cache, why LLM inference is memory-bound, not compute-bound
#
gpu
#
llm
#
inference
#
performance
Comments
Add Comment
4 min read
Etched hits $5B and $1B in orders: why inference chips matter
Induwara Ashinsana
Induwara Ashinsana
Induwara Ashinsana
Follow
Jul 1
Etched hits $5B and $1B in orders: why inference chips matter
#
aihardware
#
inference
#
cost
Comments
Add Comment
4 min read
Two labs race to make AI write whole paragraphs at once instead of word by word
Breach Protocol
Breach Protocol
Breach Protocol
Follow
Jul 1
Two labs race to make AI write whole paragraphs at once instead of word by word
#
diffusion
#
openweight
#
google
#
inference
Comments
Add Comment
3 min read
KV Cache Is Eating Your VRAM — Here's How to Estimate It Before You Run Out
zxpmail
zxpmail
zxpmail
Follow
Jun 28
KV Cache Is Eating Your VRAM — Here's How to Estimate It Before You Run Out
#
llm
#
inference
#
engineering
#
ai
Comments
Add Comment
6 min read
I Benchmarked Speculative Decoding — a = 3.5 Wasn't Enough
zxpmail
zxpmail
zxpmail
Follow
Jun 28
I Benchmarked Speculative Decoding — a = 3.5 Wasn't Enough
#
llm
#
inference
#
engineering
#
ai
Comments
Add Comment
7 min read
96% of cuBLAS, no `unsafe`: what cuTile Rust proves
Creeta
Creeta
Creeta
Follow
Jun 26
96% of cuBLAS, no `unsafe`: what cuTile Rust proves
#
cutile
#
rust
#
gpu
#
inference
Comments
Add Comment
8 min read
Extract Structured JSON from Messy Text with Telnyx AI Inference
Sonam
Sonam
Sonam
Follow
Jun 26
Extract Structured JSON from Messy Text with Telnyx AI Inference
#
ai
#
inference
#
telnyx
#
json
Comments
Add Comment
2 min read
Chạy LLM trên iGPU: Giới hạn VRAM của Intel Arc và Radeon 780M
Review Laptop
Review Laptop
Review Laptop
Follow
Jun 21
Chạy LLM trên iGPU: Giới hạn VRAM của Intel Arc và Radeon 780M
#
llama3
#
llm
#
ollama
#
inference
Comments
Add Comment
3 min read
Lossless, But Not Free: The Lossless, But Not Free — When Speculative Decoding Actually Pays Off (and When It Doesn't)
zxpmail
zxpmail
zxpmail
Follow
Jun 28
Lossless, But Not Free: The Lossless, But Not Free — When Speculative Decoding Actually Pays Off (and When It Doesn't)
#
ai
#
llm
#
inference
#
engineering
2
 reactions
Comments
4
 comments
6 min read
How to Build a Secure Homelab for LLM Inference
Jay Grider
Jay Grider
Jay Grider
Follow
Jun 12
How to Build a Secure Homelab for LLM Inference
#
homelab
#
llmsecurity
#
inference
#
supplychain
Comments
Add Comment
4 min read
Google's DiffusionGemma Generates Text Sideways
Peremptory
Peremptory
Peremptory
Follow
Jun 11
Google's DiffusionGemma Generates Text Sideways
#
modelrelease
#
architecture
#
opensource
#
inference
Comments
Add Comment
3 min read
Sipp: a local-first runtime for Hybrid AI Applications
Constant, Yuan Chen
Constant, Yuan Chen
Constant, Yuan Chen
Follow
Jun 24
Sipp: a local-first runtime for Hybrid AI Applications
#
inference
#
ai
#
localai
#
llm
11
 reactions
Comments
2
 comments
11 min read
Speculative decoding: when and why it actually speeds up inference
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 5
Speculative decoding: when and why it actually speeds up inference
#
llm
#
ai
#
inference
#
performance
1
 reaction
Comments
Add Comment
9 min read
Can You Tell When an LLM API Swaps in a Cheaper Model?
Rob
Rob
Rob
Follow
Jun 16
Can You Tell When an LLM API Swaps in a Cheaper Model?
#
localai
#
llm
#
inference
#
verification
1
 reaction
Comments
3
 comments
3 min read
ReFlect: Training-Free Error Recovery for Long-Horizon LLM Reasoning
Jangwook Kim
Jangwook Kim
Jangwook Kim
Follow
May 11
ReFlect: Training-Free Error Recovery for Long-Horizon LLM Reasoning
#
llmreasoning
#
agents
#
inference
#
arxiv2026
Comments
Add Comment
4 min read
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account