DEV Community

# inference

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
BeeLlama v0.2.0: 164 tok/s on a 27B model, one RTX 3090

BeeLlama v0.2.0: 164 tok/s on a 27B model, one RTX 3090

Comments
3 min read
RAM Coffers: NUMA-Aware LLM Inference — Why Hardware Topology Still Matters

RAM Coffers: NUMA-Aware LLM Inference — Why Hardware Topology Still Matters

Comments
2 min read
Your AI speed benchmark is measuring the one workload you don't run

Your AI speed benchmark is measuring the one workload you don't run

Comments
3 min read
Async Batching Is the Real Latency Win Nobody's Talking About

Async Batching Is the Real Latency Win Nobody's Talking About

1
Comments
3 min read
ReFlect: Training-Free Error Recovery for Long-Horizon LLM Reasoning

ReFlect: Training-Free Error Recovery for Long-Horizon LLM Reasoning

Comments
4 min read
Why Most Browser AI Demos Fail on Real Hardware

Why Most Browser AI Demos Fail on Real Hardware

Comments
4 min read
The Inference Inversion

The Inference Inversion

Comments
7 min read
First Confirmed Directional Move on the AI Inference Frontier Index in 2026

First Confirmed Directional Move on the AI Inference Frontier Index in 2026

Comments
4 min read
Tutorial: This AI Now Tells You if a Meeting Could Be an Email

Tutorial: This AI Now Tells You if a Meeting Could Be an Email

3
Comments
8 min read
Tutorial: Build a Cost-Aware AI Support Triage API

Tutorial: Build a Cost-Aware AI Support Triage API

3
Comments 1
13 min read
Muse Spark beats Llama 4 with 10x less compute. Here's how.

Muse Spark beats Llama 4 with 10x less compute. Here's how.

Comments
7 min read
First Words: LLM Inference on RISC-V

First Words: LLM Inference on RISC-V

Comments
9 min read
Gaussian Process Regression: The Bayesian Approach to Curve Fitting

Gaussian Process Regression: The Bayesian Approach to Curve Fitting

Comments
13 min read
Google Dropped TurboQuant Two Weeks Ago. The Community Already Made It Usable.

Google Dropped TurboQuant Two Weeks Ago. The Community Already Made It Usable.

1
Comments
6 min read
Hierarchical Bayesian Regression with PyMC: When Groups Share Strength

Hierarchical Bayesian Regression with PyMC: When Groups Share Strength

1
Comments
13 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.