Skip to content

DEV Community

# vllm

👋 Sign in for the ability to sort posts by relevant, latest, or top.

May 20

Ollama vs llama.cpp vs vLLM: Which Should You Use in 2026?

#ollama #llamacpp #vllm #comparison

5 min read

May 8

vLLM's V1 Release Fixes the Silent Killer in RL Training

#vllm #machinelearning #python

2 min read

Grace

May 21

Rethinking Open Source Contribution in the Age of AI Agents, featuring vLLM Core Maintainer Roger Wang at MLSys'26

#vllm #ai #machinelearning #llm

3 min read

Matthew Gladding

Apr 24

The 70B Threshold: How the RTX 5090 Rewrites the Home Lab Equation

#model #memory #models #vllm

8 min read

ANKUSH CHOUDHARY JOHAL

Apr 29

Performance Test: Ollama 0.5.0 vs. vLLM 0.4.0 Local LLM Inference Latency on NVIDIA RTX 5090 and AMD Radeon RX 8900 in 2026

#performance #test #ollama #vllm

14 min read

ANKUSH CHOUDHARY JOHAL

Apr 29

Why We Stopped Using vLLM 0.6 for Local LLMs in Favor of Ollama 0.5 for Code Tasks

#stopped #using #vllm #local

14 min read

ANKUSH CHOUDHARY JOHAL

Apr 29

Comparison: vLLM 0.6 vs. Text Generation Inference 1.4 for Serving Code LLMs

#comparison #vllm #text #generation

16 min read

May 13

72B Parameters, Zero Quantization, One GPU: Benchmarking Qwen2-VL on AMD MI300X

#vllm #rocm #mi300x #genai

13 min read

Apr 1

From one model to seven — what it took to make TurboQuant model-portable

#python #vllm #gpu #triton

3 min read

Mar 28

Compressed VLM inference from a single Containerfile — turboquant-vllm v1.1

#python #vllm #gpu #containers

2 min read

xbill for Google Developer Experts

Apr 28

Self-hosted Gemma 4 on TPU with vLLM, MCP, ADK, and Gemini CLI

#vllm #googleadk #tpu #gemini

16 min read

Apr 21

11-Second Time to First Token on a Healthy vLLM Server

#vllm #observability #ebpf #mcp

5 min read

Maksim Danilchenko

Apr 11

How to Run Gemma 4 Locally With Ollama, llama.cpp, and vLLM

#gemma4 #ollama #llamacpp #vllm

9 min read

xbill for Google Developer Experts

Mar 27

Gemma-SRE: Self-Hosted vLLM Infrastructure Agent

#gemma #mcpserver #tpusprint #vllm

18 min read

soy

Mar 26

vLLM On-Demand Gateway: Zero-VRAM Standby for Local LLMs on Consumer GPUs

#vllm #llm #gpu #python

4 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.