DEV Community

# vllm

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Running LLMs on Windows: Native vLLM vs WSL vs llama.cpp Compared

Running LLMs on Windows: Native vLLM vs WSL vs llama.cpp Compared

Comments
4 min read
The 70B Threshold: How the RTX 5090 Rewrites the Home Lab Equation

The 70B Threshold: How the RTX 5090 Rewrites the Home Lab Equation

Comments
8 min read
11-Second Time to First Token on a Healthy vLLM Server

11-Second Time to First Token on a Healthy vLLM Server

1
Comments
5 min read
Self-hosted Gemma 4 on TPU with vLLM, MCP, ADK, and Gemini CLI

Self-hosted Gemma 4 on TPU with vLLM, MCP, ADK, and Gemini CLI

26
Comments
16 min read
Performance Test: Ollama 0.5.0 vs. vLLM 0.4.0 Local LLM Inference Latency on NVIDIA RTX 5090 and AMD Radeon RX 8900 in 2026

Performance Test: Ollama 0.5.0 vs. vLLM 0.4.0 Local LLM Inference Latency on NVIDIA RTX 5090 and AMD Radeon RX 8900 in 2026

Comments
14 min read
Why We Stopped Using vLLM 0.6 for Local LLMs in Favor of Ollama 0.5 for Code Tasks

Why We Stopped Using vLLM 0.6 for Local LLMs in Favor of Ollama 0.5 for Code Tasks

Comments
14 min read
Comparison: vLLM 0.6 vs. Text Generation Inference 1.4 for Serving Code LLMs

Comparison: vLLM 0.6 vs. Text Generation Inference 1.4 for Serving Code LLMs

Comments
16 min read
From one model to seven — what it took to make TurboQuant model-portable

From one model to seven — what it took to make TurboQuant model-portable

Comments
3 min read
Compressed VLM inference from a single Containerfile — turboquant-vllm v1.1

Compressed VLM inference from a single Containerfile — turboquant-vllm v1.1

1
Comments
2 min read
vLLM Request Lifecycle (Where TTFT is measured)

vLLM Request Lifecycle (Where TTFT is measured)

Comments
2 min read
How to Run Gemma 4 Locally With Ollama, llama.cpp, and vLLM

How to Run Gemma 4 Locally With Ollama, llama.cpp, and vLLM

2
Comments 1
9 min read
Gemma-SRE: Self-Hosted vLLM Infrastructure Agent

Gemma-SRE: Self-Hosted vLLM Infrastructure Agent

1
Comments
18 min read
vLLM On-Demand Gateway: Zero-VRAM Standby for Local LLMs on Consumer GPUs

vLLM On-Demand Gateway: Zero-VRAM Standby for Local LLMs on Consumer GPUs

2
Comments 1
4 min read
I Pushed Local LLMs Harder. Here's What Two Models Actually Did.

I Pushed Local LLMs Harder. Here's What Two Models Actually Did.

1
Comments
8 min read
Pare de Brincar com LLMs Locais: Leve a IAG Open Source para a Produção na Magalu Cloud

Pare de Brincar com LLMs Locais: Leve a IAG Open Source para a Produção na Magalu Cloud

2
Comments 3
22 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.