Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
DEV Community
Close
#
vllm
Follow
Hide
Posts
Left menu
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
Running LLMs on Windows: Native vLLM vs WSL vs llama.cpp Compared
Alan West
Alan West
Alan West
Follow
May 3
Running LLMs on Windows: Native vLLM vs WSL vs llama.cpp Compared
#
llm
#
vllm
#
windows
#
machinelearning
Comments
Add Comment
4 min read
The 70B Threshold: How the RTX 5090 Rewrites the Home Lab Equation
Matthew Gladding
Matthew Gladding
Matthew Gladding
Follow
Apr 24
The 70B Threshold: How the RTX 5090 Rewrites the Home Lab Equation
#
model
#
memory
#
models
#
vllm
Comments
Add Comment
8 min read
11-Second Time to First Token on a Healthy vLLM Server
Ingero Team
Ingero Team
Ingero Team
Follow
Apr 21
11-Second Time to First Token on a Healthy vLLM Server
#
vllm
#
observability
#
ebpf
#
mcp
1
 reaction
Comments
Add Comment
5 min read
Self-hosted Gemma 4 on TPU with vLLM, MCP, ADK, and Gemini CLI
xbill
xbill
xbill
Follow
for
Google Developer Experts
Apr 28
Self-hosted Gemma 4 on TPU with vLLM, MCP, ADK, and Gemini CLI
#
vllm
#
googleadk
#
tpu
#
gemini
26
 reactions
Comments
Add Comment
16 min read
Performance Test: Ollama 0.5.0 vs. vLLM 0.4.0 Local LLM Inference Latency on NVIDIA RTX 5090 and AMD Radeon RX 8900 in 2026
ANKUSH CHOUDHARY JOHAL
ANKUSH CHOUDHARY JOHAL
ANKUSH CHOUDHARY JOHAL
Follow
Apr 29
Performance Test: Ollama 0.5.0 vs. vLLM 0.4.0 Local LLM Inference Latency on NVIDIA RTX 5090 and AMD Radeon RX 8900 in 2026
#
performance
#
test
#
ollama
#
vllm
Comments
Add Comment
14 min read
Why We Stopped Using vLLM 0.6 for Local LLMs in Favor of Ollama 0.5 for Code Tasks
ANKUSH CHOUDHARY JOHAL
ANKUSH CHOUDHARY JOHAL
ANKUSH CHOUDHARY JOHAL
Follow
Apr 29
Why We Stopped Using vLLM 0.6 for Local LLMs in Favor of Ollama 0.5 for Code Tasks
#
stopped
#
using
#
vllm
#
local
Comments
Add Comment
14 min read
Comparison: vLLM 0.6 vs. Text Generation Inference 1.4 for Serving Code LLMs
ANKUSH CHOUDHARY JOHAL
ANKUSH CHOUDHARY JOHAL
ANKUSH CHOUDHARY JOHAL
Follow
Apr 29
Comparison: vLLM 0.6 vs. Text Generation Inference 1.4 for Serving Code LLMs
#
comparison
#
vllm
#
text
#
generation
Comments
Add Comment
16 min read
From one model to seven — what it took to make TurboQuant model-portable
Alberto Nieto
Alberto Nieto
Alberto Nieto
Follow
Apr 1
From one model to seven — what it took to make TurboQuant model-portable
#
python
#
vllm
#
gpu
#
triton
Comments
Add Comment
3 min read
Compressed VLM inference from a single Containerfile — turboquant-vllm v1.1
Alberto Nieto
Alberto Nieto
Alberto Nieto
Follow
Mar 28
Compressed VLM inference from a single Containerfile — turboquant-vllm v1.1
#
python
#
vllm
#
gpu
#
containers
1
 reaction
Comments
Add Comment
2 min read
vLLM Request Lifecycle (Where TTFT is measured)
iapilgrim
iapilgrim
iapilgrim
Follow
Mar 11
vLLM Request Lifecycle (Where TTFT is measured)
#
vllm
#
monitoring
Comments
Add Comment
2 min read
How to Run Gemma 4 Locally With Ollama, llama.cpp, and vLLM
Maksim Danilchenko
Maksim Danilchenko
Maksim Danilchenko
Follow
Apr 11
How to Run Gemma 4 Locally With Ollama, llama.cpp, and vLLM
#
gemma4
#
ollama
#
llamacpp
#
vllm
2
 reactions
Comments
1
 comment
9 min read
Gemma-SRE: Self-Hosted vLLM Infrastructure Agent
xbill
xbill
xbill
Follow
for
Google Developer Experts
Mar 27
Gemma-SRE: Self-Hosted vLLM Infrastructure Agent
#
gemma
#
mcpserver
#
tpusprint
#
vllm
1
 reaction
Comments
Add Comment
18 min read
vLLM On-Demand Gateway: Zero-VRAM Standby for Local LLMs on Consumer GPUs
soy
soy
soy
Follow
Mar 26
vLLM On-Demand Gateway: Zero-VRAM Standby for Local LLMs on Consumer GPUs
#
vllm
#
llm
#
gpu
#
python
2
 reactions
Comments
1
 comment
4 min read
I Pushed Local LLMs Harder. Here's What Two Models Actually Did.
Donald Cruver
Donald Cruver
Donald Cruver
Follow
Mar 2
I Pushed Local LLMs Harder. Here's What Two Models Actually Did.
#
claudecode
#
vllm
#
selfhosted
#
amd
1
 reaction
Comments
Add Comment
8 min read
Pare de Brincar com LLMs Locais: Leve a IAG Open Source para a Produção na Magalu Cloud
Gláucio
Gláucio
Gláucio
Follow
for
Magalu Cloud
Feb 5
Pare de Brincar com LLMs Locais: Leve a IAG Open Source para a Produção na Magalu Cloud
#
ai
#
llm
#
vllm
#
docker
2
 reactions
Comments
3
 comments
22 min read
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account