DEV Community

# llamacpp

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU

Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU

Comments
8 min read
Ollama vs llama.cpp vs vLLM: Which Should You Use in 2026?

Ollama vs llama.cpp vs vLLM: Which Should You Use in 2026?

Comments
5 min read
GGUF Quantization Explained: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)

GGUF Quantization Explained: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)

Comments
4 min read
Self-Hosted AI Agent Systems: Why Local Inference Matters More Than You Think

Self-Hosted AI Agent Systems: Why Local Inference Matters More Than You Think

Comments
4 min read
Fixing Qwen 3.6 4090 llama.cpp Bug: 18 tok/s on My RTX 4090

Fixing Qwen 3.6 4090 llama.cpp Bug: 18 tok/s on My RTX 4090

Comments
8 min read
First Words: LLM Inference on RISC-V

First Words: LLM Inference on RISC-V

Comments
9 min read
Running a 70B LLM on Pure RISC-V: The MilkV Pioneer Deployment Journey

Running a 70B LLM on Pure RISC-V: The MilkV Pioneer Deployment Journey

Comments
17 min read
llama.cppの設定で8GBの性能が5倍変わる — 主要オプションの最適値を出した

llama.cppの設定で8GBの性能が5倍変わる — 主要オプションの最適値を出した

Comments
4 min read
Parameter Count Is the Worst Way to Pick a Model on 8GB VRAM

Parameter Count Is the Worst Way to Pick a Model on 8GB VRAM

Comments
5 min read
How to Run Gemma 4 Locally With Ollama, llama.cpp, and vLLM

How to Run Gemma 4 Locally With Ollama, llama.cpp, and vLLM

2
Comments 1
9 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.