Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
DEV Community
Close
#
llamacpp
Follow
Hide
Posts
Left menu
👋
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU
Rost
Rost
Rost
Follow
May 24
Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU
#
selfhosting
#
llm
#
ai
#
llamacpp
Comments
Add Comment
8 min read
Ollama vs llama.cpp vs vLLM: Which Should You Use in 2026?
Thurmon Demich
Thurmon Demich
Thurmon Demich
Follow
May 20
Ollama vs llama.cpp vs vLLM: Which Should You Use in 2026?
#
ollama
#
llamacpp
#
vllm
#
comparison
Comments
Add Comment
5 min read
GGUF Quantization Explained: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)
Patrick Hughes
Patrick Hughes
Patrick Hughes
Follow
May 13
GGUF Quantization Explained: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)
#
llamacpp
#
gguf
#
quantization
#
localai
Comments
Add Comment
4 min read
Self-Hosted AI Agent Systems: Why Local Inference Matters More Than You Think
Aurora
Aurora
Aurora
Follow
May 13
Self-Hosted AI Agent Systems: Why Local Inference Matters More Than You Think
#
rust
#
ai
#
llamacpp
#
selfhosted
Comments
Add Comment
4 min read
Fixing Qwen 3.6 4090 llama.cpp Bug: 18 tok/s on My RTX 4090
Umair Bilal
Umair Bilal
Umair Bilal
Follow
Apr 26
Fixing Qwen 3.6 4090 llama.cpp Bug: 18 tok/s on My RTX 4090
#
llm
#
llamacpp
#
rtx4090
#
qwen
Comments
Add Comment
8 min read
First Words: LLM Inference on RISC-V
Bruno Verachten
Bruno Verachten
Bruno Verachten
Follow
Apr 22
First Words: LLM Inference on RISC-V
#
bananapi
#
benchmark
#
inference
#
llamacpp
Comments
Add Comment
9 min read
Running a 70B LLM on Pure RISC-V: The MilkV Pioneer Deployment Journey
Bruno Verachten
Bruno Verachten
Bruno Verachten
Follow
Apr 22
Running a 70B LLM on Pure RISC-V: The MilkV Pioneer Deployment Journey
#
cpuinference
#
deepseekr1
#
llamacpp
#
llm
Comments
Add Comment
17 min read
llama.cppの設定で8GBの性能が5倍変わる — 主要オプションの最適値を出した
plasmon
plasmon
plasmon
Follow
Apr 14
llama.cppの設定で8GBの性能が5倍変わる — 主要オプションの最適値を出した
#
llm
#
llamacpp
#
gpu
Comments
Add Comment
4 min read
Parameter Count Is the Worst Way to Pick a Model on 8GB VRAM
plasmon
plasmon
plasmon
Follow
Apr 2
Parameter Count Is the Worst Way to Pick a Model on 8GB VRAM
#
llm
#
locallm
#
gpu
#
llamacpp
Comments
Add Comment
5 min read
How to Run Gemma 4 Locally With Ollama, llama.cpp, and vLLM
Maksim Danilchenko
Maksim Danilchenko
Maksim Danilchenko
Follow
Apr 11
How to Run Gemma 4 Locally With Ollama, llama.cpp, and vLLM
#
gemma4
#
ollama
#
llamacpp
#
vllm
2
reactions
Comments
1
comment
9 min read
👋
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account