Karan Verma for Docker

Posted on Apr 11

From Zero to Local LLM: A Developer's Guide to Docker Model Runner

#docker #devops #opensource #genai

Build your own local-first GenAI stack with Docker, LangChain, and no GPU.

Why Local LLMs Matter

The rise of large language models (LLMs) has revolutionized how we build applications. But deploying them locally? That’s still a pain for most developers. Between model formats, dependency hell, hardware constraints, and weird CLI tools, running even a small LLM on your laptop can feel like navigating a minefield.

Docker Model Runner changes that. It brings the power of container-native development to local AI workflows so you can focus on building, not battling toolchains.

The Developer Pain Points:

Too many formats: GGUF, PyTorch, ONNX, TF...
Dependency issues and messy build scripts
Need for GPUs or arcane CUDA configs
No consistent local APIs for experimentation

Docker Model Runner solves these by:

Standardizing model access via Docker images
Running fast with llama.cpp under the hood
Providing OpenAI-compatible APIs out of the box
Integrating directly with Docker Desktop

🐳 What Is Docker Model Runner?

It’s a lightweight local model runtime integrated with Docker Desktop. It allows you to run quantized models (GGUF format) locally, via a familiar CLI and an OpenAI-compatible API. It’s powered by llama.cpp and designed to be:

- Developer-friendly: Pull and run models in seconds
- Offline-first: Perfect for privacy and edge use cases
- Composable: Works with LangChain, LlamaIndex, etc.

Key Features:

OpenAI-style API on localhost:11434
GPU-free: works even on MacBooks with Apple Silicon
Easily swap between models with CLI
Integrated with Docker Desktop

Getting Started in 5 Minutes

1. Enable Model Runner (Docker Desktop)

docker desktop enable model-runner

2. Pull Your First Model

docker model pull ai/smollm2:360M-Q4_K_M

3. Run a Model with a Prompt

docker model run ai/smollm2:360M-Q4_K_M "Explain the Doppler effect like I’m five."

4. Use the API (OpenAI-compatible)

curl http://localhost:11434/v1/completions \ -H "Content-Type: application/json" \ -d '{"model": "smollm2", "prompt": "Hello, who are you?", "max_tokens": 100}'

⚙️ Building Your Local GenAI Stack

Here's a simple architecture using Docker Model Runner as your inference backend:

- LangChain: For prompt templating and chaining
- Docker Model Runner: Runs the actual LLMs locally
- LlamaIndex: For document indexing and retrieval (RAG)
- React Frontend: Clean chat UI to interface with the model
- Docker Compose: One command to run them all

docker-compose.yml Example (coming in GitHub repo):

services:
  model-runner:
    image: ai/smollm2:360M-Q4_K_M
    ports:
      - "11434:11434"
  frontend:
    build: ./frontend
    ports:
      - "3000:3000"
    environment:
      - API_URL=http://localhost:11434

Features:

Works offline
Model hot-swapping via env vars
Fully containerized

💡 Bonus: Add a Frontend Chat UI

Use any frontend framework (React/Next.js/Vue) to build a chat interface that talks to your local model via REST API.

Simple example fetch:

fetch("http://localhost:11434/v1/completions", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ prompt: "What is Docker?", model: "smollm2" })
});

This gives you a complete, local-first LLM experience without GPUs or cloud APIs.

🚀 Advanced Use Cases

- RAG pipelines: Combine PDFs + local vector search + Model Runner
- Multiple models: Run phi2, mistral, and more in separate services
- Model comparison: Build A/B testing interfaces using Compose
- Whisper.cpp integration: Speech-to-text container add-ons (coming soon)
- Edge AI setups: Deploy on airgapped systems or dev boards

The Vision: Where This Is Headed

Docker Model Runner could evolve into a full ecosystem:

ModelHub: Searchable, taggable model registry
Compose-native GenAI templates
Whisper + LLM hybrid runners
Dashboard for monitoring model performance
VSCode extensions for prompt engineering + test

As a developer, I see this as a huge opportunity to lower the barrier for AI experimentation and help bring container-native AI to everyone.

Explore What’s Next in DevOps, IaC, and Security

Join us for demos, and learn trends, best practices, and lessons learned in Platform Engineering & DevOps, Cloud and IaC, and Security.

Save Your Spot

Top comments (5)

sakethk • Apr 18 • Edited

Helpful

Dom • Apr 14

Thank you super helpful 🙂

Karan Verma • Apr 14

Thanks! Really happy it helped.

Vasilis Plavos • Apr 14

So, Docker Model Runner is just an Ollama alternative, with the dissadvantage that you cannot "dockerize" Docker Model Runner. So, NO issolation!

Still, Docker Desktop (for isolation) with Ollama (for the API, Models, etc) and WebUI (for UI) is a better option atm.

Karan Verma • Apr 14

Thanks for your comment!

While Docker Model Runner (DMR) isn’t a direct replacement for Ollama, it serves a different purpose. DMR is designed to be used within Docker-native workflows, where it allows each model to run in its own container, giving you model-level isolation. So, even though the runner itself isn’t Dockerized (yet), you still get the isolation benefits of Docker at the model level, which can be useful when you’re integrating with other services like APIs, databases, or custom UIs.

Ollama is great for quick, simple setups with built-in APIs and UIs, but DMR is meant for developers who need more control over their infrastructure and want to integrate LLMs into larger, composable systems.

That said, if your focus is on simplicity and quick isolation, combining Docker Desktop with Ollama and WebUI is definitely a solid option right now. Both tools have their place depending on what you're building!

DEV Community

From Zero to Local LLM: A Developer's Guide to Docker Model Runner

Explore What’s Next in DevOps, IaC, and Security

Top comments (5)

Discover what it takes to launch a Web3 startup

Okay