Run AI Models Locally: Docker Desktop’s New AI Model Runner

#docker #ai #programming #kristiyanvelkov

I Got Early Access to Docker Desktop’s New AI Model Runner — Here’s What You Need to Know

Docker has just taken a major step forward with its newest addition to Docker Desktop: AI Model Runner. As a Docker captain I had early access to this feature and spent the last few days testing it out in real-world scenarios.

And let me tell you — this is a game-changer. 🚀

What Is the AI Model Runner?

The Docker Model Runner is a new experimental feature in Docker Desktop 4.40+ that gives you a Docker-native experience for running large language models (LLMs) locally.
It’s available now on macOS with Apple Silicon (M1/M2/M3), with Windows support on NVIDIA GPUs coming in end of April 2025.

This is not another containerized runtime. Docker runs the inference engine (like llama.cpp) directly on your host with GPU access, so you get:

Direct GPU acceleration.
No network latency
Full control and privacy
Models are pulled as OCI artifacts from Docker Hub and dynamically loaded into memory — not packed in images.

This means:

Lower disk usage
Faster load times
Cleaner dev experience
It’s fast, simple, and will be integrated directly into the Docker Desktop UI.

My First Impressions

As someone who frequently works on front-end applications and prototypes AI-powered features, I was thrilled to try this.

Within minutes, I was able to:

Launch an AI model locally with one command.
Run inference locally.
Avoid latency, quota limits, and cloud API complexities.
Stay 100% private and offline when needed.
It feels like the local-first development experience we’ve all been waiting for.

Exploring and Managing Models

For CLI lovers like me, here are the essential commands available:

🧠 Docker Model Runner CLI Commands:

- docker model list        # List available models
- docker model inspect     # View detailed info about a model
- docker model pull        # Download a model to your machine
- docker model run         # Run the model locally
- docker model rm          # Remove a downloaded model
- docker model status      # Check if Model Runner is active
- docker model version     # Show version of the Model Runner

This gives you full control from the terminal — just like any Docker-native tool. You can browse, download, run, and manage models with ease.

📦 Available Models

All current models are hosted under a personal namespace on Docker Hub:

All these models are hosted on https://hub.docker.com/u/ai:

- ai/gemma3
- ai/llama3.2
- ai/qwq
- ai/mistral-nemo
- ai/mistral
- ai/phi4
- ai/qwen2.5
- ai/deepseek-r1-distill-llama (distill means it’s not the actual RL-ed deepseek, it’s a llama trained on DeepSeek-R1 inputs/outputs)

Example usage:

docker pull ai/llama3.2
docker model run ai/llama3.2:1B-Q8_0 "What is Docker?"

Expected output:

Docker is an open-source platform that allows you to automate the deployment, scaling, and management of applications using containerization. It helps developers package applications with all the parts they need.

Why This Matters

Privacy & Security — Sensitive data stays local. That’s critical for enterprise apps and internal tools.
Speed — Real-time feedback without internet dependency means faster dev loops.
Developer Experience — It fits naturally into the Docker ecosystem. -** No extra tools, no surprises**.

This addition will lower the entry barrier for developers looking to build AI features. Whether you’re experimenting with Open Source LLMs or integrating existing ones into your product, Docker Desktop just made your life a lot easier.

Final Thoughts

Docker continues to evolve beyond containers into something broader — a full developer platform. The AI Model Runner is proof that they’re paying attention to how we actually build today.

If you’ve ever wished you could test and run AI models with the same simplicity as docker run, this is it.

I’ll be sharing more insights and use cases as I keep working with the feature.

For now, if you’re curious — try it out, experiment, and let me know what you build.

Subscribe to my Newsletter Front-end world