Gabriel Lavoura

Posted on Mar 13

Running Ollama on Docker: A Quick Guide

#ai #programming #langchain #llm

Hi it's me again! Over the past few days, I've been testing multiples ways to work with LLMs locally, and so far, Ollama was the best tool (ignoring UI and other QoL aspects) for setting up a fast environment to test code and features.
I've tried GPT4ALL and other tools before, but they seem overly bloated when the goal is simply to set up a running model to connect with a LangChain API (on Windows with WSL).

Ollama provides an extremely straightforward experience. Because of this, today I decided to install and use it via Docker containers — and it's surprisingly easy and powerful..

With just five commands, we can set up the environment. Let's take a look.

Step 1 - Pull the latest Ollama Docker image

docker pull ollama/ollama

If you want to download an older version, you can specify the corresponding tag after the container name. By default, the :latest tag is downloaded. You can check a list of available Ollama tags here.

Step 2 - Create a Docker network

Since we'll typically use and connect multiple containers, we need to specify a shared communication channel. To achieve this, it's a good practice create a Docker network.

docker network create <network-name>

You can check a list of created Docker networks by running the following command:

docker network list

Step 3 - Run the Ollama container

In this tutorial, we're going to run Ollama with CPU only. If you need to use GPU, the official documentation provide a step-by-step guide.

The command to run the container is also listed in the documentation, but we need to specify which network it should connect to, so we must add the --network parameter.

docker run -d --netowk <network-name> -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Step 4 - Run commands inside the Ollama container

To download Ollama models, we need to run ollama pull command.

To do this, we simply execute the command below, which enables the execution inside the container by enabling the interative mode (-it parameter).
Then, we run ollama pull to download the llama3.2:latest (3B), quantized model:

docker exec -it ollama ollama pull llama3.2

Visit the Ollama website to check the list of available models. Now, wait for the download to finish.

You'll get this:

Step 5 - Check the downloaded models

To list the locally available models, just run:

ollama list

You should get this output:

So you're done! Now you have Ollama running (using only the CPU), with the llama3.2:latest model available locally. To run it with a GPU, check the documentation link in Step 3.

I'll share more short notes on working with Ollama and LangChain in the next few days. Stay tuned!

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now

DEV Community