Chandler for TimeSurge Labs

Posted on Apr 21 • Edited on May 16

How to Run Llama 3 Locally with Ollama and Open WebUI

#tutorial #ai #productivity #api

I’m a big fan of Llama. Meta releasing their LLM open source is a net benefit for the tech community at large, and their permissive license allows most medium and small businesses to use their LLMs with little to no restrictions (within the bounds of the law, of course). Their latest release is Llama 3, which has been highly anticipated.

Llama 3 comes in two sizes: 8 billion and 70 billion parameters. This kind of model is trained on a massive amount of text data and can be used for a variety of tasks, including generating text, translating languages, writing different kinds of creative content, and answering your questions in an informative way. Meta touts Llama 3 as one of the best open models available, but it is still under development. Here’s the 8B model benchmarks when compared to Mistral and Gemma (according to Meta).

This begs the question: how can I, the regular individual, run these models locally on my computer?

Getting Started with Ollama

That’s where Ollama comes in! Ollama is a free and open-source application that allows you to run various large language models, including Llama 3, on your own computer, even with limited resources. Ollama takes advantage of the performance gains of llama.cpp, an open source library designed to allow you to run LLMs locally with relatively low hardware requirements. It also includes a sort of package manager, allowing you to download and use LLMs quickly and effectively with just a single command.

The first step is installing Ollama. It supports all 3 of the major OSes, with Windows being a “preview” (nicer word for beta).

Once this is installed, open up your terminal. On all platforms, the command is the same.



ollama run llama3

Wait a few minutes while it downloads and loads the model, and then start chatting! It should bring you to a chat prompt similar to this one.



ollama run llama3
>>> Who was the second president of the united states?
The second President of the United States was John Adams. He served from 1797 to 1801, succeeding
George Washington and being succeeded by Thomas Jefferson.

>>> Who was the 30th?
The 30th President of the United States was Calvin Coolidge! He served from August 2, 1923, to March 4,
1929.

>>> /bye

You can chat all day within this terminal chat, but what if you want something more ChatGPT-like?

Open WebUI

Open WebUI is an extensible, self-hosted UI that runs entirely inside of Docker. It can be used either with Ollama or other OpenAI compatible LLMs, like LiteLLM or my own OpenAI API for Cloudflare Workers.

Assuming you already have Docker and Ollama running on your computer, installation is super simple.



docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

The simply go to http://localhost:3000, make an account, and start chatting away!

If you didn’t run Llama 3 earlier, you’ll have to pull some models down before you can start chatting. Easiest way to do this is to click the settings icon after clicking your name in the bottom left.

Then clicking on “models” on the left side of the modal, then pasting in a name of a model from the Ollama registry. Here are some models that I’ve used that I recommend for general purposes.

llama3
mistral
llama2

Ollama API

If you want to integrate Ollama into your own projects, Ollama offers both its own API as well as an OpenAI Compatible API. The APIs automatically load a locally held LLM into memory, run the inference, then unload after a certain timeout. You do have to pull whatever models you want to use before you can run the model via the API, which can easily be done via the command line.



ollama pull mistral

Ollama API

Ollama has their own API available, which also has a couple of SDKs for Javascript and Python.

Here is how you can do a simple text generation inference with the API.



curl http://localhost:11434/api/generate -d '{
  "model": "mistral",
  "prompt":"Why is the sky blue?"
}'

And here’s how you can do a Chat generation inference with the API.



curl http://localhost:11434/api/chat -d '{
  "model": "mistral",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'

Replace the model parameter with whatever model you want to use. See the official API docs for more information.

OpenAI Compatible API

You can also use Ollama as a drop in replacement (depending on use case) with the OpenAI libraries. Here’s an example from their documentation.



# Python
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',

    # required but ignored
    api_key='ollama',
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            'role': 'user',
            'content': 'Say this is a test',
        }
    ],
    model='mistral',
)

This also works for Javascript.



// Javascript

import OpenAI from 'openai'

const openai = new OpenAI({

  baseURL: 'http://localhost:11434/v1/',

// required but ignored

  apiKey: 'ollama',

})

const chatCompletion = await openai.chat.completions.create({

  messages: [{ role: 'user', content: 'Say this is a test' }],

  model: 'llama2',

})

Conclusion

The release of Meta's Llama 3 and the open-sourcing of its Large Language Model (LLM) technology mark a major milestone for the tech community. With these advanced models now accessible through local tools like Ollama and Open WebUI, ordinary individuals can tap into their immense potential to generate text, translate languages, craft creative writing, and more. Furthermore, the availability of APIs enables developers to seamlessly integrate LLMs into new projects or enhance existing ones. Ultimately, the democratization of LLM technology through open-source initiatives like Llama 3 unlocks a vast realm of innovative possibilities and fuels creativity in the tech industry.

Top comments (4)

Michael Zietlow • May 18 • Edited

Nice baseline to get Lamma3 working with a GUI! I decided to give it a shot on my home ProxmoxVE. Fired up a Ubuntu22.04 VM with an RTX3090 Nvidia passthrough.

The GUI under Ubuntu had issues populating Manage Ollama Models though so I needed to modify the docker run command to explicit the base URL & the fact I needed GPU support of course. Here are my line adds for that for the Ubuntu crowed...

# sudo apt-get install -y nvidia-docker2

#  sudo systemctl restart docker

#  nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-3cc92aa5-19c8-aabe-3e76-fc23af770969)

# docker run -d --network=host --runtime=nvidia --gpus device=GPU-3cc92aa5-19c8-aabe-3e76-fc23af770969 -v ollama:/root/.ollama -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

Zinelabidine Teyar • Oct 2

very important, the installation documentation from open webui has a section for this

To run Open WebUI with Nvidia GPU support

docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda