LocalAI has emerged as a crucial tool for running Large Language Models (LLMs) locally. What began as a weekend project by Ettore "mulder" Di Giacinto, quickly evolved into a dynamic, community-driven initiative. Continuously expanding, LocalAI now boasts an array of features, supported backends, and an upcoming version 2.
LocalAI's primary function is to facilitate the operation of models within a Docker container, accessible via APIs. Remarkably, it does not require GPUs (though they are partially supported). This accessibility allows anyone with at least 10GB of RAM and adequate disk space for model storage to use LocalAI, whether on a laptop or within a Kubernetes deployment.
An Open Source and Community-Driven Project
Hosted on GitHub and distributed under the MIT open source license, LocalAI supports various backends like llama.cpp, GPT4All, and others. This compatibility extends to multiple model formats, including ggml, gguf, GPTQ, onnx, and HuggingFace. LocalAI is adept at handling not just text, but also image and voice generative models.
The project offers a curated gallery of pre-configured models with clean licenses, along with a larger, community-sourced collection. Additionally, it facilitates easy model importation.
Simple Configuration Process
Installing LocalAI is straightforward, though it requires time and space to download the Docker image and models. The initial steps involve cloning the Git repository, followed by downloading and setting up a LLM:
# Clone LocalAI Git repo
git clone https://github.com/go-skynet/LocalAI
cd LocalAI
# Download a LLM and copy it into the 'models/' directory
wget https://huggingface.co/TheBloke/Luna-AI-Llama2-Uncensored-GGUF/resolve/main/luna-ai-llama2-uncensored.Q4_0.gguf \
-O models/luna-ai-llama2
# Copy a generic prompt template
cp -rf prompt-templates/getting_started.tmpl models/luna-ai-llama2.tmpl
For the Docker part, the LocalAI image is obtained, and the container is built and launched:
docker compose up -d --pull always
Once set up, LocalAI and the model are ready for use:
# List available models
curl http://localhost:8080/v1/models | jq .
# Example JSON response
{
"object": "list",
"data": [
{
"id": "luna-ai-llama2",
"object": "model"
}
]
}
# Call the ChatCompletion API
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json"
-d '{ \
"model": "luna-ai-llama2",
"messages": [{
"role": "user",
"content": "Why is the Earth round?"}],
"temperature": 0.9 }'
# Response:
{
"created":1699913704,
"object":"chat.completion",
"id":"54334699-2195-489a-b144-56690e0c19e4",
"model":"luna-ai-llama2",
"choices":[
{ "index":0,"finish_reason":"stop",
"message":{
"role":"assistant",
"content":"The Earth is round because of its own gravity.
Gravity pulls all objects towards its center, and the Earth is no
exception. Over time, the Earth's own gravity has pulled it into a
roughly spherical shape. This is known as hydrostatic
equilibrium."}}
],
"usage": {"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}
}
Utilizing Galleries
Configuring the container with earlier mentioned galleries is possible by modifying the .env file:
# Edit .env file for galleries
GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"name":"huggingface", "url":"github:go-skynet/model-gallery/huggingface.yaml"}]
After rebuilding the container, a vast array of models becomes available:
curl http://localhost:8080/models/available | jq .
# Example of a truncated JSON response
...
{
"url": "github:go-skynet/model-gallery/base.yaml",
"name": "xzuyn__pythia-deduped-160m-ggml__ggjtv1-model-q4_2.bin",
"urls": [
"https://huggingface.co/xzuyn/Pythia-Deduped-160M-GGML"
],
"tags": [
"gpt_neox",
"region:us"
],
"overrides": {
"parameters": {
"model": "ggjtv1-model-q4_2.bin"
}
},
"files": [
{
"filename": "ggjtv1-model-q4_2.bin",
"sha256": "",
"uri": "https://huggingface.co/xzuyn/Pythia-Deduped-160M-GGML/resolve/main/ggjtv1-model-q4_2.bin"
}
],
"gallery": {
"url": "github:go-skynet/model-gallery/huggingface.yaml",
"name": "huggingface"
}
},
...
Installing new models from the gallery via the API is also streamlined:
curl http://localhost:8080/models/apply \
-H "Content-Type: application/json"
-d '{ "id": "model-gallery@mistral" }'
LocalAI returns a UUID:
{ "uuid":"9c66ffdb-82f4-11ee-95cd-0242ac180002",
"status":"http://localhost:8080/models/jobs/9c66ffdb-82f4-11ee-95cd-0242ac180002"}
It can be used to get the download and installation status:
# Response
{ "file_name":"mistral-7b-openorca.Q6_K.gguf",
"error":null,
"processed":false,
"message":"processing",
"progress":2.0081163447858397,
"file_size":"5.5 GiB",
"downloaded_size":"113.8 MiB"}
Integration and Deployment
LocalAI aligns with OpenAI API specifications, making it a seamless substitute for OpenAI models. This compatibility enables the use of various frameworks, UIs, and tools originally designed for OpenAI. Numerous usage examples include bots for Discord or Telegram, web UIs, and integration with projects like Flowise.
Moreover, LocalAI offers Helm Charts for easy Kubernetes deployment. It's a featured component in BionicGPT, an open-source project that incorporates LocalAI into its architecture.
LocalAI stands out as a versatile and user-friendly tool for running Large Language Models locally. Its compatibility with various model formats and ease of installation make it an attractive option for both individual enthusiasts and professional developers. The active community support and open-source nature further enhance its appeal, fostering continuous improvement and innovation. Whether for experimenting on a laptop or deploying in a Kubernetes environment, LocalAI offers a powerful, accessible gateway to the world of advanced AI models.
Top comments (0)