Alpaca-LoRA is an open-source project that reproduces results from Stanford Alpaca using Low-Rank Adaptation (LoRA) techniques. It provides an Instruct model of similar quality to text-davinci-003, runs on a Raspberry Pi (for research), and the code is easily extended to 13b, 30b and 65b models.
It is impressive that Alpaca-LoRA can scale down the model to run on personal computers without relying on GPU, which I found very intriguing. As a result, I tried the model myself and leaved some screenshots. The whole process is on i7-10750H CPU, 32 GB RAM. The OS is Ubuntu 22.04.
Alpaca-LoRA
Source: https://github.com/tloen/alpaca-lora
Run the program
Resource usage
Screenshots of inputs and responses, each response takes 330 to 420 seconds (Maybe the memory is exhausted and swap is used, so it is very slow?)
Ask it a sample question:
Ask it to write me a BMI calculator:
The BMI program is correct, it also provides parameter annotations.
Ask it to write me Tower of Hanoi:
The program seems wrong.
Ask it "what is machine learning?" in Chinese and have it answer in Chinese:
It can understand Chinese but can only answer in English.
llama.cpp
Source: https://github.com/ggerganov/llama.cpp
The github user ggerganov writes an tensor library for machine learning in C called ggml. Then he ports ggml to LLaMA model, change 16-bit float into smaller value type, and use AVX2 or NEON to accelerate on CPU. See his comment for more information.
The smallest model alpaca-lora-7B-ggml only weights 4GB and can run on a raspberry pi. See the video. I wanted to try this as well, but there's a breaking change recently, not sure if this model works now. So I tried alpaca-lora-30B-ggml. It still executes very slowly, and there seems to be a small bug in the interactive command line. Hope it will be improved to be usable in the future.
See: https://www.youtube.com/watch?v=RgSAe8tDfew
Top comments (0)