To program Llama 2 easily, it is highly recommended to encode quantized model.
There is llama C++ port repository.
Download llama.cpp
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
make
Convert model to GGLM format
cd llama.cpp
python3 -m venv llama2
source llama2/bin/activate
python3 -m pip install -r requirements.txt
Converting process consists of two step.
- convert model to f16 format
- convert f16 model to ggml
convert to f16 format
mkdir -p models/7B
python3 convert.py --outfile models/7B/ggml-model-f16.bin \
--outtype f16 \
../llama2/llama/llama-2-7b-chat \
--vocab-dir ../llama2/llama
Before run the convert, create output directory (ex. models/7B)
--outfile is for specifying the output file name
--outtype is for specifying the output type which is f16
--vocab-dir is for specifying the directory containing tokenizer.model file
If you are hard to find tokenzier.model file, see tokenizer.model
convert f16 model to ggml
This step is called as quantize the model
./quantize ./models/7B/ggml-model-f16.bin \
./models/7B/ggml-model-q4_0.bin q4_0
After quantize model, the file size became very small.
mzc01-choonhoson@MZC01-CHOONHOSON 7B % ls -alh
total 33831448
drwxr-xr-x@ 4 mzc01-choonhoson staff 128B 9 12 17:23 .
drwxr-xr-x@ 5 mzc01-choonhoson staff 160B 9 12 16:50 ..
-rw-r--r--@ 1 mzc01-choonhoson staff 13G 9 12 17:23 ggml-model-f16.bin
-rw-r--r--@ 1 mzc01-choonhoson staff 3.6G 9 12 17:23 ggml-model-q4_0.bin
Example
All done. run example binary!!!
./main -m ./models/7B/ggml-model-q4_0.bin -n 1024 --repeat_penalty 1.0 --color -i -r "User:" -f ./prompts/chat-with-bob.txt
References
GGML - Large Language Models for Everyone
https://github.com/rustformers/llm/blob/main/crates/ggml/README.md
Series
Llama 2 in Apple Silicon Bacbook (1/3)
https://dev.to/choonho/llama-2-in-apple-silicon-macbook-13-54h
Llama 2 in Apple Silicon Bacbook (2/3)
https://dev.to/choonho/llama-2-in-apple-silicon-macbook-23-2j51
Llama 2 in Apple Silicon Bacbook (3/3)
https://dev.to/choonho/llama-2-in-apple-silicon-macbook-33-3hb7
Top comments (0)