Llama 2 in Apple Silicon Macbook (2/3)

To program Llama 2 easily, it is highly recommended to encode quantized model.

There is llama C++ port repository.

Download llama.cpp

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
make

Convert model to GGLM format

cd llama.cpp
python3 -m venv llama2
source llama2/bin/activate
python3 -m pip install -r requirements.txt

Converting process consists of two step.

convert model to f16 format
convert f16 model to ggml

convert to f16 format

mkdir -p models/7B
python3 convert.py --outfile models/7B/ggml-model-f16.bin \
--outtype f16 \
../llama2/llama/llama-2-7b-chat \
--vocab-dir ../llama2/llama

Before run the convert, create output directory (ex. models/7B)

--outfile is for specifying the output file name
--outtype is for specifying the output type which is f16
--vocab-dir is for specifying the directory containing tokenizer.model file

If you are hard to find tokenzier.model file, see tokenizer.model

convert f16 model to ggml

This step is called as quantize the model

./quantize ./models/7B/ggml-model-f16.bin \
./models/7B/ggml-model-q4_0.bin q4_0

After quantize model, the file size became very small.

mzc01-choonhoson@MZC01-CHOONHOSON 7B % ls -alh
total 33831448
drwxr-xr-x@ 4 mzc01-choonhoson  staff   128B  9 12 17:23 .
drwxr-xr-x@ 5 mzc01-choonhoson  staff   160B  9 12 16:50 ..
-rw-r--r--@ 1 mzc01-choonhoson  staff    13G  9 12 17:23 ggml-model-f16.bin
-rw-r--r--@ 1 mzc01-choonhoson  staff   3.6G  9 12 17:23 ggml-model-q4_0.bin

Example

All done. run example binary!!!

./main -m ./models/7B/ggml-model-q4_0.bin -n 1024 --repeat_penalty 1.0 --color -i -r "User:" -f ./prompts/chat-with-bob.txt

References

GGML - Large Language Models for Everyone
https://github.com/rustformers/llm/blob/main/crates/ggml/README.md

Series

Llama 2 in Apple Silicon Bacbook (1/3)
https://dev.to/choonho/llama-2-in-apple-silicon-macbook-13-54h

Llama 2 in Apple Silicon Bacbook (2/3)
https://dev.to/choonho/llama-2-in-apple-silicon-macbook-23-2j51

Llama 2 in Apple Silicon Bacbook (3/3)
https://dev.to/choonho/llama-2-in-apple-silicon-macbook-33-3hb7

DEV Community

Llama 2 in Apple Silicon Macbook (2/3)

Download llama.cpp

Convert model to GGLM format

convert to f16 format

convert f16 model to ggml

Example

References

Series

Top comments (0)

Read next

The Unseen Pillars: The Role and Impact of Unpaid Volunteer Work

Software Sustainability: A Path to Responsible Digital Evolution

Seamless Kubernetes Multi-Tenancy with vCluster and a Shared Platform Stack

Using Intl.DurationFormat for Localized Durations