Running advanced AI models like Meta’s LLaMA on a MacBook might seem ambitious specifically when you have M1 with 8 GB of RAM, But with the right steps, you can start building AI apps locally on your Mac easily. Thanks to Apple’s processor architecture and efficient libraries like Llama.cpp, you can unlock the power of large language models right from your lightweight laptop.
Let's get you started with the MacBook Air M1 for running these models efficiently.
Step 1:
Download the entire model from Meta's official site by providing your information & details for usage.
Install the necessary packages as mentioned in the readme file.
Run the below command to run the model
torchrun \
--nproc_per_node=$NGPUS \
llama_models/scripts/example_chat_completion.py $CHECKPOINT_DIR \
--model_parallel_size $NGPUS
Definitely, this is not going to work ☹️. To solve this issue we will be following two methods below
first, we will be using llama.cpp which provides lightweight C++ implementation for running models on various hardware.
-
We will use quantization to reduce the model size so that we will be able to run it easily.
Step 1 (This one will work):
Install llama.cpp using brew.
brew install llama.cpp
Step 2:
Now let's quantize the llama model. To perform this we have a very good space on HuggingFace called GGUF-My-Repo. Follow the below link to go to space.
On this space login with your HuggingFace credentials. Then select the model repository that you want to quantize. For llama, you need to get access to the repo. Select the checkbox for 'Create a private repo under your username'. If you are a beginner then leave other options to default and proceed.
Once the process is done you will have a quantized model created in your private repository.
Step 3:
Clone the HuggingFace repo just created on your Mac. Run the model using the command below
llama-cli -m GGUF_MODEL_FILE_NAME -n 1024 -ngl 1 -c 512 --prompt PROMT cnv
example
llama-cli -m meta-llama-3.1-8b-q4_k_m.gguf -n 1024 -ngl 1 -c 512 --prompt "Hello" -cnv
You can also run directly from the repository name.
llama-cli --hf-repo Deepcodr/llama_sample_chat-Q4_K_M-GGUF --hf-file llama_sample_chat-q4_k_m.gguf -p "The meaning to life and the universe is"
Tip: If you are a beginner avoid using base models if you don't want to get some gibberish or random responses. Instead use instruct models such as chat, or text. etc. You can find some already quantized models here
Top comments (0)