How to Efficiently Run Meta LLaMA on a MacBook Air M1 with Limited RAM

#deepcodr #ai #machinelearning #tutorial

Running advanced AI models like Meta’s LLaMA on a MacBook might seem ambitious specifically when you have M1 with 8 GB of RAM, But with the right steps, you can start building AI apps locally on your Mac easily. Thanks to Apple’s processor architecture and efficient libraries like Llama.cpp, you can unlock the power of large language models right from your lightweight laptop.

Let's get you started with the MacBook Air M1 for running these models efficiently.

Step 1:

Download the entire model from Meta's official site by providing your information & details for usage.

Install the necessary packages as mentioned in the readme file.

Run the below command to run the model

torchrun \
  --nproc_per_node=$NGPUS \
  llama_models/scripts/example_chat_completion.py $CHECKPOINT_DIR \
  --model_parallel_size $NGPUS

Definitely, this is not going to work ☹️. To solve this issue we will be following two methods below

first, we will be using llama.cpp which provides lightweight C++ implementation for running models on various hardware.
We will use quantization to reduce the model size so that we will be able to run it easily.

Step 1 (This one will work):

Install llama.cpp using brew.

brew install llama.cpp

Step 2:

Now let's quantize the llama model. To perform this we have a very good space on HuggingFace called GGUF-My-Repo. Follow the below link to go to space.

GGUF-MY-REPO

On this space login with your HuggingFace credentials. Then select the model repository that you want to quantize. For llama, you need to get access to the repo. Select the checkbox for 'Create a private repo under your username'. If you are a beginner then leave other options to default and proceed.

Once the process is done you will have a quantized model created in your private repository.

Step 3:

Clone the HuggingFace repo just created on your Mac. Run the model using the command below

llama-cli -m GGUF_MODEL_FILE_NAME -n 1024 -ngl 1 -c 512 --prompt PROMT cnv

example

llama-cli -m meta-llama-3.1-8b-q4_k_m.gguf -n 1024 -ngl 1 -c 512 --prompt "Hello" -cnv

You can also run directly from the repository name.

llama-cli --hf-repo Deepcodr/llama_sample_chat-Q4_K_M-GGUF --hf-file llama_sample_chat-q4_k_m.gguf -p "The meaning to life and the universe is"

Tip: If you are a beginner avoid using base models if you don't want to get some gibberish or random responses. Instead use instruct models such as chat, or text. etc. You can find some already quantized models here

DEV Community

How to Efficiently Run Meta LLaMA on a MacBook Air M1 with Limited RAM

Top comments (0)

Read next

The Big Lie AI Vendors Keep Telling You

Here’s how AI-powered autocompletion is implemented in Novel, an open-source text editor

How To Create Easy Pagination In Jetpack Compose

Rewind AI + Cursor AI = screenpipe: how we built a high performance Rust frame streaming API (OSS)