How to Finetune Llama 2: A Beginner's Guide

#beginners #tutorial #ai #opensource

Meta AI's LLaMA 2 has taken the NLP community by storm with its impressive range of pretrained and fine-tuned Large Language Models (LLMs). With model sizes ranging from 7B to a staggering 70B parameters, LLaMA 2 builds upon the success of its predecessor, LLaMA 1, offering a host of enhancements that have captivated the NLP community.

The Evolution of LLaMA

LLaMA 2 signifies a significant evolution in the landscape of language models. Its expansive corpus, featuring 40% more tokens than LLaMA 1, empowers it with an extraordinary context length of up to 4000 tokens. This extended contextual understanding enables LLaMA 2 to excel in tasks that require nuanced comprehension of text and context.

What makes LLaMA 2 even more extraordinary is its accessibility. Meta AI has generously made these advanced model weights available for both research and commercial applications. This democratization of cutting-edge language models ensures that a broader audience, from researchers to businesses, can harness the power of LLaMA 2 for their unique needs.

To get access to Llama 2, you can follow these steps:

Go to the Hugging Face Model Hub: huggingface.co/meta-llama and select the model that you want to use.

Click on the "Request Access" button.
Fill out the form and Submit it.
Once your request has been approved, you will be able to download the model weights.
Here are some additional details about each size of the Llama 2 model:
- 7B parameters: This is the smallest size of the Llama 2 model. It is still a powerful model, but it is not as large as the 13B or 70B parameter models.
- 13B parameters: This is the medium-sized version of the Llama 2 model. It is a good choice for most applications.
- 70B parameters: This is the largest size of the Llama 2 model. It is the most powerful model, but it is also the most expensive to train and use.

In this blog post, I will show you how to effortlessly fine-tune the LLaMA 2 - 7B model on a subset of the CodeAlpaca-20k dataset. This dataset contains over 20,000 coding questions and their corresponding correct answers. By fine-tuning the model on this dataset, we can teach it to generate code for a variety of tasks.

In this blog post, I want to make it as simple as possible to fine-tune the LLaMA 2 - 7B model, using as little code as possible. We will be using the Alpaca Lora Training script, which automates the process of fine-tuning the model and for GPU we will be using Beam.

You can create a free account on Beam, to get started.

Prerequisites

An account on Beam
An API Key from Dashboard
Install Beam CLI by running:

curl https://raw.githubusercontent.com/slai-labs/get-beam/main/get-beam.sh -sSfL | sh

Configure Beam by entering

beam configure

Install Beam SDK

pip install beam-sdk

Now you’re ready to start using Beam to deploy your ML models.

To make it simple, I have made a Github Repo, which you can clone to start with.

In the app.py file, I use the CodeAlpaca-20k dataset.

@app.run()
def train_model():
    # Trained models will be saved to this path
    beam_volume_path = "./checkpoints"

    # Load dataset -- for this example, we'll use the sahil2801/CodeAlpaca-20k dataset hosted on Huggingface:
    # https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k
    dataset = DatasetDict()
    dataset["train"] = load_dataset("sahil2801/CodeAlpaca-20k", split="train[:20%]")

    # Adjust the training loop based on the size of the dataset
    samples = len(dataset["train"])
    val_set_size = ceil(0.1 * samples)

    train(
        base_model=base_model,
        val_set_size=val_set_size,
        data=dataset,
        output_dir=beam_volume_path,
    )

To run the training/finetuning we will be running it using the command:

beam run app.py:train_model

When we run this command, the training function will run on Beam's cloud, and we'll see the progress of the training process streamed to our terminal.

The training may take hours to complete depending on the size of the dataset you use for finetuning. In my case as I am just using 20% of the dataset, the training was completed in around 1 hour.

When the model is succesfuuly trained, we can deploy an API to run inference of our fine-tuned model.

Let's create a new function for inference. If you look closely, you'll notice that we're using a different decorator this time: rest_api instead of run.

This will allow us to deploy the function as a REST API.

@app.rest_api()
def run_inference(**inputs):
    # Inputs passed to the API
    input = inputs["input"]

    # Grab the latest checkpoint
    checkpoint = get_newest_checkpoint()

    # Initialize models
    models = load_models(checkpoint=checkpoint)

    model = models["model"]
    tokenizer = models["tokenizer"]
    prompter = models["prompter"]

    # Generate text
    response = call_model(
        input=input, model=model, tokenizer=tokenizer, prompter=prompter
    )
    return response

We can deploy this as a REST API by running this command:

beam deploy app.py:run_inference

If we navigate to the URL printed in the shell, we'll be able to copy the full cURL request to call the REST API.

Now when I tried asking "How to download an image from link in Python"

{
    "input": "How to download an image from its URL in Python?"
}

I get this entire markdown string as a response

I have prettified it below:

import urllib.request

url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/6/63/Square_logo_2008.png/1200px-Square_logo_2008.png'

with urllib.request.urlopen(url) as response:
 image = response.read()

print(image)

### Solution:

import urllib.request

url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/6/63/Square_logo_2008.png/1200px-Square_logo_2008.png'

with urllib.request.urlopen(url) as response:
 image = response.read()

print(image)


### Explanation:

The first step is to import the `urllib.request` module.

The second step is to create a `with` statement. The `with` statement is used to open a file or a connection.

The third step is to create a `urlopen` function. The `urlopen` function is used to open a URL.

The fourth step is to create a `response` variable. The `response` variable is used to store the response of the URL.

The fifth step is to create a `read` function. The `read` function is used to read the response of the URL.

The sixth step is to print the image. The `print` statement is used to print the image.

### Reflection:

- What is the difference between a `with` statement and a `try` statement?
- What is the difference between a `urlopen` function and a `request` function?
- What is the difference between a `response` variable and a `request` variable?
- What is the difference between a `read` function and a `request` function?
- What is the difference between a `print` statement and a `request` statement?
</s>

Although training the model on 20% of the dataset is not ideal, it is still a good way to get started. You can see that we are already starting to see good results with this small amount of data. If you want to get even better results, you can try fine-tuning the model on the entire dataset. This will take several hours, but it will be worth it in the end. Once the model is trained, you can use it whenever you need it.

Conclusion

In conclusion, we have seen how to fine-tune LLaMA 2 - 7B on a subset of the CodeAlpaca-20k dataset using the Alpaca Lora Training script. This script makes it easy to fine-tune the model without having to write any code.

We have also seen that even by training the model on 20% of the dataset, we can get good results. If you want to get even better results, you can try fine-tuning the model on the entire dataset.

The future of open source AI is bright. The availability of large language models like LLaMA 2 makes it possible for anyone to develop powerful AI applications. With the help of open source tools and resources, developers can fine-tune these models to meet their specific needs.

In case if you still have any questions regarding this post or want to discuss something with me feel free to connect on LinkedIn or Twitter.