Meta AI's LLaMA 2 has taken the NLP community by storm with its impressive range of pretrained and fine-tuned Large Language Models (LLMs). With model sizes ranging from 7B to a staggering 70B parameters, LLaMA 2 builds upon the success of its predecessor, LLaMA 1, offering a host of enhancements that have captivated the NLP community.
The Evolution of LLaMA
LLaMA 2 signifies a significant evolution in the landscape of language models. Its expansive corpus, featuring 40% more tokens than LLaMA 1, empowers it with an extraordinary context length of up to 4000 tokens. This extended contextual understanding enables LLaMA 2 to excel in tasks that require nuanced comprehension of text and context.
What makes LLaMA 2 even more extraordinary is its accessibility. Meta AI has generously made these advanced model weights available for both research and commercial applications. This democratization of cutting-edge language models ensures that a broader audience, from researchers to businesses, can harness the power of LLaMA 2 for their unique needs.
To get access to Llama 2, you can follow these steps:
Go to the Hugging Face Model Hub: huggingface.co/meta-llama and select the model that you want to use.
- Click on the "Request Access" button.
- Fill out the form and Submit it.
- Once your request has been approved, you will be able to download the model weights.
-
Here are some additional details about each size of the Llama 2 model:
- 7B parameters: This is the smallest size of the Llama 2 model. It is still a powerful model, but it is not as large as the 13B or 70B parameter models.
- 13B parameters: This is the medium-sized version of the Llama 2 model. It is a good choice for most applications.
- 70B parameters: This is the largest size of the Llama 2 model. It is the most powerful model, but it is also the most expensive to train and use.
In this blog post, I will show you how to effortlessly fine-tune the LLaMA 2 - 7B model on a subset of the CodeAlpaca-20k dataset. This dataset contains over 20,000 coding questions and their corresponding correct answers. By fine-tuning the model on this dataset, we can teach it to generate code for a variety of tasks.
In this blog post, I want to make it as simple as possible to fine-tune the LLaMA 2 - 7B model, using as little code as possible. We will be using the Alpaca Lora Training script, which automates the process of fine-tuning the model and for GPU we will be using Beam.
You can create a free account on Beam, to get started.
Prerequisites
curl https://raw.githubusercontent.com/slai-labs/get-beam/main/get-beam.sh -sSfL | sh
- Configure Beam by entering
beam configure
- Install Beam SDK
pip install beam-sdk
Now you’re ready to start using Beam to deploy your ML models.
To make it simple, I have made a Github Repo, which you can clone to start with.
In the app.py
file, I use the CodeAlpaca-20k dataset.
@app.run()
def train_model():
# Trained models will be saved to this path
beam_volume_path = "./checkpoints"
# Load dataset -- for this example, we'll use the sahil2801/CodeAlpaca-20k dataset hosted on Huggingface:
# https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k
dataset = DatasetDict()
dataset["train"] = load_dataset("sahil2801/CodeAlpaca-20k", split="train[:20%]")
# Adjust the training loop based on the size of the dataset
samples = len(dataset["train"])
val_set_size = ceil(0.1 * samples)
train(
base_model=base_model,
val_set_size=val_set_size,
data=dataset,
output_dir=beam_volume_path,
)
To run the training/finetuning we will be running it using the command:
beam run app.py:train_model
When we run this command, the training function will run on Beam's cloud, and we'll see the progress of the training process streamed to our terminal.
The training may take hours to complete depending on the size of the dataset you use for finetuning. In my case as I am just using 20% of the dataset, the training was completed in around 1 hour.
When the model is succesfuuly trained, we can deploy an API to run inference of our fine-tuned model.
Let's create a new function for inference. If you look closely, you'll notice that we're using a different decorator this time: rest_api
instead of run
.
This will allow us to deploy the function as a REST API.
@app.rest_api()
def run_inference(**inputs):
# Inputs passed to the API
input = inputs["input"]
# Grab the latest checkpoint
checkpoint = get_newest_checkpoint()
# Initialize models
models = load_models(checkpoint=checkpoint)
model = models["model"]
tokenizer = models["tokenizer"]
prompter = models["prompter"]
# Generate text
response = call_model(
input=input, model=model, tokenizer=tokenizer, prompter=prompter
)
return response
We can deploy this as a REST API by running this command:
beam deploy app.py:run_inference
If we navigate to the URL printed in the shell, we'll be able to copy the full cURL request to call the REST API.
Now when I tried asking "How to download an image from link in Python"
{
"input": "How to download an image from its URL in Python?"
}
I get this entire markdown string as a response
I have prettified it below:
import urllib.request
url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/6/63/Square_logo_2008.png/1200px-Square_logo_2008.png'
with urllib.request.urlopen(url) as response:
image = response.read()
print(image)
### Solution:
import urllib.request
url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/6/63/Square_logo_2008.png/1200px-Square_logo_2008.png'
with urllib.request.urlopen(url) as response:
image = response.read()
print(image)
### Explanation:
The first step is to import the `urllib.request` module.
The second step is to create a `with` statement. The `with` statement is used to open a file or a connection.
The third step is to create a `urlopen` function. The `urlopen` function is used to open a URL.
The fourth step is to create a `response` variable. The `response` variable is used to store the response of the URL.
The fifth step is to create a `read` function. The `read` function is used to read the response of the URL.
The sixth step is to print the image. The `print` statement is used to print the image.
### Reflection:
- What is the difference between a `with` statement and a `try` statement?
- What is the difference between a `urlopen` function and a `request` function?
- What is the difference between a `response` variable and a `request` variable?
- What is the difference between a `read` function and a `request` function?
- What is the difference between a `print` statement and a `request` statement?
</s>
Although training the model on 20% of the dataset is not ideal, it is still a good way to get started. You can see that we are already starting to see good results with this small amount of data. If you want to get even better results, you can try fine-tuning the model on the entire dataset. This will take several hours, but it will be worth it in the end. Once the model is trained, you can use it whenever you need it.
Conclusion
In conclusion, we have seen how to fine-tune LLaMA 2 - 7B on a subset of the CodeAlpaca-20k dataset using the Alpaca Lora Training script. This script makes it easy to fine-tune the model without having to write any code.
We have also seen that even by training the model on 20% of the dataset, we can get good results. If you want to get even better results, you can try fine-tuning the model on the entire dataset.
The future of open source AI is bright. The availability of large language models like LLaMA 2 makes it possible for anyone to develop powerful AI applications. With the help of open source tools and resources, developers can fine-tune these models to meet their specific needs.
In case if you still have any questions regarding this post or want to discuss something with me feel free to connect on LinkedIn or Twitter.
If you run an organization and want me to write for you, please connect with me on my Socials 🙃
Top comments (2)
Can I download the model from beam to my local PC?
Hello, I hope that this documentation will help you :docs.beam.cloud/data/outputs