(10/29/2024 Update) This article has been updated to reflect the release of the RTX 4000 Ada GPU instance. Some screenshots and descriptions still reference the older RTX 6000 GPU, so interpret them accordingly.
PyTorch from Meta (ex-Facebook) and TensorFlow from Google are highly popular deep learning frameworks. While GPU is essential to the development and training process of deep learning models, it is time-consuming to build an environment to make GPU available for these frameworks and to make both PyTorch and TensorFlow usable in a single environment. This article shows how to set up an environment with GPU-enabled PyTorch and TensorFlow installed on Akamai Connected Cloud (formerly Linode) in 10 minutes. The procedures in this article make it easy to set up a dedicated deep learning environment in the cloud, even for those unfamiliar with setting up a Linux server.
What is Akamai Connected Cloud (formerly Linode)?
Akamai Connected Cloud (ACC / formerly Linode) is an IaaS acquired by Akamai in February 2022. ACC offers simple and predictable pricing, which includes SSD storage and a certain amount of network transfers within the price, which are often expensive in other cloud computing providers. There are no price differences by region, making spending on cloud providers more predictable. For example, a virtual machine with 16GB of memory with NVIDIA RTX 4000 Ada costs $350 per month (as of October 2024), including 500GB of SSD storage and 1TB of network transfer. If you use a service for just part of the month, hourly billing enables you to only be charged for the time the machine is present on the account.
You can use the Cloud Estimator tool provided by ACC to compare prices with other cloud providers.
The goal of this article
In this article, we will set up Docker Engine Utility for NVIDIA (nvidia-docker), a container virtualization platform that supports NVIDIA's GPUs, on a GPU instance on ACC, where we will deploy NGC Container, containers officially provided by Facebook and Google for deep learning. You can set up the environment in about 10 minutes with almost no prior knowledge of ACC, Docker, or NGC Container by using StackScripts, ACC's deployment automation function.
The environment built with this procedure includes a sample Jupyter Notebook that can be used with OpenAI Whisper, a speech recognition model that has been widely praised for its extremely high recognition accuracy so that even those who do not develop deep learning models themselves can experience the benefits of GPU instances.
If you provide the Object Storage credentials to the StackScript, the PyTorch and TensorFlow containers will automatically mount the external Object Storage, which can be used to retrieve training data from or to store your deep learning models. Using Object Storage is optional. You can skip it.
Setup a GPU instance with PyTorch and TensorFlow
First, open the StackScript I have prepared for you from the following link. This StackScript will automatically install nvidia-docker, PyTorch, and TensorFlow. (You must be logged into your ACC account to access the link.) If you can't open this StackScript for some reason, I have uploaded the contents of this StackScript to GitHub for you.
deeplearning-gpu
https://cloud.linode.com/stackscripts/1102035
StackScript has a feature called UDF (User Defined Fields) that automatically creates an input form with the parameters required for deployment. This StackScript requires you to set the login credential of a non-root user who can SSH into the virtual machine, Access Key to mount Object Storage as external storage (optional). If you want to mount Object Storage, create a bucket and obtain an Access Key in advance.
The regions where both Object Storage and RTX 4000 Ada GPU instances are available are as follows as of October 2024.
- Seattle, WA, US
- Chicago, IL, US
- Paris, FR
- Osaka, JP
Since GPU instances are available only in limited regions, select a virtual machine type first, then the region. Here I have selected Dedicated 32 GB + RTX6000 GPU x1 in the Singapore region for example.
Name the virtual machine, enter the root password, and click "Create Linode".
The screen will transition to the virtual machine management dashboard. Wait a few minutes until the virtual machine status changes from PROVISIONING to RUNNING. The IP address of the virtual machine you just created is displayed on the same screen, so take note of it.
The virtual machine is now booted. The installation process of nvidia-docker and NGC Containers will proceed automatically in the background. Wait 10 minutes for the installation to complete before proceeding to the next step.
Starting a container
Now let's log in to the virtual machine via SSH. If the setup process performed by StackScript is complete, the following message will appear when you log in. If you do not see this message, log out and wait a few minutes before logging in again. If you have inadvertently started a virtual machine that does not have a GPU, you will get the message "GPU is not available. This StackScript should be used for GPU instances." In that case, please start a GPU instance and redo the procedure from the beginning.
% ssh root@45.118.XX.XX
root@45.118.XX.XX's password:
(snip)
##############################################################################
You can launch a Docker container with each of the following commands:
pytorch: Log into an interactive shell of a container with Python and PyTorch.
tensorflow: Log into an interactive shell of a container with Python and TensorFlow.
pytorch-notebook: Start Jupyter Notebook with PyTorch as a daemon. You can access it at http://[Instance IP address]/
tensorflow-notebookm: Start Jupyter Notebook with TensorFlow as a daemon. You can access it at http://[Instance IP address]/
Other commands:
stop-all-containers: Stop all running containers.
##############################################################################
The following five commands are available on the machine created by this StackScript.
Command | Usage |
---|---|
pytorch | Start a container with PyTorch installed and enter its interactive shell |
tensorflow | Start a container with TensorFlow installed and enter its interactive shell |
pytorch-notebook | Start Jupyter Notebook with PyTorch installed as a daemon |
tensorflow-notebook | Start Jupyter Notebook with TensorFlow installed as a daemon |
stop-all-containers | Stop all running containers |
Each container has the directories /workspace/HOST-VOLUME/
and /workspace/OBJECT-STORAGE/
to mount the host machine directory and external Object Storage. The container created by the above command is configured to remove the container when it is stopped (--rm
option of docker run
is set), so place the files you want to keep in /workspace/HOST-VOLUME/
or /workspace/OBJECT-STORAGE/
.
Let's spin up Jupyter Notebook with PyTorch as a daemon and run a speech recognition model OpenAI Whisper. Run the pytorch-notebook
command from the console.
root@45-118-XX-XXX:~# pytorch-notebook
[I 04:36:22.823 NotebookApp] http://hostname:8888/?token=0ee3290287b3bd90f2e8e3ab447965d3e074267f0d60420b
http://hostname:8888/?token=0ee3290287b3bd90f2e8e3ab447965d3e074267f0d60420b
Jupyter Notebook should now be started. If you get the error "Bind for 0.0.0.0:80 failed: port is already allocated." Stop the existing container first with the stop-all-containers
command. If you get the above result without any problem, replace hostname of the URL with the IP address of the virtual machine that you noted when creating the virtual machine, delete :8888
, and access the virtual machine from a web browser. The token will change each time the container is started.
URL displayed in console | URL to be entered into a web browser |
---|---|
http://hostname:8888/?token=0ee3290287b3bd90f2e8e3ab447965d3e074267f0d60420b | http://45.118.XX.XX/?token=0ee3290287b3bd90f2e8e3ab447965d3e074267f0d60420b |
Click on Voice Recognition with OpenAI Whisper.ipynb
in HOST-VOLUME
to open it.
Click Cell
->Run All
in the menu to run OpenAI Whisper. The first time you run it, it will take a few minutes to download dependent software and deep learning models.
If the execution completes without problems, the last cell will show the result of the speech recognition: "I'm getting them for $12 a night."
Congratulations! Now you have GPU-enabled PyTorch and TensorFlow
Deleting the instance
You can delete the virtual machine that you have finished by clicking "Delete" in the ACC Management Console. The contents of /workspace/HOST-VOLUME/
(/root/shared/
from the host OS) will be deleted, so move any files you want to keep to another location.
You are charged even for powered-off virtual machines. Delete virtual machines that you do not want to be charged for.
Access control for the instance
Access to the virtual machines created in the above procedure via SSH requires password authentication or public key authentication, and access to Jupyter Notebook requires token authentication. If you want to add access control based on the IP address of the client, refer to the following articles to apply firewalls to port 22 (SSH) and port 80 (HTTP).
For more advanced access control, Akamai's zero-trust solution, Enterprise Application Access, can be used for integration with external Identity Providers and SSO support.
Enabling HTTPS
Follow the steps below to enable HTTPS in Jupyter Notebook for production use.
Running a public notebook server
The five commands listed above are defined as aliases for docker commands in /root/.bash_profile
. When HTTPS is enabled, the argument of the -p
option of the docker command used by the pytorch-notebook
and tensorflow-notebook
commands should also be modified to the appropriate port such as 443
. And finally, execute ufw allow 443/tcp
so that the firewall allows port 443.
Top comments (0)