I don't know about you, but there are times when you need to do something quick to perform a test, play with a library or even not have all the trouble to install everything again.
For that, I want you to imagine a situation in which you are at home developing a survey or even a software to present at college. You want to present what you've created to the teacher and your classmates. But you remember and you're immediately saddened by all the work you'll have to do to install the libraries on the university's machine, test to see if everything is running, and several other things. To improve this situation even more, your presentation is on Thursday night and you go on Wednesday to prepare everything. But, but, someone who works at the college had to do a format on the college machine that was scheduled for Thursday morning. Only you know it? Of course. You don't work in college to know these things. Arriving on Thursday night to present nothing works, you are all embarrassed for not being able to present. All this effort was in vain.
This article I'm focusing on developing a container for data science development, of course it can be anything you want. But taking advantage of the situation that was created in college, where we presented a work. The development of this article will give you an understanding of when you are going to present the work at your college or wherever you go, just use a container or look for an image created by you which will be shown below. So you just need the docker in the place where it will be used and then run the container with your libraries and your application. I will not focus on machine learning concepts, but I aim to show the preparation of a container and its use.
Well, that's why I bring a possible solution that won't solve 100% but 99.8% I think it will. haha ha. Let's go.
For any doubts, I develop on a machine that has a Linux operating system, this entire article will be developed using this type of environment. I don't even know how Windows works, I strongly recommend looking at the documents for the tools used in this article to suit your operating system. Another observation, most of the articles I write make the project available on some versioning platform, at the end you can find the link.
Guys, I would like to ask you please not to forget to leave a like, it helps me to see if you are enjoying the content. Also to reach more people
A warning that I leave always check the documentation of the tool.
Requirements:
- Computer
- Visual Code (IDE) - Its can be any one.
- Docker installed
- Git installed on your machine
- GitLab - I used GitLab because I like it enough to be able to work with CI/CD files. It helped a lot with this development.
Undoubtedly, you must have Docker installed on your personal machine and on that of the university for which we are going to develop. I won't go into installation details because I think it's unnecessary to keep explaining since documentation changes every day and versions appear before you can even think. I'll leave the link:
1st Step Prepare development environment
This project I will develop with GitLab. So you will need to have a GitLab account, if you don't know GitLab, don't worry. The steps will be well explained.
Once the account is created, you will create a project.
With this you will name your project. Define your user with your project path. Private or public is up to you. Take advantage and already create a README.md too. That's to advance for us.
He called my project the Data Science Environment. This is your choice. Now let's clone our project to the machine. Where you save is up to you. I like to save to ~/Documents/projects_gitlab. Stay organized and practical.
Now you will clone your project.
git clone git@gitlab.com:public-dev-projects-1/data-science-environment.git
That point public-dev-projects-1 should be your username, mine is different because I like to create groups for the projects I'm working on, GitLab gives me a bigger organization with what I work with, but it is also at your discretion.
Run the command in the terminal:
cd <name_your_project>
To be able to access your project, your path must look like this ~/Documents/projects_gitlab/your_project. Well, now let's get down to business.
Choose an IDE that you are used to using and open the repository that you just cloned.
Step 2 - Create a Docker image
Now you are going to create a Docker image, with your IDE I created a new file named Dockerfile.jupyterlab. After the dot it doesn't need to be .jupyterlab it might just be the Dockerfile that works too.
If you don't know much Dockerfile is basically a grandma's recipe. But instead of being a chocolate cake, it will be configurations for a container.
Inside your Dockerfile.jupyterlab file you will add:
- FROM python:3.10-slim : This will be the image we want to start our container with. As we are going to use notebooks we need a Python image, I chose the 3.10-slim version because it has been used for some time and it was the one that I felt the most comfortable so far and also that it is lighter.
- WORKDIR /notebooks : It will be our work directory, I put notebooks because it will be where only the notebooks will be. But it can be changed depending on your choice, in addition there will be a docker volume that will be explained later.
- COPY requirements_lab.txt /notebooks : Now it will be necessary to copy our requirements_lab.txt, rest assured the file has not been created yet, we will get there soon. The requirements and the list of ingredients for our cake, oops our container hahaha, I'm hungry over here. Instead of having flour, eggs, chocolate, we will have pandas, numpy, jupyterlab which are our libraries to be able to develop our ideas inside the jupyter notebook.
- RUN --mount=type=cache,target=/root/.cache/pip \ pip3 install -r requirements_lab.txt : This step will install all our libraries that are in the requirements.
- RUN rm requirements_lab.txt : After installation the requirements will not be necessary so let's remove them. That's because after creating the container, you won't change the Dockerfile only if you need to, you'll continue working with the container.
- ENTRYPOINT ["jupyter", "lab", "--NotebookApp.token=''","--ip=0.0.0.0", "--port=8888", "--no-browser", "- -allow-root"]: Here is the final command to be able to start the jupyter notebook as if it were locally. I opted for jupyterlab, but it's up to you too. In the requirements I add both, looking at the documentation you can see the commands which you feel better to use.
Your Dockfile should look like this:
FROM python:3.10-slim
WORKDIR /notebooks
COPY requirements_lab.txt /notebooks
RUN --mount=type=cache,target=/root/.cache/pip \
pip3 install -r requirements_lab.txt
RUN rm requirements_lab.txt
ENTRYPOINT ["jupyter", "lab", "--NotebookApp.token=''","--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"]
If something is missing from yours, you can consult the repository that I will make available.
3rd Step: Create the requirements file
Now you will create a new file called requirements_lab.txt, usually you find it as requirements.txt. But it is also up to you, in the Dockerfile just add the correct name. Once the file is created, you will add the libraries you want to use in your data science project. The two that are needed are notebook and jupyterlab. If we are not able to start our jupyter notebook in the container.
scikit-learn==1.2.2
pandas==2.0.3
numpy==1.25.0
jupyterlab==4.0.2
notebook==6.5.4
Now we have finalized all the steps to have our data science container. We proceed to the commands.
In your terminal, it can even be in the IDE terminal if you have one. Once inside your directory that you've been working on so far, we're going to execute the following command.
docker build -f Dockerfile.jupyterlab -t env_ds .
This command serves to build our container from the settings that were passed. Observe that I pass the flag -f indicating the name of my Dockerfile. Then I pass another -t flag to add a name to my image and the dot to indicate where my Dockerfile is.
Now the construction will start, if there is an error you missed adding something in your Dockerfile or even the path could be wrong.
No problems for construction let's run our container.
docker container run --rm -p 8888:8888 -v "${PWD}/notebooks:/notebooks" -d --name containerlab env_ds
With this command we are going to indicate our -p port of the container, pay attention that it has to be the same as the entrypoint that was defined in the Dockerfile. Do you remember that you mentioned that our WORKDIR was called notebooks. In our directory we will create a folder called notebooks. Created Note that we are going to create a -v volume from our local folder to the WORKDIR of the container. Does it have to be the same name for both, no. But you can choose, but don't forget to pass correctly to the volume in this command. We set it to detach -d. This so we can't see the execution in the terminal, but if you want, just remove -d and you'll see the execution. I usually create a file to execute a command to see it in another terminal. Let's also give our container a name and then let's pass the name of our image to get all those settings for our container.
Your container will be executed, if you chose to leave the detach -d. Let's run the following command. This command will present the execution of our container started.
docker logs containerlab -f
Then click on the link:
It will open a tab in your browser.
BOOMMMMM!!!! We have our environment. Now let's create a notebook to test and then see if the created volume worked. Also note because I created WORKDIR, I start in that workspace.
Note that we can already use. And the volume?
Look at him there, finished creating, developing and innovating? Remember that we created a repository in GitLab. Just now save your changes and send them to your repository. If you are working with a team, they just follow the same steps to run the container and they will be able to use the notebooks. Also, if you send the work to a teacher, you send the repository, he also performs the steps for the container, the same settings will be maintained and it will not harm your work that was done due to some environmental problem. But calm down, hahahahahah there's more. It takes me a while to release something here on Dev.to but when it comes out, a lot of content comes out.
Bonus : Container Registry GitLab
Now instead of you developing all your code and submitting your repository for presentation. Why not just pull your container and present. Let's now add it to the registry container of your GitLab repository.
Because we have created our project in GitLab, we can create a CI configuration file and together create a version of our images to be used. Then you can create image with library x or y. Feel free.
So let's go. In the same project create a file called .gitlab-ci.yml. I won't go into details. But this file will have our definitions to create our container registry in GitLab of the project we developed.
- stages: Stage where our commands will be executed.
- image: For this we need a docker image to run the docker registry commands that will come right after.
- script: They are the same as the commands we have on our machine, but instead of executing them, it will be done in this stage process.
- docker login: To connect to docker registry, you need environment variables. This is to be authenticated on the platform. Owning the GitLab account already helps a lot.
- docker build: We will need to build our image again, if your Dockerfile has any kind of error. The pipeline will notify you that it cannot continue successfully.
- docker push: Finally it will be sent to our GitLab container registry.
- only: To be chosen in our execution branch. For example, if you were in another branch, this step will not be executed.
stages:
- build
build image docker:
stage: build
image: docker:23.0.6
services:
- docker:23.0.6-dind
script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
- docker build -f Dockerfile.jupyterlab -t registry.gitlab.com/public-dev-projects-1/data-science-environment/jupyterlab:v1.0 .
- docker push registry.gitlab.com/public-dev-projects-1/data-science-environment/jupyterlab:v1.0
only:
- main
After you have added the settings, just send your changes to GitLab.
Now go to your GitLab repository, click build, then pipeline. With that configuration file that we created, our pipeline will start and we will be able to execute those commands.
Afterwards, look for the container registry tab and note that we were successful.
Now let's pull to see if we can use our container.
One note, I left my project public but I still needed to create a token to be able to pull my image. To create a token, go to your profile settings and enable reading options. Copy the provided token and add it to the command below.
docker login registry.gitlab.com -u <YOUR_USER_NAME> -p <YOUR_PASSWORD_OR_TOKEN>
If you have two-factor authentication it will fail, you will need to create a token. I already recommend creating the token and more practical. Anything and just delete it if you don't want to use it anymore.
It will return this message:
output:Login Succeeded
Now let's download our image with the following command:
docker image pull "registry.gitlab.com/public-dev-projects-1/data-science-environment/jupyterlab:v1.0"
Now run the following command to verify that we have our docker image:
docker images
We are ready with the image, let's run our notebook again but before that create a notebook folder to create the volume again I created it inside documents. My terminal is open in the same location as this folder but I'm not inside it.
Let's run the following command.
docker run -it --rm -p 8888:8888 -v "${PWD}/notebook:/notebooks" registry.gitlab.com/public-dev-projects-1/data-science-environment/jupyterlab:v1.0
Click the link again to open it in the browser.
Look at our notebook, only now starting from the image that was registered. Very cool.
Comments:
Thanks for reading this far. I hope I can help you understand. Any code or text errors please do not hesitate to return. Don’t forget to leave a like so you can reach more people.
Resources
About the author:
A little more about me...
Graduated in Bachelor of Information Systems, in college I had contact with different technologies. Along the way, I took the Artificial Intelligence course, where I had my first contact with machine learning and Python. From this it became my passion to learn about this area. Today I work with machine learning and deep learning developing communication software. Along the way, I created a blog where I create some posts about subjects that I am studying and share them to help other users.
I'm currently learning TensorFlow and Computer Vision
Curiosity: I love coffee
Top comments (0)