In the previous article, how to containerize a Python application, we created a Dockerfile and containerized our application from scratch.
Now we really want our Dockerfile to stand out, make it more production-ready, that's the goal of this article.
We will cover 6 different ways to improve our Dockerfile
:
- setting env variables and a working directory
- avoiding invalidating the Docker cache
- changing the default user to non-root
- taking care of zombie processes
- correctly forwarding signals to our application
- updating
pip
,setuptools
andwheel
Intro
This is the Dockerfile
we created last time:
# 1. Base image
FROM python:3.8.3-slim-buster
# 2. Copy files
COPY . /src
# 3. Install our deps
RUN pip install -r /src/requirements.txt
While fully functional, there are a few things we can improve regarding usability, security and performance.
You can clone this repository if you want to follow along.
Passing the git commit hash
We want to mark each Docker image and container with a tag, this tag is the git commit hash
.
At runtime we should be able to determine which version of our software we are running.
The idea is that every artifact we generate is traceable, we can go back and check which commit generated it.
The ARG and ENV instructions can help us achieving it.
ARG
specifies arguments that we can pass to the docker build
command, ENV
are env variables set inside the Dockerfile
and accessible at runtime, from within the container.
This is the new Dockerfile
, with ARG
and ENV
:
FROM python:3.8.3-slim-buster
# 👇
ARG GIT_HASH
ENV GIT_HASH=${GIT_HASH:-dev}
# 👆
COPY . /src
RUN pip install -r /src/requirements.txt
The -dev
is a way to specify defaults. If the GIT_HASH
argument is omitted then GIT_HASH
will be set to dev
.
Let's build our Docker image and check the GIT_HASH
env variable:
> docker build -t movie-app .
> docker run --rm movie-app env | grep GIT_HASH
GIT_HASH=dev
How do we pass the git commit hash to our Docker image?
We use the --build-arg
flag from the Docker cli:
# this will work on bash but not on 🐟
> export GIT_HASH=$(git rev-parse HEAD)
> docker build --build-arg GIT_HASH=${GIT_HASH::7} -t movie-app .
> docker run --rm movie-app env | grep GIT_HASH
GIT_HASH=6a78e6b
We don't need the whole commit hash, the first 7 characters are enough.
Why are we not passing the base image using
ARG
?
Because we don't want to change the base Docker image from the Docker cli, but only through a new commit.
Adding a working directory
Right now we are copying our files inside a /src
folder and then we specify all the other paths relative to /src
.
Wouldn't be nicer if we could specify a working directory and run commands from that folder?
That would be neat, and WORKDIR is exactly what we need.
FROM python:3.8.3-slim-buster
ARG GIT_HASH
ENV GIT_HASH=${GIT_HASH:-dev}
# 👇
WORKDIR /project
# 👇
COPY . .
# 👇
RUN pip install -r requirements.txt
After we specify a WORKDIR
, any RUN
, CMD
, ENTRYPOINT
, COPY
and ADD
instructions that follow will use that working directory.
Note how the path of COPY
and pip install
changed.
Let's test our application:
> docker build -t movie-app .
# 💥 it's not python /src/app.py anymore 💥
> docker run --rm -p 8888:8888 movie-app python app.py
> curl localhost:8888
Caching dependencies
Our application has a small number of external dependencies, the requirements.txt
contains only a few dependencies, so the pip install
command is fast, just a couple of seconds.
What if it were taking minutes instead of seconds?
Wouldn't be better to cache our dependencies until something changes?
If you try to modify any file inside our application's folder and try to run the Docker build command you will see how Docker builds the image starting from zero.
If you check the console output you should see something like this:
Step 6/7 : RUN pip install -r requirements.txt
---> Running in 2233484e3f72
Basically any change to our codebase, even if it's not related to requirements.txt
will invalidate the Docker cache.
We can be smarter and save some time, we just need to install our dependencies first.
FROM python:3.8.3-slim-buster
ARG GIT_HASH
ENV GIT_HASH=${GIT_HASH:-dev}
WORKDIR /project
# 👇
COPY requirements.txt ./
RUN pip install -r requirements.txt
# 👆
COPY . .
We added a new COPY
, just for requirements.txt
, and moved the pip install
right after.
If you now try to build the Docker image again, then change the main.py
and rerun the docker build
command again that shouldn't invalidate the cache.
This is the output you should see, Using cache
:
> docker build -t movie-app .
Step 6/7 : RUN pip install -r requirements.txt
---> Using cache
---> cbe7b2865e10
Running your container as non-root user
By default the user running your command inside a Docker container is root
.
> docker run --rm movie-app whoami
root
Long story short, Docker containers should not run as root and is highly recommended to change the default user to a non-root user.
How do we change the user?
We create a new one and we set the new user with the USER instruction.
FROM python:3.8.3-slim-buster
ARG GIT_HASH
ENV GIT_HASH=${GIT_HASH:-dev}
WORKDIR /project
# here we create a new user
# note how the commands are using &&
# this helps with caching
RUN useradd -m -r user && \
chown user /project
# 👆
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY . .
# 👇 here we set the user
USER user
> docker build -t movie-app .
> docker run --rm movie-app whoami
user
Our user
can't create new files outside of the /project
folder (user is the owner of the folder):
> docker run --rm touch /hello
touch: cannot touch '/hello': Permission denied
# 👇 but this command would work
> docker run --rm touch hello
Let's test our application to be sure it has all the necessary permissions:
> docker run --rm -p 8888:8888 movie-app python app.py
> curl localhost:8888
Taking care of zombie processes and signals
Each Docker container is a PID namespace, and A PID namespace is a tree, which starts at PID 1, commonly called init.
The entire process of starting the system and shutting it down is maintained by init, when you run a Docker container, PID 1 is what you set inside your ENTRYPOINT
.
If you don't set it by default Docker will use /bin/sh -c
, which does not pass signals, making almost impossible to gracefully stop your application.
This is why we need a better init, Tini.
Tini
doesn't only take care of reaping zombie processes but also of forwarding any signals we send to the Docker container to our application process.
Forwarding signals correctly is really important. Kubernetes relies on signals during the lifecycle of a pod.
More about Kubernetes and signals here.
FROM python:3.8.3-slim-buster
ARG GIT_HASH
ENV GIT_HASH=${GIT_HASH:-dev}
# 👇 you can use env variables to pin library versions
ENV TINI_VERSION="v0.19.0"
# 👇
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini
RUN chmod +x /tini
# 👆
WORKDIR /project
RUN useradd -m -r user && \
chown user /project
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY . .
USER user
# 👇
ENTRYPOINT ["/tini", "--"]
We have two new instructions here, ADD and ENTRYPOINT.
ADD
is a really useful instruction, it can add remote files to you Docker image.
The ENTRYPOINT
specifies the entry point for any command, in our case python app.py
, pretty much like running /tini -- python app.py
Updating pip, setuptools and wheel
One last thing, it's important to keep pip
, setuptools
and wheel
updated, so it's wise to bump them directly inside our Docker image.
FROM python:3.8.3-slim-buster
ARG GIT_HASH
ENV GIT_HASH=${GIT_HASH:-dev}
ENV TINI_VERSION="v0.19.0"
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini
RUN chmod +x /tini
# 👇 STYLE YOUR DOCKERFILE LIKE A PRO
RUN pip install -U \
pip \
setuptools \
wheel
WORKDIR /project
RUN useradd -m -r user && \
chown user /project
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY . .
USER user
ENTRYPOINT ["/tini", "--"]
Let's test our application once again:
> docker build -t movie-app .
> docker run --rm -p 8888:8888 movie-app python app.py
curl localhost:8888
And with this last step we are done!
Quick recap
-
ARG
andENV
are neat, use them - Copy and install your dependencies before copying your application
- Don't run containers as root, set a new user with
USER
- Try to prettify your dockerfiles
- Always use
Tini
- Defining a
WORKDIR
helps
Top comments (0)