Disclaimer: This post is heavily inspired by this post on the Docker Blog. It is also seeks to answer questions raised by the community in previous posts in this series.
Docker is undeniably a powerful tool for containerization, but its versatility and extensive feature set can be overwhelming, especially for newcomers. The platform offers multiple ways to achieve similar goals, making it crucial for users to carefully weigh the advantages and disadvantages of each option to determine the best approach for their specific projects. This decision-making process requires a solid understanding of Docker's underlying mechanisms and the trade-offs involved in different strategies.
Docker can be a bit of a maze, especially when it comes to the nitty-gritty of instructions like RUN
, CMD
, and ENTRYPOINT
. These three instructions often cause confusion, as they seem to have overlapping functions. But fear not! In this article, we'll unravel the mystery, clearly explaining the differences between them and highlighting when to use each one effectively. By the end, you'll be a Dockerfile ninja, wielding these instructions with confidence and precision!
The RUN Directive
In Docker, the RUN
directive is a key instruction used in Dockerfiles to run commands while building your Docker image. It's your go-to tool for setting up the environment inside the image, allowing you to install packages, update dependencies, configure settings, and basically do anything you need to make your image ready to use. Think of it as the stage where you get everything in place before the curtain goes up on your running container.
Here is an example of the RUN
directive in practice:
FROM ubuntu:20.04
# Update package lists and install necessary packages
RUN apt-get update && apt-get install -y \
curl \
wget \
&& rm -rf /var/lib/apt/lists/*
In this example, we use the RUN
instruction to update the package lists on your system and install curl
and wget
. These are handy tools for downloading files and interacting with websites from the command line.
The && rm -rf /var/lib/apt/lists/*
part is added for cleanup. It removes the cached package lists after the installation, making your final Docker image smaller and leaner. Since these lists aren't needed to run your application, it's good practice to get rid of them.
The CMD Directive
The CMD
instruction in your Dockerfile sets the default command that will run when you start a container from your image. It's like a pre-set option for what your container should do when it "boots up." But here's the key: you can easily change this default behavior by providing different command-line arguments when you run the container using docker run
.
CMD
is perfect for situations where you want to provide a sensible default behavior for your container, but also give users the flexibility to customize it. Think of it as setting up a suggested starting point, but allowing users to take the wheel if they want to go in a different direction.
It's a common practice to use CMD
in Docker images to define default parameters or configurations that can be easily overridden by the user when running the container.
For example, by default, you might want to start a web server to start, but also allow users to override this and run a shell instead:
FROM node:alpine
[...dockerfile truncated...]
CMD ["node", "server.js"]
The ENTRYPOINT Directive
The ENTRYPOINT
instruction sets the main executable for your container. It's similar to CMD
but with a key difference: when you run a container with docker run
, the command you provide doesn't replace the ENTRYPOINT
command. Instead, your command gets added to the ENTRYPOINT
command as arguments.
Think of it like this: ENTRYPOINT
sets the core command your container is designed to run, while any arguments you provide with docker run
become extra instructions for that command.
For example, the following Dockerfile will run the webserver no matter what the users provide as arguments:
FROM node:alpine
[...dockerfile truncated...]
ENTRYPOINT ["node", "server.js"]
So CMD or ENTRYPOINT?
CMD
and ENTRYPOINT
are special instructions in a Dockerfile. While other instructions execute when you build the image, these two come into play when you actually run a container from that image.
Essentially, when a container starts, Docker needs to know what it should do – what program to run, and how to run it. That's where CMD
and ENTRYPOINT
step in. They tell Docker what the main process inside the container is and how to start it.
Now, the difference between them is a bit tricky, and many people don't fully grasp it. Luckily, most of the time, your container will work fine even if you don't use them perfectly. However, understanding the nuances can make things a lot smoother and less confusing.
To get a better handle on this, let's break down a typical Linux command:
ping -c 10 127.0.0.1
Here, ping
is the command itself, and the rest (-c 10 127.0.0.1
) are the parameters or arguments we're giving to that command.
Now, back to Docker:
-
ENTRYPOINT
: This is where you define the command part of the expression. It's the core thing you want your container to do when it starts. -
CMD
: This is where you define the parameters for that command. They're the additional instructions you want to give it.
So, a Dockerfile that uses Alpine Linux as the base image and wants to run the ping
command could look like this:
FROM alpine:latest
ENTRYPOINT ["ping"]
CMD ["-c", "10", "127.0.0.1"]
We can now build an image called pinger
from the preceding Dockerfile, as follows:
docker image build -t pinger .
Now we can run a container from the pinger
image we just created like this:
docker container run --rm -it pinger
PING 127.0.0.1 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=64 time=0.047 ms
64 bytes from 127.0.0.1: seq=1 ttl=64 time=0.056 ms
64 bytes from 127.0.0.1: seq=2 ttl=64 time=0.038 ms
64 bytes from 127.0.0.1: seq=3 ttl=64 time=0.055 ms
64 bytes from 127.0.0.1: seq=4 ttl=64 time=0.037 ms
64 bytes from 127.0.0.1: seq=5 ttl=64 time=0.036 ms
64 bytes from 127.0.0.1: seq=6 ttl=64 time=0.053 ms
64 bytes from 127.0.0.1: seq=7 ttl=64 time=0.058 ms
64 bytes from 127.0.0.1: seq=8 ttl=64 time=0.048 ms
64 bytes from 127.0.0.1: seq=9 ttl=64 time=0.053 ms
--- 127.0.0.1 ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 0.036/0.048/0.058 ms
The great thing about this setup is that you can easily override the default CMD
parameters you've set in the Dockerfile. If you remember, we originally defined CMD ["-c", "10", "127.0.0.1"]
to ping a specific address three times.
But now, when you create a new container, you can simply add different values at the end of your docker run
command to change the ping target or the number of pings. This gives you a lot of flexibility while still keeping a consistent base command.
docker container run --rm -it pinger -w 5 127.0.0.1
PING 127.0.0.1 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=64 time=0.034 ms
64 bytes from 127.0.0.1: seq=1 ttl=64 time=0.053 ms
64 bytes from 127.0.0.1: seq=2 ttl=64 time=0.040 ms
64 bytes from 127.0.0.1: seq=3 ttl=64 time=0.038 ms
64 bytes from 127.0.0.1: seq=4 ttl=64 time=0.056 ms
This will cause the container to ping the loopback IP address (127.0.0.1) for 5 seconds.
Directive Description and Use Cases
The following table provides an overview of these commands and use cases.
Directive | Description | Syntax Example | Use Cases |
---|---|---|---|
RUN |
Executes commands during the image build process. It is used to install software packages, configure settings, or perform other setup tasks. The result is part of the image layers. | RUN apt-get update && apt-get install -y nginx |
- Installing software packages - Configuring files or repositories - Setting up environment variables or system settings |
CMD |
Specifies the default command to run when a container starts from the image. It can be overridden by providing a command in docker run . It is generally used to provide default arguments to ENTRYPOINT or to set a default command. |
CMD ["nginx", "-g", "daemon off;"] |
- Setting a default command for the container - Providing default arguments to ENTRYPOINT - Running a service or application by default |
ENTRYPOINT |
Defines the main command that is always executed when a container starts. It is not overridden by arguments provided in docker run unless you use the --entrypoint flag. Useful for ensuring a specific command is always run. |
ENTRYPOINT ["nginx", "-g", "daemon off;"] |
Ensuring a specific command is always executed - Running a primary process or service - Used in combination with CMD to provide default arguments |
Unix Signaling and the PID 1
In the world of Unix-like systems, including Docker containers, PID 1 is a special process – the very first one that starts up. Every other process inside the system is its child, creating a family tree of processes with PID 1 at the top.
In Docker, this PID 1 process is really important because it's responsible for managing everything else inside the container. One of its critical roles is handling signals (like SIGTERM
) from the host system, which are essentially messages that tell the container to do something (like gracefully shut down).
Now, when you use the shell form for a Docker command, a shell process (usually /bin/sh -c
) takes over as PID 1. The problem is that this shell process isn't very good at handling signals. It might not pass them along to your actual application, which can lead to issues like containers not shutting down cleanly.
On the other hand, the exec form lets you run your command directly as PID 1, without any shell in between. This way, the command itself receives and handles signals directly, making the whole thing much more reliable.
So, if your container needs to react to signals promptly and gracefully, the exec form is the way to go. It's particularly crucial for applications that need to respond to events or interruptions, ensuring that everything shuts down properly and your data stays safe.
Shell and exec forms
When you're working with Dockerfiles and setting commands for RUN
, CMD
, and ENTRYPOINT
, you have two ways to do it: the shell
form and the exec
form. Each has its own strengths and weaknesses, and understanding the difference is key to effectively managing your Docker containers.
Shell Form
The shell form is like writing commands the old-school way, similar to what you'd type in a terminal. When Docker sees a command in shell form, it essentially runs it through a shell (usually /bin/sh -c
for Unix-based images). This means the command gets interpreted by the shell, which unlocks some handy features like using environment variables and chaining commands together.
CMD echo "Hello, World!"
This way of writing commands, using the shell form, means that Docker executes the command echo "Hello, World!"
through the default shell of your container. This can be handy for simple commands or when you need to use special shell features. However, it has a drawback: it doesn't handle signals (like those used for managing processes) the same way as the exec form.
In other words, if your command needs to respond properly to Unix signals for things like stopping or restarting processes gracefully, the shell form might not be the best choice. It's a bit like having a middleman who might not deliver your messages as reliably as you'd like.
Exec Form
Now, let's talk about the exec form. It's a more specific and reliable way to define commands in your Dockerfile. Instead of writing the whole command as a single line, you break it up into a JSON array, where each part of the command (the command itself and its arguments) is a separate element in the array.
CMD ["echo", "Hello, World!"]
Here, instead of using a shell as a middleman, Docker directly executes the command echo
with the argument "Hello, World!"
. This direct execution method is more reliable because it doesn't depend on the shell's interpretation.
This has a few key advantages:
Better Signal Handling: The command itself can receive and respond to Unix signals directly. This is crucial for ensuring your application can gracefully shut down or restart when needed.
Precise Argument Passing: You can be confident that the arguments you provide are passed directly to your command without any potential modifications or interference from the shell.
Key Differences between Shell and Exec
Shell Form | Exec Form | |
---|---|---|
Form | Commands without square brackets ([] ). Run by the container's shell |
Commands with square brackets ([] ). Run directly, not through a shell |
Variable Substitution | Inherits environment variables from the shell, such as $HOME and $PATH
|
Does not inherit shell environment variables but behaves the same for ENV instruction variables |
Shell Features | Supports sub-commands, piping output, chaining commands, I/O redirection, etc. | Does not support shell features |
Signal Trapping & Forwarding | Most shells do not forward process signals to child processes. | Directly traps and forwards signals like SIGINT
|
Usage with ENTRYPOINT | Can cause issues with signal forwarding | Recommended due to better signal handling |
Usage as ENTRYPOINT Params | Not possible with the shell form | If the first item in the array is not a command, all items are used as parameters for the ENTRYPOINT
|
The diagram below, will help you decide if you need RUN
, CMD
or ENTRYPOINT
in your Dockerfile.
The diagram below, will help you decide if you need shell
or exec
form for your commands.
You can find high resolution diagrams here.
Running CMD and ENTRYPOINT
The following examples will walk you through the high-level differences between CMD
and ENTRYPOINT
, through an example
First lets create our Dockerfile
# Use the Ubuntu 20.04 image as the base image
FROM ubuntu:20.04
# Update the image and install traceroute
RUN apt-get update && apt-get upgrade -y
RUN apt-get install -y traceroute
# Set the default command
CMD traceroute
Then build your image using the docker build -t tracert
command.
Run the container image with CMD traceroute
Without passing any arguments, we get the following output.
docker run tracert
Usage:
traceroute [ -46dFITnreAUDV ] [ -f first_ttl ] [ -g gate,... ] [ -i device ] [ -m max_ttl ] [ -N squeries ] [ -p port ] [ -t tos ] [ -l flow_label ] [ -w MAX,HERE,NEAR ] [ -q nqueries ] [ -s src_addr ] [ -z sendwait ] [ --fwmark=num ] host [ packetlen ]
Options:
-4 Use IPv4
-6 Use IPv6
-d --debug Enable socket level debugging
-F --dont-fragment Do not fragment packets
-f first_ttl --first=first_ttl
Start from the first_ttl hop (instead from 1)
-g gate,... --gateway=gate,...
Route packets through the specified gateway
(maximum 8 for IPv4 and 127 for IPv6)
-I --icmp Use ICMP ECHO for tracerouting
-T --tcp Use TCP SYN for tracerouting (default port is 80)
-i device --interface=device
Specify a network interface to operate with
-m max_ttl --max-hops=max_ttl
Set the max number of hops (max TTL to be
reached). Default is 30
[...output truncated...]
However, if we provide an IP address to test, we will get an error
docker run tracert www.dev.to docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "www.dev.to": executable file not found in $PATH: unknown.
The problem is that the string you're providing on the command line, www.dev.to
, is completely replacing the CMD
instruction in your Dockerfile. Since that URL isn't a valid command, it's causing an error.
To fix this, you need to specify the actual command you want to run alongside the URL.
docker run tracert traceroute www.dev.to
traceroute to www.dev.to (104.18.26.242), 30 hops max, 60 byte packets
[...output truncated...]
Run the container image with ENTRYPOINT traceroute
In this updated version, we'll make a change to the Dockerfile. We will remove the original CMD
instruction and replace it with ENTRYPOINT ["traceroute"]
. This means the traceroute
command is now the main command that this container will run when it starts.
The ENTRYPOINT
instruction works a bit differently than CMD
. With ENTRYPOINT
, you can't simply override the command by typing something different after docker run
. Instead, any extra arguments you add after docker run
are treated as arguments for the traceroute
command itself.
# Use the Ubuntu 20.04 image as the base image
FROM ubuntu:20.04
# Update the image and install traceroute
RUN apt-get update && apt-get upgrade -y
RUN apt-get install -y traceroute
# Set the default command
ENTRYPOINT ["traceroute"]
Let's see what happens when we try that
docker run tracert www.dev.to
traceroute to www.dev.to (104.18.26.242), 30 hops max, 60 byte packets
[...output truncated...]
The key point here is to use ENTRYPOINT
when you want to ensure that a specific executable is always run when your container starts, regardless of any additional commands or arguments the user might provide. It gives you a way to create containers that behave like self-contained tools, with a clearly defined main purpose.
Summary
Deciding when to use RUN
, CMD
, or ENTRYPOINT
, and whether to choose the shell or exec form, demonstrates the level of detail and flexibility Docker offers. Each of these commands plays a distinct role in the Docker ecosystem, influencing how containers are constructed, behave, and interact with their environment.
By carefully selecting the right command and form for each specific situation, developers can create Docker images that are more dependable, secure, and optimized for efficiency. Mastering these Docker commands and their formats is essential for unlocking the full potential of Docker. When these best practices are followed, applications deployed within Docker containers can achieve peak performance across diverse environments, enhancing both development workflows and production deployments.
Top comments (2)
Great work diving into the PID and the core differences between the different modes instead of just feature comparison. Thanks!
I think you missed the disclaimer that it was heavily inspired on the Docker Blog post. Instead of heavily inspired should have been copy/pasted and change some words here and there to make it look different 🤣