You might know that a container is a standardized unit of software, that is, an application with everything that it requires to work: runtime, libraries, config, etc.
If this is not enough for you and you'd like to actually see with your own eyes what a container is, read on.
A container can be in two states: running or stopped.
In stopped state a container looks absolutely simple: it just one file called config.json
that contains configuration and a directory called rootfs
that is used as a container root (/
) directory.
Can it be that simple?
You might know that a container image is basically just an archived directory that becomes the root directory when a container runs. We can unpack it with tar
to a directory called rootfs
:
mkdir rootfs && docker export $(docker create alpine) | tar -xf - -C rootfs
You can run a container with runc
command after creating a default config file (config.json
) with runc spec
:
runc spec && sudo runc run mycontainerid
To see that you're indeed in a container, run:
cat /etc/*release*
and you should see NAME="Alpine Linux"
Moreover, if you start another terminal and create a file in rootfs
directory:
touch /path/to/rootfs/hello-from-host
you'll see this file in the container:
ls /
# bin hello-from-host media
But what is runc
and what does it do?
runc
runs a container (that's basically what docker uses under the hood) in an isolated environment. Kind of.
What you might think is that runc
puts an app process in a software equivalent of a solid metal box.
But it's far from truth. A container process is not enclosed in some kind of jail from which it cannot escape. The process has no idea that it's restricted. It has no idea how real world looks like. runc
virtually lies to him that:
-
rootfs
directory is the root of the filesystem - there are no other processes in the system and it's init process with
PID 1
. - about available network, computation and memory resources
It's more like putting a vr headset into a process without telling it that it's in virtual reality. It can't escape rootfs
directory because in his world view rootfs
is /
and there's no way to go higher than /
.
This is possible mostly thanks to two kernel features: namespaces
and cgroups
. namespaces
allow to virtualize system resources and cgroups
provides a way to limit resources like CPU and memory.
As you can see containers do not contain. The isolation is only one-way. A contained process can't see the world outside of a container but the host has all the information about the process, can access its filesystem and interact with it as if it was a normal process (it practically is a normal process). Isolation is based on providing fake information about the state of the system to a contained process.
That makes containers really lightweight but a little less secure than VMs. Moving a normal app to a container is a breeze because from the app's perspective there's no difference between normal environment and virtual environment presented to it when it's run in a container so there's no need for special modifications.
Summing up, a container is an app code and all its dependencies kept in a single directory tree plus a config file. When run, a container is restricted to that directory by making it think that the directory is a filesystem root and providing it with manufactured information about system resources. To make the container easy to transport it's packed into an image format that's basically an archive of a container directory.
Top comments (0)