TL;DR
Containers are supposed to be light-weighted. Adding unnecessary data will make it heavy to create and run. Docker provides several ways to mount storage from the host machine to containers. Volumes are the most commonly used one. It can be used to persist application data, and also share data between multiple containers as well. (local volumes cannot be shared between docker services though. You will need shared storage instead.)
Background
I've heard docker and container a while ago, however, I'm new to use them. Only recently I started exploring as it helps to build web services and easily deploy on multiple OS. (They are fantastic tools!)
For one of the web services, its job is to create/update/activate another virtual environment, and run a task using that environment. Different requests will sometimes need a different virtual environment. The requirements.txt
file for each virtual environment is synced from time to time, then pip install
is called to update the virtual environment. pip install
can take time, and need to be called as fewer times as possible. That means the web service need to persist the virtual environments so that when the service restarts, it doesn't have to repeat the create/update environment jobs.
Here it raises the issue that, every time when a new image was built for the web service, obviously it doesn't have the virtual environments stored in the old container. This makes the service to be "very cold-start". To solve it, I first thought to commit the changes from the old container to the new image. However, this extremely increases the size of the image and container.
After a few hours of digging in the docker documentation, I realized that so far I've thought of containers to be "fully self-contained", while it has more power when working together with its host machine.
Solution
Docker provides three ways to mount data to the container: volumes, bind mounts, and tmpfs storage [1].
- Volumes are part of the host filesystem, but managed by docker at the specific path and should not be modified by other applications
- Bind mounts can be anywhere on the host, but can be modified by other applications
- tmpfs are in the host's in-memory space, and never get written into the filesystem.
Generally speaking, volumes are the go-to solution to solve most of the data persistence issues in a container. Volumes can be either created by docker volume create
command, or created when starting a container.
Examples as my solution
The docker documentation is here [2].
1. Create volume
First, let's create a volume named as virtualenv
to serve as the path to store virtual environments.
➤ docker volume create virtualenv
We can check the volume by the following command
➤ docker volume inspect virtualenv
[
{
"CreatedAt": "2018-09-15T05:29:36Z",
"Driver": "local",
"Labels": {},
"Mountpoint": "/var/lib/docker/volumes/virtualenv/_data",
"Name": "virtualenv",
"Options": {},
"Scope": "local"
}
]
2. Create container
The structure of the example app looks like this:
Dockerfile
-
main.py
: the entrypoint -
create_env.sh
(used to create another virtual environment)
What main.py
does is to check if the virtual environment "my_env" exists. If not, it will create it. We're going to mount the volume created above as ~/.virtualenv
folder in the container.
I use the following Dockerfile to create the simplest python image:
FROM python:3.7
WORKDIR /app
ADD . /app
RUN pip install virtualenv
CMD ["python", "./main.py"]
main.py
looks like this:
import os
import subprocess
def main():
if os.path.exists('/root/.virtualenv/my_env'):
print('my_env already exists')
else:
subprocess.run(['bash', 'create_env.sh'])
print('my_env created')
if __name__ == '__main__':
main()
And the one-line create_env.sh
cd ~/.virtualenv/ && virtualenv my_env
3. Start container with volume mounted
We first build the python image:
➤ docker build -t docker-data-persistence .
Then to mount the volume, we use --mount
argument:
➤ docker run \
--mount source=virtualenv,target=/root/.virtualenv \
docker-data-persistence
Using base prefix '/usr/local'
New python executable in /root/.virtualenv/my_env/bin/python
Installing setuptools, pip, wheel...done.
my_env created
As we can see above, when we run the container for the first time, it will create the virtual environment "my_env" as it doesn't exist in the volume yet. If we run it the second time, it will say "my_env" already exists.
➤ docker run \
--mount source=virtualenv,target=/root/.virtualenv \
docker-data-persistence
my_env already exists
4. Inspect the volume
We can take a look into the files in the volume (in a hacky way [3]) to verify the contents:
➤ docker run -it \
--mount source=virtualenv,target=/root/.virtualenv \
docker-data-persistence \
find /root/.virtualenv/my_env/bin
/root/.virtualenv/my_env/bin
/root/.virtualenv/my_env/bin/python3
/root/.virtualenv/my_env/bin/activate.csh
/root/.virtualenv/my_env/bin/easy_install-3.7
/root/.virtualenv/my_env/bin/python
/root/.virtualenv/my_env/bin/python-config
/root/.virtualenv/my_env/bin/easy_install
/root/.virtualenv/my_env/bin/python3.7
/root/.virtualenv/my_env/bin/activate
/root/.virtualenv/my_env/bin/pip
/root/.virtualenv/my_env/bin/activate.fish
/root/.virtualenv/my_env/bin/pip3
/root/.virtualenv/my_env/bin/wheel
/root/.virtualenv/my_env/bin/activate_this.py
/root/.virtualenv/my_env/bin/pip3.7
5. Delete the volume
To delete the volume, we can use docker volume rm <volume-name>
. However, you can't delete a volume when there is a container that uses it, even if the container has exited.
➤ docker volume rm virtualenv
Error response from daemon: remove virtualenv: volume is in use - [dc4425b806a67a9002d68703cdd9854feba44e43d591278b4eb2869f43c0da6d]
Top comments (5)
Hi, thanks for your article
Can we store multiple containers data in one volume.
This is what we currently have, it's essentailly that files gets written in each containers by the end user of the app, and there is this one container with our backend that has to scan all the files and show them in file manager as nested structure. So we want to make these containers independent of user generated data and scanning the shared volume would be much more easier for us.
Can we do it like this
Hi Liu. Thanks for your explanation, helped me lot as I'm pretty new in Docker. Just wondering if the volume is persistent in case of switching computer off/on?
Hello,
Yes, the data is persisted until you explicitly delete the explicitly-created volume.
I get no response when I run any one these docker run ... commands. What I am missing please?
jsyk this guy plagiarized your article:
ericplayground.com/2019/10/01/data-persistence-in-docker-container/