Difference between revisions of "Docker"

From CRIU
Jump to navigation Jump to search
(link git commits)
Line 1: Line 1:
 
This HOWTO page describes how to checkpoint and restore a Docker container.
 
This HOWTO page describes how to checkpoint and restore a Docker container.
  
== Introduction ==
+
{{Note| This page was originally written a few months ago.  Since then, interfacing with CRIU has been added to Docker's native exec driver (libcontainer) and pull requests to add checkpoint/restore functionality to Docker have been submitted.  If you just want to experiment with C/R, you can use one of the following Docker versions for your C/R experiments:
 +
 
 +
Docker 1.5 [https://github.com/SaiedKazemi/docker/wiki]
 +
Docker 1.7 [https://github.com/boucher/docker/tree/cr-combined]}}
 +
 
 +
{{OverlayFS| The OverlayFS filesystem was merged into the upstream Linux kernel 3.18 and is now Docker's preferred filesystem (instead of AUFS).  However, there is a bug in OverlayFS that reports the wrong mnt_id in /proc/<pid>/fdinfo/<fd> and the wrong symlink target path for /proc/<pid>/<fd>.  Fortunately, these bugs have been fixed in the kernel v4.2-rc2.  See below for instructions on how to apply the relevant patches.}}
  
{{Note| This page was originally written a few months ago. Since then, interfacing with CRIU has been added to Docker's native exec driver (libcontainer) and pull requests to add checkpoint/restore functionality to Docker have been submittedYou can use one of the following Docker versions for your C/R experiments:
+
{{Async IO| If your process uses async IO and your kernel is older than 3.19, you need to apply two patchesSee below for instructions.}}
  
Docker 1.5 [https://github.com/SaiedKazemi/docker/wiki]
+
== Introduction ==
Docker 1.7 [https://github.com/boucher/docker/tree/cr-combined]
 
}}
 
  
 
There are two ways to checkpoint and restore a Docker container:
 
There are two ways to checkpoint and restore a Docker container:
  
'''External C/R''' using CRIU directly on the command line as it's typically
+
'''1. External C/R''' using CRIU directly on the command line as it's typically
done.
+
done for any process tree.
  
This is called external because it's happening external to the
+
This approach is called external because it's happening external to the
 
Docker daemon.  After checkpoint, the Docker daemon thinks that the
 
Docker daemon.  After checkpoint, the Docker daemon thinks that the
 
container has exited.  After restore, the Docker daemon doesn't know that
 
container has exited.  After restore, the Docker daemon doesn't know that
Line 21: Line 24:
 
will not work correctly.
 
will not work correctly.
  
{{Note| External C/R was done as proof-of-concept.  Its use is discouraged and the helper script mentioned below will be deprecated in the near future.}}
+
'''2. Native C/R''' using new <code>docker checkpoint</code> and
 
 
'''Native C/R''' using the newly added <code>docker checkpoint</code> and
 
 
<code>docker restore</code> commands.
 
<code>docker restore</code> commands.
  
Because the Docker daemon is involved in both checkpoint and restore,
+
This approach is called native because the Docker daemon is involved in both checkpoint and restore.
its notion of the container state will be consistent and all commands such as
+
Therefore, its notion of the container state will be corrent.  All commands such as
 
<code>docker ps, stop, kill </code> and <code>logs</code> will work.
 
<code>docker ps, stop, kill </code> and <code>logs</code> will work.
 
This is obviously the preferred method of checkpointing and restoring Docker containers.
 
This is obviously the preferred method of checkpointing and restoring Docker containers.
  
Native C/R is work in progress, say pre-alpha quality. You can
+
Native C/R is work in progress, say pre-alpha quality.
watch this short demo
+
You can watch this short demo
 
'''[https://www.youtube.com/watch?v=HFt9v6yqsXo video]'''
 
'''[https://www.youtube.com/watch?v=HFt9v6yqsXo video]'''
to see how it works. Source files for Docker 1.5 C/R are at this
+
to see how it works.
'''[https://github.com/SaiedKazemi/docker/tree/cr repo]'''.
+
Source files for Docker 1.5 C/R are
 +
'''[https://github.com/SaiedKazemi/docker/tree/cr here]'''
 +
and for Docker 1.7 C/R are
 +
'''https://github.com/boucher/docker/tree/cr-combined here]'''.
 
The '''[https://github.com/SaiedKazemi/docker/wiki wiki]'''
 
The '''[https://github.com/SaiedKazemi/docker/wiki wiki]'''
 
page provides an overview of the project history.
 
page provides an overview of the project history.
Work in underway to integrate C/R into the new <code>libcontainer</code>.
 
  
For native C/R support, additional functionality was added to CRIU.
+
== OverlayFS ==
The most notable addition is the <code>--inherit-fd</code> command line option.
+
 
 +
The following small kernel patches fix the mount id and symlink target path issues noted above:
 +
 
 +
* {{torvalds.git|155e35d4da}} by David Howells
 +
* {{torvalds.git|df1a085af1}} by David Howells
 +
* {{torvalds.git|f25801ee46}} by David Howells
 +
* {{torvalds.git|4bacc9c923}} by David Howells
 +
* {{torvalds.git|9391dd00d1}} by Al Viro
 +
 
 +
Assuming that you are running Ubuntu Vivid (Linux kernel 3.19), here is how you can patch your kernel:
 +
 
 +
<pre>
 +
git clone  git://kernel.ubuntu.com/ubuntu/ubuntu-vivid.git
 +
cd ubuntu-vivid
 +
git remote add torvalds  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
 +
git remote update
 +
 
 +
git cherry-pick 155e35d4da
 +
git cherry-pick df1a085af1
 +
git cherry-pick f25801ee46
 +
git cherry-pick 4bacc9c923
 +
git cherry-pick 9391dd00d1
 +
 
 +
cp /boot/config-$(uname -r) .config
 +
make olddefconfig
 +
make -j 8 bzImage modules
 +
sudo make install modules_install
 +
sudo reboot
 +
</pre>
 +
 
 +
== Async IO (AIO) ==
 +
 
 +
If you are using a kernel older than 3.19 and your container uses AIO, you need the following AIO kernel patches from 3.19:
 +
 
 +
* {{torvalds.git|bd9b51e79c}} by Al Viro
 +
* {{torvalds.git|e4a0d3e720}} by Pavel Emelyanov
  
 
== External C/R ==
 
== External C/R ==
 +
 +
{{Note| External C/R was done as proof-of-concept.  Its use is discouraged and the helper script mentioned below will be deprecated in the near future.}}
  
 
Starting with CRIU 1.3, it is possible to checkpoint and restore a
 
Starting with CRIU 1.3, it is possible to checkpoint and restore a
Line 104: Line 144:
 
the global <code>/dev/null</code> and the container <code>/dev/null</code> as the same device.
 
the global <code>/dev/null</code> and the container <code>/dev/null</code> as the same device.
  
== Restore Prework ==
+
=== <code>--inherit-fd</code> ===
 +
 
 +
For native C/R support, this option tells CRIU to let the restored process "inherit"
 +
its specified file descriptor (instead of restoring from checkpoint).
 +
 
 +
== Restore Prework for External C/R ==
  
 
As mentioned earlier, by default Docker uses AUFS to set up the
 
As mentioned earlier, by default Docker uses AUFS to set up the
Line 111: Line 156:
 
to set up the filesystem again before attempting to restore.
 
to set up the filesystem again before attempting to restore.
  
== An Example ==
+
== An External C/R Example ==
  
 
Below is an example to show C/R operations for a shell script that
 
Below is an example to show C/R operations for a shell script that
Line 186: Line 231:
 
</pre>
 
</pre>
  
== Helper Script ==
+
== External C/R Helper Script ==
  
 
As seen in the above examples, the CRIU command line for checkpointing and
 
As seen in the above examples, the CRIU command line for checkpointing and
Line 281: Line 326:
 
root      6206    1  1 10:49 ?        00:00:00 /bin/sh -c i=0; while true; do echo $i >> /foo; i=$(expr $i + 1); sleep 3; done
 
root      6206    1  1 10:49 ?        00:00:00 /bin/sh -c i=0; while true; do echo $i >> /foo; i=$(expr $i + 1); sleep 3; done
 
</pre>
 
</pre>
 
== OverlayFS ==
 
 
The OverlayFS filesystem was merged into the upstream Linux kernel 3.18 and is now Docker's preferred filesystem (instead of AUFS).  For successful C/R, however, you need to apply the following kernel patch:
 
 
OverlayFS patch [https://lkml.org/lkml/2015/3/20/372]
 
 
== Async IO (AIO) ==
 
 
If you are using a kernel older than 3.19 and your container uses AIO, you need the following AIO kernel patches from 3.19:
 
 
* {{torvalds.git|bd9b51e7}} by Al Viro
 
* {{torvalds.git|e4a0d3e72}} by Pavel Emelyanov
 
 
  
 
[[Category:HOWTO]]
 
[[Category:HOWTO]]

Revision as of 22:51, 15 July 2015

This HOWTO page describes how to checkpoint and restore a Docker container.

Note.svg Note: This page was originally written a few months ago. Since then, interfacing with CRIU has been added to Docker's native exec driver (libcontainer) and pull requests to add checkpoint/restore functionality to Docker have been submitted. If you just want to experiment with C/R, you can use one of the following Docker versions for your C/R experiments:

Docker 1.5 [1] Docker 1.7 [2]

Template:OverlayFS

Template:Async IO

Introduction

There are two ways to checkpoint and restore a Docker container:

1. External C/R using CRIU directly on the command line as it's typically done for any process tree.

This approach is called external because it's happening external to the Docker daemon. After checkpoint, the Docker daemon thinks that the container has exited. After restore, the Docker daemon doesn't know that the container is running again. Therefore, commands such as docker ps, stop, kill and logs will not work correctly.

2. Native C/R using new docker checkpoint and docker restore commands.

This approach is called native because the Docker daemon is involved in both checkpoint and restore. Therefore, its notion of the container state will be corrent. All commands such as docker ps, stop, kill and logs will work. This is obviously the preferred method of checkpointing and restoring Docker containers.

Native C/R is work in progress, say pre-alpha quality. You can watch this short demo video to see how it works. Source files for Docker 1.5 C/R are here and for Docker 1.7 C/R are https://github.com/boucher/docker/tree/cr-combined here]. The wiki page provides an overview of the project history.

OverlayFS

The following small kernel patches fix the mount id and symlink target path issues noted above:

Assuming that you are running Ubuntu Vivid (Linux kernel 3.19), here is how you can patch your kernel:

git clone  git://kernel.ubuntu.com/ubuntu/ubuntu-vivid.git
cd ubuntu-vivid
git remote add torvalds  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git remote update

git cherry-pick 155e35d4da
git cherry-pick df1a085af1
git cherry-pick f25801ee46
git cherry-pick 4bacc9c923
git cherry-pick 9391dd00d1

cp /boot/config-$(uname -r) .config
make olddefconfig
make -j 8 bzImage modules
sudo make install modules_install
sudo reboot

Async IO (AIO)

If you are using a kernel older than 3.19 and your container uses AIO, you need the following AIO kernel patches from 3.19:

External C/R

Note.svg Note: External C/R was done as proof-of-concept. Its use is discouraged and the helper script mentioned below will be deprecated in the near future.

Starting with CRIU 1.3, it is possible to checkpoint and restore a process tree running inside a Docker container. However, it's important to note that Docker needs native support for checkpoint and restore in order to maintain its parent-child relationship and to correctly keep track of container states. In other words, while CRIU can C/R a process tree, the restored tree will not become a child of Docker and, from Docker's point of view, the container's state will remain "Exited" (even after successful restore).

It's important to re-emphasize that by checkpointing and restoring a Docker container, we mean C/R of a process tree running inside a container, excluding the Docker daemon itself. As CRIU currently does not support nested PID namespaces, the C/R process tree cannot include the Docker daemon which runs in the global PID namespace.

Command Line Options

In addition to the usual CRIU command line options used when checkpointing and restoring a process tree, the following command line options are needed for Docker containers.

--root

This option has been used in the past only for restore operations that wanted to change the root of the mount namespace. It was not used for checkpoint operations.

However, because Docker by default uses the AUFS graph driver and the AUFS module in the kernel reveals branch pathnames in /proc/pid/map_files, option --root is used to specify the root of the mount namespace. Once the kernel AUFS module is fixed, it won't be necessary to specify this option anymore.

--ext-mount-map

This option is used to specify the path of the external bind mounts. Docker sets up /etc/{hostname,hosts,resolv.conf} as targets with source files outside the container's mount namespace. Older versions of Docker also bind mount /.dockerinit.

For example, assuming the default Docker configuration, /etc/hostname in the container's mount namespace is bind mounted from the source at /var/lib/docker/containers/container_id/hostname.

--manage-cgroups

When a process tree exits after a checkpoint operation, the cgroups that Docker had created for the container are removed. This option is needed during restore to move the process tree into its cgroups, re-creating them if necessary.

--evasive-devices

Docker bind mounts /dev/null on /dev/stdin for detached containers (i.e., docker run -d ...). Since earlier versions of Docker used /dev/null in the global namespace, this option tells CRIU to treat the global /dev/null and the container /dev/null as the same device.

--inherit-fd

For native C/R support, this option tells CRIU to let the restored process "inherit" its specified file descriptor (instead of restoring from checkpoint).

Restore Prework for External C/R

As mentioned earlier, by default Docker uses AUFS to set up the container's filesystem. When Docker notices that the process has exited (due to CRIU dump), it dismantles the filesystem. We need to set up the filesystem again before attempting to restore.

An External C/R Example

Below is an example to show C/R operations for a shell script that continuously appends a number to a file. You can use tail -f to see the process in action.

As you will see below, after restore, the process's parent is PID 1 (init), not Docker. Also, although the process has been successfully restored, Docker still thinks that the container has exited.

To set up the container's AUFS filesystem before restore, its branch information should be saved before checkpointing the container. For convenience, however, AUFS branch information is saved in the dump.log file. So we can examine dump.log to set up the filesystem again.

For brevity, the 64-character long container ID is replaced by the string <container_id> in the following lines.

$ docker run -d busybox:latest /bin/sh -c 'i=0; while true; do echo $i >> /foo; i=$(expr $i + 1); sleep 3; done'
<container_id>
$ 
$ docker ps
CONTAINER ID  IMAGE           COMMAND           CREATED        STATUS
168aefb8881b  busybox:latest  "/bin/sh -c 'i=0; 6 seconds ago  Up 4 seconds
$ 
$ sudo criu dump -o dump.log -v4 -t 17810 \
	-D /tmp/img/<container_id> \
	--root /var/lib/docker/aufs/mnt/<container_id> \
	--ext-mount-map /etc/resolv.conf:/etc/resolv.conf \
	--ext-mount-map /etc/hosts:/etc/hosts \
	--ext-mount-map /etc/hostname:/etc/hostname \
	--ext-mount-map /.dockerinit:/.dockerinit \
	--manage-cgroups \
	--evasive-devices
$
$ sudo grep successful /tmp/img/<container_id>/dump.log
(00.020103) Dumping finished successfully
$
$ docker ps -a
CONTAINER ID  IMAGE           COMMAND           CREATED        STATUS
168aefb8881b  busybox:latest  "/bin/sh -c 'i=0; 6 minutes ago  Exited (-1) 4 minutes ago
$
$ sudo mount -t aufs -o br=\
/var/lib/docker/aufs/diff/<container_id>:\
/var/lib/docker/aufs/diff/<container_id>-init:\
/var/lib/docker/aufs/diff/a9eb172552348a9a49180694790b33a1097f546456d041b6e82e4d7716ddb721:\
/var/lib/docker/aufs/diff/120e218dd395ec314e7b6249f39d2853911b3d6def6ea164ae05722649f34b16:\
/var/lib/docker/aufs/diff/42eed7f1bf2ac3f1610c5e616d2ab1ee9c7290234240388d6297bc0f32c34229:\
/var/lib/docker/aufs/diff/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158:\
none /var/lib/docker/aufs/mnt/<container_id>
$
$ sudo criu restore -o restore.log -v4 -d
	-D /tmp/img/<container_id> \
	--root /var/lib/docker/aufs/mnt/<container_id> \
	--ext-mount-map /etc/resolv.conf:/var/lib/docker/containers/<container_id>/resolv.conf \
	--ext-mount-map /etc/hosts:/var/lib/docker/containers/<container_id>/hosts \
	--ext-mount-map /etc/hostname:/var/lib/docker/containers/<container_id>/hostname \
	--ext-mount-map /.dockerinit:/var/lib/docker/init/dockerinit-1.0.0 \
	--manage-cgroups \
	--evasive-devices
$
$ sudo grep successful /tmp/img/<container_id>/restore.log
(00.424428) Restore finished successfully. Resuming tasks.
$
$ ps -ef | grep /bin/sh
root     18580     1  0 12:38 ?        00:00:00 /bin/sh -c i=0; while true; do echo $i >> /foo; i=$(expr $i + 1); sleep 3; done
$
$ docker ps -a
CONTAINER ID  IMAGE           COMMAND           CREATED        STATUS
168aefb8881b  busybox:latest  "/bin/sh -c 'i=0; 7 minutes ago  Exited (-1) 5 minutes ago
$

External C/R Helper Script

As seen in the above examples, the CRIU command line for checkpointing and restoring a Docker container is pretty long. For restore, there is also an additional step to set up the root filesystem before invoking CRIU.

To automate the C/R process, there is a helper script in the contrib subdirectory of CRIU sources, called docker_cr.sh. In addition to invoking CRIU, this helper script sets up the root filesystem for AUFS, UnionFS, and VFS for restore.

With docker_cr.sh, all you have to provide is the container ID. If you don't specify a container ID, docker_cr.sh will list all running containers and prompt you to choose one. Also, as shown in the help output below, by setting the appropriate environment variable, it's possible to tell docker_cr.sh which Docker and CRIU binaries to use, where Docker's home directory is, and where CRIU should save and look for its image files.

# docker_cr.sh --help
Usage:
	docker_cr.sh -c|-r [-hv] [<container_id>]
	-c, --checkpoint	checkpoint container
	-h, --help		print help message
	-r, --restore		restore container
	-v, --verbose		enable verbose mode

Environment:
	DOCKER_HOME		(default /var/lib/docker)
	CRIU_IMG_DIR		(default /var/lib/docker/criu_img)
	DOCKER_BINARY		(default docker)
	CRIU_BINARY		(default criu)

Below is an example to checkpoint and restore Docker container 4397:

# docker_cr.sh -c 4397
dump successful
# docker_cr.sh -r 4397
restore successful

Optionally, you can specify -v to see the commands that docker_cr.sh executes. For example:

# docker_cr.sh -c -v 40d3
docker binary: docker
criu binary: criu
image directory: /var/lib/docker/criu_img/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf
container root directory: /var/lib/docker/aufs/mnt/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf

criu dump -v4 -D /var/lib/docker/criu_img/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf -o dump.log \
     --manage-cgroups --evasive-devices \
     --ext-mount-map /etc/resolv.conf:/etc/resolv.conf \
     --ext-mount-map /etc/hosts:/etc/hosts \
     --ext-mount-map /etc/hostname:/etc/hostname \
     --ext-mount-map /.dockerinit:/.dockerinit \
     -t 5991 --root /var/lib/docker/aufs/mnt/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf

dump successful
(00.020827) Dumping finished successfully

# docker_cr.sh -r -v 40d3
docker binary: docker
criu binary: criu
image directory: /var/lib/docker/criu_img/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf
container root directory: /var/lib/docker/aufs/mnt/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf

mount -t aufs -o
/var/lib/docker/aufs/diff/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf
/var/lib/docker/aufs/diff/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf-init
/var/lib/docker/aufs/diff/a9eb172552348a9a49180694790b33a1097f546456d041b6e82e4d7716ddb721
/var/lib/docker/aufs/diff/120e218dd395ec314e7b6249f39d2853911b3d6def6ea164ae05722649f34b16
/var/lib/docker/aufs/diff/42eed7f1bf2ac3f1610c5e616d2ab1ee9c7290234240388d6297bc0f32c34229
/var/lib/docker/aufs/diff/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158
none
/var/lib/docker/aufs/mnt/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf

criu restore -v4 -D /var/lib/docker/criu_img/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf \
     -o restore.log --manage-cgroups --evasive-devices \
     --ext-mount-map /etc/resolv.conf:/var/lib/docker/containers/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf/resolv.conf \
     --ext-mount-map /etc/hosts:/var/lib/docker/containers/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf/hosts \
     --ext-mount-map /etc/hostname:/var/lib/docker/containers/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf/hostname \
     --ext-mount-map /.dockerinit:/var/lib/docker/init/dockerinit-1.0.0 \
     -d --root /var/lib/docker/aufs/mnt/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf \
     --pidfile /var/lib/docker/criu_img/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf/restore.pid

restore successful
(00.408807) Restore finished successfully. Resuming tasks.

root      6206     1  1 10:49 ?        00:00:00 /bin/sh -c i=0; while true; do echo $i >> /foo; i=$(expr $i + 1); sleep 3; done