Difference between revisions of "Docker"

From CRIU
Jump to navigation Jump to search
(25 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
This HOWTO page describes how to checkpoint and restore a Docker container.
 
This HOWTO page describes how to checkpoint and restore a Docker container.
 
{{Note| This page was originally written a few months ago.  Since then, interfacing with CRIU has been added to Docker's native exec driver (libcontainer) and pull requests to add checkpoint/restore functionality to Docker have been submitted.  If you just want to experiment with C/R, you can use one of the following Docker versions for your C/R experiments:
 
 
Docker 1.5 [https://github.com/SaiedKazemi/docker/wiki]
 
Docker 1.7 [https://github.com/boucher/docker/tree/cr-combined]}}
 
 
{{Note| The OverlayFS filesystem was merged into the upstream Linux kernel 3.18 and is now Docker's preferred filesystem (instead of AUFS).  However, there is a bug in OverlayFS that reports the wrong mnt_id in /proc/<pid>/fdinfo/<fd> and the wrong symlink target path for /proc/<pid>/<fd>.  Fortunately, these bugs have been fixed in the kernel v4.2-rc2.  See below for instructions on how to apply the relevant patches.}}
 
 
{{Note| If your process uses async IO and your kernel is older than 3.19, you need to apply two patches.  See below for instructions.}}
 
  
 
== Introduction ==
 
== Introduction ==
  
There are two ways to checkpoint and restore a Docker container:
+
Docker wants to manage the full lifecycle of processes running inside one if its containers, which makes it important for CRIU and Docker to work closely together when trying to checkpoint and restore a container. This is being achieved by adding the ability to checkpoint and restore directly into Docker itself, powered under the hood by CRIU. This integration is a work in progress, and its status will be outlined below.
  
'''1. External C/R''' using CRIU directly on the command line as it's typically
+
== Docker Experimental ==
done for any process tree.
 
  
This approach is called external because it's happening external to the
+
Checkpoint & Restore is now available in the _experimental_ runtime mode for Docker. Simply start your docker daemon with '''--experimental''' to enable the feature.
Docker daemon. After checkpoint, the Docker daemon thinks that the
 
container has exited.  After restore, the Docker daemon doesn't know that
 
the container is running again.  Therefore, commands such as
 
<code>docker ps, stop, kill</code> and <code>logs</code>
 
will not work correctly.
 
  
'''2. Native C/R''' using new <code>docker checkpoint</code> and
+
=== Dependencies ===
<code>docker restore</code> commands.
 
  
This approach is called native because the Docker daemon is involved in both checkpoint and restore.
+
In addition to installing version 1.13 of Docker, you need '''CRIU''' installed on your system, with at least version 2.0. You also need some shared libraries on your system. The most likely things you'll need to install are '''libprotobuf-c''' and '''libnl-3'''. Here's an output of <code>ldd</code> on my system:
Therefore, its notion of the container state will be corrent.  All commands such as
 
<code>docker ps, stop, kill </code> and <code>logs</code> will work.
 
This is obviously the preferred method of checkpointing and restoring Docker containers.
 
  
Native C/R is work in progress, say pre-alpha quality.
+
$ ldd `which criu`
You can watch this short demo
+
    linux-vdso.so.1 =>  (0x00007ffc09fda000)
'''[https://www.youtube.com/watch?v=HFt9v6yqsXo video]'''
+
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd28b2c7000)
to see how it works.
+
    libprotobuf-c.so.0 => /usr/lib/x86_64-linux-gnu/libprotobuf-c.so.0 (0x00007fd28b0b7000)
Source files for Docker 1.5 C/R are
+
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd28aeb2000)
'''[https://github.com/SaiedKazemi/docker/tree/cr here]'''
+
    libnl-3.so.200 => /lib/x86_64-linux-gnu/libnl-3.so.200 (0x00007fd28ac98000)
and for Docker 1.7 C/R are
+
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd28a8d3000)
'''https://github.com/boucher/docker/tree/cr-combined here]'''.
+
    /lib64/ld-linux-x86-64.so.2 (0x000056386bb38000)
The '''[https://github.com/SaiedKazemi/docker/wiki wiki]'''
+
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd28a5cc000)
page provides an overview of the project history.
 
  
== OverlayFS ==
+
=== checkpoint ===  
  
The following small kernel patches fix the mount id and symlink target path issues noted above:
+
There's a top level <code>checkpoint</code> sub-command in Docker, which lets you create a new checkpoint, and list or delete an existing checkpoint. These checkpoints are stored and managed by Docker, unless you specify a custom storage path.
  
* {{torvalds.git|155e35d4da}} by David Howells
+
Here's an example of creating a checkpoint, from a container that simply logs an integer in a loop.
* {{torvalds.git|df1a085af1}} by David Howells
 
* {{torvalds.git|f25801ee46}} by David Howells
 
* {{torvalds.git|4bacc9c923}} by David Howells
 
* {{torvalds.git|9391dd00d1}} by Al Viro
 
  
Assuming that you are running Ubuntu Vivid (Linux kernel 3.19), here is how you can patch your kernel:
+
First, we create container:
  
<pre>
+
  $ docker run -d --name looper --security-opt seccomp:unconfined busybox  \
git clone git://kernel.ubuntu.com/ubuntu/ubuntu-vivid.git
+
          /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
cd ubuntu-vivid
 
git remote add torvalds  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
 
git remote update
 
  
git cherry-pick 155e35d4da
+
You can verify the container is running by printings its logs:
git cherry-pick df1a085af1
 
git cherry-pick f25801ee46
 
git cherry-pick 4bacc9c923
 
git cherry-pick 9391dd00d1
 
  
cp /boot/config-$(uname -r) .config
+
$ docker logs looper
make olddefconfig
 
make -j 8 bzImage modules
 
sudo make install modules_install
 
sudo reboot
 
</pre>
 
  
== Async IO (AIO) ==
+
If you do this a few times you'll notice the integer increasing. Now, we checkpoint the container:
  
If you are using a kernel older than 3.19 and your container uses AIO, you need the following AIO kernel patches from 3.19:
+
$ docker checkpoint create looper checkpoint1
  
* {{torvalds.git|bd9b51e79c}} by Al Viro
+
You should see that the process is no longer running, and if you print the logs a few times no new logs will be printed.
* {{torvalds.git|e4a0d3e720}} by Pavel Emelyanov
 
  
== External C/R ==
+
=== restore ===
  
{{Note| External C/R was done as proof-of-concept.  Its use is discouraged and the helper script mentioned below will be deprecated in the near future.}}
+
Unlike creating a checkpoint, restoring from a checkpoint is just a flag provided to the normal container '''start''' call. Here's an example:
  
Starting with CRIU 1.3, it is possible to checkpoint and restore a
+
  $ docker start --checkpoint checkpoint1 looper
process tree running inside a Docker container. However, it's
 
important to note that Docker needs native support for checkpoint
 
and restore in order to maintain its parent-child relationship and
 
to correctly keep track of container states.  In other words, while
 
CRIU can C/R a process tree, the restored tree will not become a
 
child of Docker and, from Docker's point of view, the container's
 
state will remain "Exited" (even after successful restore).
 
  
It's important to re-emphasize that by checkpointing and restoring
+
If we then print the logs, you should see they start from where we left off and continue to increase.  
a Docker container, we mean C/R of a process tree running inside a
 
container, excluding the Docker daemon itself.  As CRIU currently
 
does not support nested PID namespaces, the C/R process tree cannot
 
include the Docker daemon which runs in the global PID namespace.
 
  
== Command Line Options ==
+
==== Restoring into a '''new''' container ====
  
In addition to the usual CRIU command line options used when
+
Beyond the straightforward case of checkpointing and restoring the same container, it's also possible to checkpoint one container, and then restore the checkpoint into a completely different container. This is done by providing a custom storage path with the <code>--checkpoint-dir</code> option. Here's a slightly revised example from before:
checkpointing and restoring a process tree, the following command
 
line options are needed for Docker containers.
 
  
=== <code>--root</code> ===
+
$ docker run -d --name looper2 --security-opt seccomp:unconfined busybox \
 +
          /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
 +
 +
# wait a few seconds to give the container an opportunity to print a few lines, then
 +
$ docker checkpoint create --checkpoint-dir=/tmp looper2 checkpoint2
 +
 +
$ docker create --name looper-clone --security-opt seccomp:unconfined busybox \
 +
          /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
 +
 +
$ docker start --checkpoint-dir=/tmp --checkpoint=checkpoint2 looper-clone
  
This option has been used in the past only for restore operations
 
that wanted to change the root of the mount namespace.  It was not
 
used for checkpoint operations.
 
  
However, because Docker by default uses the AUFS graph driver and
+
You should be able to print the logs from <code>looper-clone</code> and see that they start from wherever the logs of <code>looper</code> end.
the AUFS module in the kernel reveals branch pathnames in
 
<code>/proc/''pid''/map_files</code>, option <code>--root</code>
 
is used to specify the root of the
 
mount namespace.  Once the kernel AUFS module is fixed, it won't
 
be necessary to specify this option anymore.
 
  
=== <code>--ext-mount-map</code> ===
+
=== usage ===
  
This option is used to specify the path of the external bind mounts.
+
Checkpoint
Docker sets up <code>/etc/{hostname,hosts,resolv.conf}</code> as targets with
 
source files outside the container's mount namespace.  Older versions
 
of Docker also bind mount <code>/.dockerinit</code>.
 
  
For example, assuming the default Docker configuration, <code>/etc/hostname</code>
+
  # docker checkpoint create --help
in the container's mount namespace is bind mounted from the source
+
  Usage: docker checkpoint create [OPTIONS] CONTAINER CHECKPOINT
at <code>/var/lib/docker/containers/''container_id''/hostname</code>.
 
  
=== <code>--manage-cgroups</code> ===
+
  Create a checkpoint from a running container
  
When a process tree exits after a checkpoint operation, the cgroups
+
  Options:
that Docker had created for the container are removed.  This option
+
      --checkpoint-dir string  Use a custom checkpoint storage directory
is needed during restore to move the process tree into its cgroups,
+
      --help                    Print usage
re-creating them if necessary.
+
      --leave-running          Leave the container running after checkpoint
  
=== <code>--evasive-devices</code> ===
+
Restore
  
Docker bind mounts <code>/dev/null</code> on <code>/dev/stdin</code> for detached containers
+
  # docker start --help
(i.e., <code>docker run -d ...</code>).  Since earlier versions of Docker used
+
  Usage: docker start [OPTIONS] CONTAINER [CONTAINER...]
<code>/dev/null</code> in the global namespace, this option tells CRIU to treat
 
the global <code>/dev/null</code> and the container <code>/dev/null</code> as the same device.
 
  
=== <code>--inherit-fd</code> ===
+
  Start one or more stopped containers
  
For native C/R support, this option tells CRIU to let the restored process "inherit"
+
  Options:
its specified file descriptor (instead of restoring from checkpoint).
+
  -a, --attach                  Attach STDOUT/STDERR and forward signals
 +
      --checkpoint string      Restore from this checkpoint
 +
      --checkpoint-dir string  Use a custom checkpoint storage directory
 +
      --detach-keys string      Override the key sequence for detaching a container
 +
      --help                    Print usage
 +
  -i, --interactive            Attach container's STDIN
  
== Restore Prework for External C/R ==
+
== Integration Status ==  
  
As mentioned earlier, by default Docker uses AUFS to set up the
+
CRIU has already been integrated into the lower level components that power Docker, namely '''runc''' and '''containerd'''. The final step in the process is to integrate with Docker itself. You can track the status of that process in [https://github.com/docker/docker/pull/22049 this pull request].
container's filesystem. When Docker notices that the process has
 
exited (due to CRIU dump), it dismantles the filesystem. We need
 
to set up the filesystem again before attempting to restore.
 
  
== An External C/R Example ==
+
== Compatibility Notes ==
  
Below is an example to show C/R operations for a shell script that
+
The latest versions of the Docker integration require at least version 2.0 of CRIU in order to work correctly. Additionally, depending on the storage driver being used by Docker, and other factors, there may be other compatibility issues that will attempt to be listed here.
continuously appends a number to a file.  You can use tail -f to
 
see the process in action.
 
  
As you will see below, after restore, the process's parent is PID
+
=== TTY ===
1 (init), not Docker.  Also, although the process has been successfully
 
restored, Docker still thinks that the container has exited.
 
  
To set up the container's AUFS filesystem before restore, its branch
+
Checkpointing an interactive container is currently not supported.  
information should be saved before checkpointing the container.
 
For convenience, however, AUFS branch information is saved in the
 
dump.log file.  So we can examine dump.log to set up the filesystem
 
again.
 
  
For brevity, the 64-character long container ID is replaced by the
+
=== Seccomp ===
string <container_id> in the following lines.
 
  
<pre>
+
You'll notice that all of the above examples disable Docker's default seccomp support. In order to use seccomp, you'll need a newer version of the Kernel. **Update Needed with Exact Version**
$ docker run -d busybox:latest /bin/sh -c 'i=0; while true; do echo $i >> /foo; i=$(expr $i + 1); sleep 3; done'
 
<container_id>
 
$
 
$ docker ps
 
CONTAINER ID  IMAGE          COMMAND          CREATED        STATUS
 
168aefb8881b  busybox:latest  "/bin/sh -c 'i=0; 6 seconds ago  Up 4 seconds
 
$
 
$ sudo criu dump -o dump.log -v4 -t 17810 \
 
-D /tmp/img/<container_id> \
 
--root /var/lib/docker/aufs/mnt/<container_id> \
 
--ext-mount-map /etc/resolv.conf:/etc/resolv.conf \
 
--ext-mount-map /etc/hosts:/etc/hosts \
 
--ext-mount-map /etc/hostname:/etc/hostname \
 
--ext-mount-map /.dockerinit:/.dockerinit \
 
--manage-cgroups \
 
--evasive-devices
 
$
 
$ sudo grep successful /tmp/img/<container_id>/dump.log
 
(00.020103) Dumping finished successfully
 
$
 
$ docker ps -a
 
CONTAINER ID  IMAGE          COMMAND          CREATED        STATUS
 
168aefb8881b  busybox:latest  "/bin/sh -c 'i=0; 6 minutes ago  Exited (-1) 4 minutes ago
 
$
 
$ sudo mount -t aufs -o br=\
 
/var/lib/docker/aufs/diff/<container_id>:\
 
/var/lib/docker/aufs/diff/<container_id>-init:\
 
/var/lib/docker/aufs/diff/a9eb172552348a9a49180694790b33a1097f546456d041b6e82e4d7716ddb721:\
 
/var/lib/docker/aufs/diff/120e218dd395ec314e7b6249f39d2853911b3d6def6ea164ae05722649f34b16:\
 
/var/lib/docker/aufs/diff/42eed7f1bf2ac3f1610c5e616d2ab1ee9c7290234240388d6297bc0f32c34229:\
 
/var/lib/docker/aufs/diff/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158:\
 
none /var/lib/docker/aufs/mnt/<container_id>
 
$
 
$ sudo criu restore -o restore.log -v4 -d
 
-D /tmp/img/<container_id> \
 
--root /var/lib/docker/aufs/mnt/<container_id> \
 
--ext-mount-map /etc/resolv.conf:/var/lib/docker/containers/<container_id>/resolv.conf \
 
--ext-mount-map /etc/hosts:/var/lib/docker/containers/<container_id>/hosts \
 
--ext-mount-map /etc/hostname:/var/lib/docker/containers/<container_id>/hostname \
 
--ext-mount-map /.dockerinit:/var/lib/docker/init/dockerinit-1.0.0 \
 
--manage-cgroups \
 
--evasive-devices
 
$
 
$ sudo grep successful /tmp/img/<container_id>/restore.log
 
(00.424428) Restore finished successfully. Resuming tasks.
 
$
 
$ ps -ef | grep /bin/sh
 
root    18580    1  0 12:38 ?        00:00:00 /bin/sh -c i=0; while true; do echo $i >> /foo; i=$(expr $i + 1); sleep 3; done
 
$
 
$ docker ps -a
 
CONTAINER ID  IMAGE          COMMAND          CREATED        STATUS
 
168aefb8881b  busybox:latest  "/bin/sh -c 'i=0; 7 minutes ago  Exited (-1) 5 minutes ago
 
$
 
</pre>
 
  
== External C/R Helper Script ==
+
=== OverlayFS ===
  
As seen in the above examples, the CRIU command line for checkpointing and
+
There is a bug in OverlayFS that reports the wrong mnt_id in /proc/<pid>/fdinfo/<fd> and the wrong symlink target path for /proc/<pid>/<fd>Fortunately, these bugs have been fixed in the kernel v4.2-rc2. The following small kernel patches fix the mount id and symlink target path issue:
restoring a Docker container is pretty longFor restore, there is also
 
an additional step to set up the root filesystem before invoking CRIU.
 
  
To automate the C/R process, there is a helper script in the contrib
+
* {{torvalds.git|155e35d4da}} by David Howells
subdirectory of CRIU sources, called docker_cr.sh. In addition to
+
* {{torvalds.git|df1a085af1}} by David Howells
invoking CRIU, this helper script sets up the root filesystem for AUFS,
+
* {{torvalds.git|f25801ee46}} by David Howells
UnionFS, and VFS for restore.
+
* {{torvalds.git|4bacc9c923}} by David Howells
 +
* {{torvalds.git|9391dd00d1}} by Al Viro
  
With docker_cr.sh, all you have to provide is the container ID.
+
Assuming that you are running Ubuntu Vivid (Linux kernel 3.19), here is how you can patch your kernel:
If you don't specify a container ID, docker_cr.sh will list all running
 
containers and prompt you to choose one. Also, as shown in the help
 
output below, by setting the appropriate environment variable, it's
 
possible to tell docker_cr.sh which Docker and CRIU binaries to use,
 
where Docker's home directory is, and where CRIU should save and look
 
for its image files.
 
  
 
<pre>
 
<pre>
# docker_cr.sh --help
+
git clone  git://kernel.ubuntu.com/ubuntu/ubuntu-vivid.git
Usage:
+
cd ubuntu-vivid
docker_cr.sh -c|-r [-hv] [<container_id>]
+
git remote add torvalds  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
-c, --checkpoint checkpoint container
+
git remote update
-h, --help print help message
 
-r, --restore restore container
 
-v, --verbose enable verbose mode
 
  
Environment:
+
git cherry-pick 155e35d4da
DOCKER_HOME (default /var/lib/docker)
+
git cherry-pick df1a085af1
CRIU_IMG_DIR (default /var/lib/docker/criu_img)
+
git cherry-pick f25801ee46
DOCKER_BINARY (default docker)
+
git cherry-pick 4bacc9c923
CRIU_BINARY (default criu)
+
git cherry-pick 9391dd00d1
</pre>
 
  
Below is an example to checkpoint and restore Docker container 4397:
+
cp /boot/config-$(uname -r) .config
 
+
make olddefconfig
<pre>
+
make -j 8 bzImage modules
# docker_cr.sh -c 4397
+
sudo make install modules_install
dump successful
+
sudo reboot
# docker_cr.sh -r 4397
 
restore successful
 
 
</pre>
 
</pre>
  
Optionally, you can specify <code>-v</code> to see the commands that <code>docker_cr.sh</code>
+
=== Async IO ===
executes.  For example:
 
  
<pre>
+
If you are using a kernel older than 3.19 and your container uses AIO, you need the following AIO kernel patches from 3.19:
# docker_cr.sh -c -v 40d3
 
docker binary: docker
 
criu binary: criu
 
image directory: /var/lib/docker/criu_img/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf
 
container root directory: /var/lib/docker/aufs/mnt/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf
 
  
criu dump -v4 -D /var/lib/docker/criu_img/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf -o dump.log \
+
* {{torvalds.git|bd9b51e79c}} by Al Viro
    --manage-cgroups --evasive-devices \
+
* {{torvalds.git|e4a0d3e720}} by Pavel Emelyanov
    --ext-mount-map /etc/resolv.conf:/etc/resolv.conf \
 
    --ext-mount-map /etc/hosts:/etc/hosts \
 
    --ext-mount-map /etc/hostname:/etc/hostname \
 
    --ext-mount-map /.dockerinit:/.dockerinit \
 
    -t 5991 --root /var/lib/docker/aufs/mnt/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf
 
 
 
dump successful
 
(00.020827) Dumping finished successfully
 
  
# docker_cr.sh -r -v 40d3
+
== External Checkpoint Restore ==
docker binary: docker
 
criu binary: criu
 
image directory: /var/lib/docker/criu_img/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf
 
container root directory: /var/lib/docker/aufs/mnt/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf
 
  
mount -t aufs -o
+
{{Note| External C/R was done as proof-of-concept. Its use is highly discouraged.}}
/var/lib/docker/aufs/diff/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf
 
/var/lib/docker/aufs/diff/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf-init
 
/var/lib/docker/aufs/diff/a9eb172552348a9a49180694790b33a1097f546456d041b6e82e4d7716ddb721
 
/var/lib/docker/aufs/diff/120e218dd395ec314e7b6249f39d2853911b3d6def6ea164ae05722649f34b16
 
/var/lib/docker/aufs/diff/42eed7f1bf2ac3f1610c5e616d2ab1ee9c7290234240388d6297bc0f32c34229
 
/var/lib/docker/aufs/diff/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158
 
none
 
/var/lib/docker/aufs/mnt/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf
 
 
 
criu restore -v4 -D /var/lib/docker/criu_img/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf \
 
    -o restore.log --manage-cgroups --evasive-devices \
 
    --ext-mount-map /etc/resolv.conf:/var/lib/docker/containers/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf/resolv.conf \
 
    --ext-mount-map /etc/hosts:/var/lib/docker/containers/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf/hosts \
 
    --ext-mount-map /etc/hostname:/var/lib/docker/containers/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf/hostname \
 
    --ext-mount-map /.dockerinit:/var/lib/docker/init/dockerinit-1.0.0 \
 
    -d --root /var/lib/docker/aufs/mnt/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf \
 
    --pidfile /var/lib/docker/criu_img/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf/restore.pid
 
 
 
restore successful
 
(00.408807) Restore finished successfully. Resuming tasks.
 
 
 
root      6206    1  1 10:49 ?        00:00:00 /bin/sh -c i=0; while true; do echo $i >> /foo; i=$(expr $i + 1); sleep 3; done
 
</pre>
 
  
[[Category:HOWTO]]
+
Although it's not recommended, you can also learn more about using CRIU without integrating with docker: [[Docker_External]].

Revision as of 21:07, 24 January 2017

This HOWTO page describes how to checkpoint and restore a Docker container.

Introduction

Docker wants to manage the full lifecycle of processes running inside one if its containers, which makes it important for CRIU and Docker to work closely together when trying to checkpoint and restore a container. This is being achieved by adding the ability to checkpoint and restore directly into Docker itself, powered under the hood by CRIU. This integration is a work in progress, and its status will be outlined below.

Docker Experimental

Checkpoint & Restore is now available in the _experimental_ runtime mode for Docker. Simply start your docker daemon with --experimental to enable the feature.

Dependencies

In addition to installing version 1.13 of Docker, you need CRIU installed on your system, with at least version 2.0. You also need some shared libraries on your system. The most likely things you'll need to install are libprotobuf-c and libnl-3. Here's an output of ldd on my system:

$ ldd `which criu`
   	linux-vdso.so.1 =>  (0x00007ffc09fda000)
   	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd28b2c7000)
   	libprotobuf-c.so.0 => /usr/lib/x86_64-linux-gnu/libprotobuf-c.so.0 (0x00007fd28b0b7000)
   	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd28aeb2000)
   	libnl-3.so.200 => /lib/x86_64-linux-gnu/libnl-3.so.200 (0x00007fd28ac98000)
   	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd28a8d3000)
   	/lib64/ld-linux-x86-64.so.2 (0x000056386bb38000)
   	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd28a5cc000)

checkpoint

There's a top level checkpoint sub-command in Docker, which lets you create a new checkpoint, and list or delete an existing checkpoint. These checkpoints are stored and managed by Docker, unless you specify a custom storage path.

Here's an example of creating a checkpoint, from a container that simply logs an integer in a loop.

First, we create container:

$ docker run -d --name looper --security-opt seccomp:unconfined busybox  \
         /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'

You can verify the container is running by printings its logs:

$ docker logs looper

If you do this a few times you'll notice the integer increasing. Now, we checkpoint the container:

$ docker checkpoint create looper checkpoint1

You should see that the process is no longer running, and if you print the logs a few times no new logs will be printed.

restore

Unlike creating a checkpoint, restoring from a checkpoint is just a flag provided to the normal container start call. Here's an example:

$ docker start --checkpoint checkpoint1 looper

If we then print the logs, you should see they start from where we left off and continue to increase.

Restoring into a new container

Beyond the straightforward case of checkpointing and restoring the same container, it's also possible to checkpoint one container, and then restore the checkpoint into a completely different container. This is done by providing a custom storage path with the --checkpoint-dir option. Here's a slightly revised example from before:

$ docker run -d --name looper2 --security-opt seccomp:unconfined busybox \
         /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'

# wait a few seconds to give the container an opportunity to print a few lines, then
$ docker checkpoint create --checkpoint-dir=/tmp looper2 checkpoint2

$ docker create --name looper-clone --security-opt seccomp:unconfined busybox \
         /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'

$ docker start --checkpoint-dir=/tmp --checkpoint=checkpoint2 looper-clone


You should be able to print the logs from looper-clone and see that they start from wherever the logs of looper end.

usage

Checkpoint

 # docker checkpoint create --help
 Usage:	docker checkpoint create [OPTIONS] CONTAINER CHECKPOINT
 Create a checkpoint from a running container
 Options:
     --checkpoint-dir string   Use a custom checkpoint storage directory
     --help                    Print usage
     --leave-running           Leave the container running after checkpoint

Restore

  # docker start --help
  Usage:	docker start [OPTIONS] CONTAINER [CONTAINER...]
 Start one or more stopped containers
 Options:
 -a, --attach                  Attach STDOUT/STDERR and forward signals
     --checkpoint string       Restore from this checkpoint
     --checkpoint-dir string   Use a custom checkpoint storage directory
     --detach-keys string      Override the key sequence for detaching a container
     --help                    Print usage
 -i, --interactive             Attach container's STDIN

Integration Status

CRIU has already been integrated into the lower level components that power Docker, namely runc and containerd. The final step in the process is to integrate with Docker itself. You can track the status of that process in this pull request.

Compatibility Notes

The latest versions of the Docker integration require at least version 2.0 of CRIU in order to work correctly. Additionally, depending on the storage driver being used by Docker, and other factors, there may be other compatibility issues that will attempt to be listed here.

TTY

Checkpointing an interactive container is currently not supported.

Seccomp

You'll notice that all of the above examples disable Docker's default seccomp support. In order to use seccomp, you'll need a newer version of the Kernel. **Update Needed with Exact Version**

OverlayFS

There is a bug in OverlayFS that reports the wrong mnt_id in /proc/<pid>/fdinfo/<fd> and the wrong symlink target path for /proc/<pid>/<fd>. Fortunately, these bugs have been fixed in the kernel v4.2-rc2. The following small kernel patches fix the mount id and symlink target path issue:

Assuming that you are running Ubuntu Vivid (Linux kernel 3.19), here is how you can patch your kernel:

git clone  git://kernel.ubuntu.com/ubuntu/ubuntu-vivid.git
cd ubuntu-vivid
git remote add torvalds  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git remote update

git cherry-pick 155e35d4da
git cherry-pick df1a085af1
git cherry-pick f25801ee46
git cherry-pick 4bacc9c923
git cherry-pick 9391dd00d1

cp /boot/config-$(uname -r) .config
make olddefconfig
make -j 8 bzImage modules
sudo make install modules_install
sudo reboot

Async IO

If you are using a kernel older than 3.19 and your container uses AIO, you need the following AIO kernel patches from 3.19:

External Checkpoint Restore

Note.svg Note: External C/R was done as proof-of-concept. Its use is highly discouraged.

Although it's not recommended, you can also learn more about using CRIU without integrating with docker: Docker_External.