Line 168: |
Line 168: |
| == External Checkpoint Restore == | | == External Checkpoint Restore == |
| | | |
− | {{Note| External C/R was done as proof-of-concept. Its use is discouraged and the helper script mentioned below will be deprecated in the near future.}} | + | {{Note| External C/R was done as proof-of-concept. Its use is highly discouraged.}} |
| | | |
− | This approach is called external because it's happening external to the
| + | Although it's not recommended, you can also learn more about using CRIU without integrating with docker: [[Docker_External]]. |
− | Docker daemon. After checkpoint, the Docker daemon thinks that the
| |
− | container has exited. After restore, the Docker daemon doesn't know that
| |
− | the container is running again. Therefore, commands such as
| |
− | <code>docker ps, stop, kill</code> and <code>logs</code>
| |
− | will not work correctly.
| |
− | | |
− | Starting with CRIU 1.3, it is possible to checkpoint and restore a
| |
− | process tree running inside a Docker container. However, it's
| |
− | important to note that Docker needs native support for checkpoint
| |
− | and restore in order to maintain its parent-child relationship and
| |
− | to correctly keep track of container states. In other words, while
| |
− | CRIU can C/R a process tree, the restored tree will not become a
| |
− | child of Docker and, from Docker's point of view, the container's
| |
− | state will remain "Exited" (even after successful restore).
| |
− | | |
− | It's important to re-emphasize that by checkpointing and restoring
| |
− | a Docker container, we mean C/R of a process tree running inside a
| |
− | container, excluding the Docker daemon itself. As CRIU currently
| |
− | does not support nested PID namespaces, the C/R process tree cannot
| |
− | include the Docker daemon which runs in the global PID namespace.
| |
− | | |
− | === Command Line Options ===
| |
− | | |
− | In addition to the usual CRIU command line options used when
| |
− | checkpointing and restoring a process tree, the following command
| |
− | line options are needed for Docker containers.
| |
− | | |
− | ==== <code>--root</code> ====
| |
− | | |
− | This option has been used in the past only for restore operations
| |
− | that wanted to change the root of the mount namespace. It was not
| |
− | used for checkpoint operations.
| |
− | | |
− | However, because Docker by default uses the AUFS graph driver and
| |
− | the AUFS module in the kernel reveals branch pathnames in
| |
− | <code>/proc/''pid''/map_files</code>, option <code>--root</code>
| |
− | is used to specify the root of the
| |
− | mount namespace. Once the kernel AUFS module is fixed, it won't
| |
− | be necessary to specify this option anymore.
| |
− | | |
− | ==== <code>--ext-mount-map</code> ====
| |
− | | |
− | This option is used to specify the path of the external bind mounts.
| |
− | Docker sets up <code>/etc/{hostname,hosts,resolv.conf}</code> as targets with
| |
− | source files outside the container's mount namespace. Older versions
| |
− | of Docker also bind mount <code>/.dockerinit</code>.
| |
− | | |
− | For example, assuming the default Docker configuration, <code>/etc/hostname</code>
| |
− | in the container's mount namespace is bind mounted from the source
| |
− | at <code>/var/lib/docker/containers/''container_id''/hostname</code>.
| |
− | | |
− | ==== <code>--manage-cgroups</code> ====
| |
− | | |
− | When a process tree exits after a checkpoint operation, the cgroups
| |
− | that Docker had created for the container are removed. This option
| |
− | is needed during restore to move the process tree into its cgroups,
| |
− | re-creating them if necessary.
| |
− | | |
− | ==== <code>--evasive-devices</code> ====
| |
− | | |
− | Docker bind mounts <code>/dev/null</code> on <code>/dev/stdin</code> for detached containers
| |
− | (i.e., <code>docker run -d ...</code>). Since earlier versions of Docker used
| |
− | <code>/dev/null</code> in the global namespace, this option tells CRIU to treat
| |
− | the global <code>/dev/null</code> and the container <code>/dev/null</code> as the same device.
| |
− | | |
− | ==== <code>--inherit-fd</code> ====
| |
− | | |
− | For native C/R support, this option tells CRIU to let the restored process "inherit"
| |
− | its specified file descriptor (instead of restoring from checkpoint).
| |
− | | |
− | === Restore Prework for External C/R ===
| |
− | | |
− | Docker supports many storage drivers (AKA graph drivers) including
| |
− | AUFS, Btrfs, ZFS, DeviceMapper, OverlayFS, and VFS. The user can
| |
− | specify his/her desired storage driver via the <code>DOCKER_DRIVER</code>
| |
− | environment variable or the <code>-s (--storage-driver)</code> command
| |
− | line option.
| |
− | | |
− | Currently C/R can only be done on containers using either AUFS, OverlayFS, or VFS.
| |
− | In the following example, we assume AUFS.
| |
− | | |
− | When Docker notices that the container has exited (due to CRIU dump),
| |
− | it dismantles the container's filesystem. We need to set up the container's
| |
− | filesystem again before attempting to restore.
| |
− | | |
− | === An External C/R Example ===
| |
− | | |
− | Below is an example to show C/R operations for a shell script that
| |
− | continuously appends a number to a file. You can use tail -f to
| |
− | see the process in action.
| |
− | | |
− | As you will see below, after restore, the process's parent is PID
| |
− | 1 (init), not Docker. Also, although the process has been successfully
| |
− | restored, Docker still thinks that the container has exited.
| |
− | | |
− | To set up the container's AUFS filesystem before restore, its branch
| |
− | information should be saved before checkpointing the container.
| |
− | For convenience, however, AUFS branch information is saved in the
| |
− | dump.log file. So we can examine dump.log to set up the filesystem
| |
− | again.
| |
− | | |
− | For brevity, the 64-character long container ID is replaced by the
| |
− | string <container_id> in the following lines.
| |
− | | |
− | <pre>
| |
− | $ docker run -d busybox:latest /bin/sh -c 'i=0; while true; do echo $i >> /foo; i=$(expr $i + 1); sleep 3; done'
| |
− | <container_id>
| |
− | $
| |
− | $ docker ps
| |
− | CONTAINER ID IMAGE COMMAND CREATED STATUS
| |
− | 168aefb8881b busybox:latest "/bin/sh -c 'i=0; 6 seconds ago Up 4 seconds
| |
− | $
| |
− | $ sudo criu dump -o dump.log -v4 -t 17810 \
| |
− | -D /tmp/img/<container_id> \
| |
− | --root /var/lib/docker/aufs/mnt/<container_id> \
| |
− | --ext-mount-map /etc/resolv.conf:/etc/resolv.conf \
| |
− | --ext-mount-map /etc/hosts:/etc/hosts \
| |
− | --ext-mount-map /etc/hostname:/etc/hostname \
| |
− | --ext-mount-map /.dockerinit:/.dockerinit \
| |
− | --manage-cgroups \
| |
− | --evasive-devices
| |
− | $
| |
− | $ sudo grep successful /tmp/img/<container_id>/dump.log
| |
− | (00.020103) Dumping finished successfully
| |
− | $
| |
− | $ docker ps -a
| |
− | CONTAINER ID IMAGE COMMAND CREATED STATUS
| |
− | 168aefb8881b busybox:latest "/bin/sh -c 'i=0; 6 minutes ago Exited (-1) 4 minutes ago
| |
− | $
| |
− | $ sudo mount -t aufs -o br=\
| |
− | /var/lib/docker/aufs/diff/<container_id>:\
| |
− | /var/lib/docker/aufs/diff/<container_id>-init:\
| |
− | /var/lib/docker/aufs/diff/a9eb172552348a9a49180694790b33a1097f546456d041b6e82e4d7716ddb721:\
| |
− | /var/lib/docker/aufs/diff/120e218dd395ec314e7b6249f39d2853911b3d6def6ea164ae05722649f34b16:\
| |
− | /var/lib/docker/aufs/diff/42eed7f1bf2ac3f1610c5e616d2ab1ee9c7290234240388d6297bc0f32c34229:\
| |
− | /var/lib/docker/aufs/diff/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158:\
| |
− | none /var/lib/docker/aufs/mnt/<container_id>
| |
− | $
| |
− | $ sudo criu restore -o restore.log -v4 -d
| |
− | -D /tmp/img/<container_id> \
| |
− | --root /var/lib/docker/aufs/mnt/<container_id> \
| |
− | --ext-mount-map /etc/resolv.conf:/var/lib/docker/containers/<container_id>/resolv.conf \
| |
− | --ext-mount-map /etc/hosts:/var/lib/docker/containers/<container_id>/hosts \
| |
− | --ext-mount-map /etc/hostname:/var/lib/docker/containers/<container_id>/hostname \
| |
− | --ext-mount-map /.dockerinit:/var/lib/docker/init/dockerinit-1.0.0 \
| |
− | --manage-cgroups \
| |
− | --evasive-devices
| |
− | $
| |
− | $ sudo grep successful /tmp/img/<container_id>/restore.log
| |
− | (00.424428) Restore finished successfully. Resuming tasks.
| |
− | $
| |
− | $ ps -ef | grep /bin/sh
| |
− | root 18580 1 0 12:38 ? 00:00:00 /bin/sh -c i=0; while true; do echo $i >> /foo; i=$(expr $i + 1); sleep 3; done
| |
− | $
| |
− | $ docker ps -a
| |
− | CONTAINER ID IMAGE COMMAND CREATED STATUS
| |
− | 168aefb8881b busybox:latest "/bin/sh -c 'i=0; 7 minutes ago Exited (-1) 5 minutes ago
| |
− | $
| |
− | </pre>
| |
− | | |
− | === External C/R Helper Script ===
| |
− | | |
− | As seen in the above examples, the CRIU command line for checkpointing and
| |
− | restoring a Docker container is pretty long. For restore, there is also
| |
− | an additional step to set up the root filesystem before invoking CRIU.
| |
− | | |
− | To automate the C/R process, there is a helper script in the contrib
| |
− | subdirectory of CRIU sources, called docker_cr.sh. In addition to
| |
− | invoking CRIU, this helper script sets up the root filesystem for AUFS,
| |
− | UnionFS, and VFS for restore.
| |
− | | |
− | With docker_cr.sh, all you have to provide is the container ID.
| |
− | If you don't specify a container ID, docker_cr.sh will list all running
| |
− | containers and prompt you to choose one. Also, as shown in the help
| |
− | output below, by setting the appropriate environment variable, it's
| |
− | possible to tell docker_cr.sh which Docker and CRIU binaries to use,
| |
− | where Docker's home directory is, and where CRIU should save and look
| |
− | for its image files.
| |
− | | |
− | <pre>
| |
− | # docker_cr.sh --help
| |
− | Usage:
| |
− | docker_cr.sh -c|-r [-hv] [<container_id>]
| |
− | -c, --checkpoint checkpoint container
| |
− | -h, --help print help message
| |
− | -r, --restore restore container
| |
− | -v, --verbose enable verbose mode
| |
− | | |
− | Environment:
| |
− | DOCKER_HOME (default /var/lib/docker)
| |
− | CRIU_IMG_DIR (default /var/lib/docker/criu_img)
| |
− | DOCKER_BINARY (default docker)
| |
− | CRIU_BINARY (default criu)
| |
− | </pre>
| |
− | | |
− | Below is an example to checkpoint and restore Docker container 4397:
| |
− | | |
− | <pre>
| |
− | # docker_cr.sh -c 4397
| |
− | dump successful
| |
− | # docker_cr.sh -r 4397
| |
− | restore successful
| |
− | </pre>
| |
− | | |
− | Optionally, you can specify <code>-v</code> to see the commands that <code>docker_cr.sh</code>
| |
− | executes. For example:
| |
− | | |
− | <pre>
| |
− | # docker_cr.sh -c -v 40d3
| |
− | docker binary: docker
| |
− | criu binary: criu
| |
− | image directory: /var/lib/docker/criu_img/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf
| |
− | container root directory: /var/lib/docker/aufs/mnt/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf
| |
− | | |
− | criu dump -v4 -D /var/lib/docker/criu_img/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf -o dump.log \
| |
− | --manage-cgroups --evasive-devices \
| |
− | --ext-mount-map /etc/resolv.conf:/etc/resolv.conf \
| |
− | --ext-mount-map /etc/hosts:/etc/hosts \
| |
− | --ext-mount-map /etc/hostname:/etc/hostname \
| |
− | --ext-mount-map /.dockerinit:/.dockerinit \
| |
− | -t 5991 --root /var/lib/docker/aufs/mnt/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf
| |
− | | |
− | dump successful
| |
− | (00.020827) Dumping finished successfully
| |
− | | |
− | # docker_cr.sh -r -v 40d3
| |
− | docker binary: docker
| |
− | criu binary: criu
| |
− | image directory: /var/lib/docker/criu_img/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf
| |
− | container root directory: /var/lib/docker/aufs/mnt/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf
| |
− | | |
− | mount -t aufs -o
| |
− | /var/lib/docker/aufs/diff/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf
| |
− | /var/lib/docker/aufs/diff/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf-init
| |
− | /var/lib/docker/aufs/diff/a9eb172552348a9a49180694790b33a1097f546456d041b6e82e4d7716ddb721
| |
− | /var/lib/docker/aufs/diff/120e218dd395ec314e7b6249f39d2853911b3d6def6ea164ae05722649f34b16
| |
− | /var/lib/docker/aufs/diff/42eed7f1bf2ac3f1610c5e616d2ab1ee9c7290234240388d6297bc0f32c34229
| |
− | /var/lib/docker/aufs/diff/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158
| |
− | none
| |
− | /var/lib/docker/aufs/mnt/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf
| |
− | | |
− | criu restore -v4 -D /var/lib/docker/criu_img/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf \
| |
− | -o restore.log --manage-cgroups --evasive-devices \
| |
− | --ext-mount-map /etc/resolv.conf:/var/lib/docker/containers/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf/resolv.conf \
| |
− | --ext-mount-map /etc/hosts:/var/lib/docker/containers/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf/hosts \
| |
− | --ext-mount-map /etc/hostname:/var/lib/docker/containers/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf/hostname \
| |
− | --ext-mount-map /.dockerinit:/var/lib/docker/init/dockerinit-1.0.0 \
| |
− | -d --root /var/lib/docker/aufs/mnt/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf \
| |
− | --pidfile /var/lib/docker/criu_img/40d363f564e00a2f893579fa012a200e475dcf8df47f2a22b7dd0860ffc3d7bf/restore.pid
| |
− | | |
− | restore successful
| |
− | (00.408807) Restore finished successfully. Resuming tasks.
| |
− | | |
− | root 6206 1 1 10:49 ? 00:00:00 /bin/sh -c i=0; while true; do echo $i >> /foo; i=$(expr $i + 1); sleep 3; done
| |
− | </pre>
| |
− | | |
− | | |
− | [[Category:HOWTO]]
| |