Changes

Jump to navigation Jump to search
1,392 bytes added ,  22:51, 15 July 2015
no edit summary
Line 1: Line 1:  
This HOWTO page describes how to checkpoint and restore a Docker container.
 
This HOWTO page describes how to checkpoint and restore a Docker container.
   −
== Introduction ==
+
{{Note| This page was originally written a few months ago.  Since then, interfacing with CRIU has been added to Docker's native exec driver (libcontainer) and pull requests to add checkpoint/restore functionality to Docker have been submitted.  If you just want to experiment with C/R, you can use one of the following Docker versions for your C/R experiments:
 +
 
 +
Docker 1.5 [https://github.com/SaiedKazemi/docker/wiki]
 +
Docker 1.7 [https://github.com/boucher/docker/tree/cr-combined]}}
 +
 
 +
{{OverlayFS| The OverlayFS filesystem was merged into the upstream Linux kernel 3.18 and is now Docker's preferred filesystem (instead of AUFS).  However, there is a bug in OverlayFS that reports the wrong mnt_id in /proc/<pid>/fdinfo/<fd> and the wrong symlink target path for /proc/<pid>/<fd>.  Fortunately, these bugs have been fixed in the kernel v4.2-rc2.  See below for instructions on how to apply the relevant patches.}}
   −
{{Note| This page was originally written a few months ago. Since then, interfacing with CRIU has been added to Docker's native exec driver (libcontainer) and pull requests to add checkpoint/restore functionality to Docker have been submittedYou can use one of the following Docker versions for your C/R experiments:
+
{{Async IO| If your process uses async IO and your kernel is older than 3.19, you need to apply two patchesSee below for instructions.}}
   −
Docker 1.5 [https://github.com/SaiedKazemi/docker/wiki]
+
== Introduction ==
Docker 1.7 [https://github.com/boucher/docker/tree/cr-combined]
  −
}}
      
There are two ways to checkpoint and restore a Docker container:
 
There are two ways to checkpoint and restore a Docker container:
   −
'''External C/R''' using CRIU directly on the command line as it's typically
+
'''1. External C/R''' using CRIU directly on the command line as it's typically
done.
+
done for any process tree.
   −
This is called external because it's happening external to the
+
This approach is called external because it's happening external to the
 
Docker daemon.  After checkpoint, the Docker daemon thinks that the
 
Docker daemon.  After checkpoint, the Docker daemon thinks that the
 
container has exited.  After restore, the Docker daemon doesn't know that
 
container has exited.  After restore, the Docker daemon doesn't know that
Line 21: Line 24:  
will not work correctly.
 
will not work correctly.
   −
{{Note| External C/R was done as proof-of-concept.  Its use is discouraged and the helper script mentioned below will be deprecated in the near future.}}
+
'''2. Native C/R''' using new <code>docker checkpoint</code> and
 
  −
'''Native C/R''' using the newly added <code>docker checkpoint</code> and
   
<code>docker restore</code> commands.
 
<code>docker restore</code> commands.
   −
Because the Docker daemon is involved in both checkpoint and restore,
+
This approach is called native because the Docker daemon is involved in both checkpoint and restore.
its notion of the container state will be consistent and all commands such as
+
Therefore, its notion of the container state will be corrent.  All commands such as
 
<code>docker ps, stop, kill </code> and <code>logs</code> will work.
 
<code>docker ps, stop, kill </code> and <code>logs</code> will work.
 
This is obviously the preferred method of checkpointing and restoring Docker containers.
 
This is obviously the preferred method of checkpointing and restoring Docker containers.
   −
Native C/R is work in progress, say pre-alpha quality. You can
+
Native C/R is work in progress, say pre-alpha quality.
watch this short demo
+
You can watch this short demo
 
'''[https://www.youtube.com/watch?v=HFt9v6yqsXo video]'''
 
'''[https://www.youtube.com/watch?v=HFt9v6yqsXo video]'''
to see how it works. Source files for Docker 1.5 C/R are at this
+
to see how it works.
'''[https://github.com/SaiedKazemi/docker/tree/cr repo]'''.
+
Source files for Docker 1.5 C/R are
 +
'''[https://github.com/SaiedKazemi/docker/tree/cr here]'''
 +
and for Docker 1.7 C/R are
 +
'''https://github.com/boucher/docker/tree/cr-combined here]'''.
 
The '''[https://github.com/SaiedKazemi/docker/wiki wiki]'''
 
The '''[https://github.com/SaiedKazemi/docker/wiki wiki]'''
 
page provides an overview of the project history.
 
page provides an overview of the project history.
Work in underway to integrate C/R into the new <code>libcontainer</code>.
     −
For native C/R support, additional functionality was added to CRIU.
+
== OverlayFS ==
The most notable addition is the <code>--inherit-fd</code> command line option.
+
 
 +
The following small kernel patches fix the mount id and symlink target path issues noted above:
 +
 
 +
* {{torvalds.git|155e35d4da}} by David Howells
 +
* {{torvalds.git|df1a085af1}} by David Howells
 +
* {{torvalds.git|f25801ee46}} by David Howells
 +
* {{torvalds.git|4bacc9c923}} by David Howells
 +
* {{torvalds.git|9391dd00d1}} by Al Viro
 +
 
 +
Assuming that you are running Ubuntu Vivid (Linux kernel 3.19), here is how you can patch your kernel:
 +
 
 +
<pre>
 +
git clone  git://kernel.ubuntu.com/ubuntu/ubuntu-vivid.git
 +
cd ubuntu-vivid
 +
git remote add torvalds  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
 +
git remote update
 +
 
 +
git cherry-pick 155e35d4da
 +
git cherry-pick df1a085af1
 +
git cherry-pick f25801ee46
 +
git cherry-pick 4bacc9c923
 +
git cherry-pick 9391dd00d1
 +
 
 +
cp /boot/config-$(uname -r) .config
 +
make olddefconfig
 +
make -j 8 bzImage modules
 +
sudo make install modules_install
 +
sudo reboot
 +
</pre>
 +
 
 +
== Async IO (AIO) ==
 +
 
 +
If you are using a kernel older than 3.19 and your container uses AIO, you need the following AIO kernel patches from 3.19:
 +
 
 +
* {{torvalds.git|bd9b51e79c}} by Al Viro
 +
* {{torvalds.git|e4a0d3e720}} by Pavel Emelyanov
    
== External C/R ==
 
== External C/R ==
 +
 +
{{Note| External C/R was done as proof-of-concept.  Its use is discouraged and the helper script mentioned below will be deprecated in the near future.}}
    
Starting with CRIU 1.3, it is possible to checkpoint and restore a
 
Starting with CRIU 1.3, it is possible to checkpoint and restore a
Line 104: Line 144:  
the global <code>/dev/null</code> and the container <code>/dev/null</code> as the same device.
 
the global <code>/dev/null</code> and the container <code>/dev/null</code> as the same device.
   −
== Restore Prework ==
+
=== <code>--inherit-fd</code> ===
 +
 
 +
For native C/R support, this option tells CRIU to let the restored process "inherit"
 +
its specified file descriptor (instead of restoring from checkpoint).
 +
 
 +
== Restore Prework for External C/R ==
    
As mentioned earlier, by default Docker uses AUFS to set up the
 
As mentioned earlier, by default Docker uses AUFS to set up the
Line 111: Line 156:  
to set up the filesystem again before attempting to restore.
 
to set up the filesystem again before attempting to restore.
   −
== An Example ==
+
== An External C/R Example ==
    
Below is an example to show C/R operations for a shell script that
 
Below is an example to show C/R operations for a shell script that
Line 186: Line 231:  
</pre>
 
</pre>
   −
== Helper Script ==
+
== External C/R Helper Script ==
    
As seen in the above examples, the CRIU command line for checkpointing and
 
As seen in the above examples, the CRIU command line for checkpointing and
Line 281: Line 326:  
root      6206    1  1 10:49 ?        00:00:00 /bin/sh -c i=0; while true; do echo $i >> /foo; i=$(expr $i + 1); sleep 3; done
 
root      6206    1  1 10:49 ?        00:00:00 /bin/sh -c i=0; while true; do echo $i >> /foo; i=$(expr $i + 1); sleep 3; done
 
</pre>
 
</pre>
  −
== OverlayFS ==
  −
  −
The OverlayFS filesystem was merged into the upstream Linux kernel 3.18 and is now Docker's preferred filesystem (instead of AUFS).  For successful C/R, however, you need to apply the following kernel patch:
  −
  −
OverlayFS patch [https://lkml.org/lkml/2015/3/20/372]
  −
  −
== Async IO (AIO) ==
  −
  −
If you are using a kernel older than 3.19 and your container uses AIO, you need the following AIO kernel patches from 3.19:
  −
  −
* {{torvalds.git|bd9b51e7}} by Al Viro
  −
* {{torvalds.git|e4a0d3e72}} by Pavel Emelyanov
  −
      
[[Category:HOWTO]]
 
[[Category:HOWTO]]
26

edits

Navigation menu