|
|
(13 intermediate revisions by 5 users not shown) |
Line 1: |
Line 1: |
− | This article describes how to perform checkpoint-restore for an LXC container.
| + | == Requirements == |
| | | |
− | == Preparing a Linux Container ==
| + | You should have built and installed a recent (>= 1.3.1) version of CRIU. |
| | | |
− | === Requirements === | + | == Checkpointing and restoring a container == |
| | | |
− | * A console should be disabled (<code>lxc.console = none</code>)
| + | LXC upstream has begun to integrate checkpoint/restore support through the lxc-checkpoint tool. This functionality has been in the recent released version of LXC---LXC 1.1.0 , you can install the LXC 1.1.0 or you can check out the development version on Ubuntu by doing: |
− | * udev should not run inside containers (<code>mv /sbin/udevd{,.bcp}</code>)
| + | <source lang="bash"> |
| + | sudo add-apt-repository ppa:ubuntu-lxc/daily |
| + | sudo apt-get update |
| + | sudo apt-get install lxc |
| + | </source> |
| | | |
− | === Preparing a host environment ===
| + | Next, create a container: |
| | | |
− | * Mount cgroupfs
| + | <source lang="bash"> |
− | $ mount -t cgroup c /cgroup | + | sudo lxc-create -t ubuntu -n u1 -- -r trusty -a amd64 |
| + | </source> |
| | | |
− | * Create a network bridge
| + | And add the following lines (as above) to its config: |
− | # cat /etc/sysconfig/network-scripts/ifcfg-br0
| |
− | DEVICE=br0
| |
− | TYPE=Bridge
| |
− | BOOTPROTO=dhcp
| |
− | ONBOOT=yes
| |
− | DELAY=5
| |
− | NM_CONTROLLED=n
| |
− | $ cat /etc/sysconfig/network-scripts/ifcfg-eth0
| |
− | DEVICE="eth0"
| |
− | NM_CONTROLLED="no"
| |
− | ONBOOT="yes"
| |
− | BRIDGE=br0
| |
| | | |
− | === Create and start a container === | + | <source lang="bash"> |
− | * Download an OpenVZ template and extract it.
| + | cat | sudo tee -a /var/lib/lxc/u1/config << EOF |
− | <pre><nowiki>curl http://download.openvz.org/template/precreated/centos-6-x86_64.tar.gz | tar -xz -C test-lxc
| + | # hax for criu |
− | </nowiki></pre> | + | lxc.console.path = none |
− | * Create config files
| + | lxc.tty.max = 0 |
− | $ cat ~/test-lxc.conf
| + | lxc.cgroup.devices.deny = c 5:1 rwm |
− | lxc.console=none
| + | # on older lxc comment the above and uncomment the below |
− | lxc.utsname = test-lxc
| + | # lxc.console = none |
− | lxc.network.type = veth
| + | # lxc.tty = 0 |
− | lxc.network.flags = up
| + | # lxc.cgroup.devices.deny = c 5:1 rwm |
− | lxc.network.link = br0
| + | EOF |
− | lxc.network.name = eth0
| + | </source> |
− | lxc.mount = /root/test-lxc/etc/fstab
| |
− | lxc.rootfs = /root/test-lxc-root/
| |
− | lxc.console = none
| |
− | lxc.tty = 0
| |
| | | |
− | $ cat /root/test-lxc/etc/fstab
| + | Finally, start, and checkpoint the container: |
− | none /root/test-lxc-root/dev/pts devpts defaults 0 0
| |
− | none /root/test-lxc-root/proc proc defaults 0 0
| |
− | none /root/test-lxc-root/sys sysfs defaults 0 0
| |
− | none /root/test-lxc-root/dev/shm tmpfs defaults 0 0
| |
| | | |
− | * Register the container
| + | <source lang="bash"> |
− | $ lxc-create -n test-lxc -f test-lxc.conf
| + | sudo lxc-start -n u1 |
| + | sleep 5s # let the container get to a more interesting state |
| + | sudo lxc-checkpoint -s -D /tmp/checkpoint -n u1 |
| + | </source> |
| | | |
− | * Start the container
| + | At this point, the container's state is stored in /tmp/checkpoint, and the filesystem is in /var/lib/lxc/u1/rootfs. You can restore the container by doing: |
− | $ mount --bind test-lxc test-lxc-root/
| |
− | $ lxc-start -n test-lxc
| |
| | | |
− | == Checkpoint and restore an LXC Container == | + | <source lang="bash"> |
| + | sudo lxc-checkpoint -r -D /tmp/checkpoint -n u1 |
| + | </source> |
| | | |
− | === Preparations ===
| + | And then, get your container's IP and ssh in: |
| | | |
− | You not only need to [[Installation | install]] the criu, but also check that the iproute2 utility (<code>ip</code>) is v3.6.0 or higher.
| + | <source lang="bash"> |
| + | ssh ubuntu@$(sudo lxc-info -i -H -n u1) |
| + | </source> |
| | | |
− | You can clone the [http://git.kernel.org/?p=linux/kernel/git/shemminger/iproute2.git;a=summary git repo] or download the [http://kernel.org/pub/linux/utils/net/iproute2/ tarball] to compile it manually. In order to tell to criu where the proper ip tool is set the <code>CR_IP_TOOL</code> environment variable.
| + | == Troubleshooting == |
| + | |
| + | === Error (mount.c:805): fusectl isn't empty: 8388625 === |
| | | |
− | === Dump and restore ===
| + | Dumping of fuse filesystems is currently not supported. Empty the container's <code>/sys/fs/fuse/connections</code> and try again. |
− | Dumping and restoring an LXC contianer means -- dumping a subtree of processes starting from container init plus all kinds of namespaces. | |
− | Restoring is symmetrical. The way LXC container works imposes some more requirements on criu usage.
| |
| | | |
− | * In order to properly isolate container from unwanted networking communication during checkpoint/restore you should provide a script for locking/unlocking the container network (see below)
| + | === Error (mount.c:517): Mount 58 (master_id: 12 shared_id: 0) has unreachable sharing === |
− | * When restoring a container with veth device you may specify a name for the host-side veth device
| |
− | * In order to checkpoint and restore alive TCP connections you should use the <code>--tcp-established</code> option
| |
| | | |
− | Typically a container dump command will look like
| + | CRIU doesn't yet support shared mountpoints as LXC does; make sure your rootfs is on a non-shared mount. |
− | <pre>
| |
− | criu dump
| |
− | --tcp-established # allow for TCP connections dump
| |
− | -n net -n mnt -n ipc -n pid # dump all the namespaces container uses
| |
− | --action-script "net-script.sh" # use net-script.sh to lock/unlock networking
| |
− | -D dump/ -o dump.log # set images dir to dump/ and put logs into dump.log file
| |
− | -t ${init-pid} # start dumping from task ${init-pid}. It should be container's init
| |
− | </pre>
| |
− | and restore command like
| |
− | <pre>
| |
− | criu restore
| |
− | --tcp-established
| |
− | -n net -n mnt -n ipc -n pid
| |
− | --action-script "net-script.sh"
| |
− | --veth-pair eth0=${veth-name} # when restoring a veth link use ${veth-name} for host-side device end
| |
− | --root ${path} # path to container root. It should be a root of a (bind)mount
| |
− | -D data/ -o restore.log
| |
− | -t ${init-pid}
| |
− | </pre>
| |
| | | |
− | We also find it useful to use the <code>--restore-detached</code> option for restore to make contianer reparent to init rather than hanging on a criu process launched from shell. Another useful option is the <code>--pidfile</code> one -- you will be able to find out the host-side pid of a container init after restore.
| + | == External links == |
| | | |
− | Also note, that there's a BUG in how LXC prepares the /dev filesystem for a container which sometimes makes it impossible to dump and container. The <code>--evasive-devices</code> option can help.
| + | * [https://www.youtube.com/watch?v=a9T2gcnQg2k&feature=youtu.be&t=18m8s The New New Thing: Turning Docker Tech into a Full Speed Hypervisor] - Talk of Tycho Andersen with demo of migration LXC container with Doom inside |
− | | + | * [https://github.com/tych0/presentations/blob/master/ods2014.md Demo script] |
− | More details on the option mentioned can be found in [[Usage]] and [[Advanced usage]] pages.
| |
− | | |
− | === Example ===
| |
− | We have [http://git.criu.org/?p=crtools.git;a=tree;f=test/app-emu/lxc;hb=HEAD an application test] for dumping/restoring an LXC Container. You may look at it for better understanding how to dump and restore your container with criu.
| |
− | | |
− | This test contains two scripts:
| |
− | ;[http://git.criu.org/?p=crtools.git;a=blob;f=test/app-emu/lxc/run.sh;hb=HEAD run.sh]
| |
− | :This is the main script, which executes ''criu'' two times for dumping and restoring CT. It contains a working commands for dumping and restoring a container.
| |
− | | |
− | ;[http://git.criu.org/?p=crtools.git;a=blob;f=test/app-emu/lxc/network-script.sh;hb=HEAD network-script.sh]
| |
− | : This one is used to lock and unlock CT's network as described above.
| |
− | | |
− | ===FAQ===
| |
− | * CRIU supports restricted number of file systems: proc, sysfs, devtmpfs, tmpfs, binfmt_misc. All unsupported file systems must be umounted or handled by plugins.
| |
− | Error (mount.c:737): FS mnt /sys/fs/pstore dev 0x18 root / unsupported
| |
− | | |
− | * /dev/console isn't supported yet. You can try to remove the "lxc.cgroup.devices.allow = c 5:1 rwm" from a CT config.
| |
− | Error (tty.c:203): tty: Can't obtain ptmx index: Inappropriate ioctl for device
| |
− | | |
− | * Netlink sockets with pending messages are not supported. You can try again later.
| |
− | Error (sk-netlink.c:77): The socket has data to read
| |
| | | |
| [[Category: HOWTO]] | | [[Category: HOWTO]] |
| + | [[Category: Live migration]] |
Requirements[edit]
You should have built and installed a recent (>= 1.3.1) version of CRIU.
Checkpointing and restoring a container[edit]
LXC upstream has begun to integrate checkpoint/restore support through the lxc-checkpoint tool. This functionality has been in the recent released version of LXC---LXC 1.1.0 , you can install the LXC 1.1.0 or you can check out the development version on Ubuntu by doing:
sudo add-apt-repository ppa:ubuntu-lxc/daily
sudo apt-get update
sudo apt-get install lxc
Next, create a container:
sudo lxc-create -t ubuntu -n u1 -- -r trusty -a amd64
And add the following lines (as above) to its config:
cat | sudo tee -a /var/lib/lxc/u1/config << EOF
# hax for criu
lxc.console.path = none
lxc.tty.max = 0
lxc.cgroup.devices.deny = c 5:1 rwm
# on older lxc comment the above and uncomment the below
# lxc.console = none
# lxc.tty = 0
# lxc.cgroup.devices.deny = c 5:1 rwm
EOF
Finally, start, and checkpoint the container:
sudo lxc-start -n u1
sleep 5s # let the container get to a more interesting state
sudo lxc-checkpoint -s -D /tmp/checkpoint -n u1
At this point, the container's state is stored in /tmp/checkpoint, and the filesystem is in /var/lib/lxc/u1/rootfs. You can restore the container by doing:
sudo lxc-checkpoint -r -D /tmp/checkpoint -n u1
And then, get your container's IP and ssh in:
ssh ubuntu@$(sudo lxc-info -i -H -n u1)
Troubleshooting[edit]
Error (mount.c:805): fusectl isn't empty: 8388625[edit]
Dumping of fuse filesystems is currently not supported. Empty the container's /sys/fs/fuse/connections
and try again.
Error (mount.c:517): Mount 58 (master_id: 12 shared_id: 0) has unreachable sharing[edit]
CRIU doesn't yet support shared mountpoints as LXC does; make sure your rootfs is on a non-shared mount.
External links[edit]