Live migration

Revision as of 09:42, 21 September 2016 by Xemul (talk | contribs) (→‎See also)
Note.svg Note: The main article about live-migration is here. The article below is just description of how it can be done.

The criu utility can be used to perform live migration of apps or containers. This page is a sort of HOWTO describing this.

Migration sequence

In order to live-migrate an application or a container you should make sure, that files, that are/can be accessed by processes you're migrating are available on both nodes -- source and destination. This can be achieved by using either shared file-system such as NFS, GlusterFS or CEPH, or by using rsync to copy files from one box to another. Further in this article we assume, that the file-system is the same on both sides.

In order to live migrate tasks you should do these steps:

Dump

Take tasks you're about to migrate and dump them into some place, asking criu to leave them in stopped state after dump:

[src]# criu dump --tree <pid> --images-dir <path-to-existing-directory> --leave-stopped

The directory you put images to can reside on the shared file-system if you're using one. In this case you can skip the Copy step and proceed to Restore.

Copy

Copy images to destination node:

[src]# scp -r <path-to-images-dir> <dst>:/<path-to-images>

Restore

Go to the destination node and restore the apps from images on it:

[dst]# criu restore --tree <pid> --images-dir <path-to-images>

Kill

If everything went OK you can return on the source node and kill stopped tasks on it.

[src]# FIXME put command here

Notes

  • The directories with images would contain two copies of applications memory, which may be space-consuming. The CRIU can perform disk-less migration to address this.
  • Another issue with this way of doing live migration is that while copying memory on remote host tasks remain frozen. If there's a LOT of memory, this freeze time can be big. CRIU can speed this up by doing iterative migration.
  • If you're live migrating a shell job, remember that --shell-job option must be used on both stages -- dump and restore. See more details about shell jobs here.

See also