Disk-less migration

Revision as of 12:36, 13 January 2019 by Radostin (talk | contribs) (→‎criu restore: Using -t with criu restore is obsoleted)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

When performing live migration, CRIU puts image files with applications' memory on a storage user provides. If the images with applications' memory are too big, this will result in big delays, due to the need to copy this data several times. Other than this, in some situations it would be desirable to avoid using the storage at all not to increase the load on it. This article describes how one can do live migration without putting images on disk, step by step.

The processEdit

PreparationEdit

Prepare a tmpfs mount on both sides, where you will put images other than those with apps memory. These images are typically very small and will not create significant memory pressure on nodes.

dst# mount -t tmpfs none <dir>
src# mount -t tmpfs none <dir>

Run page serverEdit

Launch a page server on the destination node. The page server will accept pages from criu and will put them into the tmpfs mount. Since we're about to run the apps on the destination node, it will have to bear with this memory consumption. The source node will not have to store these images.

dst# criu page-server --images-dir <dir> --port <port>

Now, page server will wait for incoming connections to write the applications' memory to the <dir>. When doing iterative migration, you can make page server to automatically drop duplicated pages by using --auto-dedup option. See the incremental dumps article for details.

criu dumpEdit

Dump the applications, just like it would have been done when doing live migration, but with options explaining to criu where the page server is:

src# criu dump --tree <pid> --images-dir <dir> --leave-stopped --page-server --address <dst> --port <port>

Copy imagesEdit

Copy the rest of images onto the destination node:

src# scp -r <dir> <dst-node>/<dir>

As mentioned before, CRIU images being copied here (everything but the process' memory) are relatively small so it would not take long.

criu restoreEdit

Restore the applications. By now, the page server should have been stopped (check this by its return code), and images with pages are already in the <dir>.

dst# criu restore --images-dir <dir>

Cleanup tmpfsEdit

Kill the tmpfs mount with old images. It's no longer required.

dst# umount <dir>

Kill processes on sourceEdit

Kill apps on the source node, as they are already running on the destination.

src# FIXME

See alsoEdit