Difference between revisions of "Live migration"
m |
(Added note about shell job) |
||
Line 33: | Line 33: | ||
== Notes == | == Notes == | ||
− | The directories with images would contain two copies of applications memory, which may be space-consuming. The CRIU can perform [[disk-less migration]] to address this. | + | * The directories with images would contain two copies of applications memory, which may be space-consuming. The CRIU can perform [[disk-less migration]] to address this. |
− | Another issue with this way of doing live migration is that while copying memory on remote host tasks remain frozen. If there's a LOT of memory, this freeze time can be big. CRIU can speed this up by doing [[iterative migration]]. | + | * Another issue with this way of doing live migration is that while copying memory on remote host tasks remain frozen. If there's a LOT of memory, this freeze time can be big. CRIU can speed this up by doing [[iterative migration]]. |
+ | |||
+ | * If you're live migrating a shell job, remember that <code>--shell-job</code> option must be used on both stages -- dump and restore. See more details about shell jobs [[Advanced usage|here]]. | ||
[[Category: HOWTO]] | [[Category: HOWTO]] |
Revision as of 20:45, 19 February 2015
Note: The main article about live-migration is here. The article below is just description of how it can be done. |
The criu
utility can be used to perform live migration of apps or containers. This page is a sort of HOWTO describing this.
Migration sequence
In order to live-migrate an application or a container you should make sure, that files, that are/can be accessed by processes you're migrating are available on both nodes -- source and destination. This can be achieved by using either shared file-system such as NFS, GlusterFS or CEPH, or by using rsync
to copy files from one box to another. Further in this article we assume, that the file-system is the same on both sides.
In order to live migrate tasks you should do these steps:
Dump
Take tasks you're about to migrate and dump them into some place, asking criu
to leave them in stopped state after dump:
[src]# criu dump --tree <pid> --images-dir <path-to-existing-directory> --leave-stopped
The directory you put images to can reside on the shared file-system if you're using one. In this case you can skip the Copy step and proceed to Restore.
Copy
Copy images to destination node:
[src]# scp -r <path-to-images-dir> <dst>:/<path-to-images>
Restore
Go to the destination node and restore the apps from images on it:
[dst]# criu restore --tree <pid> --images-dir <path-to-images>
Kill
If everything went OK you can return on the source node and kill stopped tasks on it.
[src]# FIXME put command here
Notes
- The directories with images would contain two copies of applications memory, which may be space-consuming. The CRIU can perform disk-less migration to address this.
- Another issue with this way of doing live migration is that while copying memory on remote host tasks remain frozen. If there's a LOT of memory, this freeze time can be big. CRIU can speed this up by doing iterative migration.
- If you're live migrating a shell job, remember that
--shell-job
option must be used on both stages -- dump and restore. See more details about shell jobs here.