Difference between revisions of "Advanced usage"

From CRIU
Jump to navigation Jump to search
(→‎Restoring a veth pair of devices: use italic for variables)
Line 4: Line 4:
  
 
Sometimes an application opens a device file and then somehow the path by which it was opened becomes inaccessible (e.g. overmounted or unlinked). In that case criu cannot easily dump and restore such a process. If you consider that the path doesn't really matter when dumping your apps state you can tell criu that a device file can be opened by '''any''' name, even if the original one is no longer accessible. The option for that is <code>--evasive-devices</code>
 
Sometimes an application opens a device file and then somehow the path by which it was opened becomes inaccessible (e.g. overmounted or unlinked). In that case criu cannot easily dump and restore such a process. If you consider that the path doesn't really matter when dumping your apps state you can tell criu that a device file can be opened by '''any''' name, even if the original one is no longer accessible. The option for that is <code>--evasive-devices</code>
 +
 +
''See also [[CLI/opt/--evasive-devices]]''
  
 
== External UNIX sockets ==
 
== External UNIX sockets ==
Line 9: Line 11:
 
Consider an application opens a datagram UNIX socket and connects it to some address. If you will try to dump such app and the server socket for some reason will '''not''' be taken in the dumped state (e.g. -- a task holding it is not dumped) the dump will fail. You can override this behavior by allowing criu to disconnect the client after dump and re-connecting it back on restore by the server socket path using the <code>--ext-unix-sk</code> option.
 
Consider an application opens a datagram UNIX socket and connects it to some address. If you will try to dump such app and the server socket for some reason will '''not''' be taken in the dumped state (e.g. -- a task holding it is not dumped) the dump will fail. You can override this behavior by allowing criu to disconnect the client after dump and re-connecting it back on restore by the server socket path using the <code>--ext-unix-sk</code> option.
  
For more details, see [[external UNIX socket]].
+
''See also [[CLI/opt/--ext-unix-sk]]''
  
 
== Link unlinked files back ==
 
== Link unlinked files back ==
  
 
When an app opens a file and then unlinks it we take this file with us into image. However, we do so only if the file is ''really'' unlinked, i.e. its n_link is zero. Otherwise we cannot take file with us and should somehow open the inode. To do so we should create a temporary hardlink on it during dump, open it on restore and unlink again. The <code>--link-remap</code> option allows criu to create this temporary hard links on FS.
 
When an app opens a file and then unlinks it we take this file with us into image. However, we do so only if the file is ''really'' unlinked, i.e. its n_link is zero. Otherwise we cannot take file with us and should somehow open the inode. To do so we should create a temporary hardlink on it during dump, open it on restore and unlink again. The <code>--link-remap</code> option allows criu to create this temporary hard links on FS.
 +
 +
''See also [[CLI/opt/--link-remap]]''
  
 
== TCP connections ==
 
== TCP connections ==
  
 
When dumping and restoring an application having an opened TCP connection you should use the <code>--tcp-established</code> option. When this option is in use criu will leave the connection(s) locked after dump and will require it(them) to be still locked before restore. Article [[Simple TCP pair]] describes how to play with this option.
 
When dumping and restoring an application having an opened TCP connection you should use the <code>--tcp-established</code> option. When this option is in use criu will leave the connection(s) locked after dump and will require it(them) to be still locked before restore. Article [[Simple TCP pair]] describes how to play with this option.
 +
 +
''See also [[CLI/opt/--tcp-established]]''
  
 
== Per-task logging on restore ==
 
== Per-task logging on restore ==
  
 
By default criu puts all logs into one log file specified by <code>-o/--log-file</code> option. If you want to split restoration logs on per-pid basis you can use the <code>--log-pid</code> option. The <code>${pid}</code> task's logs will appear in a <code>${log-file}.pid</code>
 
By default criu puts all logs into one log file specified by <code>-o/--log-file</code> option. If you want to split restoration logs on per-pid basis you can use the <code>--log-pid</code> option. The <code>${pid}</code> task's logs will appear in a <code>${log-file}.pid</code>
 +
 +
''See also [[CLI/opt/--log-pid]]''
  
 
== Knowing a new pid of restored task ==
 
== Knowing a new pid of restored task ==
  
 
When you restore a process tree that lives inside a PID namespace, the pid of a new task will be generated by the system (since the pids sitting in image files are treated as virtual, i.e. seen in namespace ones). To find this pid out you can use the <code>--pidfile <file></code> option to make criu put the new pid into a pidfile.
 
When you restore a process tree that lives inside a PID namespace, the pid of a new task will be generated by the system (since the pids sitting in image files are treated as virtual, i.e. seen in namespace ones). To find this pid out you can use the <code>--pidfile <file></code> option to make criu put the new pid into a pidfile.
 +
 +
''See also [[CLI/opt/--pidfile]]''
  
 
== Restoring a veth pair of devices ==
 
== Restoring a veth pair of devices ==
Line 31: Line 41:
 
When you restore a net namespace with a veth device end in it, criu will create the other end of the pair in the net namespace you launch criu from. By default it's name will be the one generated by veth kernel driver. Not to scan through all the net devices trying to fine one you can use the <code>--veth-pair ''ns-dev'':''host-dev''</code> option. When used a device named <code>''ns-dev''</code> in a newly restored namespace will be linked with the <code>''host-dev''</code> one in the criu namespace.
 
When you restore a net namespace with a veth device end in it, criu will create the other end of the pair in the net namespace you launch criu from. By default it's name will be the one generated by veth kernel driver. Not to scan through all the net devices trying to fine one you can use the <code>--veth-pair ''ns-dev'':''host-dev''</code> option. When used a device named <code>''ns-dev''</code> in a newly restored namespace will be linked with the <code>''host-dev''</code> one in the criu namespace.
  
''See also: [[VETH device]]''
+
''See also [[CLI/opt/--veth-pair]]''
  
 
== Action scripts ==
 
== Action scripts ==
  
 
The '''criu''' can call your hooks on various stages of dumping/restoring. These hooks are added with the <code>--action-script ''shell-code-to-execute''</code> option. When called, the <code>CRTOOLS_SCRIPT_ACTION</code> environment is set to a value determining which type of action is performed. See [[Action scripts]] for more details.
 
The '''criu''' can call your hooks on various stages of dumping/restoring. These hooks are added with the <code>--action-script ''shell-code-to-execute''</code> option. When called, the <code>CRTOOLS_SCRIPT_ACTION</code> environment is set to a value determining which type of action is performed. See [[Action scripts]] for more details.
 +
 +
''See also [[CLI/opt/--action-script]]''
  
 
== Shell jobs C/R ==
 
== Shell jobs C/R ==
Line 41: Line 53:
 
When you run some app directly from shell (e.g. -- just launch top) it inherits session ID from bash and uses pty whose other end sits somewhere outside this shell session (e.g. in sshd or xterm). Strictly speaking this app cannot be checkpointed (since session leader, which is bash, is left outside the image, and so is the pty master). Nor it can be restored easily. But sometimes it might make sense to migrate such app by moving it to other session and attaching to different pty slave "on the fly". For such cases the <code>--shell-job</code> option should be used on both stages -- dump and restore. On dump it will allow criu to ignore the "leaked" session leader and pty master, on restore it will tell criu to change session, group and attach to existing pty slave.
 
When you run some app directly from shell (e.g. -- just launch top) it inherits session ID from bash and uses pty whose other end sits somewhere outside this shell session (e.g. in sshd or xterm). Strictly speaking this app cannot be checkpointed (since session leader, which is bash, is left outside the image, and so is the pty master). Nor it can be restored easily. But sometimes it might make sense to migrate such app by moving it to other session and attaching to different pty slave "on the fly". For such cases the <code>--shell-job</code> option should be used on both stages -- dump and restore. On dump it will allow criu to ignore the "leaked" session leader and pty master, on restore it will tell criu to change session, group and attach to existing pty slave.
  
''See also: [[Simple loop]], [[Shell jobs]]
+
''See also [[CLI/opt/--shell-job]]''
  
 
== File locks C/R ==
 
== File locks C/R ==
Line 47: Line 59:
 
Some app may use file locks for synchronization. Generally they will use flock or posix file locks, which were achieved by flock or fcntl system calls. For dump/restore, it is hard to be handled perfectly, because we can't guarantee all potential users are dumped for a specific file lock. Right now, we assume that all file lock users are taken into dump, and we should use the --file-locks option on both dump and restore stages if our app may use any file locks. Remember that file locks dump/restore is only safe for container dumping in theory.
 
Some app may use file locks for synchronization. Generally they will use flock or posix file locks, which were achieved by flock or fcntl system calls. For dump/restore, it is hard to be handled perfectly, because we can't guarantee all potential users are dumped for a specific file lock. Right now, we assume that all file lock users are taken into dump, and we should use the --file-locks option on both dump and restore stages if our app may use any file locks. Remember that file locks dump/restore is only safe for container dumping in theory.
  
''See also: [[File locks]]''
+
''See also [[CLI/opt/--file-locks]]''
  
 
== Leave task running after checkpoint ==
 
== Leave task running after checkpoint ==
Line 59: Line 71:
 
* No established TCP connections are present
 
* No established TCP connections are present
 
* File system get snapshoted with <code>post-dump</code> script (and then manually restored before criu restore takes place)
 
* File system get snapshoted with <code>post-dump</code> script (and then manually restored before criu restore takes place)
 +
 +
''See also [[CLI/opt/--leave-running]]''
  
 
== [[External bind mounts]] ==
 
== [[External bind mounts]] ==
  
 
When dumping LXC or Docker container you will likely meet those. The <code>--ext-mount-map</code> option is to handle ones.
 
When dumping LXC or Docker container you will likely meet those. The <code>--ext-mount-map</code> option is to handle ones.
 +
 +
''See also [[CLI/opt/--ext-mount-map]]''
  
 
[[Category:API]]
 
[[Category:API]]

Revision as of 07:04, 21 September 2016

This page describes some less-than-obvious options that can be used with criu.

Evasive devices

Sometimes an application opens a device file and then somehow the path by which it was opened becomes inaccessible (e.g. overmounted or unlinked). In that case criu cannot easily dump and restore such a process. If you consider that the path doesn't really matter when dumping your apps state you can tell criu that a device file can be opened by any name, even if the original one is no longer accessible. The option for that is --evasive-devices

See also CLI/opt/--evasive-devices

External UNIX sockets

Consider an application opens a datagram UNIX socket and connects it to some address. If you will try to dump such app and the server socket for some reason will not be taken in the dumped state (e.g. -- a task holding it is not dumped) the dump will fail. You can override this behavior by allowing criu to disconnect the client after dump and re-connecting it back on restore by the server socket path using the --ext-unix-sk option.

See also CLI/opt/--ext-unix-sk

Link unlinked files back

When an app opens a file and then unlinks it we take this file with us into image. However, we do so only if the file is really unlinked, i.e. its n_link is zero. Otherwise we cannot take file with us and should somehow open the inode. To do so we should create a temporary hardlink on it during dump, open it on restore and unlink again. The --link-remap option allows criu to create this temporary hard links on FS.

See also CLI/opt/--link-remap

TCP connections

When dumping and restoring an application having an opened TCP connection you should use the --tcp-established option. When this option is in use criu will leave the connection(s) locked after dump and will require it(them) to be still locked before restore. Article Simple TCP pair describes how to play with this option.

See also CLI/opt/--tcp-established

Per-task logging on restore

By default criu puts all logs into one log file specified by -o/--log-file option. If you want to split restoration logs on per-pid basis you can use the --log-pid option. The ${pid} task's logs will appear in a ${log-file}.pid

See also CLI/opt/--log-pid

Knowing a new pid of restored task

When you restore a process tree that lives inside a PID namespace, the pid of a new task will be generated by the system (since the pids sitting in image files are treated as virtual, i.e. seen in namespace ones). To find this pid out you can use the --pidfile <file> option to make criu put the new pid into a pidfile.

See also CLI/opt/--pidfile

Restoring a veth pair of devices

When you restore a net namespace with a veth device end in it, criu will create the other end of the pair in the net namespace you launch criu from. By default it's name will be the one generated by veth kernel driver. Not to scan through all the net devices trying to fine one you can use the --veth-pair ns-dev:host-dev option. When used a device named ns-dev in a newly restored namespace will be linked with the host-dev one in the criu namespace.

See also CLI/opt/--veth-pair

Action scripts

The criu can call your hooks on various stages of dumping/restoring. These hooks are added with the --action-script shell-code-to-execute option. When called, the CRTOOLS_SCRIPT_ACTION environment is set to a value determining which type of action is performed. See Action scripts for more details.

See also CLI/opt/--action-script

Shell jobs C/R

When you run some app directly from shell (e.g. -- just launch top) it inherits session ID from bash and uses pty whose other end sits somewhere outside this shell session (e.g. in sshd or xterm). Strictly speaking this app cannot be checkpointed (since session leader, which is bash, is left outside the image, and so is the pty master). Nor it can be restored easily. But sometimes it might make sense to migrate such app by moving it to other session and attaching to different pty slave "on the fly". For such cases the --shell-job option should be used on both stages -- dump and restore. On dump it will allow criu to ignore the "leaked" session leader and pty master, on restore it will tell criu to change session, group and attach to existing pty slave.

See also CLI/opt/--shell-job

File locks C/R

Some app may use file locks for synchronization. Generally they will use flock or posix file locks, which were achieved by flock or fcntl system calls. For dump/restore, it is hard to be handled perfectly, because we can't guarantee all potential users are dumped for a specific file lock. Right now, we assume that all file lock users are taken into dump, and we should use the --file-locks option on both dump and restore stages if our app may use any file locks. Remember that file locks dump/restore is only safe for container dumping in theory.

See also CLI/opt/--file-locks

Leave task running after checkpoint

In some scenarios a user might want to leave program in a running state once checkpoint complete. For this sake criu has --leave-running command line option. We strongly encourage NOT to use it, until you understand what you are doing. Leaving task in a running state may lead to inconsistency between task images and external resources such as opened files, TCP connections.

Imagine a task been dumped and then continue execution, for some reason the task close and delete a file which has been checkpointed — any attempt to restore such task will simply fail because criu wont be able to open deleted file. Situation become more drastic if TCP connections are checkpointed — during execution the internal state of connections is changed inside kernel and restore then fail as well.

Still, leaving task running might be helpful but it is up to a user to make sure

  • No established TCP connections are present
  • File system get snapshoted with post-dump script (and then manually restored before criu restore takes place)

See also CLI/opt/--leave-running

External bind mounts

When dumping LXC or Docker container you will likely meet those. The --ext-mount-map option is to handle ones.

See also CLI/opt/--ext-mount-map