Difference between revisions of "What cannot be checkpointed"

From CRIU
Jump to navigation Jump to search
(More info about devices)
 
(18 intermediate revisions by 5 users not shown)
Line 1: Line 1:
CRIU cannot dump ''all'' the possible states applications can live in. This article summarizes our experience in that area.
+
This article describes what application can do to make CRIU refuse to dump it, summarizing our experience in the area.
 +
Note that there is no "What cannot be restored" article, and never will be. Is something was dumped, it should be restored.
  
;External resources
+
Some things cannot be dumped at all, some require special option to be used.
 +
 
 +
== Dumped with special option ==
 +
 
 +
=== External resources ===
  
 
By default CRIU allows to dump the set of processes and their resources if this set has no connections outside. However, in some situations it makes sense to ignore this external connection on dump and recreate one on restore. So, what this "connection outside" is? For example:
 
By default CRIU allows to dump the set of processes and their resources if this set has no connections outside. However, in some situations it makes sense to ignore this external connection on dump and recreate one on restore. So, what this "connection outside" is? For example:
  
* UNIX socket: Application may have a UNIX socket connected to some other app and the latter one is not dumped. This is called [[external UNIX socket]].
+
''Main article: [[External resources]]''
* TTY, group and session:  If you start a program from shell, the tty, process group and session of the new program can be shared with the shell itself. People often meet this when they try CRIU for the first tome on a [[simple loop]].
 
* [[TCP connection]]: This socket is literally an external connection, so CRIU should be explicitly allowed to dump one.
 
  
;File locks
+
=== File locks ===
  
 
A file lock is an object, that belongs to some filesystem. On dump it's impossible to find out whether this lock help by one task ''can'' be used by some other. Thus, CRIU doesn't dump tasks with held locks. The <code>--file-locks</code> CLI option tells CRIU to dump the lock.
 
A file lock is an object, that belongs to some filesystem. On dump it's impossible to find out whether this lock help by one task ''can'' be used by some other. Thus, CRIU doesn't dump tasks with held locks. The <code>--file-locks</code> CLI option tells CRIU to dump the lock.
  
;Devices
+
''Main article: [[File locks]]''
 +
 
 +
=== Invisible files ===
 +
 
 +
Sometimes a file name cannot be found in a filesystem. In this case criu can leave a temporary name for it.
 +
 
 +
''Main article: [[Invisible files]]''
 +
 
 +
== Cannot be dumped (yet) ==
 +
 
 +
=== Devices ===
  
 
If a task has opened or mapped any character or block device, this typically means, it wants some connection to the hardware. In this case dump (and restore) is impossible. The exception is virtual devices like null, zero, etc. and TUN network device (used by OpenVPN).
 
If a task has opened or mapped any character or block device, this typically means, it wants some connection to the hardware. In this case dump (and restore) is impossible. The exception is virtual devices like null, zero, etc. and TUN network device (used by OpenVPN).
Line 19: Line 32:
 
The explanation why device file cannot be dumped and restored in a generic way is two fold. First of all, application can be dumped for live migration and on restore there can be no such device. But that's easy. Other than this, we don't know how all the devices work. App might have loaded some state into it, and in order to dump it properly we need to fetch that state. This is not something that can be done in a generic manner.
 
The explanation why device file cannot be dumped and restored in a generic way is two fold. First of all, application can be dumped for live migration and on restore there can be no such device. But that's easy. Other than this, we don't know how all the devices work. App might have loaded some state into it, and in order to dump it properly we need to fetch that state. This is not something that can be done in a generic manner.
  
;Tasks with debugger attached
+
=== Open files from unmounted filesystem ===
 +
 
 +
When a busy filesystem is "lazy" unmounted any references to it are cleaned up as soon as the filesystem is not busy anymore.
 +
If a process has an open file from a lazy-umounted filesystem, CRIU just can't checkpoint/restore this process, unless the file is closed.
 +
 
 +
''See also: [[Dumping files]]''
 +
 
 +
=== Tasks with debugger attached ===
  
 
CRIU uses the same API as debuggers do to get some tasks' state and this API (the ptrace one) doesn't allow for multiple debuggers to explore a task. Thus tasks under gdb or strace cannot be dumped.
 
CRIU uses the same API as debuggers do to get some tasks' state and this API (the ptrace one) doesn't allow for multiple debuggers to explore a task. Thus tasks under gdb or strace cannot be dumped.
  
;Task from different user (for non root)
+
=== Task from a different user (for non-root) ===
  
 
For security reasons, if CRIU is requested by non-root to dump some other task, it doesn't do it unless the dumpee belongs to the same user.
 
For security reasons, if CRIU is requested by non-root to dump some other task, it doesn't do it unless the dumpee belongs to the same user.
  
;UNIX sockets with relative path
+
''See also: [[User-mode]]''
 +
 
 +
=== Sockets other than TCP, UDP, UNIX, packet and netlink ===
 +
 
 +
=== Packetized pipes ===
 +
 
 +
These are pipes created with <code>O_DIRECT</code>
 +
 
 +
=== Cork-ed UDP sockets ===
  
For bound unix sockets kernel only provides the path with this the bind() syscall was called. If the path is relative we have no reliable way to find the exact socket location out and just refuse to dump the task.
+
=== Files sent over unix sockets ===
 +
{{Bug|251}}
  
;TimerFD
+
=== SysVIPC memory segment w/o IPC namespace ===
  
;Sockets other than TCP, UDP, UNIX, packet and netlink
+
IPC objects are not tied to any tasks. Thus once CRIU meets an IPC memory attached to a task, it requires the whole IPC namespace to be dumped as well.
  
;Packetized pipes (created with <code>O_DIRECT</code>)
+
== Dump/restore of graphical applications ==
 +
Dumping + restoring an application connected to a "real" Xserver (e.g. on your laptop) is impossible now due to part of the app's state is in the Xserver and we don't dump this.
  
;Cork-ed UDP sockets
+
[[Category:Using]]

Latest revision as of 10:02, 25 February 2022

This article describes what application can do to make CRIU refuse to dump it, summarizing our experience in the area. Note that there is no "What cannot be restored" article, and never will be. Is something was dumped, it should be restored.

Some things cannot be dumped at all, some require special option to be used.

Dumped with special option[edit]

External resources[edit]

By default CRIU allows to dump the set of processes and their resources if this set has no connections outside. However, in some situations it makes sense to ignore this external connection on dump and recreate one on restore. So, what this "connection outside" is? For example:

Main article: External resources

File locks[edit]

A file lock is an object, that belongs to some filesystem. On dump it's impossible to find out whether this lock help by one task can be used by some other. Thus, CRIU doesn't dump tasks with held locks. The --file-locks CLI option tells CRIU to dump the lock.

Main article: File locks

Invisible files[edit]

Sometimes a file name cannot be found in a filesystem. In this case criu can leave a temporary name for it.

Main article: Invisible files

Cannot be dumped (yet)[edit]

Devices[edit]

If a task has opened or mapped any character or block device, this typically means, it wants some connection to the hardware. In this case dump (and restore) is impossible. The exception is virtual devices like null, zero, etc. and TUN network device (used by OpenVPN).

The explanation why device file cannot be dumped and restored in a generic way is two fold. First of all, application can be dumped for live migration and on restore there can be no such device. But that's easy. Other than this, we don't know how all the devices work. App might have loaded some state into it, and in order to dump it properly we need to fetch that state. This is not something that can be done in a generic manner.

Open files from unmounted filesystem[edit]

When a busy filesystem is "lazy" unmounted any references to it are cleaned up as soon as the filesystem is not busy anymore. If a process has an open file from a lazy-umounted filesystem, CRIU just can't checkpoint/restore this process, unless the file is closed.

See also: Dumping files

Tasks with debugger attached[edit]

CRIU uses the same API as debuggers do to get some tasks' state and this API (the ptrace one) doesn't allow for multiple debuggers to explore a task. Thus tasks under gdb or strace cannot be dumped.

Task from a different user (for non-root)[edit]

For security reasons, if CRIU is requested by non-root to dump some other task, it doesn't do it unless the dumpee belongs to the same user.

See also: User-mode

Sockets other than TCP, UDP, UNIX, packet and netlink[edit]

Packetized pipes[edit]

These are pipes created with O_DIRECT

Cork-ed UDP sockets[edit]

Files sent over unix sockets[edit]

#251

SysVIPC memory segment w/o IPC namespace[edit]

IPC objects are not tied to any tasks. Thus once CRIU meets an IPC memory attached to a task, it requires the whole IPC namespace to be dumped as well.

Dump/restore of graphical applications[edit]

Dumping + restoring an application connected to a "real" Xserver (e.g. on your laptop) is impossible now due to part of the app's state is in the Xserver and we don't dump this.