Hardness in dumping and restoring of fsnotify
Fsnotify are implemented quite straightforth -- we can fetch watchees by their handled from procfs output:
pos: 0 flags: 02000000 inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d
so that on dump we can remember a watchee file handler and open it back on restore retrieving path from file descriptor link provided by procfs.
This all works just fine until watchees are represented as children of another watch descriptor. Consider one has a directory dir and two files under it a and b:
dir `- a `- b
and a program sets up fsnotify mark on every file entry, i.e. on dir itself and both files. Then imagine a program open both a and b and then unlink them. This action generates notify events which a program may or may not read yet (thus events queue is not empty) but a user start dumping procedure. Because kernel has not yet any API to peek events from queue (note the peek here means to read events without removing them from the queue) we either should ignore the events or refuse to dump.
Refusing dumping might be an option but due to current CRIU design it turns out that we might stuck in situation where any attempt to dump will force CRIU to generate events itself leading to endless cycle. This is mostly because of that named ghost files. The ghost files are the files which were removed by an application but its file descriptor is still alive. For such scenario we generate a hardlink to the deleted file at moment of dumping which of course generates notify events.
Almost the same situation happens on restore procedure -- ghost files get unlinked which cause kernel to generate events.
So until redesign of the dumping/restore procedure for fsnotify system we have to ignore nonempty notify queues on dump and live with the fact that we're generating own events on restore.