Changes

no edit summary
Line 25: Line 25:     
One filesystem, however, behaves friendly to this problem. The tempfs one pins the dentries in memory, since it has no other media on which to store the information about files on it. So for tmpfs the name is always at hands.
 
One filesystem, however, behaves friendly to this problem. The tempfs one pins the dentries in memory, since it has no other media on which to store the information about files on it. So for tmpfs the name is always at hands.
 +
 +
[[Category: Under the hood]]
 +
 +
== Opening a file without open() ==
 +
 +
Linux provides a way to do this. The way is called <code>open_by_handle_at</code> system call. Introduced to make the user-space NFS server work, this call allows to open an inode using a blob called ''handle''. The handle is (almost) meaning-less sequence of bytes by which filesystem promises to find the inode and open one. And the handle itself can be generated by the kernel using the inode object. Since fsnotify object references inode we can try to ask the kernel to generate the respective inode's handle. And we did that and patched the kernel to show this handle in the /proc/$pid/fdindo/$fd file for the fsnotify.
 +
 +
So when dumping the fsnotify we read the handle out of proc and save one in the images, and on restore time we call the <code>open_by_handle_at</code> with the handle value and get the inode back. Then we need to ask the kernel to put the fsnotify on this inode. To do this CRIU calls fsnotify init call on the /proc/self/fd/$fd path. While resolving the path kernel finds the inode opened previously and restores the handle in the proper place. Thus we fool the kernel and put fsnotify on an inode without even knowing its path.
 +
 +
== Irmap ==
 +
 +
But the problems are still not over. Not all filesystems provide handles. Hopefully yet, but still -- not always we can get a handle out of an inode and an inode out of a handle.
 +
This is very nasty situation, since Linux kernel provides no other APIs for getting the inode, only open by path and open by handle. With both ways closed we have to make a detour.
 +
 +
CRIU uses the empiric knowledge where fsnotify-s are typically put by programs (config files and alike) and does filesystem tree scan to find out the name by the inode number. The engine is caller ''irmap'' which stands for Inode Reverse MAP. The irmap cache recursively scans the tree starting from "known" locations and remembers all the name-inode pairs it meets. If we later try to irmap some inode which was met during the first scan, no additional FS access would occur, irmap would just report the name back.
 +
 +
=== Caching the irmap cache ===
 +
 +
Since this FS scan can be quite long, this is recommended to be done while tasks are not frozen. So the irmap cache fill is also started on the pre-dump operation, when tasks are not frozen. After the scan the cache is stored in the working dir under the irmap-cache.img name. When CRIU's next pre-dump or final dump is performed, the irmap cache is read back and when required the cached entries are re-validated individually, w/o the full FS re-scan.
    
[[Category: Under the hood]]
 
[[Category: Under the hood]]