Changes

Jump to navigation Jump to search
Line 10: Line 10:     
Chances to get the name back exist. To understand when let's dive a little bit more in how Linux manages dentries and inodes.
 
Chances to get the name back exist. To understand when let's dive a little bit more in how Linux manages dentries and inodes.
 +
 +
=== Inodes and dentries ===
    
So, every file on disk is represented by an inode object. Inode has an ID (inode number), access rights, owner, link count and some more data. Names are only stored in special files called ''directories'' -- in directories there's a set of name-to-inode mappings. When accessing a file by its name Linux kernel sequentially reads from disk these mapping tables and for every name found in it creates a dentry object in memory. It's important to know, that dentry is created not only for existing files, but also for non-existing to speed up the ENOENT report for second file lookup. IOW dentries form a cache, which contains records for both present and absent objects on disk.
 
So, every file on disk is represented by an inode object. Inode has an ID (inode number), access rights, owner, link count and some more data. Names are only stored in special files called ''directories'' -- in directories there's a set of name-to-inode mappings. When accessing a file by its name Linux kernel sequentially reads from disk these mapping tables and for every name found in it creates a dentry object in memory. It's important to know, that dentry is created not only for existing files, but also for non-existing to speed up the ENOENT report for second file lookup. IOW dentries form a cache, which contains records for both present and absent objects on disk.
Line 16: Line 18:     
Having said that, at the time the fsnotify creating happens we have a full dentry chain and the inode sitting in memory. Then the events generator is put on the inode and that's it. Neither inode nor fsnotify object references the dentry, so eventually the whole dentry chain can be shrunk from memory.
 
Having said that, at the time the fsnotify creating happens we have a full dentry chain and the inode sitting in memory. Then the events generator is put on the inode and that's it. Neither inode nor fsnotify object references the dentry, so eventually the whole dentry chain can be shrunk from memory.
 +
 +
 +
So, returning to the "can I get the name back" question. The answer is -- if the dentry cache is still alive -- yes, you can. But CRIU cannot rely on this, since it should also support situations when the dentry cache is not there.
 +
 +
=== Tmpfs ===
 +
 +
One filesystem, however, behaves friendly to this problem. The tempfs one pins the dentries in memory, since it has no other media on which to store the information about files on it. So for tmpfs the name is always at hands.
 +
 +
[[Category: Under the hood]]
 +
 +
== Opening a file without open() ==
 +
 +
Linux provides a way to do this. The way is called <code>open_by_handle_at</code> system call. Introduced to make the user-space NFS server work, this call allows to open an inode using a blob called ''handle''. The handle is (almost) meaning-less sequence of bytes by which filesystem promises to find the inode and open one. And the handle itself can be generated by the kernel using the inode object. Since fsnotify object references inode we can try to ask the kernel to generate the respective inode's handle. And we did that and patched the kernel to show this handle in the /proc/$pid/fdindo/$fd file for the fsnotify.
 +
 +
So when dumping the fsnotify we read the handle out of proc and save one in the images, and on restore time we call the <code>open_by_handle_at</code> with the handle value and get the inode back. Then we need to ask the kernel to put the fsnotify on this inode. To do this CRIU calls fsnotify init call on the /proc/self/fd/$fd path. While resolving the path kernel finds the inode opened previously and restores the handle in the proper place. Thus we fool the kernel and put fsnotify on an inode without even knowing its path.
 +
 +
== [[Irmap]] ==
 +
 +
But the problems are still not over. Not all filesystems provide handles. Hopefully yet, but still -- not always we can get a handle out of an inode and an inode out of a handle.
 +
This is very nasty situation, since Linux kernel provides no other APIs for getting the inode, only open by path and open by handle. With both ways closed we have to make a detour.
 +
 +
CRIU uses the empiric knowledge where fsnotify-s are typically put by programs (config files and alike) and does filesystem tree scan to find out the name by the inode number. The engine is caller ''irmap'' which stands for Inode Reverse MAP. The irmap cache recursively scans the tree starting from "known" locations and remembers all the name-inode pairs it meets. If we later try to irmap some inode which was met during the first scan, no additional FS access would occur, irmap would just report the name back.
 +
 +
=== Caching the irmap cache ===
 +
 +
Since this FS scan can be quite long, this is recommended to be done while tasks are not frozen. So the irmap cache fill is also started on the pre-dump operation, when tasks are not frozen. After the scan the cache is stored in the working dir under the irmap-cache.img name. When CRIU's next pre-dump or final dump is performed, the irmap cache is read back and when required the cached entries are re-validated individually, w/o the full FS re-scan.
    
[[Category: Under the hood]]
 
[[Category: Under the hood]]
 +
[[Category:Files]]

Navigation menu