This is what CRIU does to dump information about opened files.
Files, descriptors and inodes in Linux
When a task opens a file Linux kernel constructs a chain of 3 objects to serve this.
- This is an object which describes file as a couple of meta-data (owner, type, size) and data (the bytes themselves).
- Dentry (directory entry)
- A helper object that kernel uses to resolve file path into Inode object. If a file has hard-links, then one Inode will have several Dentries.
- This one describes how a tasks works with an opened Dentry-Inode pair.
- File descriptor
- It's a number in task's table, that is used to reference the needed File object
open() (or some other, see below) system call there will be this chain in the memory
The File object can be referenced by more than on FDT, e.g. when a task calls
fork() the child one gets new FDT, but it references the same Files as the parent does, i.e. Files become shared objects.
The Inode object is also interesting. First of all, remember that in Linux file descriptors can be obtained not only by the
open() system call, but also by
socket() and a bunch of Linux-specific
signalfd and others. So when serving this Linux would anyway create the mentioned above chain of File-Dentry-Inode objects, but the Inode one will be different for different calls. And CRIU knows this all and acts respectively :)
How info about opened files is stored in CRIU
Having said that CRIU introduces several images to keep info about what files are opened by task.
First image is the
fdinfo-$id.img one. This image contains info about FDT of a process. The entries have two important fields --
The fd is the descriptor number under which the created on restore File should be put. The id is the identifier to the File-Inode object pair in other images.
Fore the sake of simplicity CRIU doesn't introduce separate states for Files, Dentries and Inodes leaving it to the kernel. Instead each this triplet is treated as one object and for every Inode type (file, pipe, socket, etc.) separate image is introduced. Thus CRIU has
- reg-files.img for regular files, that are created by open() call
- unixsk.img for unix sockets
- pipes.img for pipes
- inetsk for IP sockets (both TCP and UDP)
- signalfd.img for signal fd
In each of this files info about File and Inode of respective file is preserved. Dentry information is effectively stored there for regular files only -- the file's path.
How CRIU gets the information to dump
So on dump CRIU needs to find out several things.
- The FD numbers owned by tasks
- How File-s are shared between tasks' FDTs
- What Inode types are there
- State of File and Inode objects
This is simple. Reading the /proc/$pid/fd or /proc/$pid/fdinfo directory just shows the required numbers.
In order to find out whether two Files sitting beyond two FDs of two tasks are the same CRIU uses the
kcmp system call.
Determining Inode type
The inode type in most of the cases can be found out by
stat()-ing the descriptor. To do this CRIU asks the parasite code to send back the files via unix socket's
SCM_RIGHTS message. After this, having the same Files at hands CRIU
fstats each and checks the
For some stupid files (signalfd, inotify, etc.) the mode field is zero and CRIU reads the /proc/$pid/fd/$fd/ link. For those the link target uniquely defines the Inode type.
State of File and Inode
From File CRIU needs only two things -- the mode and the position. Both can be read from /proc/$pid/fdinfo/$fd/ and the latter one can be requested via
Getting the Inode state is specific to Inode. CRIU uses the following sources of information:
- Data from /proc/$pid/fdinfo/$fd/
- Link target of /proc/$pid/fd/$fd/ link
- Inode-specific ioctl()-s
- Fetch data directly from FD with recv + MSG_PEEK for socket queues and tee for pipes/fifos
- Info from sock_diag modules to get info about sockets