Difference between revisions of "How hard is it to open a file"
(Categorized) |
(Intro added) Tags: Mobile edit Mobile web edit |
||
(4 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | + | This article outlines what CRIU restore needs to take care of when re-creating an open file descriptor. | |
− | |||
− | < | + | Let's imagine we have an information about a file we want to open. |
+ | What should it contain? Apparently, access mode and path: | ||
+ | |||
+ | <source lang="C"> | ||
struct file { | struct file { | ||
char *path; | char *path; | ||
unsigned mode; | unsigned mode; | ||
} *f; | } *f; | ||
− | </ | + | </source> |
and we'd like to have that path being opened by a process. We would | and we'd like to have that path being opened by a process. We would | ||
do it like below: | do it like below: | ||
− | < | + | <source lang="C"> |
int fd; | int fd; | ||
fd = open(f->path, f->mode); | fd = open(f->path, f->mode); | ||
− | </ | + | </source> |
− | + | Right? Right, but it's not all of it. We all know, that not only regular | |
− | files might be opened via paths, but also such things as | + | files might be opened via paths, but also such things as FIFOs. And |
− | plain open with the flags we want it to have may just hang. So we need | + | plain <code>open()</code> with the flags we want it to have may just hang. So we need |
to change that code to look like this: | to change that code to look like this: | ||
− | < | + | <source lang="C"> |
int fd, tfd = -1; | int fd, tfd = -1; | ||
Line 33: | Line 35: | ||
if (tfd >= 0) | if (tfd >= 0) | ||
close(tfd); | close(tfd); | ||
− | </ | + | </source> |
The tfd keeps FIFO read-write opened while we open it with any flags | The tfd keeps FIFO read-write opened while we open it with any flags | ||
we want. Then we close it. | we want. Then we close it. | ||
− | Now this seems to be OK, but it's actually not. In Linux file can be | + | Now this seems to be OK, but it's actually not. In Linux, file can be |
− | unlinked while being opened. In that case what was formerly pointed by | + | unlinked while being opened (these [[invisible files]] are treated carefully |
− | path may be kept in some temporary location. | + | on dump). In that case what was formerly pointed by |
− | + | path may be kept in some temporary location. We have to create a | |
− | info about file | + | temporary name for it, and unlink it afterwards. So, we need to extend the |
+ | info about a file: | ||
− | < | + | <source lang="C"> |
struct file { | struct file { | ||
char *path; | char *path; | ||
Line 50: | Line 53: | ||
char *temp_path; | char *temp_path; | ||
} *f; | } *f; | ||
− | </ | + | </source> |
− | and the opening code to take care of that | + | and the opening code to take care of that temporary location |
− | < | + | <source lang="C"> |
int fd, tfd = -1; | int fd, tfd = -1; | ||
Line 70: | Line 73: | ||
if (f->temp_path) | if (f->temp_path) | ||
unlink(f->path); | unlink(f->path); | ||
− | </ | + | </source> |
And we haven't seen all the code we need to manage what is pointed by | And we haven't seen all the code we need to manage what is pointed by | ||
− | the temp_path, but let's proceed. | + | the <code>temp_path</code>, but let's proceed. |
− | We have forgotten, that opened and | + | We have forgotten, that opened and <s>unlinked</s> removed can also be a |
− | directory. | + | directory. For directories, link and unlink do not work, and we have to |
− | + | append the code to at least try to make things work OK: | |
− | < | + | <source lang="C"> |
int fd, tfd = -1; | int fd, tfd = -1; | ||
Line 103: | Line 106: | ||
unlink(f->path); | unlink(f->path); | ||
} | } | ||
− | </ | + | </source> |
− | Done. | + | Done. Oh wait, we also should take care of hard links! If a file has any, |
− | and both were opened and removed, we cannot | + | and both were opened and removed, we cannot just go |
− | ahead and kill the temp_path | + | ahead and kill the <code>temp_path</code> after opening, as |
− | struct file to open one. A little bit more information should be added | + | it can be waiting for some other |
− | to the struct file | + | <code>struct file</code> to open one. A little bit more information should be added |
+ | to the <code>struct file</code>. | ||
− | < | + | <source lang="C"> |
struct temp_file { | struct temp_file { | ||
char *path; | char *path; | ||
Line 122: | Line 126: | ||
struct temp_file *temp; | struct temp_file *temp; | ||
} *f; | } *f; | ||
− | </ | + | </source> |
and to the code that opens one now looks like this: | and to the code that opens one now looks like this: | ||
− | < | + | <source lang="C"> |
int fd, tfd = -1; | int fd, tfd = -1; | ||
Line 152: | Line 156: | ||
} | } | ||
} | } | ||
− | </ | + | </source> |
By the way, we've left behind the scenes all the code required to make | By the way, we've left behind the scenes all the code required to make | ||
− | the temp_file data be shared between processes that need one and to | + | the <code>temp_file</code> data be shared between processes that need one and to |
− | make the | + | make the decrementing of <code>f->temp->users</code> be SMP-safe. |
Also note, that we don't handle the case when the file/directory is | Also note, that we don't handle the case when the file/directory is | ||
removed and some other file/directory is created under the same name. | removed and some other file/directory is created under the same name. | ||
− | It's rare case. | + | It's a rare case. |
Now, is that all? No, sorry. A couple of things left. First, Linux has | Now, is that all? No, sorry. A couple of things left. First, Linux has | ||
Line 167: | Line 171: | ||
info about what mount point the file belongs to like this: | info about what mount point the file belongs to like this: | ||
− | < | + | <source lang="C"> |
struct file { | struct file { | ||
char *path; | char *path; | ||
Line 174: | Line 178: | ||
unsigned mnt_id; | unsigned mnt_id; | ||
} *f; | } *f; | ||
− | </ | + | </source> |
and the code to open file would now look like | and the code to open file would now look like | ||
− | < | + | <source lang="C"> |
int fd, tfd = -1, ns_fd; | int fd, tfd = -1, ns_fd; | ||
char *rel_path = f->path + 1; | char *rel_path = f->path + 1; | ||
Line 209: | Line 213: | ||
close(ns_fd); | close(ns_fd); | ||
− | </ | + | </source> |
− | Let | + | Let's not dive into the details of how the <code>open_ns_root</code> looks like. |
Just know, that it opens a file descriptor, that refers to the root | Just know, that it opens a file descriptor, that refers to the root | ||
− | of the mount namespace that contains a mount point with the id mnt_id | + | of the mount namespace that contains a mount point with the id <code>mnt_id</code> |
(they cannot be shared, and that's great). | (they cannot be shared, and that's great). | ||
Line 219: | Line 223: | ||
First, opened files typically have a position. Flags we get need to be | First, opened files typically have a position. Flags we get need to be | ||
sanitated not to container those that only make sense during open, | sanitated not to container those that only make sense during open, | ||
− | like O_TRUNC or O_CREAT. And file may have a thing called fown managed | + | like <code>O_TRUNC</code> or <code>O_CREAT</code>. And file may have a thing called <code>fown</code> managed |
− | by the F_SETSIG and F_SETOWN fcntls. All this results in | + | by the <code>F_SETSIG</code> and <code>F_SETOWN</code> fcntls. All this results in |
− | < | + | <source lang="C"> |
struct file { | struct file { | ||
char *path; | char *path; | ||
Line 231: | Line 235: | ||
struct fown fown; | struct fown fown; | ||
} *f; | } *f; | ||
− | </ | + | </source> |
and | and | ||
− | < | + | <source lang="C"> |
int fd, tfd = -1, ns_fd, open_flags; | int fd, tfd = -1, ns_fd, open_flags; | ||
char *rel_path = f->path + 1; | char *rel_path = f->path + 1; | ||
Line 271: | Line 275: | ||
fcntl(fd, F_SETOWN, &f->fown->owner); | fcntl(fd, F_SETOWN, &f->fown->owner); | ||
lseek(fd, SEEK_SET, f->pos); | lseek(fd, SEEK_SET, f->pos); | ||
− | </ | + | </source> |
− | And don't ask for details of the f->fown thing. It's tricky, but just | + | And don't ask for details of the <code>f->fown</code> thing. It's tricky, but just |
− | follows the ABI and | + | follows the ABI and therefore boring. |
− | OK, we've finished with the top of the iceberg | + | OK, we've finished with the top of the iceberg — opening a file. Why |
top? Becase when opened file should be planted into a process' file | top? Becase when opened file should be planted into a process' file | ||
− | descriptors table under desired number. You might | + | descriptors table under desired number. You might think that it should be |
− | simple as | + | as simple as: |
− | + | <source lang="C"> | |
− | < | ||
dup2(fd, desired_fd); | dup2(fd, desired_fd); | ||
− | </ | + | </source> |
− | but it's not. | + | but it's not. Here's [[how to assign needed file descriptor to a file]]. |
[[Category:Under the hood]] | [[Category:Under the hood]] | ||
+ | [[Category:Files]] |
Latest revision as of 02:57, 24 September 2016
This article outlines what CRIU restore needs to take care of when re-creating an open file descriptor.
Let's imagine we have an information about a file we want to open. What should it contain? Apparently, access mode and path:
struct file {
char *path;
unsigned mode;
} *f;
and we'd like to have that path being opened by a process. We would do it like below:
int fd;
fd = open(f->path, f->mode);
Right? Right, but it's not all of it. We all know, that not only regular
files might be opened via paths, but also such things as FIFOs. And
plain open()
with the flags we want it to have may just hang. So we need
to change that code to look like this:
int fd, tfd = -1;
if (S_ISFIFO(f->mode))
tfd = open(f->path, O_RDWR);
fd = open(f->path, f->mode);
if (tfd >= 0)
close(tfd);
The tfd keeps FIFO read-write opened while we open it with any flags we want. Then we close it.
Now this seems to be OK, but it's actually not. In Linux, file can be unlinked while being opened (these invisible files are treated carefully on dump). In that case what was formerly pointed by path may be kept in some temporary location. We have to create a temporary name for it, and unlink it afterwards. So, we need to extend the info about a file:
struct file {
char *path;
unsigned mode;
char *temp_path;
} *f;
and the opening code to take care of that temporary location
int fd, tfd = -1;
if (f->temp_path)
link(f->temp_path, f->path);
if (S_ISFIFO(f->mode))
tfd = open(f->path, O_RDWR);
fd = open(f->path, f->mode);
if (tfd >= 0)
close(tfd);
if (f->temp_path)
unlink(f->path);
And we haven't seen all the code we need to manage what is pointed by
the temp_path
, but let's proceed.
We have forgotten, that opened and unlinked removed can also be a
directory. For directories, link and unlink do not work, and we have to
append the code to at least try to make things work OK:
int fd, tfd = -1;
if (f->temp_path) {
if (S_ISDIR(f->mode))
mkdir(f->path);
else
link(f->temp_path, f->path);
}
if (S_ISFIFO(f->mode))
tfd = open(f->path, O_RDWR);
fd = open(f->path, f->mode);
if (tfd >= 0)
close(tfd);
if (f->temp_path) {
if (S_ISDIR(f->mode))
rmdir(f->mode);
else
unlink(f->path);
}
Done. Oh wait, we also should take care of hard links! If a file has any,
and both were opened and removed, we cannot just go
ahead and kill the temp_path
after opening, as
it can be waiting for some other
struct file
to open one. A little bit more information should be added
to the struct file
.
struct temp_file {
char *path;
unsigned users;
};
struct file {
char *path;
unsigned mode;
struct temp_file *temp;
} *f;
and to the code that opens one now looks like this:
int fd, tfd = -1;
if (f->temp) {
if (S_ISDIR(f->mode))
mkdir(f->path);
else
link(f->temp->path, f->path);
}
if (S_ISFIFO(f->mode))
tfd = open(f->path, O_RDWR);
fd = open(f->path, f->mode);
if (tfd >= 0)
close(tfd);
if (f->temp) {
if (--f->temp->users == 0) {
if (S_ISDIR(f->mode))
rmdir(f->mode);
else
unlink(f->temp->path);
}
}
By the way, we've left behind the scenes all the code required to make
the temp_file
data be shared between processes that need one and to
make the decrementing of f->temp->users
be SMP-safe.
Also note, that we don't handle the case when the file/directory is removed and some other file/directory is created under the same name. It's a rare case.
Now, is that all? No, sorry. A couple of things left. First, Linux has a thing called mount namespace. And two files with the same path may have been opened in different mount namespaces. So we also need the info about what mount point the file belongs to like this:
struct file {
char *path;
unsigned mode;
struct temp_file *temp;
unsigned mnt_id;
} *f;
and the code to open file would now look like
int fd, tfd = -1, ns_fd;
char *rel_path = f->path + 1;
ns_fd = open_ns_root(f->mnt_id);
if (f->temp) {
if (S_ISDIR(f->mode))
mkdirat(ns_fd, rel_path);
else
linkat(ns_fd, f->temp->path, ns_fd, rel_path);
}
if (S_ISFIFO(f->mode))
tfd = openat(ns_fd, rel_path, O_RDWR);
fd = openat(ns_fd, rel_path, f->mode);
if (tfd >= 0)
close(tfd);
if (f->temp_path) {
if (--f->temp->users == 0) {
if (S_ISDIR(f->mode))
unlinkat(ns_fd, f->mode, AT_REMOVEDIR);
else
unlinkat(ns_fd, f->temp->path);
}
}
close(ns_fd);
Let's not dive into the details of how the open_ns_root
looks like.
Just know, that it opens a file descriptor, that refers to the root
of the mount namespace that contains a mount point with the id mnt_id
(they cannot be shared, and that's great).
Pretty complex already, isn't it? Just a couple of final touches left.
First, opened files typically have a position. Flags we get need to be
sanitated not to container those that only make sense during open,
like O_TRUNC
or O_CREAT
. And file may have a thing called fown
managed
by the F_SETSIG
and F_SETOWN
fcntls. All this results in
struct file {
char *path;
unsigned mode;
struct temp_file *temp;
unsigned mnt_id;
unsigned long pos;
struct fown fown;
} *f;
and
int fd, tfd = -1, ns_fd, open_flags;
char *rel_path = f->path + 1;
ns_fd = open_ns_root(f->mnt_id);
if (f->temp) {
if (S_ISDIR(f->mode))
mkdirat(ns_fd, rel_path);
else
linkat(ns_fd, f->temp->path, ns_fd, rel_path);
}
if (S_ISFIFO(f->mode))
tfd = openat(ns_fd, rel_path, O_RDWR);
open_flags = sanitize_open_mode(f->mode);
fd = openat(ns_fd, rel_path, open_flags);
if (tfd >= 0)
close(tfd);
if (f->temp_path) {
if (--f->temp->users == 0) {
if (S_ISDIR(f->mode))
unlinkat(ns_fd, f->mode, AT_REMOVEDIR);
else
unlinkat(ns_fd, f->temp->path);
}
}
close(ns_fd);
fcntl(fd, F_SETSIG, f->fown->sig);
fcntl(fd, F_SETOWN, &f->fown->owner);
lseek(fd, SEEK_SET, f->pos);
And don't ask for details of the f->fown
thing. It's tricky, but just
follows the ABI and therefore boring.
OK, we've finished with the top of the iceberg — opening a file. Why top? Becase when opened file should be planted into a process' file descriptors table under desired number. You might think that it should be as simple as:
dup2(fd, desired_fd);
but it's not. Here's how to assign needed file descriptor to a file.