Changes

Jump to navigation Jump to search
m
cat files
Line 67: Line 67:     
<pre>
 
<pre>
int fd, i, pid = getpid();
+
int fd, i, pid = getpid(), sk;
 +
 
 +
sk = create_socket();
    
if (pid == file_opener(fd)) {
 
if (pid == file_opener(fd)) {
Line 76: Line 78:  
dup2(fd, fd->tgt_fds[i].fd);
 
dup2(fd, fd->tgt_fds[i].fd);
 
else
 
else
send_fd(fd, fd->tgt_fds[i]);
+
send_fd(fd, fd->tgt_fds[i], sk);
 
}
 
}
   Line 85: Line 87:  
continue;
 
continue;
   −
fd = recv_fd();
+
fd = recv_fd(sk);
 
dup2(fd, fd->tgt_fds[i].fd);
 
dup2(fd, fd->tgt_fds[i].fd);
 
close(fd);
 
close(fd);
 
}
 
}
 
}
 
}
 +
 +
close(sk);
 
</pre>
 
</pre>
    
Please, note, that all <code>tgt_fds</code> belonging to some task are opened by different one and then are sent to the real owner in the order they are met in the array. So does the receiver -- it receives the fds in the same order, so this algorithm puts files into proper descriptors.
 
Please, note, that all <code>tgt_fds</code> belonging to some task are opened by different one and then are sent to the real owner in the order they are met in the array. So does the receiver -- it receives the fds in the same order, so this algorithm puts files into proper descriptors.
   −
There are several interesting things about the above code snippet.
+
There are several bugs in the above code snippet however :(
 +
 
 +
First, the <code>send_fd</code> and <code>recv_fd</code> routines cannot works using one socket for all tasks -- a descriptor sent to task <code>pid</code> should reach ''this'' task, not some arbitrary one that kernel woke up earlier on data arrival. That said, we have to create one socket per at least task to receive descriptors. But files can be shared in a tricky manner, so that task A may have one file shared with task B and some other file shared with task C. If the "who opens a file" voting selects B and C for respective files, they will have to send descriptors to A with proper coordination with each other. This coordination can be simplified if we create sockets not just per-pid, but per-(pid, fd). And where to keep all this bunch of sockets? The easiest answer -- in the places where the files they will receive should sit :) And we then use the <code>sendto()</code> syscall to send the descriptor via unconnected socket by address. These transport sockets may have some unique name like <code>"criu-fd-transport-%pid-%fd"</code>.
 +
 
 +
Second, when the file opener calls <code>dup2()</code> it may overwrite the <code>sk</code> descriptor. This is sad, but OK, since we can move the sk into any free place using plain <code>dup()</code> system call.
 +
 
 +
<pre>
 +
int fd, i, pid = getpid(), sk;
 +
 
 +
if (pid == file_opener(fd)) {
 +
sk = create_socket();
 +
fd = open_a_file(fd->file);
 +
 
 +
for (i = 0; i < fd->n_fds; i++) {
 +
if (fd->tgt_fds[i].pid == pid) {
 +
if (sk == fd->tgt_fds[i].fd)
 +
sk = dup(sk);
 +
dup2(fd, fd->tgt_fds[i].fd);
 +
} else
 +
send_fd(fd, fd->tgt_fds[i], sk);
 +
}
 +
 
 +
close(fd);
 +
close(sk);
 +
} else {
 +
for (i = 0; i < fd->n_fds; i++) {
 +
if (fd->tgt_fds[i].pid != pid)
 +
continue;
 +
 
 +
sk = create_socket();
 +
dup2(sk, fd->tgt_fds[i].fd);
 +
}
 +
 
 +
for (i = 0; i < fd->n_fds; i++) {
 +
if (fd->tgt_fds[i].pid != pid)
 +
continue;
 +
 
 +
fd = recv_fd(fd->tgt_fds[i].fd);
 +
dup2(fd, fd->tgt_fds[i].fd);
 +
close(fd);
 +
}
 +
}
 +
</pre>
 +
 
 +
This is almost all. Only a few notes left.
 +
 
 +
First, the above code is still buggy -- the file opener should make sure that everybody has their transport sockets ready. This requires some synchronization around <code>create_socket()</code> and <code>send_fd</code>.
 +
 
 +
Second, the "who will open a file" voting. We should make sure, that the synchronization mentioned above doesn't AB-BA deadlock, so when deciding which task to open a file we always chose the one with the smallest pid. And the file sending wave goes upwards the process tree :)
   −
First, the <code>send_fd</code> and <code>recv_fd</code> routines cannot works using one socket for all tasks -- a descriptor sent to task <code>pid</code> should reach ''this'' task, not some arbitrary one that kernel woke up earlier on data arrival. That said, we have to create one socket per at least task to receive descriptors.
+
Third, the <code>open_a_file()</code> is not just [[How hard is it to open a file|this]]. Opened can be pipe, socket, signalfd, inotify and many other fancy stuff none of which uses the open-by-path engine.
   −
Second, files can be shared in a tricky manner, so that task A may have one file shared with task B and some other file shared with task C. If the "who opens a file" voting selects B and C for respective files, they will have to send descriptors to A with proper coordination with each other. This coordination can be simplified if we create sockets not just per-pid, but per-(pid, fd). This is what CRIU does, and this is how it does that.
+
And the last, but not least, files can depend on each other. E.g. an eventpoll file may have some other file descriptor monitored, and if we call the <code>open_a_file()</code> on eventpoll fd before we open the fd being monitored, we fail. This also affects the code that forms an array of <code>struct fd</code>-s
    
[[Category:Under the hood]]
 
[[Category:Under the hood]]
 +
[[Category:Files]]

Navigation menu