Line 20:
Line 20:
== Restore ==
== Restore ==
−
Upon restore, CRIU already knows which mappings are shared, and the trick is to restore them as such.
+
During the restore, CRIU already knows which mappings are shared, so they need to be restored as shared.
−
For that, two different approaches are used, depending on the availability.
+
To restore file mappings, no tricks are needed, they are opened and mmaped with with a MAP_SHARED flag set.
−
The common part is, between the processes sharing a mapping, the one with the lowest PID
+
Anonymous memory mappings, though, need some work to be restored as such. Here is how it is done.
−
among the group performs the actual <code>mmap()</code>, while all the others wait
−
for the mapping to appear and, once it's available, use it.
−
=== memfd ===
+
Among all the processes sharing a mapping, the one with the lowest PID among the group
+
(see [[postulates]]) is assigned to be a mapping creator. The creator task is to obtain a mapping
+
file descriptor, restore the mapping data, and signal all the other process that it's ready.
+
During this process, all the other processes are waiting.
−
Linux kernel v3.17 adds a [http://man7.org/linux/man-pages/man2/memfd_create.2.html memfd_create()]
+
First, the creator need to obtain a file descriptor for the mapping. To achieve it, two different
−
syscall. CRIU restore checks if it is available from the running kernel; it yes, it is used.
+
approaches are used, depending on the availability.
−
FIXME how
+
In case [http://man7.org/linux/man-pages/man2/memfd_create.2.html memfd_create()]
+
syscall is available (Linux kernel v3.17+), it is used to obtain a file descriptor.
+
Next, <code>ftruncate()</code> is called to set the proper size of mapping.
−
HOW: The memfd in question is created in the task with lowest PID (see [[postulates]]) among those having this shmem segment
+
If <code>memfd_create()</code> is not available, the alternative approach is used.
−
mapped, then criu waits for the others to get this file by opening the creator's /proc/pid/fd/ link.
+
First, mmap() is called to create a mapping. Next, a file in <code>/proc/self/map_files/</code>
−
Afterwards all the files just mmap() this descriptor into their address space.
+
is opened to get a file descriptor for the mapping. The limitation of this method is,
+
due to security concerns, /proc/$PID/map_files/ is not available for processes that
+
live inside a user namespace, so it is impossible to use it if there
+
are any user namespaces in the dump.
−
=== /proc/$PID/map_files/ ===
+
Once the creator have the file descriptor, it mmap()s it and restores its content from
−
+
the dump (using memcpy()). The creator then unmaps the the mapping (note the file
−
This method is used if memfd is not available. The limitation is, /proc/$PID/map_files/ is not available
+
descriptor is still available). Next, it calls futex(FUTEX_WAKE) to signal all the
−
for users inside user namespaces (due to security concerns), so it's not possible to use it if there
+
waiting processes that the mapping file descriptor is ready.
−
are any user namespaces in the dump.
−
FIXME how
+
All the other processes that need this mapping wait on futex(FUTEX_WAIT). Once the
+
wait is over, they open the creator's /proc/$CREATOR_PID/fd/$FD file to get the
+
mapping file descriptor.
−
HOW: The same technique as with memfd is used, with two exceptions. First is that creator calls mmap()
+
Finally, all the processes (including the creator itself) call mmap() to create a
−
not memfd_create() and creates the shared memory at once. Then it waits for the others to open its
+
needed mapping (note that mmap() arguments such as length, offset and flags may
−
/proc/pid/map_files/ link. After opening "the others" mmap() one to their address space just as if
+
differ for different processes), and close() the mapping file descriptor as it is
−
they would have done it with memfd descriptor.
+
no longer needed.
== Changes tracking ==
== Changes tracking ==