Line 1: |
Line 1: |
− | This articles describes some intricacies of handling shared memory mappings, i.e. mappings that are shared between a few processes.
| + | Every process has one or more memory mappings, i.e. regions of virtual memory it allows to use. |
| + | Some such mappings can be shared between a few processes, and they are called shared mappings. |
| + | In other words, these are shared '''anonymous (not file-based) memory mappings'''. |
| + | The article describes some intricacies of handling such mappings. |
| | | |
| == Checkpoint == | | == Checkpoint == |
| | | |
− | Every process has one or more mmaped files. Some mappings (for example, ones of shared libraries)
| + | During the checkpointing, CRIU needs to figure out all the shared mappings in order to dump them as such. |
− | are shared between a few processes. During the checkpointing, CRIU need to figure out
| |
− | all the mappings that are shared in order to dump them as such. | |
| | | |
− | It does so by performing <code>fstatat()</code> for each entry in <code>/proc/$PID/map_files/</code>, | + | It does so by calling <code>fstatat()</code> on each entry found in the <code>/proc/$PID/map_files/</code>, |
− | noting the ''device'' and ''inode'' fields of the structure returned by fstatat(). This information | + | noting the ''device:inode'' pair of the structure returned by <code>fstatat()</code>. Now, if some processes |
− | is collected and sorted. Now, if any few processes have a mapping with same ''device'' and ''inode'',
| + | have a mapping with the same ''device:inode'' pair, this mapping is marked as shared between these processes |
− | this mapping is a shared one and should be dumped as such. | + | and dumped as such. |
| | | |
− | It's important to note that the above mechanism works not just for the file-based mappings,
| + | Note that <code>fstatat()</code> works because the kernel actually creates a hidden |
− | but also for the anonymous ones. For an anonymous mapping, kernel actually creates a hidden
| + | tmpfs file, not visible from any tmpfs mounts, but accessible via its |
− | tmpfs file, and so <code>fstatat()</code> on the <code>/proc/$PID/map_files/</code> entry | + | <code>/proc/$PID/map_files/</code> entry. |
− | works the same way as for other files. The tmpfs file itself is not visible from any tmpfs
| + | |
− | mounts, but can be opened via its <code>map_files</code> entry.
| + | Dumping a mapping means two things: |
| + | * writing an entry into process' mm.img file; |
| + | * storing the actual mapping data (contents). |
| + | For shared mappings, the contents is stored into a pair of image files: pagemap-shmem.img and pages.img. |
| + | For details, see [[Memory dumps]]. |
| + | |
| + | Note that different processes can map different parts of a shared memory segment. |
| + | In this case, CRIU first collects mapping offsets and lengths from all the processes |
| + | to determine the total segment size, then reads all the parts contents |
| + | from the respective processes. |
| | | |
| == Restore == | | == Restore == |
| | | |
− | During the restore, CRIU already knows which mappings are shared, so they need to be restored as shared. | + | During the restore, CRIU already knows which mappings are shared, so they need to be |
− | To restore file mappings, no tricks are needed, they are opened and mmaped with with a MAP_SHARED flag set.
| + | restored as such. Here is how it is done. |
− | | |
− | Anonymous memory mappings, though, need some work to be restored as such. Here is how it is done.
| |
| | | |
| Among all the processes sharing a mapping, the one with the lowest PID among the group | | Among all the processes sharing a mapping, the one with the lowest PID among the group |
Line 66: |
Line 74: |
| == Dumping present pages == | | == Dumping present pages == |
| | | |
− | When dumping the contents of shared memory CRIU doesn't dump all the data. Instead, it determines which pages contain | + | When dumping the contents of shared memory, CRIU does not dump all of the data. Instead, it determines which pages contain |
− | it and dumps only them. This is done similarly to how regular [[memory dumping and restoring]] works, i.e. by analyzing | + | it, and only dumps those pages. This is done similarly to how regular [[memory dumping and restoring]] works, i.e. by looking |
− | the owners' pagemap entries for PRESENT or SWAPPED bits. But there's one feature of shmem dumps -- sometimes shmem
| + | for PRESENT or SWAPPED bits in owners' pagemap entries. |
− | page can exist in the kernel, but not mapped to any process. In this case criu detects one by calling mincore() on
| + | |
− | the shmem segment, which reports back the page in-memory status. And the mincore bitmap is AND-ed with the per-process | + | There is one particular feature of shared memory dumps worth mentioning. Sometimes, a shared memory page |
− | ones. | + | can exist in the kernel, but it is not mapped to any process. CRIU detects such pages by calling mincore() |
| + | on the shmem segment, which reports back the page in-memory status. The mincore bitmap is when ANDed with |
| + | the per-process ones. |
| | | |
| == See also == | | == See also == |
| | | |
− | [[Memory dumping and restoring]] | + | * [[Memory dumping and restoring]] |
− | | + | * [[Memory images deduplication]] |
− | [[Memory images deduplication]] | |
| | | |
| [[Category:Memory]] | | [[Category:Memory]] |
| [[Category:Under the hood]] | | [[Category:Under the hood]] |
| + | [[Category:Editor help needed]] |