Changes

misc fixes
Line 1: Line 1:  +
This article describes how CRIU dumps and restores processes' memory.
 +
 
== Basic C/R ==
 
== Basic C/R ==
   Line 13: Line 15:  
** ''anonymoys'' for the MAP_FILE | MAP_PRIVATE mapping indicate that the page in question is already COW-ed from the file's. Not-anonymous pages are not dumped as they are still in sync with the file
 
** ''anonymoys'' for the MAP_FILE | MAP_PRIVATE mapping indicate that the page in question is already COW-ed from the file's. Not-anonymous pages are not dumped as they are still in sync with the file
 
** ''soft-dirty'' bit is used by [[memory changes tracking]]
 
** ''soft-dirty'' bit is used by [[memory changes tracking]]
* Ptrace SEIZE that is used to grab pages from task's VM into pipe (with vmsplice)
+
* Ptrace SEIZE, used to grab pages from task's VM into a pipe (with vmsplice)
   −
The latter step deserves some better explanation. So in order to drain memory from task we first generate the bitmap of pages needed to be dumped (using the smaps, map_files and pagemap from proc). Then we create a set of pipe-s to put pages into. Then we infect the process with [[parasite code]] which, in turn, gets the pipes and <code>vmsplice</code>-s the required pages into it. Then we <code>splice</code> the pages from pipes into [[memory dumps|image files]].
+
The last step deserves a more detailed explanation. In order to drain memory from a task, we first generate the bitmap of pages needed to be dumped (using the smaps, map_files and pagemap from proc). Next, we create a set of pipes to put pages into. Then we infect the process with [[parasite code]], which, in turn, gets the pipes and <code>vmsplice</code>s the required pages into it. Finally, we <code>splice</code> the pages from pipes into [[memory dumps|image files]].
    
=== Restoring ===
 
=== Restoring ===
   −
Restoring is pretty straightforward as during restore CRIU morphs itself into the target task. Two things worth mentioning before diving into explanation of steps.
+
Restoring is pretty straightforward. During restore, CRIU morphs itself into a target task. Two things worth mentioning before diving into explanation of steps.
    
;[[COW]]
 
;[[COW]]
Line 27: Line 29:  
:Those areas are implemented in the kernel by supporting a pseudo file on a hidden tmpfs mount. So on restore we just determine who will create the shared are and who will attach to it (see the [[postulates]]). Then the creator <code>mmap</code>-s the region and the others open the /proc/pid/map_files/ link. However, on the recent kernels, we use the new <code>memfd</code> system call that does similar thing but works for user namespaces. Briefly -- creator creates the memfd, all the others get one via /proc/pid/fd link which is not that strict as compared to the map_files.
 
:Those areas are implemented in the kernel by supporting a pseudo file on a hidden tmpfs mount. So on restore we just determine who will create the shared are and who will attach to it (see the [[postulates]]). Then the creator <code>mmap</code>-s the region and the others open the /proc/pid/map_files/ link. However, on the recent kernels, we use the new <code>memfd</code> system call that does similar thing but works for user namespaces. Briefly -- creator creates the memfd, all the others get one via /proc/pid/fd link which is not that strict as compared to the map_files.
   −
having said that, the restore of memory is done in the following steps
+
Having said that, the restore of memory is done in the following steps:
   −
; Opening images and reading VMA-s in
+
; Open images and read in VMAs
: Here we open all the mm.img, read mappings in, resolve shared memory segments and check whether we need to special-care mapped files.
+
: Open all the mm.img, read mappings in, resolve shared memory segments and check whether we need to special-care mapped files.
   −
; Forking and pre-mapping
+
; Fork and pre-mmap
: At this stage each task pre-mmaps private anonymous areas and populates them with pages (from pagemap/pages images). Then task fork()-s the child which does the same. This is done like this to make COW-ed areas actually share the pages they should. On fork() the shared pages become such and currently that's the only way to make Linux kernel do this.
+
: Each task pre-mmaps private anonymous areas and populates them with pages (from pagemap/pages images). Then task forks the child which does the same. It is done in such way in order to make COWed areas actually share the pages they should. On fork() the shared pages become actually shared, as currently this is the only way to make Linux kernel do this.
   −
; Opening file mappings
+
; Open file mappings
 
: Soon after fork we check which VMA-s are MAP_FILE ones and request the [[files]] engine to open them.
 
: Soon after fork we check which VMA-s are MAP_FILE ones and request the [[files]] engine to open them.
   −
; Opening shared mappings
+
; Open shared mappings
 
: At almost the same place we create an FD for shared anonymous VMA-s.
 
: At almost the same place we create an FD for shared anonymous VMA-s.
   −
; Diving into [[restorer context]]
+
; Dive into [[restorer context]]
 
: At this stage we strip off all the old CRIU mappings thus making the VM be ready for restored mappings.
 
: At this stage we strip off all the old CRIU mappings thus making the VM be ready for restored mappings.
   −
; Restoring mappings in their places
+
; Restore mappings in their places
 
: Anonymous private mappings are <code>mremap</code>-ed from the pre-mapped areas one-by-one, file mappings are created with <code>mmap</code> system call. Anonymous shared mappings are also just mmaped.
 
: Anonymous private mappings are <code>mremap</code>-ed from the pre-mapped areas one-by-one, file mappings are created with <code>mmap</code> system call. Anonymous shared mappings are also just mmaped.
    
=== Non linear mappings ===
 
=== Non linear mappings ===
   −
Currently we don't support non-linear mappings (fail dump if present)
+
Currently we don't support non-linear mappings (so dump fails if such mappings are found).
    
== Advanced C/R ==
 
== Advanced C/R ==
   −
For such things as remote dump, stackable images and incremental dumps CRIU supports a more sophisticated memory C/R policies rather than "dump all -- restore all" one. There are several CLI knobs that affect this question.
+
For things as remote dump, stackable images, and incremental dumps, CRIU supports a more sophisticated memory C/R policies rather than "dump all -- restore all" one. There are several CLI knobs that can be used.
    
* dump action
 
* dump action
Line 67: Line 69:     
;dump
 
;dump
:Without any options till will dump everything and kill the dumped tasks.
+
:Without any options, dump everything and kill the dumped tasks.
    
;dump --track-mem
 
;dump --track-mem
:Will dump everything, will turn on memory changes tracking and will kill tasks after this. As you might have noticed this is pretty useless combination of options.
+
:Dump everything, turn on memory changes tracking, and kill tasks after this. As you might have noticed, this is pretty useless combination of options!
    
;dump --leave-running
 
;dump --leave-running
:Will dump everything and leave the tasks running after dump.
+
:Dump everything, and leave the tasks running after dump.
    
;dump --track-mem --leave-running
 
;dump --track-mem --leave-running
:Same as above, but will turn on memory changes tracking.
+
:Same as above, but turn on memory changes tracking.
    
;dump --track-mem --leave-running --prev-images-dir <path>
 
;dump --track-mem --leave-running --prev-images-dir <path>
:Same as above, but during dump will also check whether the page in question is present in parent and would skip dumping it this time.
+
:Same as above, but during dump also check whether the page in question is present in parent, and skip dumping it this time.
    
;pre-dump
 
;pre-dump
:Will dump only the memory, turn on memory changes tracking and leave tasks running
+
:Only dump memory, turn on memory changes tracking and leave the tasks running.
    
;pre-dump --prev-images-dir <path>
 
;pre-dump --prev-images-dir <path>
:Same as above, but will check for pages present in parent and would skip them.
+
:Same as above, but check for pages present in parent and skip them.
    
;<pre->dump <options> --page-server
 
;<pre->dump <options> --page-server
:Given to any combination above would make CRIU send the pages to the page server (e.g. for [[disk-less migration]]).
+
:Send the pages to the page server (e.g. for [[disk-less migration]]). See [[page server]] for more details.
    
== Messing with image files ==
 
== Messing with image files ==
Line 96: Line 98:  
[[Category:Under the hood]]
 
[[Category:Under the hood]]
 
[[Category:Memory]]
 
[[Category:Memory]]
 +
[[Category:Live migration]]