Difference between revisions of "Memory dumping and restoring"

From CRIU
Jump to navigation Jump to search
Line 89: Line 89:
 
;<pre->dump <options> --page-server
 
;<pre->dump <options> --page-server
 
:Given to any combination above would make CRIU send the pages to the page server (e.g. for [[disk-less migration]]).
 
:Given to any combination above would make CRIU send the pages to the page server (e.g. for [[disk-less migration]]).
 +
 +
=== Messing with image files ===
 +
 +
[[File:Criu-memory-wflow.png]]
  
 
[[Category:Under the hood]]
 
[[Category:Under the hood]]
 
[[Category:Memory]]
 
[[Category:Memory]]

Revision as of 15:19, 27 February 2015

Basic C/R

Dumping

Currently memory dumping depends on 3 big technologies:

  • /proc/pid/smaps file and /proc/pid/map_files/ directory with links are used to determine
    • memory areas in use by task
    • file is mapped (if any)
    • shared memory "identifier" to resolve the MAP_SHARED areas
  • /proc/pid/pagemap file that reveals important flags
    • present indicates that the physical page is there. Non-present pages are not dumped.
    • anonymoys for the MAP_FILE | MAP_PRIVATE mapping indicate that the page in question is already COW-ed from the file's. Not-anonymous pages are not dumped as they are still in sync with the file
    • soft-dirty bit is used by memory changes tracking
  • Ptrace SEIZE that is used to grab pages from task's VM into pipe (with vmsplice)

The latter step deserves some better explanation. So in order to drain memory from task we first generate the bitmap of pages needed to be dumped (using the smaps, map_files and pagemap from proc). Then we create a set of pipe-s to put pages into. Then we infect the process with parasite code which, in turn, gets the pipes and vmsplice-s the required pages into it. Then we splice the pages from pipes into image files.

Restoring

Restoring is pretty straightforward as during restore CRIU morphs itself into the target task. Two things worth mentioning before diving into explanation of steps.

COW
Anonymous private mappings might have pages shared between tasks till they get COW-ed. To restore this CRIU pre-restores those pages before forking the child processes and mremap-s them in the final stage.
Shared memory
Those areas are implemented in the kernel by supporting a pseudo file on a hidden tmpfs mount. So on restore we just determine who will create the shared are and who will attach to it (see the postulates). Then the creator mmap-s the region and the others open the /proc/pid/map_files/ link. However, on the recent kernels, we use the new memfd system call that does similar thing but works for user namespaces. Briefly -- creator creates the memfd, all the others get one via /proc/pid/fd link which is not that strict as compared to the map_files.

having said that, the restore of memory is done in the following steps

Opening images and reading VMA-s in
Here we open all the mm.img, read mappings in, resolve shared memory segments and check whether we need to special-care mapped files.
Forking and pre-mapping
At this stage each task pre-mmaps private anonymous areas and populates them with pages (from pagemap/pages images). Then task fork()-s the child which does the same. This is done like this to make COW-ed areas actually share the pages they should. On fork() the shared pages become such and currently that's the only way to make Linux kernel do this.
Opening file mappings
Soon after fork we check which VMA-s are MAP_FILE ones and request the files engine to open them.
Opening shared mappings
At almost the same place we create an FD for shared anonymous VMA-s.
Diving into restorer context
At this stage we strip off all the old CRIU mappings thus making the VM be ready for restored mappings.
Restoring mappings in their places
Anonymous private mappings are mremap-ed from the pre-mapped areas one-by-one, file mappings are created with mmap system call. Anonymous shared mappings are also just mmaped.

Non linear mappings

Currently we don't support non-linear mappings (fail dump if present)

Advanced C/R

For such things as remote dump, stackable images and incremental dumps CRIU supports a more sophisticated memory C/R policies rather than "dump all -- restore all" one. There are several CLI knobs that affect this question.

  • dump action
  • pre-dump action
  • --track-mem option
    • --prev-images-dir option
  • --leave-running option
  • --page-server option

Let's see what all of this means.

First of all, the pre-dump action always turns on the --track-mem and the --leave-running options even if they are not specified in the command line. Next, the pre-dump action dumps only the memory, while the dump one dumps all the state including open files, sockets and other stuff. Having said that, let's see all the possible combinations and what they result in.

dump
Without any options till will dump everything and kill the dumped tasks.
dump --track-mem
Will dump everything, will turn on memory changes tracking and will kill tasks after this. As you might have noticed this is pretty useless combination of options.
dump --leave-running
Will dump everything and leave the tasks running after dump.
dump --track-mem --leave-running
Same as above, but will turn on memory changes tracking.
dump --track-mem --leave-running --prev-images-dir <path>
Same as above, but during dump will also check whether the page in question is present in parent and would skip dumping it this time.
pre-dump
Will dump only the memory, turn on memory changes tracking and leave tasks running
pre-dump --prev-images-dir <path>
Same as above, but will check for pages present in parent and would skip them.
<pre->dump <options> --page-server
Given to any combination above would make CRIU send the pages to the page server (e.g. for disk-less migration).

Messing with image files

Criu-memory-wflow.png