Difference between revisions of "Checkpoint/Restore"

From CRIU
Jump to navigation Jump to search
(Created page with "== Basic design == === Checkpoint === The checkpoint procedure relies heavily on '''/proc''' file system (it's a general place where crtools takes all the information it needs)...")
 
Line 12: Line 12:
 
The process dumper (lets call it a dumper further) does the following steps during checkpoint stage
 
The process dumper (lets call it a dumper further) does the following steps during checkpoint stage
  
# A '''$pid''' of a process group leader is obtained from the command line.
+
# '''$pid''' of a process group leader is obtained from the command line.
# By using this '''$pid''' the dumper walks though '''/proc/$pid/status''' and gathers children '''$pids''' recursively. At the end we will have a process tree.
+
# By using this '''$pid''' the dumper walks though '''/proc/$pid/task/$tid/children''' and gathers children '''$pids''' recursively. At the end we will have a process tree.
# Then it takes every '''$pid''' from a process tree, sends ''SIGSTOP'' to every process found, and performs the following steps on each '''$pid'''.
+
# Then we take every '''$pid''' from a process tree, seize and them with ptrace ''PTRACE_SEIZE'' call (which put tasks into seized state, where tasks do not know that they are actually stopped and someone does nasty things with them :), and performs the following steps on each '''$pid'''.
#* Collects VMA areas by parsing '''/proc/$pid/maps'''.
+
# Collect VMA areas by parsing '''/proc/$pid/maps'''.
#* Seizes a task via relatively new ptrace interface. Seizing a task means to put it into a special state when the task have no idea if it's being operated by ptrace.
+
# Collect file descriptor numbers the task has via '''/proc/$pid/fd'''.
#* Core parameters of a task (such as registers and friends) are being dumped via ptrace interface and parsing '''/proc/$pid/stat''' entry.
+
# Core parameters of a task (such as registers and friends) are being dumped via ptrace interface and parsing '''/proc/$pid/stat''' entry.
#* The dumper injects a parasite code into a task via ptrace interface. This allows us to dump pages of a task right from within the task's address space.
+
# The dumper injects a parasite code into a task via ptrace interface. This is done in two steps - at first we inject only a few bytes for ''mmap'' syscall at CS:IP the task has at moment of seizing. Then ptrace allow us to run an injected syscall and we allocate enough memory for a parasite code chunk we need for dumping. After that the parasite code is copied into new place inside dumpee address space and CS:IP set respectively to point to our parasite code.
#** An injection procedure is pretty simple - the dumper scans executable VMA areas of a task (which were collected previously) and tests if there a place for <code>syscall</code> call, then (by ptrace as well) it substitutes an original code with <code>syscall</code> instructions and creates a new VMA area inside process address space.
+
# After everything dumped (such as memory pages, which can be written out only from inside dumpee address space) we use ptrace facility again and cure dumpee by dropping out all our parasite code and restoring original code.
#** Finally parasite code get copied into the new VMA and the former code which was modified during parasite bootstrap procedure get restored.
 
#* Then (by using a parasite code) the dumper flushes contents of a task's pages to the file. And pulls out parasite code block completely, since we don't need it anymore.
 
#* Once parasite removed a task get unseized via ptrace call but it remains stopped still.
 
#* The dumper writes out files and pipes parameter and data.
 
 
# The procedure continues for every '''$pid'''.
 
# The procedure continues for every '''$pid'''.
  

Revision as of 15:07, 29 August 2012

Basic design

Checkpoint

The checkpoint procedure relies heavily on /proc file system (it's a general place where crtools takes all the information it needs). Which includes

  • Files descriptors information (via /proc/$pid/fd and /proc/$pid/fdinfo).
  • Pipes parameters.
  • Memory maps (via /proc/$pid/maps).

The process dumper (lets call it a dumper further) does the following steps during checkpoint stage

  1. $pid of a process group leader is obtained from the command line.
  2. By using this $pid the dumper walks though /proc/$pid/task/$tid/children and gathers children $pids recursively. At the end we will have a process tree.
  3. Then we take every $pid from a process tree, seize and them with ptrace PTRACE_SEIZE call (which put tasks into seized state, where tasks do not know that they are actually stopped and someone does nasty things with them :), and performs the following steps on each $pid.
  4. Collect VMA areas by parsing /proc/$pid/maps.
  5. Collect file descriptor numbers the task has via /proc/$pid/fd.
  6. Core parameters of a task (such as registers and friends) are being dumped via ptrace interface and parsing /proc/$pid/stat entry.
  7. The dumper injects a parasite code into a task via ptrace interface. This is done in two steps - at first we inject only a few bytes for mmap syscall at CS:IP the task has at moment of seizing. Then ptrace allow us to run an injected syscall and we allocate enough memory for a parasite code chunk we need for dumping. After that the parasite code is copied into new place inside dumpee address space and CS:IP set respectively to point to our parasite code.
  8. After everything dumped (such as memory pages, which can be written out only from inside dumpee address space) we use ptrace facility again and cure dumpee by dropping out all our parasite code and restoring original code.
  9. The procedure continues for every $pid.

Restore

The restore procedure (aka restorer) proceed in the following steps

  1. A process tree has been read from a file.
  2. Every process started with saved (i.e. original) $pid via clone() call.
  3. Files and pipes are restored (by restored it's meant - they are opened and positioned).
  4. A new memory map is created, filled with data the program had at checkpoint time.
  5. Finally the program is kicked to start with rt_sigreturn system call.