Line 10: |
Line 10: |
| | | |
| == Basic design == | | == Basic design == |
| + | |
| + | === Checkpoint === |
| + | |
| + | The checkpoint procedure relies heavily on '''/proc''' file system (it's a general place where crtools takes all the information it needs). |
| + | Which includes |
| + | |
| + | * Files descriptors information (via '''/proc/$pid/fd''' and '''/proc/$pid/fdinfo'''). |
| + | * Pipes parameters. |
| + | * Memory maps (via '''/proc/$pid/maps'''). |
| + | |
| + | The process dumper (lets call it simply the dumper further) does the following steps during checkpoint stage |
| + | |
| + | # A '''$pid''' of a process group leader is obtained from the command line. |
| + | # By using this '''$pid''' the dumper walks though '''/proc/$pid/status''' and gathers children '''$pids''' recursively. At the end we will have a process tree. |
| + | # Then it takes every '''$pid''' from a process tree, sends ''SIGSTOP'' to every process found, and performs the following steps on each '''$pid'''. |
| + | #* Collects VMA areas by parsing '''/proc/$pid/maps'''. |
| + | #* Seizes a task via relatively new ptrace interface. Seizing a task means to put it into a special state when the task have no idea if it's being operated by ptrace. |
| + | #* Core parameters of a task (such as registers and friends) are being dumped via ptrace interface and parsing '''/proc/$pid/stat''' entry. |
| + | #* The dumper injects a parasite code into a task via ptrace interface. This allows us to dump pages of a task right from within the task's address space. |
| + | #** An injection procedure is pretty simple - the dumper scans executable VMA areas of a task (which were collected previously) and tests if there a place for <code>syscall</code> call, then (by ptrace as well) it substitutes an original code with <code>syscall</code> instructions and creates a new VMA area inside process address space. |
| + | #** Finally parasite code get copied into the new VMA and the former code which was modified during parasite bootstrap procedure get restored. |
| + | #* Then (by using a parasite code) the dumper flushes contents of a task's pages to the file. And pulls out parasite code block completely, since we don't need it anymore. |
| + | #* Once parasite removed a task get unseized via ptrace call but it remains stopped still. |
| + | #* The dumper writes out files and pipes parameter and data. |
| + | # The procedure continues for every '''$pid'''. |
| + | |
| + | === Restore === |
| + | |
| + | The restore procedure (aka restorer) proceed in the following steps |
| + | |
| + | # A process tree has been read from a file. |
| + | # Every process started with saved (i.e. original) '''$pid''' via <code>clone()</code> call with new <code>CLONE_CHILD_USEPID</code> flag. |
| + | # Files and pipes are restored (by restored it's meant - they are opened and positioned). |
| + | # A new file generated. The file has an Elf format but with modified executable and program header types (telling the kernel that this particular file is not a regular Elf'oid but rather the kernel is to handle it in a slightly different way). |
| + | # Finally execve with new Elf file as an argument is executed, which initiate the kernel's stage of restore procedure. |
| + | |
| + | === Kernel requirements === |
| + | |
| + | Since checkpoint and restore processes require some help from the Linux kernel, the following kernel patches are needed |
| + | |
| + | * procfs-report-eisdir-when-reading-sysctl-dirs-in-proc.patch |
| + | * proc-fix-races-against-execve-of-proc-pid-fd.patch |
| + | * proc-fix-races-against-execve-of-proc-pid-fd-fix.patch |
| + | * proc-force-dcache-drop-on-unauthorized-access.patch |
| + | * cr-statfs-callback-for-pipefs |
| + | |
| + | These patches are already in -mm tree and rather a preparation patches for the next series. |
| + | |
| + | * fs-proc-switch-to-dentry |
| + | * cr-proc-map-files-21 |
| + | |
| + | These patches introduce '''/proc/$pid/msp_files'''. |
| + | |
| + | * cr-clone-with-pid-support |
| + | |
| + | This one introduce ability to clone process with specified pid. |
| + | |
| + | * cr-proc-add-children |
| + | |
| + | This one introduce "Children" line to '''/proc/$pid/status'''. |
| + | |
| + | * fs-add-do-close |
| + | * fs-proc-add-tls |
| + | * fs-proc-add-mm-task-stat |
| + | |
| + | These ones provides missing pieces of process' information which is needed for checkpoint/restore. |
| + | |
| + | * binfmt-elf-for-cr-5 |
| + | |
| + | This one provides new Elf file format. |
| + | |
| + | === Where to get '''crtools''' itself === |
| + | |
| + | The '''crtools''' utility itself is hosted at [https://github.com/cyrillos/crtools github]. Clone this repo to test new functionality out. Note the kernel patches are placed at kernel/ directory inside source code tree and includes [http://savannah.nongnu.org/projects/quilt quilt] series file. |
| + | |
| + | '''crtools''' has been tested on Linux 3.1-rc3. |