Difference between revisions of "Comparison to other CR projects"

From CRIU
Jump to navigation Jump to search
(added absent architectures for criu)
 
(21 intermediate revisions by 5 users not shown)
Line 5: Line 5:
 
{{:DMTCP}}
 
{{:DMTCP}}
  
== [http://criu.org CRIU], [http://dmtcp.sourceforge.net DMTCP], [https://ftg.lbl.gov/projects/CheckpointRestart BLCR] ==
+
== BLCR ==
 
“looks\seems like yes/no” - i found only unproved message(s) saying “yes”/“no”
 
  
“not yet” - it is officially planned or i found no reasons, why it can’t be done.
+
Berkeley Lab Checkpoint/Restart (BLCR) is a part of the Scalable Systems Software Suite ,
 +
developed by the Future Technologies Group at Lawrence Berkeley National Lab under SciDAC
 +
funding from the United States Department of Energy. It is an Open Source, system-level
 +
checkpointer designed with High Performance Computing (HPC) applications in mind: in particular
 +
CPU and memory intensive batch-scheduled MPI jobs. BLCR is implemented as a GPL-licensed
 +
loadable kernel module for Linux 2.4.x and 2.6.x kernels on the x86, x86_64, PPC/PPC64, ARM architectures, and a
 +
small LGPL-licensed library.
  
 +
== PinLIT / PinPlay ==
  
{| style="border-spacing:0;"
+
PinLIT (Pin-Long Instruction Trace) is a checkpointing tool built on top of Intel's proprietary [https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool PIN binary instrumentation tool] described on page 48 of [https://cseweb.ucsd.edu/~calder/papers/thesis-cristiano.pdf Cristiano Pereira's PhD thesis]. It records the processor's (big) architectural register state and all pages of memory that contain application and shared library code, optimizing size by only storing memory used during a desired interval.
| style="border:1pt solid #000000;padding:0.176cm;"|
 
| style="border:1pt solid #000000;padding:0.176cm;"| CRIU
 
| style="border:1pt solid #000000;padding:0.176cm;"| DMTCP
 
| style="border:1pt solid #000000;padding:0.176cm;"| BLCR
 
  
|-
+
[https://software.intel.com/en-us/articles/program-recordreplay-toolkit PinPlay] or the Program Record/Replay Toolkit appears to be the successor of or new name for PinLIT.  
| style="background-color:#dc2300;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"|
 
| style="background-color:#dc2300;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"|
 
| style="background-color:#dc2300;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"|
 
| style="background-color:#dc2300;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"|
 
  
|-
+
Both tools appear primarily focused on reducing benchmark runtime on slow computer architecture simulators, leveraging sampling algorithms such as SimPoint.
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| arch
 
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| x86_64, ARM
 
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| x86, x86_64, ARM
 
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| x86,x86_64,PPC/PPC64,ARM
 
  
|-
+
== OpenVZ (in-kernel) ==
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| OS
 
| colspan="3"  style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| <center>Linux</center>
 
  
|-
+
Legacy OpenVZ (RHEL4, RHEL5, RHEL6 based kernels) has in-kernel checkpoint/restore, sources can be found in kernel/cpt/.
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| modified kernel
 
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes, but only for some extra features.
 
  
All unnecessary features are already in new kernel versions
+
== CKPT (in-kernel) ==
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| no
 
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| no, module can be simply modprobed
 
  
 +
(In-kernel) [https://ckpt.wiki.kernel.org/index.php/Main_Page Linux Checkpoint/Restart] was a project from around 2008 to around 2010 to implement checkpoint/restart of Linux processes.
  
problems with installation on new kernels
+
== CRIU, DMTCP, BLCR, OpenVZ comparison table ==
 +
 +
“looks\seems like yes/no” - i found only unproved message(s) saying “yes”/“no”
  
|-
+
“not yet” - it is officially planned or i found no reasons, why it can’t be done.
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| special libs
 
  
  
 
+
{| class="wikitable sortable"
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| no
 
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
 
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
 
 
 
 
|-
 
|-
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| root privileges
+
!
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes, otherwise it would be unsafe,because,for example, of parasite code
+
! CRIU
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| no
+
! DMTCP
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| no
+
! BLCR
 +
! OpenVZ
  
 
|-
 
|-
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| need to modify programs
+
| Arch
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| no
+
| x86_64, ARM, AArch64, PPC64le
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| no
+
| x86, x86_64, ARM
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
+
| x86, x86_64, PPC/PPC64, ARM
 
+
| x86, x86_64
there are some difficulties with statically linked applications, and with LinuxThreads (cuz it does not support them at all)
 
 
 
 
 
 
 
  
 
|-
 
|-
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| need to prepare tasks
+
| OS
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| no
+
| Linux
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
+
| Linux
 
+
| Linux
It preloadsthe DMTCP library. That library runs before the routinemain(). It creates a second thread. Thecheckpoint thread then creates a socket to the DMTCP coordinator andregisters itself. The checkpoint thread also creates a signal handler.
+
| Linux
 
 
 
 
 
 
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
 
 
 
CR shall notify processes when a checkpoint is to occur (before the kernel takes a checkpoint) to
 
 
 
allow the processes to prepare itself accordingly.
 
 
 
 
 
 
 
  
 
|-
 
|-
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| Does it change behavior of the c/r-ed programs?
+
| Uses standard kernel?
 
+
| {{Yes}}, provided it's 3.11 or later
 
+
| {{Yes}}
 
+
| {{Yes}}, just needs to load module
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| no
+
| {{No}}. OpenVZ kernel is required
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
 
 
 
because of wrappers on system calls
 
 
 
 
 
 
 
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
 
 
 
because of wrappers on system calls
 
  
 
|-
 
|-
| style="background-color:#008000;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"|  
+
| Can be used without preloading special libraries before app start?
| style="background-color:#008000;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"|
+
| {{Yes}}
| style="background-color:#008000;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"|
+
| {{No}}
| style="background-color:#008000;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"|
+
| {{No}}
 +
| {{Yes}}
  
 
|-
 
|-
| style="border:1pt solid #000000;padding:0.176cm;"| migration
+
| Can be used as non-root user?
| style="border:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}, but user can only manipulate tasks belonging to him
 
+
| {{Yes}}
even if kernel ,libs, etc are newer
+
| {{Yes}}
 
+
| {{No}}
 
 
Can use Memory Changes Tracking to decrease time for dumping
 
| style="border:1pt solid #000000;padding:0.176cm;"| yes
 
 
 
if both kernels are recent
 
| style="border:1pt solid #000000;padding:0.176cm;"| yes
 
 
 
but if all is the same!
 
 
 
 
 
if even prelinked addresses are different,it will not restore
 
 
 
 
 
But it can save the whole used libs and localization files to restore program on the different machine
 
  
 
|-
 
|-
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| Containers
+
| Can run unmodified programs?
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}
 
+
| {{Yes}}
LXC and OpenVZ containers
+
| {{No}}. Statically linked and/or threaded apps are unsupported.
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| looks like no
+
| {{Yes}}
 
 
It doesn't support namespaces, so it probably can’t dump containers
 
 
 
 
 
 
 
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| looks like no
 
  
 
|-
 
|-
| style="border:1pt solid #000000;padding:0.176cm;"| parallel/distributed computations
+
| Can run unprepared tasks?
| style="border:1pt solid #000000;padding:0.176cm;"| no
+
| {{Yes}}
| style="border:1pt solid #000000;padding:0.176cm;"| yes
+
| {{No}}. It preloads the DMTCP library. That library runs before the routine main(). It creates a second thread. The checkpoint thread then creates a socket to the DMTCP coordinator and registers itself. The checkpoint thread also creates a signal handler.
 
+
| {{No}}. CR shall notify processes when a checkpoint is to occur (before the kernel takes a checkpoint) to allow the processes to prepare itself accordingly.
OpenMPI, MPICH2, OpenMP, Cilk are alredy supported and Infiniband is in progress.
+
| {{Yes}}
| style="border:1pt solid #000000;padding:0.176cm;"| yes
 
 
 
Cray MPI, Intel MPI, LAM/MPI, MPICH-V, MPICH2, MVAPICH, Open MPI, SGI MPT
 
  
 
|-
 
|-
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| c\r gdb with debugging app
+
| Retains behavior of the c/r-ed programs?
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| no, because they are using the same interface
+
| {{Yes}} (but see [[What can change after C/R]])
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
+
| {{No}}, because of wrappers on system calls
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| no
+
| {{No}}, because of wrappers on system calls
 +
| {{Yes}}
  
 
|-
 
|-
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| X-Windows graphics programs (KDE, GNOME, etc)
+
| Live migration
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes, by using vnc
+
| {{Yes}}, even if kernel, libs, etc are newer. Can use [[memory changes tracking]] to decrease freeze time
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes, by using vnc
+
| {{Yes}}, if both kernels are recent
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| seems like no
+
| {{Yes}}, but if all components are the same. Even if prelinked addresses are different, it will not restore, but it can save the whole used libs and localization files to restore program on the different machine
 
+
| {{Yes}}
 
 
 
 
  
 
|-
 
|-
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| Solutions for invocation in the custom software
+
| Containers
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| not yet
+
| {{Yes}}, LXC and OpenVZ containers
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
+
| {{No}}. It doesn't support namespaces, so it probably can’t dump containers
 
+
| {{No|Looks like no}}
Plugins and API
+
| {{Yes}}
 
 
 
 
 
 
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| not yet
 
  
 
|-
 
|-
| colspan="4"  style="background-color:#800080;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"|  
+
| Parallel/distributed computations libraries
 +
| {{No}} (planned)
 +
| {{Yes}}. OpenMPI, MPICH2, OpenMP, Cilk are alredy supported and Infiniband is in progress
 +
| {{Yes}}. Cray MPI, Intel MPI, LAM/MPI, MPICH-V, MPICH2, MVAPICH, Open MPI, SGI MPT
 +
| {{Yes}}
  
 
|-
 
|-
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| unix sockets
+
| Possible to C/R of gdb with debugged app?
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes,all kinds
+
| {{No}}, because they are using the same interface
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| no
+
| {{No}}
 +
| {{Yes}}
  
 
|-
 
|-
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| udp sockets
+
| X Window apps (KDE, GNOME, etc)
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes, both ipv4 and ipv6
+
| {{Yes}}, via VNC
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| not yet
+
| {{Yes}}, via VNC
 +
| {{No|Looks like no}}
 +
| {{Yes}}, via VNC
  
developers of dmtcp had no request for this
 
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| not yet
 
  
 
|-
 
|-
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| tcp sockets
+
| Solutions for invocation in the custom software
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}, [[RPC]] and [[C API]]
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}, plugins and API
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| not yet
+
| {{No|Not yet}}
 +
| {{Yes}}, via ioctl calls
  
 
|-
 
|-
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| remote tcp connection
+
| colspan="4" |
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
 
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| not yet
 
  
but you can write a simple DMTCP plugin that tells DMTCP how you want to reconnect on restart
+
|-
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| no
+
| Unix sockets
 +
| {{Yes}}
 +
| {{Yes}}
 +
| {{No}}
 +
| {{Yes}}
  
 
|-
 
|-
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| Infiniband
+
| UDP sockets
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| no
+
| {{Yes}}, both ipv4 and ipv6
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| not yet
+
| {{No|Not yet}}. Developers of dmtcp had no request for this
 
+
| {{No|Not yet}}
developing is on the half-way
+
| {{Yes}}
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| no
 
  
 
|-
 
|-
| style="background-color:#008080;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"|  
+
| TCP sockets
| style="background-color:#008080;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"|
+
| {{Yes}}
| style="background-color:#008080;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"|  
+
| {{Yes}}
| style="background-color:#008080;border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"|
+
| {{No|Not yet}}
 +
| {{Yes}}
  
 
|-
 
|-
| style="border:1pt solid #000000;padding:0.176cm;"| multithread support
+
| Established TCP connection
| style="border:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}
| style="border:1pt solid #000000;padding:0.176cm;"| yes
+
| {{No}}, but you can write a simple DMTCP plugin that tells DMTCP how you want to reconnect on restart
| style="border:1pt solid #000000;padding:0.176cm;"| yes
+
| {{No}}
 +
| {{Yes}}
  
 
|-
 
|-
| style="border:1pt solid #000000;padding:0.176cm;"| multiprocess
+
| Infiniband
| style="border:1pt solid #000000;padding:0.176cm;"| yes
+
| {{No}}
| style="border:1pt solid #000000;padding:0.176cm;"| yes
+
| {{No|Not yet, developing is on the half-way}}
| style="border:1pt solid #000000;padding:0.176cm;"| yes
+
| {{No}}
 +
| {{No}}
  
 
|-
 
|-
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| process groups
+
| Multithread support
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| not yet
+
| {{Yes}}
 +
| {{Yes}}
  
 
|-
 
|-
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| zombies
+
| Multiprocess
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| no
+
| {{Yes}}
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| no
+
| {{Yes}}
 +
| {{Yes}}
  
 
|-
 
|-
| style="border:1pt solid #000000;padding:0.176cm;"| namespaces
+
| Process groups and sessions
| style="border:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}
| style="border:1pt solid #000000;padding:0.176cm;"| no
+
| {{Yes}}
| style="border:1pt solid #000000;padding:0.176cm;"| no
+
| {{No|Not yet}}
 +
| {{Yes}}
  
 
|-
 
|-
| style="border:1pt solid #000000;padding:0.176cm;"| sessions
+
| Zombies
| style="border:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}
| style="border:1pt solid #000000;padding:0.176cm;"| yes
+
| {{No}}
| style="border:1pt solid #000000;padding:0.176cm;"| not yet
+
| {{No}}
 +
| {{Yes}}
  
 
|-
 
|-
| style="border:1pt solid #000000;padding:0.176cm;"| Ptraced programs
+
| Namespaces
| style="border:1pt solid #000000;padding:0.176cm;"| no
+
| {{Yes}}
| style="border:1pt solid #000000;padding:0.176cm;"| yes
+
| {{No}}
| style="border:1pt solid #000000;padding:0.176cm;"| no
+
| {{No}}
 +
| {{Yes}}
  
 
|-
 
|-
| style="border:1pt solid #000000;padding:0.176cm;"| System V IPC
+
| Ptraced programs
| style="border:1pt solid #000000;padding:0.176cm;"| yes
+
| {{No}}
| style="border:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}
| style="border:1pt solid #000000;padding:0.176cm;"| no
+
| {{No}}
 +
| {{Yes}}
  
 
|-
 
|-
| style="border:1pt solid #000000;padding:0.176cm;"| memory mappings
+
| System V IPC
| style="border:1pt solid #000000;padding:0.176cm;"| yes, all kinds
+
| {{Yes}}
| style="border:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}
| style="border:1pt solid #000000;padding:0.176cm;"| yes, partially
+
| {{No}}
 +
| {{Yes}}
  
 
|-
 
|-
| style="border:1pt solid #000000;padding:0.176cm;"| protected memory
+
| Memory mappings
| style="border:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}, all kinds
| style="border:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}
| style="border:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Partial}}
 +
| {{Yes}}
  
 
|-
 
|-
| style="border:1pt solid #000000;padding:0.176cm;"| pipes
+
| Pipes
| style="border:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}
| style="border:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}
| style="border:1pt solid #000000;padding:0.176cm;"| not yet
+
| {{No|Not yet}}
 +
| {{Yes}}
  
 
|-
 
|-
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| terminals
+
| Terminals
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}, but only Unix98 PTYs
 
+
| {{Yes}}
only Unix98 PTYs
+
| {{Yes}}
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}
 
 
 
 
 
 
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
 
  
 
|-
 
|-
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| non-posix files (inotify, signalfd, eventfd, etc)
+
| Non-POSIX files (inotify, signalfd, eventfd, etc)
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}, inotify, fanotify, epoll, signalfd, eventfd
 
+
| {{Yes}}, epoll, eventfd, signalfd are already supported and inotify will be supported in future
inotify, epoll, etc.
+
| {{No|Looks like no}}
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| Yes
+
| {{Yes}}
 
 
epoll, eventfd, signalfd are already supported and
 
 
 
inotify will be supported in future
 
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| looks like no
 
  
 
|-
 
|-
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| timers
+
| Timers
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| no
+
| {{No}}. Any counter or timer active since the beginning of a process will consider the restarted process to be a new process.
 
+
| {{Yes}}
Any counter or timer active since the beginning of a process will consider the restarted process to be a new process.  
+
| {{Yes}}
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
 
  
 
|-
 
|-
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| Shared resources (files, mm, etc.)
+
| Shared resources (files, mm, etc.)
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}. SysVIPC, files, fd table and memory
 
+
| {{Yes}}. System V shared memory(shmget, etc.), mmap-based shared memory, shared sockets, pipes, file descriptors
files, memory, etc.
+
| {{No}}, but it is planned to support shared mmap regions
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}
 
 
System V shared memory(shmget, etc.), mmap-based shared memory, shared sockets, pipes, file descriptors.
 
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| no
 
 
 
but it is planned to suppord shared mmap regions
 
  
 
|-
 
|-
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| block devices
+
| Block devices
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| looks like yes
+
| {{No}}
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| looks like yes
+
| {{Yes|Looks like yes}}
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| no
+
| {{No}}
 
+
| {{No}}
 
 
  
  
 
|-
 
|-
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| character devices
+
| Character devices
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| mostly no
+
| {{Yes}}, only /dev/null, /dev/zero, etc. are supported
 
+
| {{Yes}}, looks like null and zero are supported
but /dev/null, /dev/zero, etc. are supported
+
| {{Yes}}, /dev/null and /dev/zero
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| mostly no
+
| {{Yes}}
 
 
looks like null and zero are supported
 
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| mostly no
 
 
 
but /dev/null and
 
 
 
/dev/zero are supported
 
  
 
|-
 
|-
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| capture the contents of all open files
+
| Capture the contents of open files
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| yes
+
| {{Yes}}, if file is unlinked
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| looks like no
+
| {{No|Looks like no}}
| style="border-top:none;border-bottom:1pt solid #000000;border-left:1pt solid #000000;border-right:1pt solid #000000;padding:0.176cm;"| not yet
+
| {{No|Not yet}}
 +
| {{Yes}}
  
 
|}
 
|}
  
 +
== Sources ==
 +
DMTCP:
 +
*http://dmtcp.sourceforge.net/
 +
*http://dmtcp.sourceforge.net/papers/dmtcp.pdf
 +
*http://www.ccs.neu.edu/home/gene/papers/ccgrid06.pdf
 +
*http://research.cs.wisc.edu/htcondor/CondorWeek2010/condor-presentations/cooperman-dmtcp.pdf
 +
*http://dmtcp.sourceforge.net/papers/mtcp.pdf
  
 +
BLCR:
 +
*https://upc-bugs.lbl.gov/blcr/doc/html/
 +
*https://ftg.lbl.gov/assets/projects/CheckpointRestart/Pubs/LBNL-49659.pdf
 +
*https://ftg.lbl.gov/assets/projects/CheckpointRestart/Pubs/blcr.pdf
 +
*https://ftg.lbl.gov/assets/projects/CheckpointRestart/Pubs/checkpointSurvey-020724b.pdf
 +
*https://ftg.lbl.gov/assets/projects/CheckpointRestart/Pubs/lacsi-2003.pdf
 +
*https://ftg.lbl.gov/assets/projects/CheckpointRestart/Pubs/LBNL-60520.pdf
  
 
== External links ==
 
== External links ==

Latest revision as of 19:20, 9 December 2015

This pages tries to explain differences between CRIU and other C/R solutions.

DMTCP[edit]

DMTCP implements checkpoint/restore of a process on a library level. This means, that if you want to C/R some application you should launch one with DMTCP library (dynamically) linked from the very beginning. When launched like this, the DMTCP library intercepts a certain amount of library calls from the application, builds a shadow data-base of information about process' internals and then forwards the request down to glibc/kernel. The information gathered is to be used to create an image of the application. With this approach, one can only dump applications known to run successfully with the DMTCP libraries, but the latter doesn't provide proxies for all kernel APIs (for example, inotify() is known to be unsupported). Another implication of this approach is potential performance issues that arise due to proxying of requests.

Restoration of process set is also tricky, as it frequently requires restoring an object with the predefined ID and kernel is known to provide no APIs for several of them. For example, kernel cannot fork a process with the desired PID. To address that, DMTCP fools a process by intercepting the getpid() library call and providing fake PID value to the application. Such behavior is very dangerous, as application might see wrong files in the /proc filesystem if it will try to access one via its PID.

CRIU, on the other hand, doesn't require any libraries to be pre-loaded. It will checkpoint and restore any arbitrary application, as long as kernel provides all needed facilities. Kernel support for some of CRIU features were added recently, essentially meaning that a recent kernel version might be required.

BLCR[edit]

Berkeley Lab Checkpoint/Restart (BLCR) is a part of the Scalable Systems Software Suite , developed by the Future Technologies Group at Lawrence Berkeley National Lab under SciDAC funding from the United States Department of Energy. It is an Open Source, system-level checkpointer designed with High Performance Computing (HPC) applications in mind: in particular CPU and memory intensive batch-scheduled MPI jobs. BLCR is implemented as a GPL-licensed loadable kernel module for Linux 2.4.x and 2.6.x kernels on the x86, x86_64, PPC/PPC64, ARM architectures, and a small LGPL-licensed library.

PinLIT / PinPlay[edit]

PinLIT (Pin-Long Instruction Trace) is a checkpointing tool built on top of Intel's proprietary PIN binary instrumentation tool described on page 48 of Cristiano Pereira's PhD thesis. It records the processor's (big) architectural register state and all pages of memory that contain application and shared library code, optimizing size by only storing memory used during a desired interval.

PinPlay or the Program Record/Replay Toolkit appears to be the successor of or new name for PinLIT.

Both tools appear primarily focused on reducing benchmark runtime on slow computer architecture simulators, leveraging sampling algorithms such as SimPoint.

OpenVZ (in-kernel)[edit]

Legacy OpenVZ (RHEL4, RHEL5, RHEL6 based kernels) has in-kernel checkpoint/restore, sources can be found in kernel/cpt/.

CKPT (in-kernel)[edit]

(In-kernel) Linux Checkpoint/Restart was a project from around 2008 to around 2010 to implement checkpoint/restart of Linux processes.

CRIU, DMTCP, BLCR, OpenVZ comparison table[edit]

“looks\seems like yes/no” - i found only unproved message(s) saying “yes”/“no”

“not yet” - it is officially planned or i found no reasons, why it can’t be done.


CRIU DMTCP BLCR OpenVZ
Arch x86_64, ARM, AArch64, PPC64le x86, x86_64, ARM x86, x86_64, PPC/PPC64, ARM x86, x86_64
OS Linux Linux Linux Linux
Uses standard kernel? Yes, provided it's 3.11 or later Yes Yes, just needs to load module No. OpenVZ kernel is required
Can be used without preloading special libraries before app start? Yes No No Yes
Can be used as non-root user? Yes, but user can only manipulate tasks belonging to him Yes Yes No
Can run unmodified programs? Yes Yes No. Statically linked and/or threaded apps are unsupported. Yes
Can run unprepared tasks? Yes No. It preloads the DMTCP library. That library runs before the routine main(). It creates a second thread. The checkpoint thread then creates a socket to the DMTCP coordinator and registers itself. The checkpoint thread also creates a signal handler. No. CR shall notify processes when a checkpoint is to occur (before the kernel takes a checkpoint) to allow the processes to prepare itself accordingly. Yes
Retains behavior of the c/r-ed programs? Yes (but see What can change after C/R) No, because of wrappers on system calls No, because of wrappers on system calls Yes
Live migration Yes, even if kernel, libs, etc are newer. Can use memory changes tracking to decrease freeze time Yes, if both kernels are recent Yes, but if all components are the same. Even if prelinked addresses are different, it will not restore, but it can save the whole used libs and localization files to restore program on the different machine Yes
Containers Yes, LXC and OpenVZ containers No. It doesn't support namespaces, so it probably can’t dump containers Looks like no Yes
Parallel/distributed computations libraries No (planned) Yes. OpenMPI, MPICH2, OpenMP, Cilk are alredy supported and Infiniband is in progress Yes. Cray MPI, Intel MPI, LAM/MPI, MPICH-V, MPICH2, MVAPICH, Open MPI, SGI MPT Yes
Possible to C/R of gdb with debugged app? No, because they are using the same interface Yes No Yes
X Window apps (KDE, GNOME, etc) Yes, via VNC Yes, via VNC Looks like no Yes, via VNC


Solutions for invocation in the custom software Yes, RPC and C API Yes, plugins and API Not yet Yes, via ioctl calls
Unix sockets Yes Yes No Yes
UDP sockets Yes, both ipv4 and ipv6 Not yet. Developers of dmtcp had no request for this Not yet Yes
TCP sockets Yes Yes Not yet Yes
Established TCP connection Yes No, but you can write a simple DMTCP plugin that tells DMTCP how you want to reconnect on restart No Yes
Infiniband No Not yet, developing is on the half-way No No
Multithread support Yes Yes Yes Yes
Multiprocess Yes Yes Yes Yes
Process groups and sessions Yes Yes Not yet Yes
Zombies Yes No No Yes
Namespaces Yes No No Yes
Ptraced programs No Yes No Yes
System V IPC Yes Yes No Yes
Memory mappings Yes, all kinds Yes Partial Yes
Pipes Yes Yes Not yet Yes
Terminals Yes, but only Unix98 PTYs Yes Yes Yes
Non-POSIX files (inotify, signalfd, eventfd, etc) Yes, inotify, fanotify, epoll, signalfd, eventfd Yes, epoll, eventfd, signalfd are already supported and inotify will be supported in future Looks like no Yes
Timers Yes No. Any counter or timer active since the beginning of a process will consider the restarted process to be a new process. Yes Yes
Shared resources (files, mm, etc.) Yes. SysVIPC, files, fd table and memory Yes. System V shared memory(shmget, etc.), mmap-based shared memory, shared sockets, pipes, file descriptors No, but it is planned to support shared mmap regions Yes
Block devices No Looks like yes No No


Character devices Yes, only /dev/null, /dev/zero, etc. are supported Yes, looks like null and zero are supported Yes, /dev/null and /dev/zero Yes
Capture the contents of open files Yes, if file is unlinked Looks like no Not yet Yes

Sources[edit]

DMTCP:

BLCR:

External links[edit]