Revision as of 21:18, 21 September 2012 by Kir (talk | contribs) (15 Jul 2011: Pavel sent POC to LKML: link, text)
Jump to: navigation, search

Here we list major project milestones.

20 Sep 2012: crtools 0.2

Checkpoint-restore tool v0.2

23 Jul 2012: crtools 0.1

Checkpoint-restore tool v0.1

14 Feb 2012: Andrew Morton starts to doubt in CRIU

From https://lkml.org/lkml/2012/2/14/384:

Thus far our (my) approach has been to trickle the c/r support code
into mainline as it is developed.  Under the assumption that the end
result will be acceptable and useful kernel code.

I'm afraid that I'm losing confidence in that approach.  We have this
patchset, we have Stanislav's "IPC: checkpoint/restore in userspace
enhancements" (which apparently needs to get more complex to support
LSM context c/r).  I simply *don't know* what additional patchsets are
expected.  And from what you told me it sounds like networking support
is at a very early stage and I fear for what the end result of that
will look like.

So I don't feel that I can continue feeding these things into mainline
until someone can convince me that we won't have a nasty mess (and/or
an unsufficiently useful feature) at the end of the project.

12 Jan 2012: Linus merged a first wave of CRIU patches

From commit 0994695:

    - checkpoint/restart feature work.                                                                                                    
      A note on this: this is a project by various mad Russians to perform                                                                
      c/r mainly from userspace, with various oddball helper code added                                                                   
      into the kernel where the need is demonstrated.                                                                                     
      So rather than some large central lump of code, what we have is                                                                     
      little bits and pieces popping up in various places which either                                                                    
      expose something new or which permit something which is normally                                                                    
      kernel-private to be modified.                                                                                                      
      The overall project is an ongoing thing.  I've judged that the size                                                                 
      and scope of the thing means that we're more likely to be successful                                                                
      with it if we integrate the support into mainline piecemeal rather                                                                  
      than allowing it all to develop out-of-tree.                                                                                        
      However I'm less confident than the developers that it will all                                                                     
      eventually work! So what I'm asking them to do is to wrap each piece                                                                
      of new code inside CONFIG_CHECKPOINT_RESTORE.  So if it all                                                                         
      eventually comes to tears and the project as a whole fails, it should                                                               
      be a simple matter to go through and delete all trace of it.

30 Nov 2011: CRIU name coined, dot org domain registered

Domain Name:CRIU.ORG
Created On:30-Nov-2011 12:49:39 UTC

19 Jul 2011: Jonathan Corbet wrote the article at lwn.net

See "Checkpoint/restart (mostly) in user space"

15 Jul 2011: Pavel sent initial RFC and code

From http://lwn.net/Articles/451916/:

From: Pavel Emelyanov
Subject: [RFC][PATCH 0/7 + tools] Checkpoint/restore mostly in the userspace
Date: Fri, 15 Jul 2011 17:45:10 +0400

Hi guys!

There have already been made many attempts to have the checkpoint/restore functionality
in Linux, but as far as I can see there's still no final solutions that suits most of
the interested people. The main concern about the previous approaches as I see it was
about - all that stuff was supposed to sit in the kernel thus creating various problems.

I'd like to bring this subject back again proposing the way of how to implement c/r
mostly in the userspace with the reasonable help of a kernel.

That said, I propose to start with very basic set of objects to c/r that can work with

* x86_64 tasks (subtree) which includes
   - registers
   - TLS
   - memory of all kinds (file and anon both shared and private)
* open regular files
* pipes (with data in it)

Core idea:

The core idea of the restore process is to implement the binary handler that can execve-ute
image files recreating the register and the memory state of a task. Restoring the process 
tree and opening files is done completely in the user space, i.e. when restoring the subtree
of processes I first fork all the tasks in respective order, then open required files and 
then call execve() to restore registers and memory.

The checkpointing process is quite simple - all we need about processes can be read from /proc
except for several things - registers and private memory. In current implementation to get 
them I introduce the /proc/<pid>/dump file which produces the file that can be executed by the
described above binfmt. Additionally I introduce the /proc/<pid>/mfd/ dir with info about
mappings. It is populated with symbolc links with names equal to vma->vm_start and pointing to
mapped files (including anon shared which are tmpfs ones). Thus we can open some task's
/proc/<pid>/mfd/<address> link and find out the mapped file inode (to check for sharing) and
if required map one and read the contents of anon shared memory.

Other minor stuff is in patches and mostly tools. The set is for linux-2.6.39. The current
implementation is not yet well tested and has many other defects, but demonstrates the idea. 

What do you think? Does the support from kernel of the proposed type suit us?