Changes

2,255 bytes added ,  21:18, 21 September 2012
Line 62: Line 62:  
See [http://lwn.net/Articles/452184/ "Checkpoint/restart (mostly) in user space"]
 
See [http://lwn.net/Articles/452184/ "Checkpoint/restart (mostly) in user space"]
   −
== 15 Jul 2011: Pavel sent POC to LKML ==
+
== 15 Jul 2011: Pavel sent initial RFC and code ==
 +
From http://lwn.net/Articles/451916/:
 +
<pre>
 +
From: Pavel Emelyanov
 +
Subject: [RFC][PATCH 0/7 + tools] Checkpoint/restore mostly in the userspace
 +
Date: Fri, 15 Jul 2011 17:45:10 +0400
   −
From: Pavel Emelyanov
+
Hi guys!
Subject: [RFC][PATCH 0/7 + tools] Checkpoint/restore mostly in the userspace
+
 
Date: Fri, 15 Jul 2011 17:45:10 +0400
+
There have already been made many attempts to have the checkpoint/restore functionality
 +
in Linux, but as far as I can see there's still no final solutions that suits most of
 +
the interested people. The main concern about the previous approaches as I see it was
 +
about - all that stuff was supposed to sit in the kernel thus creating various problems.
 +
 
 +
I'd like to bring this subject back again proposing the way of how to implement c/r
 +
mostly in the userspace with the reasonable help of a kernel.
 +
 
 +
 
 +
That said, I propose to start with very basic set of objects to c/r that can work with
 +
 
 +
* x86_64 tasks (subtree) which includes
 +
  - registers
 +
  - TLS
 +
  - memory of all kinds (file and anon both shared and private)
 +
* open regular files
 +
* pipes (with data in it)
 +
 
 +
Core idea:
 +
 
 +
The core idea of the restore process is to implement the binary handler that can execve-ute
 +
image files recreating the register and the memory state of a task. Restoring the process
 +
tree and opening files is done completely in the user space, i.e. when restoring the subtree
 +
of processes I first fork all the tasks in respective order, then open required files and
 +
then call execve() to restore registers and memory.
 +
 
 +
The checkpointing process is quite simple - all we need about processes can be read from /proc
 +
except for several things - registers and private memory. In current implementation to get
 +
them I introduce the /proc/<pid>/dump file which produces the file that can be executed by the
 +
described above binfmt. Additionally I introduce the /proc/<pid>/mfd/ dir with info about
 +
mappings. It is populated with symbolc links with names equal to vma->vm_start and pointing to
 +
mapped files (including anon shared which are tmpfs ones). Thus we can open some task's
 +
/proc/<pid>/mfd/<address> link and find out the mapped file inode (to check for sharing) and
 +
if required map one and read the contents of anon shared memory.
 +
 
 +
Other minor stuff is in patches and mostly tools. The set is for linux-2.6.39. The current
 +
implementation is not yet well tested and has many other defects, but demonstrates the idea.
 +
 
 +
What do you think? Does the support from kernel of the proposed type suit us?
 +
 
 +
Thanks,
 +
Pavel
 +
</pre>