| Line 62: |
Line 62: |
| | See [http://lwn.net/Articles/452184/ "Checkpoint/restart (mostly) in user space"] | | See [http://lwn.net/Articles/452184/ "Checkpoint/restart (mostly) in user space"] |
| | | | |
| − | == 15 Jul 2011: Pavel sent POC to LKML == | + | == 15 Jul 2011: Pavel sent initial RFC and code == |
| | + | From http://lwn.net/Articles/451916/: |
| | + | <pre> |
| | + | From: Pavel Emelyanov |
| | + | Subject: [RFC][PATCH 0/7 + tools] Checkpoint/restore mostly in the userspace |
| | + | Date: Fri, 15 Jul 2011 17:45:10 +0400 |
| | | | |
| − | From: Pavel Emelyanov
| + | Hi guys! |
| − | Subject: [RFC][PATCH 0/7 + tools] Checkpoint/restore mostly in the userspace
| + | |
| − | Date: Fri, 15 Jul 2011 17:45:10 +0400
| + | There have already been made many attempts to have the checkpoint/restore functionality |
| | + | in Linux, but as far as I can see there's still no final solutions that suits most of |
| | + | the interested people. The main concern about the previous approaches as I see it was |
| | + | about - all that stuff was supposed to sit in the kernel thus creating various problems. |
| | + | |
| | + | I'd like to bring this subject back again proposing the way of how to implement c/r |
| | + | mostly in the userspace with the reasonable help of a kernel. |
| | + | |
| | + | |
| | + | That said, I propose to start with very basic set of objects to c/r that can work with |
| | + | |
| | + | * x86_64 tasks (subtree) which includes |
| | + | - registers |
| | + | - TLS |
| | + | - memory of all kinds (file and anon both shared and private) |
| | + | * open regular files |
| | + | * pipes (with data in it) |
| | + | |
| | + | Core idea: |
| | + | |
| | + | The core idea of the restore process is to implement the binary handler that can execve-ute |
| | + | image files recreating the register and the memory state of a task. Restoring the process |
| | + | tree and opening files is done completely in the user space, i.e. when restoring the subtree |
| | + | of processes I first fork all the tasks in respective order, then open required files and |
| | + | then call execve() to restore registers and memory. |
| | + | |
| | + | The checkpointing process is quite simple - all we need about processes can be read from /proc |
| | + | except for several things - registers and private memory. In current implementation to get |
| | + | them I introduce the /proc/<pid>/dump file which produces the file that can be executed by the |
| | + | described above binfmt. Additionally I introduce the /proc/<pid>/mfd/ dir with info about |
| | + | mappings. It is populated with symbolc links with names equal to vma->vm_start and pointing to |
| | + | mapped files (including anon shared which are tmpfs ones). Thus we can open some task's |
| | + | /proc/<pid>/mfd/<address> link and find out the mapped file inode (to check for sharing) and |
| | + | if required map one and read the contents of anon shared memory. |
| | + | |
| | + | Other minor stuff is in patches and mostly tools. The set is for linux-2.6.39. The current |
| | + | implementation is not yet well tested and has many other defects, but demonstrates the idea. |
| | + | |
| | + | What do you think? Does the support from kernel of the proposed type suit us? |
| | + | |
| | + | Thanks, |
| | + | Pavel |
| | + | </pre> |