Line 62: |
Line 62: |
| See [http://lwn.net/Articles/452184/ "Checkpoint/restart (mostly) in user space"] | | See [http://lwn.net/Articles/452184/ "Checkpoint/restart (mostly) in user space"] |
| | | |
− | == 15 Jul 2011: Pavel sent POC to LKML == | + | == 15 Jul 2011: Pavel sent initial RFC and code == |
| + | From http://lwn.net/Articles/451916/: |
| + | <pre> |
| + | From: Pavel Emelyanov |
| + | Subject: [RFC][PATCH 0/7 + tools] Checkpoint/restore mostly in the userspace |
| + | Date: Fri, 15 Jul 2011 17:45:10 +0400 |
| | | |
− | From: Pavel Emelyanov
| + | Hi guys! |
− | Subject: [RFC][PATCH 0/7 + tools] Checkpoint/restore mostly in the userspace
| + | |
− | Date: Fri, 15 Jul 2011 17:45:10 +0400
| + | There have already been made many attempts to have the checkpoint/restore functionality |
| + | in Linux, but as far as I can see there's still no final solutions that suits most of |
| + | the interested people. The main concern about the previous approaches as I see it was |
| + | about - all that stuff was supposed to sit in the kernel thus creating various problems. |
| + | |
| + | I'd like to bring this subject back again proposing the way of how to implement c/r |
| + | mostly in the userspace with the reasonable help of a kernel. |
| + | |
| + | |
| + | That said, I propose to start with very basic set of objects to c/r that can work with |
| + | |
| + | * x86_64 tasks (subtree) which includes |
| + | - registers |
| + | - TLS |
| + | - memory of all kinds (file and anon both shared and private) |
| + | * open regular files |
| + | * pipes (with data in it) |
| + | |
| + | Core idea: |
| + | |
| + | The core idea of the restore process is to implement the binary handler that can execve-ute |
| + | image files recreating the register and the memory state of a task. Restoring the process |
| + | tree and opening files is done completely in the user space, i.e. when restoring the subtree |
| + | of processes I first fork all the tasks in respective order, then open required files and |
| + | then call execve() to restore registers and memory. |
| + | |
| + | The checkpointing process is quite simple - all we need about processes can be read from /proc |
| + | except for several things - registers and private memory. In current implementation to get |
| + | them I introduce the /proc/<pid>/dump file which produces the file that can be executed by the |
| + | described above binfmt. Additionally I introduce the /proc/<pid>/mfd/ dir with info about |
| + | mappings. It is populated with symbolc links with names equal to vma->vm_start and pointing to |
| + | mapped files (including anon shared which are tmpfs ones). Thus we can open some task's |
| + | /proc/<pid>/mfd/<address> link and find out the mapped file inode (to check for sharing) and |
| + | if required map one and read the contents of anon shared memory. |
| + | |
| + | Other minor stuff is in patches and mostly tools. The set is for linux-2.6.39. The current |
| + | implementation is not yet well tested and has many other defects, but demonstrates the idea. |
| + | |
| + | What do you think? Does the support from kernel of the proposed type suit us? |
| + | |
| + | Thanks, |
| + | Pavel |
| + | </pre> |