Difference between revisions of "Todo"

From CRIU
Jump to navigation Jump to search
m (remove @efiop's asignees)
 
(170 intermediate revisions by 12 users not shown)
Line 1: Line 1:
 +
{{note|This list is being transformed into the [https://github.com/checkpoint-restore/criu/issues github issues]}}
 +
 
{| class="wikitable sortable"
 
{| class="wikitable sortable"
 
|-
 
|-
 
! component
 
! component
 
! task
 
! task
! assignee
+
! complexity
 +
! potential/willing assignee
 
! comments
 
! comments
 
|-
 
|-
| crtools/kernel || memory snapshot || xemul@ || need to take mem snapshot in kernel for iterative migration, (probably) rebootless upgrade and HA
+
| tests || automate process of measurement code coverage || easy || - || It is required to automate process of getting code coverage. We have code coverage results [http://criu.org/cov/ measured in 2012]. Would be nice to get up to date results on periodic basis and without manual actions.
 
|-
 
|-
| crtools || TCP socket migration with changed IP || xemul@ || it might make sense to migrate a tcp connection on a box with changed IP address _if_ both boxes are NAT-ed to the destination. We will then have to go to NAT box and fix the conntracks, but this might make sense.
+
| crtools || Inherit resources, not restore || medium || - || Sigactions are restored for every task before it fork()-s. Then children check for the sa_action from their image matches to one it got from parent. Need to do the same for rlimits, maybe other resources too.
 
|-
 
|-
| crtools || Apply-images mode || xemul@ || Think about ability to take images and apply them to a living task(s). E.g. -- repopulate fdtable according to data from image. Another use-case -- when doing partial migration we'll need to modify one part to switch from pipes to sockets
+
| crtools || Implement [[restorer v2]] || hard (v2) || - ||
 
|-
 
|-
| crtools || Modify restored resources run-time || xemul@ || Need (probably) some way to alter what is being restored. Usage example -- change the IP address of sockets from task above.
+
| crtools || New images format || medium (v2) || - || See [[what's bad with V1 images]]
 
|-
 
|-
| crtools/kernel || madvise bits || gorcunov@ || currently we just drop these bits, since we have no API to find them out on dump
+
| kernel/crtools || Tune the start-time of tasks || medium || - || When we restore tasks their start-time goes forward (since we create the new task effectively). Need to address this somehow, most likely with the [[time namespace]].
 
|-
 
|-
| crtools || partial migration || - || migrate some tasks while proxying IPC to existing others (pipes->sockets, etc.)
+
| crtools || Support chroot-ed mount namespace || medium || - || If the root task lives in another mount namespace ''and'' has its root moved (with chroot()) CRIU dump fails with errors about inability to resolve files' paths. This is because CRIU treats the mount namespace's root as the init task's root which should be "/".
 
|-
 
|-
| crtools || Shared objects (mm/fs/fdtable) support || avagin@ || Now we have the kcmp syscall and can do it. The first candidate is mm sharing, as we do know, that MySQL does so sometimes.
+
| crtools || Non-stop memory (first?) pre-dump || medium || - || When reading only the memory we can avoid freezing tasks and draining memory with parasite. There's a system call named "read_process_vm" which can help us accessing the other task's memory. The disadvantage of this approach is the need for additional memory. We may control this behaviour by reading memory in chunks and not allocating to much of additional buffers.
 
|-
 
|-
| kernel || Provide own defconfig || gorcunov@ || suggested by avagin@
+
| kernel/crtools || Speed up fetching info about tasks || medium || Andrey Vagin || Using proc to get info about tasks is nice but too slow. We have measured that having socket-based engine that would fetch info about tasks from the kernel speeds things up significantly. So Andrey is working on the [[Task-diag]] patchset that would implement that.
 
|-
 
|-
| crtools || Paranoid dumping and restoring || - || Make paranoid checks for what we dump. E.g. pgid being valid (withing session) and fds drained from parasite are valid
+
| kernel || Make pipes swappable || hard || - || When [[Memory dumping and restoring|pre-dumping]] memory we pull all the task's memory into pipe with vmsplice and then send it via network splicing the pages into socket. During this period all the memory is effectively pinned as pages in pipe are not swappable.
 
|-
 
|-
| kernel || Proc fdinfo extension || gorcunov@ || Need to rework and resubmit the patches
+
| kernel/crtools || Adjust per-task/-container timers offsets || medium || - || Absolute timers differ on different nodes. When live migrating a task/container this difference may (and will) screw the timers up.
 
|-
 
|-
| kernel/crtools || posix timers || skinsbursky@ || Need new kernel API for a) listing existing timers and b) fetching timer notify configuration.
+
| crtools || [[time namespace|Shift timers' timeouts]] according to the actual C-to-R delay || medium || - || If we pause tasks between C and R we, probably, need to adjust timers respectively. "Medium" complexity is because it's unclear ''what'' to do, not ''how''.
 
|-
 
|-
| crtools || Smart paths resolution || - || Need a way to resolve paths to overmounted files. There are two ways: 1. Move mounts, that overlap the desired path temprarily. 2. When creating a new mount pre-open an fd keeping the mountpoint. Later, do accurate path resolve and call openat() on proper mountpoint fd
+
| kernel/crtools || Put call to mmap into VDSO || easy || Cyrill || To put the [[parasite code]] into target process we modify its code to call the <code>mmap()</code> system call (and the unmodify it back) and put the parasite into new area. Oleg Nesterov suggests not to patch victim, but to always have one on VDSO.
 
|-
 
|-
| kernel/crtools || TCP connection fixup || - || When we turn repair off a window probe is sent by kernel. It can be lost and we leave with stuck connection. Plus, the keepalive timer isn't rearmed on repaired socket connect, probably this is the way to solve this.
+
| crtools || [[Integration]] with other projects || hard || - || CRIU is not working great by itself. There's alway some specific about what user wants to dump. Integrating CRIU with other projects will make CRIU work at its best.
 
|-
 
|-
| crtools || Iptables || - || This is easy. Need to run ipdables-save and iptables-restore
+
| crtools || Restore tasks into fresh new pid namespace || easy || - || When we dumped processes, it can be hard to restore it back, if they didn't live in a pid namespace, due to PIDs conflict. It would be nice to have the ability to ask CRIU to create the pid namespace for those guys and restore them there. A thing to worry about is this new namespace's init task.
 
|-
 
|-
| kernel/crtools || Auto namespaces detection || - || Now we "detect" them by looking at cmdline options
+
| crtools || Rollback tree state || medium || - || When we checkpointed process tree with -R option (let them run after checkpoint) we might want to return the tasks into checkpointed state on the same machine. Currently this can only be done by killing the processes and restoring them from scratch. If we could ask CRIU to restore the images ''into'' the ready processes that could speed things up, especially if carefully caring about [[memory changes tracking]].
 
|-
 
|-
| crtools || Migration w/o intermediate disk || Adrian Reber || -
+
| crtools || Restore arbitrary mountpoints tree || hard || - || Linux kernel can construct tricky knows with [[mount points]]. We don't support arbitrary configuration of such things, only those that are in active use by software. Need to fix them up.
 
|-
 
|-
| kernel/crtools || TUN/TAP || - || -
+
| crtools || Lazy restore using [[userfaultfd]] || medium || xemul || It might make sense to restore tasks w/o putting all the memory into respective places. Instead, the VMAs in question can be marked as "lazy" and pages will get filled into them in the background and, upon demand, in the out-of-order manner. The functionality is related to lazy migration and seamless kernel update tasks.
 
|-
 
|-
| kernel/crtools || tcpdump || xemul@ || The diag modules extensions are on their way to net-next. Next is -- socket filters
+
| crtools || [[Lazy migration]] using [[userfaultfd]] || medium || xemul || Lazy migration is when we move all the tasks on another node, but leave theirs memory on the source one. Not to allow tasks read garbage from empty address space we protect all of it as inaccessible. When tasks start reading/writing the mem they got page-fault-ed. With the userfaultfd technology it can be possible to intercept the #PF, pull the page from source node and map it into expected address.
 
|-
 
|-
| vzkernel/crtools || OpenVZ kernel support || - || Within 3.5 and RHEL7 port
+
| crtools || Speed up [[logging]] || medium || Cyrill || Synchronous formatting and writes into log files slow things down. On the other hand turning logs off make it impossible to troubleshoot.
 
|-
 
|-
| crtools || More sockoptions || xemul@ || SOL_ are mostly done
+
| crtools || Sanitize [[logging]] messages || hard || - || Currently log messages are printed w/o any logic, it's hard to analize what has happened when CRIU fails. Need to improve that by, e.g. categorizing images and [[When C/R fails|explaining them]] in more details.
 
|-
 
|-
| crtools || Bridges in container || - ||
+
| crtools || Page transfer filters || medium || - || The page-xfer engine just splices the pages from stealing pipes into socket. Packing or encrypting the data would be nice. Maybe it's purely for [[P.Haul]]?
 
|-
 
|-
| crtools || cgroups in container || - ||
+
| crtools || TCP socket migration with changed IP || medium || - || It might make sense to migrate a tcp connection on a box with changed IP address _if_ both boxes are NAT-ed to the destination. We will then have to go to NAT box and fix the conntracks in that case and use CRIT images modifucation facilities.
 
|-
 
|-
| crtools/kernel || NFS || - ||
+
| crtools || [[Applying images]] || hard (v2) || xemul@ w/ students || Think about ability to take images and apply them to a living task(s). Like it was described in the "rollback" feature above. Another exampl -- repopulate fdtable according to data from image. Yet another use-case -- when doing partial migration (see below) we'll need to modify one part to switch from pipes to sockets. What else? With constant replication of tree state we can do incremental dumps on source node and apply those increments on pre-created replicas on the destination node.
 
|-
 
|-
| crtools/kernel || VDSO || - || issues: VDSO may change between kernels; vsyscall may change between kernels; VDSO mapping should be VDSO mapping, not regular one
+
| crtools || Partial migration || hard || - || If tasks subtree has connections to the rest of the tree (e.g. with pipes of unix sockets) we try to detect this and refuse the dump. It should be possible to take part of the tree, migrating it somewhere and recreating the mentioned links with some other appropriate IPC channel. E.g. pipes with sockets, shared memory with distributed shared memory and so on.
 
|-
 
|-
| crtools || Unmap a restorer VMA || - || [https://docs.google.com/spreadsheet/ccc?key=0Au56mM6UWU8mdGhmYURYZGZhbmxBLWVmVnhBT3VXVHc#gid=0 Lots] of ideas were generated so far.
+
| crtools || Shared objects (mm/fs) support || medium || - || Things created with CLONE_FOO flags are not supported now (exception -- full threads). Now we have the kcmp syscall and can do it. The shared fdtable (CLONE_FILES) is supported, the next candidate is mm sharing, as we do know, that MySQL does so sometimes.
 
|-
 
|-
| crtools || file locks || - || It's hard to do it carefully. We need to make sure that all lock users are taken into dump. Only support it inside container?
+
| crtools || Smart paths resolution || hard || - || Files can be overmounted. In this case CRIU will refuse the dump saying that file is [[invisible files|not alive]] but inaccessible by its name. Need a way to resolve paths to such. There are two ways: 1. Move mounts, that overlap the desired path temporarily, then open the file, then move the mountpoint back. 2. When creating a new mount pre-open an fd keeping the mountpoint. Later, do accurate path resolve and call openat() on proper mountpoint fd.
 
|-
 
|-
| kernel || unhashed sockets || xemul@ || When we create a tcp socket it doesn't get to eny hashes and is not reported by diag infra. However, there exists stuff that we can configure on such a socket and there's no existing APIs for getting this info (e.g. -- bind to device). What to do? Report unhashed sockets with diag or extend API for this crap?
+
| kernel/crtools || [[TCP repair TODO|TCP repair fixes]] || hard || - || We can dump and restore live [[TCP connection]]. There are some issues with it, that should be fixed.
 +
|-
 +
| kernel || [[Seamless kernel upgrade]] || hard || xemul || Briefly — dump tasks (into memory), change the kernel w/ kexec, then restore tasks back. From the tasks and remote client perspective tasks has just stopped and then resumed on the newer kernel. Can be a good complement to the classic live-patching technology.
 +
|-
 +
| crtools || Restore arbitrary process tree || hard ||  - || Need to restore any process tree, which could be created with help PR_SET_CHILD_SUBREAPER and CLONE_PARENT. Processes can share other resources [http://man7.org/linux/man-pages/man2/clone2.2.html clone(2)]. Look at [https://github.com/checkpoint-restore/criu/blob/master/test/zdtm/static/session02.c session02]. The task of resolving the given images into operations we might need to perform seem to be NP (not proven though).
 +
|-
 +
| crtools || C/R [[X applications]] || hard || - || Dump/restore of graphical applications (see about [[integration]]). In case of X app part of its state is stored into the X-server. Need the way to fetch this state during dump and put this state back into the server on restore. Requires fixing the X-server software too.
 +
|-
 +
| crtools || More detailed RPC fail codes || easy || - || Currently only 3 typical errors are reported(see [https://github.com/checkpoint-restore/criu/blob/master/criu/include/cr-errno.h#L8 include/cr-errno.h]). Need to extend this set as currently it's hard to understand what has happened w/o analysing CRIU log files.
 +
|-
 +
| crtools || Make CRIU work on AArch32 with CONFIG_KUSER_HELPERS=n || medium || cov || CRIU currently fails on AArch32 kernels built with CONFIG_KUSER_HELPERS=n.
 +
|-
 +
| tests || Run many/all tests in "container" || medium || - || Currently we run zdtm tests one-by-one. It would be nice to run the all in one pseudo-container and C/R them as one big subtree.
 +
|-
 +
| tests || [[Fuzz testing|Trinity-like (fuzz) testing]] || hard || - || The existing suite is 99% functionality testing. Need more sophisticated testing -- take a process that has done a random set of actions, C/R one, check that all is OK. The latter is the most complicated thing.
 +
|-
 +
| crtools || Set checkpoint tokens without recompiling || medium || - || Sometimes you need to call checkpoint in some particular point of the code. The way to do it now is to recompile app with criu_dump() call where needed. But it is quite a bummer to recompile, repackage and redistribute an app you want to c/r. It would be great if one could set a token in app source and then let criu find that point in running task and take a snapshot. The best way to do it might be in libcriu.
 +
|-
 +
| crtools || Large ghost files support || medium || - || If we have a large ghost (opened unlinked) file, it's inefficient to copy it to another node via CRIU dump. Need to migrate them independently, iteratively, using memory tracking.
 
|}
 
|}
 +
 +
[[Category:Development]]
 +
[[Category:Plans]]

Latest revision as of 08:33, 6 December 2021

Note.svg Note: This list is being transformed into the github issues
component task complexity potential/willing assignee comments
tests automate process of measurement code coverage easy - It is required to automate process of getting code coverage. We have code coverage results measured in 2012. Would be nice to get up to date results on periodic basis and without manual actions.
crtools Inherit resources, not restore medium - Sigactions are restored for every task before it fork()-s. Then children check for the sa_action from their image matches to one it got from parent. Need to do the same for rlimits, maybe other resources too.
crtools Implement restorer v2 hard (v2) -
crtools New images format medium (v2) - See what's bad with V1 images
kernel/crtools Tune the start-time of tasks medium - When we restore tasks their start-time goes forward (since we create the new task effectively). Need to address this somehow, most likely with the time namespace.
crtools Support chroot-ed mount namespace medium - If the root task lives in another mount namespace and has its root moved (with chroot()) CRIU dump fails with errors about inability to resolve files' paths. This is because CRIU treats the mount namespace's root as the init task's root which should be "/".
crtools Non-stop memory (first?) pre-dump medium - When reading only the memory we can avoid freezing tasks and draining memory with parasite. There's a system call named "read_process_vm" which can help us accessing the other task's memory. The disadvantage of this approach is the need for additional memory. We may control this behaviour by reading memory in chunks and not allocating to much of additional buffers.
kernel/crtools Speed up fetching info about tasks medium Andrey Vagin Using proc to get info about tasks is nice but too slow. We have measured that having socket-based engine that would fetch info about tasks from the kernel speeds things up significantly. So Andrey is working on the Task-diag patchset that would implement that.
kernel Make pipes swappable hard - When pre-dumping memory we pull all the task's memory into pipe with vmsplice and then send it via network splicing the pages into socket. During this period all the memory is effectively pinned as pages in pipe are not swappable.
kernel/crtools Adjust per-task/-container timers offsets medium - Absolute timers differ on different nodes. When live migrating a task/container this difference may (and will) screw the timers up.
crtools Shift timers' timeouts according to the actual C-to-R delay medium - If we pause tasks between C and R we, probably, need to adjust timers respectively. "Medium" complexity is because it's unclear what to do, not how.
kernel/crtools Put call to mmap into VDSO easy Cyrill To put the parasite code into target process we modify its code to call the mmap() system call (and the unmodify it back) and put the parasite into new area. Oleg Nesterov suggests not to patch victim, but to always have one on VDSO.
crtools Integration with other projects hard - CRIU is not working great by itself. There's alway some specific about what user wants to dump. Integrating CRIU with other projects will make CRIU work at its best.
crtools Restore tasks into fresh new pid namespace easy - When we dumped processes, it can be hard to restore it back, if they didn't live in a pid namespace, due to PIDs conflict. It would be nice to have the ability to ask CRIU to create the pid namespace for those guys and restore them there. A thing to worry about is this new namespace's init task.
crtools Rollback tree state medium - When we checkpointed process tree with -R option (let them run after checkpoint) we might want to return the tasks into checkpointed state on the same machine. Currently this can only be done by killing the processes and restoring them from scratch. If we could ask CRIU to restore the images into the ready processes that could speed things up, especially if carefully caring about memory changes tracking.
crtools Restore arbitrary mountpoints tree hard - Linux kernel can construct tricky knows with mount points. We don't support arbitrary configuration of such things, only those that are in active use by software. Need to fix them up.
crtools Lazy restore using userfaultfd medium xemul It might make sense to restore tasks w/o putting all the memory into respective places. Instead, the VMAs in question can be marked as "lazy" and pages will get filled into them in the background and, upon demand, in the out-of-order manner. The functionality is related to lazy migration and seamless kernel update tasks.
crtools Lazy migration using userfaultfd medium xemul Lazy migration is when we move all the tasks on another node, but leave theirs memory on the source one. Not to allow tasks read garbage from empty address space we protect all of it as inaccessible. When tasks start reading/writing the mem they got page-fault-ed. With the userfaultfd technology it can be possible to intercept the #PF, pull the page from source node and map it into expected address.
crtools Speed up logging medium Cyrill Synchronous formatting and writes into log files slow things down. On the other hand turning logs off make it impossible to troubleshoot.
crtools Sanitize logging messages hard - Currently log messages are printed w/o any logic, it's hard to analize what has happened when CRIU fails. Need to improve that by, e.g. categorizing images and explaining them in more details.
crtools Page transfer filters medium - The page-xfer engine just splices the pages from stealing pipes into socket. Packing or encrypting the data would be nice. Maybe it's purely for P.Haul?
crtools TCP socket migration with changed IP medium - It might make sense to migrate a tcp connection on a box with changed IP address _if_ both boxes are NAT-ed to the destination. We will then have to go to NAT box and fix the conntracks in that case and use CRIT images modifucation facilities.
crtools Applying images hard (v2) xemul@ w/ students Think about ability to take images and apply them to a living task(s). Like it was described in the "rollback" feature above. Another exampl -- repopulate fdtable according to data from image. Yet another use-case -- when doing partial migration (see below) we'll need to modify one part to switch from pipes to sockets. What else? With constant replication of tree state we can do incremental dumps on source node and apply those increments on pre-created replicas on the destination node.
crtools Partial migration hard - If tasks subtree has connections to the rest of the tree (e.g. with pipes of unix sockets) we try to detect this and refuse the dump. It should be possible to take part of the tree, migrating it somewhere and recreating the mentioned links with some other appropriate IPC channel. E.g. pipes with sockets, shared memory with distributed shared memory and so on.
crtools Shared objects (mm/fs) support medium - Things created with CLONE_FOO flags are not supported now (exception -- full threads). Now we have the kcmp syscall and can do it. The shared fdtable (CLONE_FILES) is supported, the next candidate is mm sharing, as we do know, that MySQL does so sometimes.
crtools Smart paths resolution hard - Files can be overmounted. In this case CRIU will refuse the dump saying that file is not alive but inaccessible by its name. Need a way to resolve paths to such. There are two ways: 1. Move mounts, that overlap the desired path temporarily, then open the file, then move the mountpoint back. 2. When creating a new mount pre-open an fd keeping the mountpoint. Later, do accurate path resolve and call openat() on proper mountpoint fd.
kernel/crtools TCP repair fixes hard - We can dump and restore live TCP connection. There are some issues with it, that should be fixed.
kernel Seamless kernel upgrade hard xemul Briefly — dump tasks (into memory), change the kernel w/ kexec, then restore tasks back. From the tasks and remote client perspective tasks has just stopped and then resumed on the newer kernel. Can be a good complement to the classic live-patching technology.
crtools Restore arbitrary process tree hard - Need to restore any process tree, which could be created with help PR_SET_CHILD_SUBREAPER and CLONE_PARENT. Processes can share other resources clone(2). Look at session02. The task of resolving the given images into operations we might need to perform seem to be NP (not proven though).
crtools C/R X applications hard - Dump/restore of graphical applications (see about integration). In case of X app part of its state is stored into the X-server. Need the way to fetch this state during dump and put this state back into the server on restore. Requires fixing the X-server software too.
crtools More detailed RPC fail codes easy - Currently only 3 typical errors are reported(see include/cr-errno.h). Need to extend this set as currently it's hard to understand what has happened w/o analysing CRIU log files.
crtools Make CRIU work on AArch32 with CONFIG_KUSER_HELPERS=n medium cov CRIU currently fails on AArch32 kernels built with CONFIG_KUSER_HELPERS=n.
tests Run many/all tests in "container" medium - Currently we run zdtm tests one-by-one. It would be nice to run the all in one pseudo-container and C/R them as one big subtree.
tests Trinity-like (fuzz) testing hard - The existing suite is 99% functionality testing. Need more sophisticated testing -- take a process that has done a random set of actions, C/R one, check that all is OK. The latter is the most complicated thing.
crtools Set checkpoint tokens without recompiling medium - Sometimes you need to call checkpoint in some particular point of the code. The way to do it now is to recompile app with criu_dump() call where needed. But it is quite a bummer to recompile, repackage and redistribute an app you want to c/r. It would be great if one could set a token in app source and then let criu find that point in running task and take a snapshot. The best way to do it might be in libcriu.
crtools Large ghost files support medium - If we have a large ghost (opened unlinked) file, it's inefficient to copy it to another node via CRIU dump. Need to migrate them independently, iteratively, using memory tracking.