Changes

Jump to navigation Jump to search
4,419 bytes removed ,  14:36, 4 April 2018
vdso remapping fixed for arm64/arm32/ppc64/ia32 running on x86_64, s390 doesn't need this - looks like, each supported architecture is covered; for any new architecture there is an easy way now in kernel to add .mremap() to special_mapping_ops
Line 1: Line 1:  +
{{note|This list is being transformed into the [https://github.com/checkpoint-restore/criu/issues github issues]}}
 +
 
{| class="wikitable sortable"
 
{| class="wikitable sortable"
 
|-
 
|-
Line 6: Line 8:  
! potential/willing assignee
 
! potential/willing assignee
 
! comments
 
! comments
|-
  −
| crtools/crit || Library/API to access pagemap+page images || medium || - || The [[memory dumps]] are quite sophisticated. It would be nice to have a library or API to access the data in them using some simple API. IOW -- API-ize the page_read.c
  −
|-
  −
| crtools || Zombies with threads :) || easy || - || When milti-thread task's leader thread exits task turns into zombie state, but the other threads keep running. Need to support this (zdt test pthread02).
   
|-
 
|-
 
| tests || automate process of measurement code coverage || easy || - || It is required to automate process of getting code coverage. We have code coverage results [http://criu.org/cov/ measured in 2012]. Would be nice to get up to date results on periodic basis and without manual actions.
 
| tests || automate process of measurement code coverage || easy || - || It is required to automate process of getting code coverage. We have code coverage results [http://criu.org/cov/ measured in 2012]. Would be nice to get up to date results on periodic basis and without manual actions.
|-
  −
| crtools || Non-full mntns dump || medium || - || Systemd launches services in a new mount namespace with a single change -- /tmp is re-mounted into a private one(PrivateTmp option). Need to invent an API for dumping only a part of mntns.
  −
|-
  −
| crtools || Make dump and restore work under [[selinux]] || medium || - || Selinux imposes more restrictions on the stuff we typically do.
   
|-
 
|-
 
| crtools || Inherit resources, not restore || medium || - || Sigactions are restored for every task before it fork()-s. Then children check for the sa_action from their image matches to one it got from parent. Need to do the same for rlimits, maybe other resources too.
 
| crtools || Inherit resources, not restore || medium || - || Sigactions are restored for every task before it fork()-s. Then children check for the sa_action from their image matches to one it got from parent. Need to do the same for rlimits, maybe other resources too.
Line 36: Line 30:  
|-
 
|-
 
| crtools || [[time namespace|Shift timers' timeouts]] according to the actual C-to-R delay || medium || - || If we pause tasks between C and R we, probably, need to adjust timers respectively. "Medium" complexity is because it's unclear ''what'' to do, not ''how''.
 
| crtools || [[time namespace|Shift timers' timeouts]] according to the actual C-to-R delay || medium || - || If we pause tasks between C and R we, probably, need to adjust timers respectively. "Medium" complexity is because it's unclear ''what'' to do, not ''how''.
|-
  −
| crtools || Show what was left in the system after dump || easy || - || When we use [[Invisible files|--link-remap]] option or [[TCP connection|--tcp-established]] one CRIU leaves some traces in the system, in particular -- temporary hard links in the former case and iptables rules in the latter. Need some way to show these to the user.
   
|-
 
|-
 
| kernel/crtools || Put call to mmap into VDSO || easy || Cyrill || To put the [[parasite code]] into target process we modify its code to call the <code>mmap()</code> system call (and the unmodify it back) and put the parasite into new area. Oleg Nesterov suggests not to patch victim, but to always have one on VDSO.
 
| kernel/crtools || Put call to mmap into VDSO || easy || Cyrill || To put the [[parasite code]] into target process we modify its code to call the <code>mmap()</code> system call (and the unmodify it back) and put the parasite into new area. Oleg Nesterov suggests not to patch victim, but to always have one on VDSO.
Line 56: Line 48:  
|-
 
|-
 
| crtools || Sanitize [[logging]] messages || hard || - || Currently log messages are printed w/o any logic, it's hard to analize what has happened when CRIU fails. Need to improve that by, e.g. categorizing images and [[When C/R fails|explaining them]] in more details.
 
| crtools || Sanitize [[logging]] messages || hard || - || Currently log messages are printed w/o any logic, it's hard to analize what has happened when CRIU fails. Need to improve that by, e.g. categorizing images and [[When C/R fails|explaining them]] in more details.
|-
  −
| crtools || Optimize kcmp calls || medium || - || CRIU build [[kcmp trees]] to find out IDs of such objects as MM, FDT and others. Currently we kcmp all tasks to get the ID, but we can improve that by pre-generating ID based on objects that live on MM, FS, etc. If pre-ID of two tasks matches, then we call kcmp, if not -- objects ''are'' different.
   
|-
 
|-
 
| crtools || Page transfer filters || medium || - || The page-xfer engine just splices the pages from stealing pipes into socket. Packing or encrypting the data would be nice. Maybe it's purely for [[P.Haul]]?
 
| crtools || Page transfer filters || medium || - || The page-xfer engine just splices the pages from stealing pipes into socket. Packing or encrypting the data would be nice. Maybe it's purely for [[P.Haul]]?
|-
  −
| crtools || [[FUSE]] mount points || hard || - || When dumping mountpoints we explicitly check the filesystem mounted. The thing is -- not all filesystems can be just ignored on dump. E.g. FUSE mount involves a user-space daemon that is responsible for the files tree contents. If we just kill one on dump we might not be able to restore it. Need to special-care one.
  −
|-
  −
| crtools || Modify restored resources run-time in [[CRIT]] daemon || medium || - || Sometimes it might make sense to tune the objects from images on restore. E.g. change the IP address of sockets from task above or fix file paths to be "chroot-ed". The best solution seems to be in launching CRIT in daemon mode, telling it what images and how to modify and teaching CRIU to "filter" the pb objects read from images through this daemon.
   
|-
 
|-
 
| crtools || TCP socket migration with changed IP || medium || - ||  It might make sense to migrate a tcp connection on a box with changed IP address _if_ both boxes are NAT-ed to the destination. We will then have to go to NAT box and fix the conntracks in that case and use CRIT images modifucation facilities.
 
| crtools || TCP socket migration with changed IP || medium || - ||  It might make sense to migrate a tcp connection on a box with changed IP address _if_ both boxes are NAT-ed to the destination. We will then have to go to NAT box and fix the conntracks in that case and use CRIT images modifucation facilities.
Line 76: Line 62:  
|-
 
|-
 
| kernel/crtools || [[TCP repair TODO|TCP repair fixes]] || hard || - || We can dump and restore live [[TCP connection]]. There are some issues with it, that should be fixed.
 
| kernel/crtools || [[TCP repair TODO|TCP repair fixes]] || hard || - || We can dump and restore live [[TCP connection]]. There are some issues with it, that should be fixed.
|-
  −
| kernel?/crtools || TCP conntrack-ed connections || medium || - || When a container uses conntracks inside, we cannot just dump and restore alive TCP connection. Otherwise on restore the resurrected packets will be blocked by connection tracker as they would not be recognized as established connection. Need to check whether connection tracking is ON, dump the needed conntrack info and put the tracker back.
  −
|-
  −
| crtools/kernel || [[NFS mount points]] support || hard || - || NFS mount points from inside container cannot be easily restored. The thing is -- if we want to restore opened file we will go ahead and [[How hard is it to open a file|call]] the open system call. If the file in question resides on NFS, the latter might need to go to network to check whether the file actually exists and set up the handle. But if the networking is still not restored this operation would fail and we'll have to fail the whole restore. In order to untie this chicken-and-egg problem we may go in two directions.
   
|-
 
|-
 
| kernel || [[Seamless kernel upgrade]] || hard || xemul || Briefly — dump tasks (into memory), change the kernel w/ kexec, then restore tasks back. From the tasks and remote client perspective tasks has just stopped and then resumed on the newer kernel. Can be a good complement to the classic live-patching technology.
 
| kernel || [[Seamless kernel upgrade]] || hard || xemul || Briefly — dump tasks (into memory), change the kernel w/ kexec, then restore tasks back. From the tasks and remote client perspective tasks has just stopped and then resumed on the newer kernel. Can be a good complement to the classic live-patching technology.
 
|-
 
|-
| crtools || Restore arbitrary process tree || hard ||  - || Need to restore any process tree, which could be created with help PR_SET_CHILD_SUBREAPER and CLONE_PARENT. Processes can share other resources [http://man7.org/linux/man-pages/man2/clone2.2.html clone(2)]. Look at [http://git.criu.org/?p=crtools.git;a=blob;f=test/zdtm/live/static/session02.c;hb=HEAD session02]. The task of resolving the given images into operations we might need to perform seem to be NP (not proven though).
+
| crtools || Restore arbitrary process tree || hard ||  - || Need to restore any process tree, which could be created with help PR_SET_CHILD_SUBREAPER and CLONE_PARENT. Processes can share other resources [http://man7.org/linux/man-pages/man2/clone2.2.html clone(2)]. Look at [https://github.com/checkpoint-restore/criu/blob/master/test/zdtm/static/session02.c session02]. The task of resolving the given images into operations we might need to perform seem to be NP (not proven though).
 
|-
 
|-
 
| crtools || C/R [[X applications]] || hard || Ruslan Kuprieiev || Dump/restore of graphical applications (see about [[integration]]). In case of X app part of its state is stored into the X-server. Need the way to fetch this state during dump and put this state back into the server on restore. Requires fixing the X-server software too.
 
| crtools || C/R [[X applications]] || hard || Ruslan Kuprieiev || Dump/restore of graphical applications (see about [[integration]]). In case of X app part of its state is stored into the X-server. Need the way to fetch this state during dump and put this state back into the server on restore. Requires fixing the X-server software too.
 
|-
 
|-
| crtools/kernel || Undo semaphores || medium || Cyrill Gorcunov || These are SysVIPC objects created with semctl() and SEM_UNDO flag. Shame on us, we don't even detect these are created. Fortunately they are not in active use. Need to do it -- dump and restore. Requires modifications from both sides — criu and kernel.
+
| crtools || More detailed RPC fail codes || easy || - || Currently only 3 typical errors are reported(see [https://github.com/checkpoint-restore/criu/blob/master/criu/include/cr-errno.h#L8 include/cr-errno.h]). Need to extend this set as currently it's hard to understand what has happened w/o analysing CRIU log files.
|-
  −
| crtools || More detailed RPC fail codes || easy || - || Currently only 3 typical errors are reported(see [https://github.com/xemul/criu/blob/master/include/cr-errno.h#L8 include/cr-errno.h]). Need to extend this set as currently it's hard to understand what has happened w/o analysing CRIU log files.
  −
|-
  −
| kernel/criu || FS-notify queues || hard || - || We dump [[Fsnotify]] files, but when they contain events inside -- just ignore those. Need to fetch then and put back on restore. The difficulty here is that while dumping/restoring CRIU may touch files that are monitored and thus produce unwanted events into queue.
   
|-
 
|-
 
| crtools || Make CRIU work on AArch32 with CONFIG_KUSER_HELPERS=n || medium || cov || CRIU currently fails on AArch32 kernels built with CONFIG_KUSER_HELPERS=n.
 
| crtools || Make CRIU work on AArch32 with CONFIG_KUSER_HELPERS=n || medium || cov || CRIU currently fails on AArch32 kernels built with CONFIG_KUSER_HELPERS=n.
|-
  −
| kernel || Fix VDSO remapping on non-x86 architectures || medium || Laurent Dufour, cov || However some architectures like PowerPC and ARM are keeping a reference to the VDSO base address to build the signal return stack frame by calling the VDSO sigreturn service. So once the VDSO has been moved, this reference is no more valid and the signal frame built later are not usable.
   
|-
 
|-
 
| tests || Run many/all tests in "container" || medium || - || Currently we run zdtm tests one-by-one. It would be nice to run the all in one pseudo-container and C/R them as one big subtree.
 
| tests || Run many/all tests in "container" || medium || - || Currently we run zdtm tests one-by-one. It would be nice to run the all in one pseudo-container and C/R them as one big subtree.
 
|-
 
|-
| tests || Trinity-like (fuzz) testing || hard || - || The existing suite is 99% functionality testing. Need more sophisticated testing -- take a process that has done a random set of actions, C/R one, check that all is OK. The latter is the most complicated thing.
+
| tests || [[Fuzz testing|Trinity-like (fuzz) testing]] || hard || - || The existing suite is 99% functionality testing. Need more sophisticated testing -- take a process that has done a random set of actions, C/R one, check that all is OK. The latter is the most complicated thing.
|-
  −
| tests || Split mountpoints.c test into pieces || easy || - || Currently this one is one big set of tests. Need more fine-grained set.
  −
|-
  −
| tests/infrastructure || Run tests on patches sent to the mailing lists || medium || Ruslan Kuprieiev || It's quite typical that a set sent to the mailing list fails some tests. Need a robot that would monitor the list, check the patches and send the result back.
  −
|-
  −
| tests || Fault injection || hard || - || Need some way to test error paths in CRIU. Right now we rely on the developers to write correct code :\ This is the most critical on dump.
  −
|-
  −
| crtools || Zombies with threads || medium || - || Support processes with alive threads and a dead leader
   
|-
 
|-
| crtools || Unix sender address || medium || - || Restore sender addresses for unix socket messages
+
| crtools || Set checkpoint tokens without recompiling || medium || - || Sometimes you need to call checkpoint in some particular point of the code. The way to do it now is to recompile app with criu_dump() call where needed. But it is quite a bummer to recompile, repackage and redistribute an app you want to c/r. It would be great if one could set a token in app source and then let criu find that point in running task and take a snapshot. The best way to do it might be in libcriu.
 
|-
 
|-
| crtools || --leave-stopped for restore || easy || - || Restore task but leave it stopped
+
| crtools || Large ghost files support || medium || - || If we have a large ghost (opened unlinked) file, it's inefficient to copy it to another node via CRIU dump. Need to migrate them independently, iteratively, using memory tracking.
 
|}
 
|}
    
[[Category:Development]]
 
[[Category:Development]]
 
[[Category:Plans]]
 
[[Category:Plans]]
105

edits

Navigation menu