Difference between revisions of "Google Summer of Code Ideas"

From CRIU
Jump to navigation Jump to search
(15 intermediate revisions by 4 users not shown)
Line 8: Line 8:
 
=== Post-copy for shared memory and hugetlbfs ===
 
=== Post-copy for shared memory and hugetlbfs ===
 
   
 
   
'''Summary:''' extend post-copy memory restore and migration to support shared memory and hugeltbfs.
+
'''Summary:''' extend post-copy memory restore and migration to support shared memory and hugetlbfs.
  
 
CRIU relies on [[Userfaultfd]] mechanism in the Linux kernel to implement the demand paging in userspace and allow post-copy memory (or lazy) [[Lazy_migration|migration]]. However, currently this support is limited to anonymous private memory mappings, although kernel also supports shared memory areas and hugetlbfs backed memory.
 
CRIU relies on [[Userfaultfd]] mechanism in the Linux kernel to implement the demand paging in userspace and allow post-copy memory (or lazy) [[Lazy_migration|migration]]. However, currently this support is limited to anonymous private memory mappings, although kernel also supports shared memory areas and hugetlbfs backed memory.
Line 20: Line 20:
 
* Skill level: most probably advanced?
 
* Skill level: most probably advanced?
 
* Language: C
 
* Language: C
* Mentor: Mike Rapoport <rppt@linux.ibm.com>
+
* Mentor: Mike Rapoport <rppt@linux.ibm.com> / Andrey Vagin <avagin@gmail.com>
 
* Suggested by: Mike Rapoport <rppt@linux.ibm.com>
 
* Suggested by: Mike Rapoport <rppt@linux.ibm.com>
  
Line 40: Line 40:
 
* Skill level: intermediate
 
* Skill level: intermediate
 
* Language: C, though decoder/preprocessor can be in any language
 
* Language: C, though decoder/preprocessor can be in any language
* Mentor: Andrei Vagin <avagin@gmail.com>, Pavel Emelyanov <xemul@openvz.org>
+
* Mentor: Andrei Vagin <avagin@gmail.com> / Pavel Emelyanov <xemul@openvz.org>
 
* Suggested by: Andrei Vagin <avagin@gmail.com>
 
* Suggested by: Andrei Vagin <avagin@gmail.com>
  
Line 78: Line 78:
 
Current [[CLI/cmd/pre-dump|pre-dump]] mode is used to write task memory contents into image
 
Current [[CLI/cmd/pre-dump|pre-dump]] mode is used to write task memory contents into image
 
files w/o stopping the task for too long. It does this by stopping the task, infecting it and
 
files w/o stopping the task for too long. It does this by stopping the task, infecting it and
draining all the memory into a set of pipes. Then the task is cured and resumed and the pipes'
+
draining all the memory into a set of pipes. Then the task is cured, resumed and the pipes'
contents is written into images (maybe a [[page server]]). This approach creates a big stress
+
contents is written into images (maybe a [[page server]]). Unfortunately, this approach creates
on memory subsystem, as keeping all the memory in pipes creates a lot of unreclaimable memory
+
a big stress on the memory subsystem, as keeping all memory in pipes creates a lot of unreclaimable
(pages in pipes are not swappable), as well as the number of pipes themselves can be buge (as
+
memory (pages in pipes are not swappable), as well as the number of pipes themselves can be huge, as
one pipe doesn't store more than a fixed certain amount of data).
+
one pipe doesn't store more than a fixed amount of data (see pipe(7) man page).
  
We can try to use sys_read_process_vm() syscall to mitigate all of the above. To do this we
+
A solution for this problem is to use a sys_read_process_vm() syscall, which will mitigate
need to allocate a temporary buffer in criu, then walk the target process vm by copying the
+
all of the above. To do this we need to allocate a temporary buffer in criu, then walk the
memory piece-by-piece into it, then flush the data into image (or page server), then repeat.
+
target process vm by copying the memory piece-by-piece into it, then flush the data into image
 +
(or page server), and repeat.
  
 
Ideally there should be sys_splice_process_vm() syscall in the kernel, that does the same as
 
Ideally there should be sys_splice_process_vm() syscall in the kernel, that does the same as
Line 92: Line 93:
 
   
 
   
 
'''Links:'''
 
'''Links:'''
 +
* [[Memory pre dump]]
 
* https://github.com/checkpoint-restore/criu/issues/351
 
* https://github.com/checkpoint-restore/criu/issues/351
 
* [[Memory dumping and restoring]], [[Memory changes tracking]]
 
* [[Memory dumping and restoring]], [[Memory changes tracking]]
* [http://man7.org/linux/man-pages/man2/process_vm_readv.2.html process_vm_readv(2)] [http://man7.org/linux/man-pages/man2/vmsplice.2.html vmsplice(2)] [https://lkml.org/lkml/2017/11/22/527 RFC for splice_process_vm syscall]
+
* [http://man7.org/linux/man-pages/man2/process_vm_readv.2.html process_vm_readv(2)] [http://man7.org/linux/man-pages/man2/vmsplice.2.html vmsplice(2)] [https://lkml.org/lkml/2018/1/9/32 RFC for splice_process_vm syscall]
 
   
 
   
 
'''Details:'''
 
'''Details:'''
Line 102: Line 104:
 
* Suggested by: Pavel Emelianov <xemul@virtuozzo.com>
 
* Suggested by: Pavel Emelianov <xemul@virtuozzo.com>
  
=== Anonymize image files ===
+
=== Anonymise image files ===
 
   
 
   
 
'''Summary:''' Teach [[CRIT]] to remove sensitive information from images
 
'''Summary:''' Teach [[CRIT]] to remove sensitive information from images
Line 120: Line 122:
 
   
 
   
 
'''Links:'''
 
'''Links:'''
 +
* [[Anonymize image files]]
 
* https://github.com/checkpoint-restore/criu/issues/360
 
* https://github.com/checkpoint-restore/criu/issues/360
 
* [[CRIT]], [[Images]]
 
* [[CRIT]], [[Images]]
Line 134: Line 137:
 
'''Summary:''' Implement image view and manipulation in Go
 
'''Summary:''' Implement image view and manipulation in Go
 
   
 
   
CRIU's checkpoint images are stored on disk using protobuf. For easier analysis of checkpoint files CRIU has a tool called CRIT.
+
CRIU's checkpoint images are stored on disk using protobuf. For easier analysis of checkpoint files CRIU has a tool called [[CRIT|CRiu Image Tool (CRIT)]]. It can display/decode CRIU image files from binary protobuf to JSON as well as encode JSON files back to the binary format. With closer integration of CRIU in container runtimes it becomes important to be able to view the CRIU output files. Either for manipulation before restoring or for reading checkpoint statistics (memory pages written to disk, memory pages skipped, process downtime).
 +
 
 +
Currently CRIT is implemented in Python, for easier integration in other Go projects it is important to have image manipulation and analysis available from GO. This means we need a Go based library to read/modify/write/encode/decode CRIU's image files. Based on this library a Go based implementation of CRIT would be useful.
  
 
'''Links:'''
 
'''Links:'''
* Wiki links to relevant material
+
* [[CRIT]]
* External links to mailing lists or web sites
+
* Possible use case see LXD: https://github.com/lxc/lxd/blob/cb55b1c5a484a43e0c21c6ae8c4a2e30b4d45be3/lxd/migrate_container.go#L179
 +
* https://github.com/lxc/lxd/pull/4072
 +
* https://github.com/checkpoint-restore/go-criu/blob/master/phaul/stats.go
 
   
 
   
 
'''Details:'''
 
'''Details:'''
Line 145: Line 152:
 
* Mentor: Adrian Reber <areber@redhat.com>
 
* Mentor: Adrian Reber <areber@redhat.com>
 
* Suggested by: Adrian Reber <areber@redhat.com>
 
* Suggested by: Adrian Reber <areber@redhat.com>
 
=== Implement diskless migration ===
 
 
'''Summary:''' Need to investigate and implement that named diskless migration.
 
 
By diskless we imply a case where all images generated by checkpoint procedure do not sit on storage at all but rather get collected by the criu service on a destination machine, and read from memory later once restore procedure is initiated. More importantly the memory transferred should be deduplicated on the fly and premapped at some preliminary address. Later the restore procedure just remap data to proper positions without copying page data at all.
 
 
This task is under the question still and the section is like a placeholder.
 
 
'''Details:'''
 
* Skill level: expert
 
* Language: C
 
* Mentor: Cyrill Gorcunov <gorcunov@gmail.com>
 
* Suggested by: Cyrill Gorcunov <gorcunov@gmail.com>
 
  
 
=== Memory changes tracking with userfaultfd-WP ===
 
=== Memory changes tracking with userfaultfd-WP ===
Line 165: Line 158:
  
 
Currently CRIU uses [[Memory_changes_tracking|Soft-dirty]] mechanism in Linux kernel to track memory changes.
 
Currently CRIU uses [[Memory_changes_tracking|Soft-dirty]] mechanism in Linux kernel to track memory changes.
This mechanism can be complemented or even completely replaced with recently proposed userfaultfd-WP.
+
This mechanism can be complemented (or even completely replaced) with recently proposed write protection support for
 +
userfaultfd (userfaultfd-WP).
  
Userfault allows implementation of paging in userspace. It allows an application to receive notifications about page faults and provide the desired memory contents for the faulting pages. In the current upstream kernels only missing page faults are supported, but there is an ongoing work to allow notifications for write faults as well. Using these notifications it would be possible to precisely track memory accesses of during pre-dump iterations and this approach may prove more efficient than soft-dirty.
+
Userfault allows implementation of paging in userspace. It allows an application to receive notifications about page faults and provide the desired memory contents for the faulting pages. In the current upstream kernels only missing page faults are supported, but there is an ongoing work to allow notifications for write faults as well. Using such notifications it would be possible to precisely track memory changes during pre-dump iterations. This approach may prove to be more efficient than soft-dirty.
 
   
 
   
 
'''Links:'''
 
'''Links:'''
Line 173: Line 167:
 
* [https://github.com/xzpeter/linux/tree/uffd-wp-merged Userfaultfd-WP]
 
* [https://github.com/xzpeter/linux/tree/uffd-wp-merged Userfaultfd-WP]
 
* [https://www.kernel.org/doc/html/latest/admin-guide/mm/soft-dirty.html?highlight=soft%20dirty Soft-Dirty]
 
* [https://www.kernel.org/doc/html/latest/admin-guide/mm/soft-dirty.html?highlight=soft%20dirty Soft-Dirty]
 +
* https://lwn.net/Articles/777258/
  
 
'''Details:'''
 
'''Details:'''
Line 179: Line 174:
 
* Mentor: Mike Rapoport <rppt@linux.ibm.com>
 
* Mentor: Mike Rapoport <rppt@linux.ibm.com>
 
* Suggested by: Mike Rapoport <rppt@linux.ibm.com>
 
* Suggested by: Mike Rapoport <rppt@linux.ibm.com>
 +
 +
=== Use eBPF to lock and unlock the network ===
 +
 +
'''Summary:''' Use ePBF instead of external iptables-restore tool for network lock and unlock.
 +
 +
During checkpointing and restoring CRIU locks the network to make sure no network packets are accepted by the network stack during the time the process is checkpointed. Currently CRIU calls out to iptables-restore to create and delete the corresponding iptables rules. Another approach which avoids calling out to the external binary iptables-restore would be to directly inject eBPF rules. There have been reports from users that iptables-restore fails in some way and eBPF could avoid this external dependency.
 +
 +
'''Links:'''
 +
* https://www.criu.org/TCP_connection#Checkpoint_and_restore_TCP_connection
 +
* https://github.com/systemd/systemd/blob/master/src/core/bpf-firewall.c
 +
 +
'''Details:'''
 +
* Skill level: intermediate
 +
* Language: C
 +
* Mentor: Adrian Reber <areber@redhat.com>
 +
* Suggested by: Adrian Reber <areber@redhat.com>
 +
 +
[[Category:GSoC]]
 +
[[Category:Development]]

Revision as of 16:57, 28 March 2019

Google Summer of Code (GSoC) is a global program that offers post-secondary students an opportunity to be paid for contributing to an open source project over a three month period.

This page contains project ideas for upcoming Google Summer of Code.

Suggested ideas

Post-copy for shared memory and hugetlbfs

Summary: extend post-copy memory restore and migration to support shared memory and hugetlbfs.

CRIU relies on Userfaultfd mechanism in the Linux kernel to implement the demand paging in userspace and allow post-copy memory (or lazy) migration. However, currently this support is limited to anonymous private memory mappings, although kernel also supports shared memory areas and hugetlbfs backed memory.

The shared memory support for lazy migration can be added to CRIU without kernel modifications, while proper handling of hugetlbfs would require userfaultfd callbacks for fallocate(PUNCH_HOLE) and madvise(MADV_REMOVE) system calls.

Links:

Details:

  • Skill level: most probably advanced?
  • Language: C
  • Mentor: Mike Rapoport <rppt@linux.ibm.com> / Andrey Vagin <avagin@gmail.com>
  • Suggested by: Mike Rapoport <rppt@linux.ibm.com>

Optimize logging engine

Summary: CRIU puts a lots of logs when doing its job. Logging is done with simple fprintf function. They are typically useless, but if some operation fails -- the logs are the only way to find what was the reason for failure.

At the same time the printf family of functions is known to take some time to work -- they need to scan the format string for %-s and then convert the arguments into strings. If comparing criu dump with and without logs the time difference is notable (15%-20%), so speeding the logs up will help improve criu performance.

One of the solutions to the problem might be binary logging. The problem with binary logs is the amount of efforts to convert existing logs to binary form. Preferably, the switch to binary logging either keeps existing log() calls intact, either has some automatics to convert them.

The option to keep log() calls intact might be in pre-compilation pass of the sources. In this pass each log(fmt, ...) call gets translated into a call to a binary log function that saves fmt identifier copies all the args as is into the log file. The binary log decode utility, required in this case, should then find the fmt string by its ID in the log file and print the resulting message.


Links:

Details:

  • Skill level: intermediate
  • Language: C, though decoder/preprocessor can be in any language
  • Mentor: Andrei Vagin <avagin@gmail.com> / Pavel Emelyanov <xemul@openvz.org>
  • Suggested by: Andrei Vagin <avagin@gmail.com>

Add support for checkpoint/restore of CORK-ed UDP socket

Summary: Support C/R of corked UDP socket

There's UDP_CORK option for sockets. As man page says:

    If this option is enabled, then all data output on this socket
    is accumulated into a single datagram that is transmitted when
    the option is disabled.  This option should not be used in
    code intended to be portable.

Currently criu refuses to dump this case, so it's effectively a bug. Supporting this will need extending the kernel API to allow criu read back the write queue of the socket (see how it's done for TCP sockets, for example). Then the queue is written into the image and is restored into the socket (with the CORK bit set too).

Links:

Details:

  • Skill level: intermediate (+linux kernel)
  • Language: C
  • Mentor: Pavel Emelianov <xemul@virtuozzo.com>
  • Suggested by: Pavel Emelianov <xemul@virtuozzo.com>

Optimize the pre-dump algorithm

Summary: Optimize the pre-dump algorithm to avoid pinning to many memory in RAM

Current pre-dump mode is used to write task memory contents into image files w/o stopping the task for too long. It does this by stopping the task, infecting it and draining all the memory into a set of pipes. Then the task is cured, resumed and the pipes' contents is written into images (maybe a page server). Unfortunately, this approach creates a big stress on the memory subsystem, as keeping all memory in pipes creates a lot of unreclaimable memory (pages in pipes are not swappable), as well as the number of pipes themselves can be huge, as one pipe doesn't store more than a fixed amount of data (see pipe(7) man page).

A solution for this problem is to use a sys_read_process_vm() syscall, which will mitigate all of the above. To do this we need to allocate a temporary buffer in criu, then walk the target process vm by copying the memory piece-by-piece into it, then flush the data into image (or page server), and repeat.

Ideally there should be sys_splice_process_vm() syscall in the kernel, that does the same as the read_process_vm does, but vmsplices the data

Links:

Details:

  • Skill level: advanced
  • Language: C
  • Mentor: Pavel Emelianov <xemul@virtuozzo.com>
  • Suggested by: Pavel Emelianov <xemul@virtuozzo.com>

Anonymise image files

Summary: Teach CRIT to remove sensitive information from images

When reporting a BUG it may be not acceptable for the reporter to send us raw images, as they may contain sensitive data. Need to teach CRIT to "anonymise" images for publication.

List of data to shred:

  • Memory contents. For the sake of investigation, all the memory contents can be just removed. Only the sizes of pages*.img files are enough.
  • Paths to files. Here we should keep the paths relations to each other. The simplest way seem to be replacing file names with "random" (or sequential) strings, BUT (!) keeping an eye on making this mapping be 1:1. Note, that file paths may also sit in sk-unix.img.
  • Registers.
  • Process names. (But relations should be kept).
  • Contents of streams, i.e. pipe/fifo data, sk-queue, tcp-stream, tty data.
  • Ghost files.
  • Tarballs with tmpfs-s.
  • IP addresses in sk-inet-s, ip tool dumps and net*.img.

Links:

Details:

  • Skill level: beginner
  • Language: Python
  • Mentor: Pavel Emelianov <xemul@virtuozzo.com>
  • Suggested by: Pavel Emelianov <xemul@virtuozzo.com>

Porting crit functionalities in GO

Summary: Implement image view and manipulation in Go

CRIU's checkpoint images are stored on disk using protobuf. For easier analysis of checkpoint files CRIU has a tool called CRiu Image Tool (CRIT). It can display/decode CRIU image files from binary protobuf to JSON as well as encode JSON files back to the binary format. With closer integration of CRIU in container runtimes it becomes important to be able to view the CRIU output files. Either for manipulation before restoring or for reading checkpoint statistics (memory pages written to disk, memory pages skipped, process downtime).

Currently CRIT is implemented in Python, for easier integration in other Go projects it is important to have image manipulation and analysis available from GO. This means we need a Go based library to read/modify/write/encode/decode CRIU's image files. Based on this library a Go based implementation of CRIT would be useful.

Links:

Details:

  • Skill level: beginner
  • Language: Go
  • Mentor: Adrian Reber <areber@redhat.com>
  • Suggested by: Adrian Reber <areber@redhat.com>

Memory changes tracking with userfaultfd-WP

Summary: add ability to track memory changes of the snapshotted processes using userfaultfd-WP

Currently CRIU uses Soft-dirty mechanism in Linux kernel to track memory changes. This mechanism can be complemented (or even completely replaced) with recently proposed write protection support for userfaultfd (userfaultfd-WP).

Userfault allows implementation of paging in userspace. It allows an application to receive notifications about page faults and provide the desired memory contents for the faulting pages. In the current upstream kernels only missing page faults are supported, but there is an ongoing work to allow notifications for write faults as well. Using such notifications it would be possible to precisely track memory changes during pre-dump iterations. This approach may prove to be more efficient than soft-dirty.

Links:

Details:

  • Skill level: most probably advanced?
  • Language: C
  • Mentor: Mike Rapoport <rppt@linux.ibm.com>
  • Suggested by: Mike Rapoport <rppt@linux.ibm.com>

Use eBPF to lock and unlock the network

Summary: Use ePBF instead of external iptables-restore tool for network lock and unlock.

During checkpointing and restoring CRIU locks the network to make sure no network packets are accepted by the network stack during the time the process is checkpointed. Currently CRIU calls out to iptables-restore to create and delete the corresponding iptables rules. Another approach which avoids calling out to the external binary iptables-restore would be to directly inject eBPF rules. There have been reports from users that iptables-restore fails in some way and eBPF could avoid this external dependency.

Links:

Details:

  • Skill level: intermediate
  • Language: C
  • Mentor: Adrian Reber <areber@redhat.com>
  • Suggested by: Adrian Reber <areber@redhat.com>