Difference between revisions of "Google Summer of Code Ideas"

From CRIU
Jump to navigation Jump to search
m
 
(94 intermediate revisions by 8 users not shown)
Line 3: Line 3:
 
This page contains project ideas for upcoming Google Summer of Code.
 
This page contains project ideas for upcoming Google Summer of Code.
  
== Suggested ideas ==
+
== Contact ==
  
 +
First, make sure to go through the [[GSoC Students Recommendations]]. Once you build CRIU locally and C/R a simple process successfully, please contact the respective mentor for the idea you are interested in. For general questions feel free to send an email to the [mailto:criu@lists.linux.dev mailing list] or write in [https://gitter.im/save-restore/criu gitter].
  
=== Post-copy for shared memory and hugetlbfs ===
+
== Project ideas ==
 +
 
 +
=== Add support for memory compression ===
 +
 +
'''Summary:''' Support compression for page images
 +
 +
We would like to support memory page files compression
 +
in CRIU using one of the fastest algorithms (it's matter
 +
of discussion which one to choose!).
 +
 
 +
This task does not require any Linux kernel modifications
 +
and scope is limited to CRIU itself. At the same time it's
 +
complex enough as we need to touch memory dump/restore codepath
 +
in CRIU and also handle many corner cases like page-server and stuff.
 
   
 
   
'''Summary:''' extend post-copy memory restore and migration to support shared memory and hugetlbfs.
+
'''Details:'''
 +
* Skill level: intermediate
 +
* Language: C
 +
* Expected size: 350 hours
 +
* Suggested by: Andrei Vagin <avagin@gmail.com>
 +
* Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Alexander Mikhalitsyn <alexander@mihalicyn.com>, Andrei Vagin <avagin@gmail.com>
 +
 
 +
=== Use eBPF to lock and unlock the network ===
 +
 +
'''Summary:''' Use eBPF instead of external iptables-restore tool for network lock and unlock.
 +
 
 +
During checkpointing and restoring CRIU locks the network to make sure no network packets are accepted by the network stack during the time the process is checkpointed. Currently CRIU calls out to iptables-restore to create and delete the corresponding iptables rules. Another approach which avoids calling out to the external binary iptables-restore would be to directly inject eBPF rules. There have been reports from users that iptables-restore fails in some way and eBPF could avoid this external dependency.
 +
 
 +
'''Links:'''
 +
* https://www.criu.org/TCP_connection#Checkpoint_and_restore_TCP_connection
 +
* https://github.com/systemd/systemd/blob/master/src/core/bpf-firewall.c
 +
* https://blog.zeyady.com/2021-08-16/gsoc-criu
 +
 
 +
'''Details:'''
 +
* Skill level: intermediate
 +
* Language: C
 +
* Expected size: 350 hours
 +
* Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Prajwal S N <prajwalnadig21@gmail.com>
 +
* Suggested by: Adrian Reber <areber@redhat.com>
 +
 
 +
=== Files on detached mounts ===
 +
 
 +
'''Summary:''' Initial support of open files on "detached" mounts
 +
 
 +
When criu dumps a process with an open fd on a file, it gets the mount identifier (mnt_id) via /proc/<pid>/fdinfo/<fd>, so that criu knows from which exact mount the file was initially opened. This way criu can restore this fd by opening the same exact file from topologically the same mount in restored mount tree.
 +
 
 +
Restoring fd from the right mount can be important in different cases, for instance if the process would later want to resolve paths relative to the fd, and obviously resolving from the same file on different mount can lead to different resolved paths, or if the process wants to check path to the file via /proc/<pid>/fd/<fd>.
 +
 
 +
But we have a problem finding on which mount we need to reopen the file at restore if we only know mnt_id but can't find this mnt_id in /proc/<pid>/mountinfo.
 +
 
 +
Mountinfo file shows the mount tree topology of current mntns: parent - child relations, sharing group information, mountpoint and fs root information. And if we don't see mnt_id in it we don't know anything about this mount.
 +
 
 +
This can happen in two cases
 +
 
 +
* 1) external mount or file - if file was opened from e.g. host it's mount would not be visible in container mountinfo
 +
* 2) mount was lazily unmounted
 +
 
 +
In case of 1) we have criu options to help criu handle external dependencies.
 +
 
 +
In case of 2) or no options provided criu can't resolve mnt_id in mountinfo and criu fails.
 +
 
 +
'''Solution:'''
 +
We can handle 2) with: resolving major/minor via fstat, using name_to_handle_at and open_by_handle_at to open same file on any other available mount from same superblock (same major/minor) in container. Now we have fd2 of the same file as fd, but on existing mount we can dump it as usual instead, and mark it as "detached" in image, now criu on restore knows where to find this file, but instead of just opening fd2 from actually restored mount, we create a temporary bindmount which is lazy unmounted just after open making the file appear as a file on detached mount.
 +
 
 +
Known problems with this approach:
 +
 
 +
* Stat on btrfs gives wrong major/minor
 +
* file handles does not work everywhere
 +
* file handles can return fd2 on deleted file or on other hardlink, this needs special handling.
 +
 
 +
Additionally (optional part):
 +
We can export real major/minor in fdinfo (kernel).
 +
We can think of new kernel interface to get mount's major/minor and root (shift from fsroot) for detached mounts, if we have it we don't need file handle hack to find file on other mount (see fsinfo or getvalues kernel patches in LKML, can we add this info there?).
 +
 
 +
'''Details:'''
 +
* Skill level: intermediate
 +
* Language: C
 +
* Expected size: 350 hours
 +
* Mentor: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
 +
* Suggested by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
 +
 
 +
=== Checkpointing of POSIX message queues ===
 +
 
 +
'''Summary:''' Add support for checkpoint/restore of POSIX message queues
 +
 
 +
POSIX message queues are a widely used inter-process communication mechanism. Message queues are implemented as files on a virtual filesystem (mqueue), where a file descriptor (message queue descriptor) is used to perform operations such as sending or receiving messages. To support checkpoint/restore of POSIX message queues, we need a kernel interface (similar to [https://github.com/checkpoint-restore/criu/commit/8ce9e947051e43430eb2ff06b96dddeba467b4fd MSG_PEEK]) that would enable the retrieval of messages from a queue without removing them. This project aims to implement such an interface that allows retrieving all messages and their priorities from a POSIX message queue.
 +
 
 +
'''Links:'''
 +
* https://github.com/checkpoint-restore/criu/issues/2285
 +
* https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/ipc/mqueue.c
 +
* https://www.man7.org/tlpi/download/TLPI-52-POSIX_Message_Queues.pdf
 +
 
 +
'''Details:'''
 +
* Skill level: intermediate
 +
* Language: C
 +
* Expected size: 350 hours
 +
* Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
 +
* Suggested by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
  
CRIU relies on [[Userfaultfd]] mechanism in the Linux kernel to implement the demand paging in userspace and allow post-copy memory (or lazy) [[Lazy_migration|migration]]. However, currently this support is limited to anonymous private memory mappings, although kernel also supports shared memory areas and hugetlbfs backed memory.
 
  
The shared memory support for lazy migration can be added to CRIU without kernel modifications, while proper handling of hugetlbfs would require userfaultfd callbacks for [http://man7.org/linux/man-pages/man2/fallocate.2.html fallocate(PUNCH_HOLE)] and [http://man7.org/linux/man-pages/man2/madvise.2.html madvise(MADV_REMOVE)] system calls.
+
=== Add support for arm64 Guarded Control Stack (GCS) ===
 +
 +
'''Summary:''' Support arm64 Guarded Control Stack (GCS)
 
   
 
   
 +
The arm64 Guarded Control Stack (GCS) feature provides support for
 +
hardware protected stacks of return addresses, intended to provide
 +
hardening against return oriented programming (ROP) attacks and to make
 +
it easier to gather call stacks for applications such as profiling (taken from [1]).
 +
We would like to support arm64 Guarded Control Stack (GCS) in CRIU, which means
 +
that CRIU should be able to Checkpoint/Restore applications using GCS.
 +
 +
This task should not require any Linux kernel modifications
 +
but will require a lot of effort to understand Linux kernel and
 +
glibc support patches. We have a good example of support for
 +
x86 shadow stack [4].
 +
 
'''Links:'''
 
'''Links:'''
* [https://www.kernel.org/doc/html/latest/admin-guide/mm/userfaultfd.html Userfaultfd]
+
* [1] kernel support https://lore.kernel.org/all/20241001-arm64-gcs-v13-0-222b78d87eee@kernel.org
* [https://www.kernel.org/doc/html/latest/admin-guide/mm/hugetlbpage.html hugetlbfs]
+
* [2] libc support https://inbox.sourceware.org/libc-alpha/20250117174119.3254972-1-yury.khrustalev@arm.com
 +
* [3] libc tests https://inbox.sourceware.org/libc-alpha/20250210114538.1723249-1-yury.khrustalev@arm.com
 +
* [4] x86 support https://github.com/checkpoint-restore/criu/pull/2306
 +
 
'''Details:'''
 
'''Details:'''
* Skill level: most probably advanced?
+
* Skill level: expert (a lot of moving parts: Linux kernel / libc / CRIU)
 
* Language: C
 
* Language: C
* Mentor: Mike Rapoport <rppt@linux.ibm.com> / Andrey Vagin <avagin@gmail.com>
+
* Expected size: 350 hours
* Suggested by: Mike Rapoport <rppt@linux.ibm.com>
+
* Suggested by: Mike Rapoport <rppt@kernel.org>
 +
* Mentors: Mike Rapoport <rppt@kernel.org>, Andrei Vagin <avagin@gmail.com>, Alexander Mikhalitsyn <alexander@mihalicyn.com>
 +
 
 +
=== Coordinated checkpointing of distributed applications ===
 +
 +
'''Summary:''' Enable coordinated container checkpointing with Kubernetes.
 +
 
 +
Checkpointing support has been recently introduced in Kubernetes, where the
 +
smallest deployable unit is a Pod (a group of containers).  Kubernetes is often
 +
used to deploy applications that are distributed across multiple nodes.
 +
However, checkpointing such distributed applications requires a coordination
 +
mechanism to synchronize the checkpoint and restore operations. To address this
 +
challenge, we have developed a new tool called <code>criu-coordinator</code>
 +
that relies on the action-script functionality of CRIU to enable synchronization
 +
in distributed environments. This project aims to extend this tool to enable
 +
seamless integration with the checkpointing functionality of Kubernetes.
 +
 
 +
'''Links:'''
 +
* https://github.com/checkpoint-restore/criu-coordinator
 +
* https://lpc.events/event/18/contributions/1803/
 +
* https://sched.co/1YeT4
 +
* https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/
 +
 
 +
'''Details:'''
 +
* Skill level: intermediate
 +
* Language: Rust / Go / C
 +
* Expected size: 350 hours
 +
* Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Prajwal S N <prajwalnadig21@gmail.com>
 +
* Suggested by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
 +
 
 +
== Suspended project ideas ==
 +
 
 +
Listed here are tasks that seem suitable for GSoC, but currently do not have anybody to mentor it.
  
 
=== Optimize logging engine ===
 
=== Optimize logging engine ===
Line 33: Line 177:
 
The option to keep log() calls intact might be in pre-compilation pass of the sources. In this pass each <code>log(fmt, ...)</code> call gets translated into a call to a binary log function that saves <code>fmt</code> identifier copies all the args ''as is'' into the log file. The binary log decode utility, required in this case, should then find the fmt string by its ID in the log file and print the resulting message.
 
The option to keep log() calls intact might be in pre-compilation pass of the sources. In this pass each <code>log(fmt, ...)</code> call gets translated into a call to a binary log function that saves <code>fmt</code> identifier copies all the args ''as is'' into the log file. The binary log decode utility, required in this case, should then find the fmt string by its ID in the log file and print the resulting message.
  
 
 
'''Links:'''
 
'''Links:'''
 
* [[Better logging]]
 
* [[Better logging]]
Line 40: Line 183:
 
* Skill level: intermediate
 
* Skill level: intermediate
 
* Language: C, though decoder/preprocessor can be in any language
 
* Language: C, though decoder/preprocessor can be in any language
* Mentor: Andrei Vagin <avagin@gmail.com> / Pavel Emelyanov <xemul@openvz.org>
+
* Expected size: 350 hours
* Suggested by: Andrei Vagin <avagin@gmail.com>
+
* Suggested by: Andrei Vagin
 +
* Mentors: Alexander Mikhalitsyn <alexander@mihalicyn.com>
  
=== Add support for checkpoint/restore of CORK-ed UDP socket ===
+
=== IOUring support ===
+
The io_uring Asynchronous I/O (AIO) framework is a new Linux I/O interface, first introduced in upstream Linux kernel version 5.1 (March 2019). It provides a low-latency and feature-rich interface for applications that require AIO functionality.
'''Summary:''' Support C/R of corked UDP socket
 
 
There's UDP_CORK option for sockets. As man page says:
 
<pre>
 
    If this option is enabled, then all data output on this socket
 
    is accumulated into a single datagram that is transmitted when
 
    the option is disabled. This option should not be used in
 
    code intended to be portable.
 
</pre>
 
  
Currently criu refuses to dump this case, so it's effectively a bug. Supporting
 
this will need extending the kernel API to allow criu read back the write queue
 
of the socket (see [[TCP connection|how it's done]] for TCP sockets, for example). Then
 
the queue is written into the image and is restored into the socket (with the CORK
 
bit set too).
 
 
 
'''Links:'''
 
'''Links:'''
* https://github.com/checkpoint-restore/criu/issues/409
+
* https://blogs.oracle.com/linux/an-introduction-to-the-io_uring-asynchronous-io-framework
* [[Sockets]], [[TCP connection]]
+
* https://github.com/axboe/liburing
* [[https://groups.google.com/forum/#!topic/comp.os.linux.networking/Uz8PYiTCZSg UDP cork explained]]
+
 
 
 
'''Details:'''
 
'''Details:'''
* Skill level: intermediate (+linux kernel)
+
* Skill level: expert (+linux kernel)
* Language: C
+
* Expected size: 350 hours
* Mentor: Pavel Emelianov <xemul@virtuozzo.com>
 
* Suggested by: Pavel Emelianov <xemul@virtuozzo.com>
 
  
=== Optimize the pre-dump algorithm ===
+
=== Add support for SPFS ===
 
   
 
   
'''Summary:''' Optimize the pre-dump algorithm to avoid pinning to many memory in RAM
+
'''Summary:''' The SPFS is a special filesystem that allows checkpoint and restore of such things as NFS and FUSE
+
 
Current [[CLI/cmd/pre-dump|pre-dump]] mode is used to write task memory contents into image
+
NFS support is already implemented in Virtuozzo CRIU, but it's very beneficial to port it to mainline CRIU. The importaint part of it is the need to implement the integration of Stub-Proxy File System (SPFS) with LXC/yet_another_containers_environment.
files w/o stopping the task for too long. It does this by stopping the task, infecting it and
 
draining all the memory into a set of pipes. Then the task is cured, resumed and the pipes'
 
contents is written into images (maybe a [[page server]]). Unfortunately, this approach creates
 
a big stress on the memory subsystem, as keeping all memory in pipes creates a lot of unreclaimable
 
memory (pages in pipes are not swappable), as well as the number of pipes themselves can be huge, as
 
one pipe doesn't store more than a fixed amount of data (see pipe(7) man page).
 
  
A solution for this problem is to use a sys_read_process_vm() syscall, which will mitigate
+
'''Links'''
all of the above. To do this we need to allocate a temporary buffer in criu, then walk the
+
* https://github.com/checkpoint-restore/criu/issues/60
target process vm by copying the memory piece-by-piece into it, then flush the data into image
+
* https://github.com/checkpoint-restore/criu/issues/53
(or page server), and repeat.
+
* https://github.com/skinsbursky/spfs
 +
* https://patchwork.criu.org/series/137/
  
Ideally there should be sys_splice_process_vm() syscall in the kernel, that does the same as
 
the read_process_vm does, but vmsplices the data
 
 
'''Links:'''
 
* [[Memory pre dump]]
 
* https://github.com/checkpoint-restore/criu/issues/351
 
* [[Memory dumping and restoring]], [[Memory changes tracking]]
 
* [http://man7.org/linux/man-pages/man2/process_vm_readv.2.html process_vm_readv(2)] [http://man7.org/linux/man-pages/man2/vmsplice.2.html vmsplice(2)] [https://lkml.org/lkml/2018/1/9/32 RFC for splice_process_vm syscall]
 
 
 
'''Details:'''
 
'''Details:'''
* Skill level: advanced
+
* Skill level: expert
 
* Language: C
 
* Language: C
* Mentor: Pavel Emelianov <xemul@virtuozzo.com>
+
* Mentor: Alexander Mikhalitsyn <alexander@mihalicyn.com>
* Suggested by: Pavel Emelianov <xemul@virtuozzo.com>
+
* Suggested by: Alexander Mikhalitsyn <alexander@mihalicyn.com>
 +
 
  
 
=== Anonymise image files ===
 
=== Anonymise image files ===
Line 130: Line 243:
 
* Skill level: beginner
 
* Skill level: beginner
 
* Language: Python
 
* Language: Python
* Mentor: Pavel Emelianov <xemul@virtuozzo.com>
 
* Suggested by: Pavel Emelianov <xemul@virtuozzo.com>
 
  
=== Porting crit functionalities in GO ===
+
=== Add support for checkpoint/restore of CORK-ed UDP socket ===
 
   
 
   
'''Summary:''' Implement image view and manipulation in Go
+
'''Summary:''' Support C/R of corked UDP socket
 
   
 
   
CRIU's checkpoint images are stored on disk using protobuf. For easier analysis of checkpoint files CRIU has a tool called [[CRIT|CRiu Image Tool (CRIT)]]. It can display/decode CRIU image files from binary protobuf to JSON as well as encode JSON files back to the binary format. With closer integration of CRIU in container runtimes it becomes important to be able to view the CRIU output files. Either for manipulation before restoring or for reading checkpoint statistics (memory pages written to disk, memory pages skipped, process downtime).
+
There's UDP_CORK option for sockets. As man page says:
 +
<pre>
 +
    If this option is enabled, then all data output on this socket
 +
    is accumulated into a single datagram that is transmitted when
 +
    the option is disabled. This option should not be used in
 +
    code intended to be portable.
 +
</pre>
  
Currently CRIT is implemented in Python, for easier integration in other Go projects it is important to have image manipulation and analysis available from GO. This means we need a Go based library to read/modify/write/encode/decode CRIU's image files. Based on this library a Go based implementation of CRIT would be useful.
+
Currently criu refuses to dump this case, so it's effectively a bug. Supporting
 +
this will need extending the kernel API to allow criu read back the write queue
 +
of the socket (see [[TCP connection|how it's done]] for TCP sockets, for example). Then
 +
the queue is written into the image and is restored into the socket (with the CORK
 +
bit set too).
  
'''Links:'''
+
'''Notes:'''
* [[CRIT]]
 
* Possible use case see LXD: https://github.com/lxc/lxd/blob/cb55b1c5a484a43e0c21c6ae8c4a2e30b4d45be3/lxd/migrate_container.go#L179
 
* https://github.com/lxc/lxd/pull/4072
 
* https://github.com/checkpoint-restore/go-criu/blob/master/phaul/stats.go
 
 
'''Details:'''
 
* Skill level: beginner
 
* Language: Go
 
* Mentor: Adrian Reber <areber@redhat.com>
 
* Suggested by: Adrian Reber <areber@redhat.com>
 
  
=== Memory changes tracking with userfaultfd-WP ===
+
We already had a couple (3) of tries for this problem:
 
'''Summary:''' add ability to track memory changes of the snapshotted processes using userfaultfd-WP
 
  
Currently CRIU uses [[Memory_changes_tracking|Soft-dirty]] mechanism in Linux kernel to track memory changes.
+
* UDP_REPAIR approach didn't succeed: https://lore.kernel.org/netdev/721a2e32-c930-ad6b-5055-631b502ed11b@gmail.com/, https://lore.kernel.org/netdev/?q=udp_repair
This mechanism can be complemented (or even completely replaced) with recently proposed write protection support for
+
* eBPF (CRIB) approach, socket queue iterator was not merged: https://lore.kernel.org/netdev/AM6PR03MB5848EDA002E3D7EACA7C6BDA99A52@AM6PR03MB5848.eurprd03.prod.outlook.com/, and we have general objections to CRIB approach https://lore.kernel.org/bpf/CAHk-=wjLWFa3i6+Tab67gnNumTYipj_HuheXr2RCq4zn0tCTzA@mail.gmail.com/
userfaultfd (userfaultfd-WP).
 
  
Userfault allows implementation of paging in userspace. It allows an application to receive notifications about page faults and provide the desired memory contents for the faulting pages. In the current upstream kernels only missing page faults are supported, but there is an ongoing work to allow notifications for write faults as well. Using such notifications it would be possible to precisely track memory changes during pre-dump iterations. This approach may prove to be more efficient than soft-dirty.
+
We still have one idea we didn't try, as UDP allows packets to be lost on the way on restore we can somehow mark the socket to drop all data before UNCORK. This way we don't really need to restore contents of UDP CORK-ed sockets send queue.
 
   
 
   
 
'''Links:'''
 
'''Links:'''
* [https://www.kernel.org/doc/html/latest/admin-guide/mm/userfaultfd.html Userfaultfd]
+
* https://github.com/checkpoint-restore/criu/issues/409
* [https://github.com/xzpeter/linux/tree/uffd-wp-merged Userfaultfd-WP]
+
* https://github.com/criupatchwork/criu/commit/a532312
* [https://www.kernel.org/doc/html/latest/admin-guide/mm/soft-dirty.html?highlight=soft%20dirty Soft-Dirty]
+
* [[Sockets]], [[TCP connection]]
* https://lwn.net/Articles/777258/
+
* [[https://groups.google.com/forum/#!topic/comp.os.linux.networking/Uz8PYiTCZSg UDP cork explained]]
 
+
 
'''Details:'''
 
'''Details:'''
* Skill level: most probably advanced?
+
* Skill level: intermediate (+linux kernel)
 
* Language: C
 
* Language: C
* Mentor: Mike Rapoport <rppt@linux.ibm.com>
+
* Expected size: 350 hours
* Suggested by: Mike Rapoport <rppt@linux.ibm.com>
+
* Mentors: Alexander Mikhalitsyn <alexander@mihalicyn.com>, Pavel Tikhomirov <ptikhomirov@virtuozzo.com>, Andrei Vagin <avagin@gmail.com>
 +
 
 +
 
  
 
[[Category:GSoC]]
 
[[Category:GSoC]]
 
[[Category:Development]]
 
[[Category:Development]]

Latest revision as of 18:52, 10 March 2025

Google Summer of Code (GSoC) is a global program that offers post-secondary students an opportunity to be paid for contributing to an open source project over a three month period.

This page contains project ideas for upcoming Google Summer of Code.

Contact[edit]

First, make sure to go through the GSoC Students Recommendations. Once you build CRIU locally and C/R a simple process successfully, please contact the respective mentor for the idea you are interested in. For general questions feel free to send an email to the mailing list or write in gitter.

Project ideas[edit]

Add support for memory compression[edit]

Summary: Support compression for page images

We would like to support memory page files compression in CRIU using one of the fastest algorithms (it's matter of discussion which one to choose!).

This task does not require any Linux kernel modifications and scope is limited to CRIU itself. At the same time it's complex enough as we need to touch memory dump/restore codepath in CRIU and also handle many corner cases like page-server and stuff.

Details:

  • Skill level: intermediate
  • Language: C
  • Expected size: 350 hours
  • Suggested by: Andrei Vagin <avagin@gmail.com>
  • Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Alexander Mikhalitsyn <alexander@mihalicyn.com>, Andrei Vagin <avagin@gmail.com>

Use eBPF to lock and unlock the network[edit]

Summary: Use eBPF instead of external iptables-restore tool for network lock and unlock.

During checkpointing and restoring CRIU locks the network to make sure no network packets are accepted by the network stack during the time the process is checkpointed. Currently CRIU calls out to iptables-restore to create and delete the corresponding iptables rules. Another approach which avoids calling out to the external binary iptables-restore would be to directly inject eBPF rules. There have been reports from users that iptables-restore fails in some way and eBPF could avoid this external dependency.

Links:

Details:

  • Skill level: intermediate
  • Language: C
  • Expected size: 350 hours
  • Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Prajwal S N <prajwalnadig21@gmail.com>
  • Suggested by: Adrian Reber <areber@redhat.com>

Files on detached mounts[edit]

Summary: Initial support of open files on "detached" mounts

When criu dumps a process with an open fd on a file, it gets the mount identifier (mnt_id) via /proc/<pid>/fdinfo/<fd>, so that criu knows from which exact mount the file was initially opened. This way criu can restore this fd by opening the same exact file from topologically the same mount in restored mount tree.

Restoring fd from the right mount can be important in different cases, for instance if the process would later want to resolve paths relative to the fd, and obviously resolving from the same file on different mount can lead to different resolved paths, or if the process wants to check path to the file via /proc/<pid>/fd/<fd>.

But we have a problem finding on which mount we need to reopen the file at restore if we only know mnt_id but can't find this mnt_id in /proc/<pid>/mountinfo.

Mountinfo file shows the mount tree topology of current mntns: parent - child relations, sharing group information, mountpoint and fs root information. And if we don't see mnt_id in it we don't know anything about this mount.

This can happen in two cases

  • 1) external mount or file - if file was opened from e.g. host it's mount would not be visible in container mountinfo
  • 2) mount was lazily unmounted

In case of 1) we have criu options to help criu handle external dependencies.

In case of 2) or no options provided criu can't resolve mnt_id in mountinfo and criu fails.

Solution: We can handle 2) with: resolving major/minor via fstat, using name_to_handle_at and open_by_handle_at to open same file on any other available mount from same superblock (same major/minor) in container. Now we have fd2 of the same file as fd, but on existing mount we can dump it as usual instead, and mark it as "detached" in image, now criu on restore knows where to find this file, but instead of just opening fd2 from actually restored mount, we create a temporary bindmount which is lazy unmounted just after open making the file appear as a file on detached mount.

Known problems with this approach:

  • Stat on btrfs gives wrong major/minor
  • file handles does not work everywhere
  • file handles can return fd2 on deleted file or on other hardlink, this needs special handling.

Additionally (optional part): We can export real major/minor in fdinfo (kernel). We can think of new kernel interface to get mount's major/minor and root (shift from fsroot) for detached mounts, if we have it we don't need file handle hack to find file on other mount (see fsinfo or getvalues kernel patches in LKML, can we add this info there?).

Details:

  • Skill level: intermediate
  • Language: C
  • Expected size: 350 hours
  • Mentor: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
  • Suggested by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

Checkpointing of POSIX message queues[edit]

Summary: Add support for checkpoint/restore of POSIX message queues

POSIX message queues are a widely used inter-process communication mechanism. Message queues are implemented as files on a virtual filesystem (mqueue), where a file descriptor (message queue descriptor) is used to perform operations such as sending or receiving messages. To support checkpoint/restore of POSIX message queues, we need a kernel interface (similar to MSG_PEEK) that would enable the retrieval of messages from a queue without removing them. This project aims to implement such an interface that allows retrieving all messages and their priorities from a POSIX message queue.

Links:

Details:

  • Skill level: intermediate
  • Language: C
  • Expected size: 350 hours
  • Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
  • Suggested by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>


Add support for arm64 Guarded Control Stack (GCS)[edit]

Summary: Support arm64 Guarded Control Stack (GCS)

The arm64 Guarded Control Stack (GCS) feature provides support for hardware protected stacks of return addresses, intended to provide hardening against return oriented programming (ROP) attacks and to make it easier to gather call stacks for applications such as profiling (taken from [1]). We would like to support arm64 Guarded Control Stack (GCS) in CRIU, which means that CRIU should be able to Checkpoint/Restore applications using GCS.

This task should not require any Linux kernel modifications but will require a lot of effort to understand Linux kernel and glibc support patches. We have a good example of support for x86 shadow stack [4].

Links:

Details:

  • Skill level: expert (a lot of moving parts: Linux kernel / libc / CRIU)
  • Language: C
  • Expected size: 350 hours
  • Suggested by: Mike Rapoport <rppt@kernel.org>
  • Mentors: Mike Rapoport <rppt@kernel.org>, Andrei Vagin <avagin@gmail.com>, Alexander Mikhalitsyn <alexander@mihalicyn.com>

Coordinated checkpointing of distributed applications[edit]

Summary: Enable coordinated container checkpointing with Kubernetes.

Checkpointing support has been recently introduced in Kubernetes, where the smallest deployable unit is a Pod (a group of containers). Kubernetes is often used to deploy applications that are distributed across multiple nodes. However, checkpointing such distributed applications requires a coordination mechanism to synchronize the checkpoint and restore operations. To address this challenge, we have developed a new tool called criu-coordinator that relies on the action-script functionality of CRIU to enable synchronization in distributed environments. This project aims to extend this tool to enable seamless integration with the checkpointing functionality of Kubernetes.

Links:

Details:

  • Skill level: intermediate
  • Language: Rust / Go / C
  • Expected size: 350 hours
  • Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Prajwal S N <prajwalnadig21@gmail.com>
  • Suggested by: Radostin Stoyanov <rstoyanov@fedoraproject.org>

Suspended project ideas[edit]

Listed here are tasks that seem suitable for GSoC, but currently do not have anybody to mentor it.

Optimize logging engine[edit]

Summary: CRIU puts a lots of logs when doing its job. Logging is done with simple fprintf function. They are typically useless, but if some operation fails -- the logs are the only way to find what was the reason for failure.

At the same time the printf family of functions is known to take some time to work -- they need to scan the format string for %-s and then convert the arguments into strings. If comparing criu dump with and without logs the time difference is notable (15%-20%), so speeding the logs up will help improve criu performance.

One of the solutions to the problem might be binary logging. The problem with binary logs is the amount of efforts to convert existing logs to binary form. Preferably, the switch to binary logging either keeps existing log() calls intact, either has some automatics to convert them.

The option to keep log() calls intact might be in pre-compilation pass of the sources. In this pass each log(fmt, ...) call gets translated into a call to a binary log function that saves fmt identifier copies all the args as is into the log file. The binary log decode utility, required in this case, should then find the fmt string by its ID in the log file and print the resulting message.

Links:

Details:

  • Skill level: intermediate
  • Language: C, though decoder/preprocessor can be in any language
  • Expected size: 350 hours
  • Suggested by: Andrei Vagin
  • Mentors: Alexander Mikhalitsyn <alexander@mihalicyn.com>

IOUring support[edit]

The io_uring Asynchronous I/O (AIO) framework is a new Linux I/O interface, first introduced in upstream Linux kernel version 5.1 (March 2019). It provides a low-latency and feature-rich interface for applications that require AIO functionality.

Links:

Details:

  • Skill level: expert (+linux kernel)
  • Expected size: 350 hours

Add support for SPFS[edit]

Summary: The SPFS is a special filesystem that allows checkpoint and restore of such things as NFS and FUSE

NFS support is already implemented in Virtuozzo CRIU, but it's very beneficial to port it to mainline CRIU. The importaint part of it is the need to implement the integration of Stub-Proxy File System (SPFS) with LXC/yet_another_containers_environment.

Links

Details:

  • Skill level: expert
  • Language: C
  • Mentor: Alexander Mikhalitsyn <alexander@mihalicyn.com>
  • Suggested by: Alexander Mikhalitsyn <alexander@mihalicyn.com>


Anonymise image files[edit]

Summary: Teach CRIT to remove sensitive information from images

When reporting a BUG it may be not acceptable for the reporter to send us raw images, as they may contain sensitive data. Need to teach CRIT to "anonymise" images for publication.

List of data to shred:

  • Memory contents. For the sake of investigation, all the memory contents can be just removed. Only the sizes of pages*.img files are enough.
  • Paths to files. Here we should keep the paths relations to each other. The simplest way seem to be replacing file names with "random" (or sequential) strings, BUT (!) keeping an eye on making this mapping be 1:1. Note, that file paths may also sit in sk-unix.img.
  • Registers.
  • Process names. (But relations should be kept).
  • Contents of streams, i.e. pipe/fifo data, sk-queue, tcp-stream, tty data.
  • Ghost files.
  • Tarballs with tmpfs-s.
  • IP addresses in sk-inet-s, ip tool dumps and net*.img.

Links:

Details:

  • Skill level: beginner
  • Language: Python

Add support for checkpoint/restore of CORK-ed UDP socket[edit]

Summary: Support C/R of corked UDP socket

There's UDP_CORK option for sockets. As man page says:

    If this option is enabled, then all data output on this socket
    is accumulated into a single datagram that is transmitted when
    the option is disabled.  This option should not be used in
    code intended to be portable.

Currently criu refuses to dump this case, so it's effectively a bug. Supporting this will need extending the kernel API to allow criu read back the write queue of the socket (see how it's done for TCP sockets, for example). Then the queue is written into the image and is restored into the socket (with the CORK bit set too).

Notes:

We already had a couple (3) of tries for this problem:

We still have one idea we didn't try, as UDP allows packets to be lost on the way on restore we can somehow mark the socket to drop all data before UNCORK. This way we don't really need to restore contents of UDP CORK-ed sockets send queue.

Links:

Details:

  • Skill level: intermediate (+linux kernel)
  • Language: C
  • Expected size: 350 hours
  • Mentors: Alexander Mikhalitsyn <alexander@mihalicyn.com>, Pavel Tikhomirov <ptikhomirov@virtuozzo.com>, Andrei Vagin <avagin@gmail.com>