Difference between revisions of "Google Summer of Code Ideas"

Latest revision as of 11:34, 31 March 2026

Google Summer of Code (GSoC) is a global program that offers post-secondary students an opportunity to be paid for contributing to an open source project over a three month period.

This page contains project ideas for upcoming Google Summer of Code.

ContactEdit

First, make sure to go through the GSoC Students Recommendations. Once you build CRIU locally and C/R a simple process successfully, please contact the respective mentor for the idea you are interested in. For general questions feel free to send an email to the mailing list or write in gitter.

Project ideasEdit

Kubernetes Operator for Automated CheckpointingEdit

Summary: Extend the Checkpoint/Restore Operator with support for automated policy-based checkpointing.

The Checkpoint/Restore Operator for Kubernetes currently supports only policies and parameters that limit the number of checkpoints. This project aims to extend the current support with automated policy-based checkpointing, allowing users to define triggers for checkpoint creation, such as time-based schedules, resource thresholds (CPU, memory, I/O usage), Kubernetes events (node drain, pod eviction, preemption), and application-level signals or annotations.

Links:

Details:

Skill level: intermediate
Language: Go
Expected size: 350 hours
Mentors: Prajwal S N <prajwalnadig21@gmail.com>, Radostin Stoyanov <rstoyanov@fedoraproject.org>, Adrian Reber <areber@redhat.com>

Forensic Checkpointing Framework for KubernetesEdit

Kubernetes provides a highly dynamic and ephemeral environment where workloads can start and disappear very quickly and are continuously being rescheduled across different nodes in the cluster. One of the key challenges with forensic investigations in Kubernetes is capturing and preserving the evidence during security incidents. This project aims to address this problem by developing a framework for efficiently capturing and preserving the state of all running applications in a container at a specific point in time, along with the associated container configurations and metadata. These artifacts would allow investigators to accurately reconstruct the events, create a timeline, and analyze security incidents without impacting the running cluster. This is an important step towards enabling forensic readiness for Kubernetes, where cluster administrators proactively ensure the environments are prepared to collect and preserve evidence before a security incident occurs.

Links:

Details:

Skill level: intermediate
Language: Go
Expected size: 350 hours
Mentors: Lorena Goldoni <lory.goldoni@gmail.com>, Radostin Stoyanov <rstoyanov@fedoraproject.org>, Adrian Reber <areber@redhat.com>

Enabling Checkpoint/Restore of Rootless ContainersEdit

Rootless containers are containers that can be created, run, and managed by unprivileged users. Container engines such as Podman natively support running containers in a rootless mode to improve security and usability. While checkpoint/restore functionality is already available for rootful containers and unprivileged checkpointing is possible with the CAP_CHECKPOINT_RESTORE capability, container engines do not yet support native checkpointing of containers running in rootless mode. This project aims to explore and address the remaining challenges required to enable unprivileged checkpoint/restore for rootless containers.

Links:

Details:

Skill level: intermediate
Language: C, Go
Expected size: 350 hours
Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Adrian Reber <areber@redhat.com>

Checkpointing of POSIX message queuesEdit

Summary: Add support for checkpoint/restore of POSIX message queues

POSIX message queues are a widely used inter-process communication mechanism. Message queues are implemented as files on a virtual filesystem (mqueue), where a file descriptor (message queue descriptor) is used to perform operations such as sending or receiving messages. To support checkpoint/restore of POSIX message queues, we need a kernel interface (similar to MSG_PEEK) that would enable the retrieval of messages from a queue without removing them. This project aims to implement such an interface that allows retrieving all messages and their priorities from a POSIX message queue.

Links:

Details:

Skill level: intermediate
Language: C
Expected size: 350 hours
Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Suggested by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

Add support for SCM_CREDENTIALS / SCM_PIDFD and friendsEdit

Summary: Support for SCM_CREDENTIALS / SCM_PIDFD

SCM_CREDENTIALS and SCM_PIDFD are types of SCM (Socket-level Control Messages). They play a crucial role in systemd and many other user space applications. This project is about adding support for these SCMs to be properly saved and restored back with CRIU. There is an existing code in OpenVZ CRIU fork, see [1] and [2]. Goal would be first of all to properly port this code, cover with extensive tests and ensure that SCM_PIDFD / SO_PEERPIDFD are handled correctly. Also we expect to cover things like SO_PASSRIGHTS and SO_PASSPIDFD.

There is some extra source of complexity here pidfds can be "stale" (see PIDFD_STALE in Linux kernel) and we need to ensure that we properly cover those cases.

Links:

[1] openvz-criu https://bitbucket.org/openvz/criu.ovz/history-node/918653a0a343194385592d7b50b5bd7a8fbe1cc1/criu/sk-unix.c?at=hci-dev
[2] openvz-criu https://bitbucket.org/openvz/criu.ovz/history-node/918653a0a343194385592d7b50b5bd7a8fbe1cc1/criu/sk-queue.c?at=hci-dev
[3] Linux kernel https://github.com/torvalds/linux/commit/5e2ff6704a275be009be8979af17c52361b79b89
[4] Linux kernel https://github.com/torvalds/linux/commit/c679d17d3f2d895b34e660673141ad250889831f

Details:

Skill level: intermediate / advanced
Language: C
Expected size: 350 hours
Suggested by: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Mentors: Andrei Vagin <avagin@gmail.com>, Alexander Mikhalitsyn <alexander@mihalicyn.com>

Integrate with Live Update Orchestrator (LUO)Edit

Summary: Integrate with Live Update Orchestrator (LUO)

Live Update Orchestrator (LUO) is a framework for Linux kernel live updates (via kexec). Idea behind it is to provide kernel and user space API to save specific system resources across kexec reboot.

This research project explores how CRIU can be integrated with LUO. For example, if a user is running memcached on a node, the current approach would require a full CRIU dump, then saving the entire process memory to disk, then followed by restoring it after the kernel live update.

Instead, CRIU could be extended to leverage the LUO API. When instructed, it could preserve selected memory regions directly across the kexec reboot, avoiding a full disk dump and significantly accelerating the restore process after the kernel update.

Links:

[1] LUO kernel documentation https://docs.kernel.org/core-api/liveupdate.html
[2] LUO memfd doc https://docs.kernel.org/mm/memfd_preservation.html

Details:

Skill level: intermediate / advanced
Language: C
Expected size: 350 hours
Suggested by: Andrei Vagin <avagin@gmail.com>
Mentors: Andrei Vagin <avagin@gmail.com>, Alexander Mikhalitsyn <alexander@mihalicyn.com>

Optimize COW memory dumpingEdit

Summary: Optimize COW memory dumping

The Linux kernel memory management subsystem is highly optimized not only for performance, but also to minimize unnecessary memory consumption. A key example of this is how the kernel handles private VMAs when user space invokes the fork() system call.

Rather than duplicating the entire VMA tree along with all memory contents, the kernel creates optimized copies of inherited VMAs using the Copy-on-Write (COW) mechanism. When a process writes to a page within a COW-ed VMA, a write page fault occurs, and the kernel creates a private copy of that page before applying the modification. However, if the page is only read, no copying is performed.

This approach significantly improves fork() performance and can dramatically reduce memory usage in many workloads.

In CRIU, when dumping VMAs and their associated memory pages, this COW optimization is not currently taken into account during the dump phase. As a result, for COW-backed VMAs, CRIU may generate multiple copies of identical memory pages in the dump image.

During restore, however, CRIU explicitly handles this situation (see [1] and [2]) and attempts to reconstruct COW relationships inside the kernel. This step is critical: without it, a checkpoint/restore (C/R) cycle could lead to a substantial increase in memory consumption for the same process tree. For example, a workload that originally consumed 500 MiB could expand to 800 MiB after restore, which is clearly unacceptable.

This project aims to improve the dumping algorithm so that it avoids producing multiple unnecessary copies of identical pages belonging to COW-ed VMAs.

The project requires some understanding of Linux memory management internals and CRIU’s architecture. We strongly encourage GSoC contributors to study references [1] and [2] and experiment with the relevant code paths before applying. We are happy to answer questions and provide guidance along the way.

Links:

[1] preparing COW VMAs https://github.com/checkpoint-restore/criu/blob/c180188db036f8ea4c08bfee28cbcdbdd52cdfc3/criu/mem.c#L878
[2] private vma content restore cow case https://github.com/checkpoint-restore/criu/blob/c180188db036f8ea4c08bfee28cbcdbdd52cdfc3/criu/mem.c#L1219

Details:

Skill level: intermediate / advanced
Language: C
Expected size: 350 hours
Suggested by: Andrei Vagin <avagin@gmail.com>
Mentors: Andrei Vagin <avagin@gmail.com>, Alexander Mikhalitsyn <alexander@mihalicyn.com>

Suspended project ideasEdit

Listed here are tasks that seem suitable for GSoC, but currently do not have anybody to mentor it.

Optimize logging engineEdit

Summary: CRIU puts a lots of logs when doing its job. Logging is done with simple fprintf function. They are typically useless, but if some operation fails -- the logs are the only way to find what was the reason for failure.

At the same time the printf family of functions is known to take some time to work -- they need to scan the format string for %-s and then convert the arguments into strings. If comparing criu dump with and without logs the time difference is notable (15%-20%), so speeding the logs up will help improve criu performance.

One of the solutions to the problem might be binary logging. The problem with binary logs is the amount of efforts to convert existing logs to binary form. Preferably, the switch to binary logging either keeps existing log() calls intact, either has some automatics to convert them.

The option to keep log() calls intact might be in pre-compilation pass of the sources. In this pass each log(fmt, ...) call gets translated into a call to a binary log function that saves fmt identifier copies all the args as is into the log file. The binary log decode utility, required in this case, should then find the fmt string by its ID in the log file and print the resulting message.

Links:

Better logging

Details:

Skill level: intermediate
Language: C, though decoder/preprocessor can be in any language
Expected size: 350 hours
Suggested by: Andrei Vagin
Mentors: Alexander Mikhalitsyn <alexander@mihalicyn.com>

IOUring supportEdit

The io_uring Asynchronous I/O (AIO) framework is a new Linux I/O interface, first introduced in upstream Linux kernel version 5.1 (March 2019). It provides a low-latency and feature-rich interface for applications that require AIO functionality.

Links:

Details:

Skill level: expert (+linux kernel)
Expected size: 350 hours

Add support for SPFSEdit

Summary: The SPFS is a special filesystem that allows checkpoint and restore of such things as NFS and FUSE

NFS support is already implemented in Virtuozzo CRIU, but it's very beneficial to port it to mainline CRIU. The importaint part of it is the need to implement the integration of Stub-Proxy File System (SPFS) with LXC/yet_another_containers_environment.

Links

Details:

Skill level: expert
Language: C
Mentor: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Suggested by: Alexander Mikhalitsyn <alexander@mihalicyn.com>

Anonymise image filesEdit

Summary: Teach CRIT to remove sensitive information from images

When reporting a BUG it may be not acceptable for the reporter to send us raw images, as they may contain sensitive data. Need to teach CRIT to "anonymise" images for publication.

List of data to shred:

Memory contents. For the sake of investigation, all the memory contents can be just removed. Only the sizes of pages*.img files are enough.
Paths to files. Here we should keep the paths relations to each other. The simplest way seem to be replacing file names with "random" (or sequential) strings, BUT (!) keeping an eye on making this mapping be 1:1. Note, that file paths may also sit in sk-unix.img.
Registers.
Process names. (But relations should be kept).
Contents of streams, i.e. pipe/fifo data, sk-queue, tcp-stream, tty data.
Ghost files.
Tarballs with tmpfs-s.
IP addresses in sk-inet-s, ip tool dumps and net*.img.

Links:

Anonymize image files
https://github.com/checkpoint-restore/criu/issues/360
CRIT, Images
External links to mailing lists or web sites

Details:

Skill level: beginner
Language: Python

Add support for checkpoint/restore of CORK-ed UDP socketEdit

Summary: Support C/R of corked UDP socket

There's UDP_CORK option for sockets. As man page says:

    If this option is enabled, then all data output on this socket
    is accumulated into a single datagram that is transmitted when
    the option is disabled.  This option should not be used in
    code intended to be portable.

Currently criu refuses to dump this case, so it's effectively a bug. Supporting this will need extending the kernel API to allow criu read back the write queue of the socket (see how it's done for TCP sockets, for example). Then the queue is written into the image and is restored into the socket (with the CORK bit set too).

Notes:

We already had a couple (3) of tries for this problem:

UDP_REPAIR approach didn't succeed: https://lore.kernel.org/netdev/721a2e32-c930-ad6b-5055-631b502ed11b@gmail.com/, https://lore.kernel.org/netdev/?q=udp_repair
eBPF (CRIB) approach, socket queue iterator was not merged: https://lore.kernel.org/netdev/AM6PR03MB5848EDA002E3D7EACA7C6BDA99A52@AM6PR03MB5848.eurprd03.prod.outlook.com/, and we have general objections to CRIB approach https://lore.kernel.org/bpf/CAHk-=wjLWFa3i6+Tab67gnNumTYipj_HuheXr2RCq4zn0tCTzA@mail.gmail.com/

We still have one idea we didn't try, as UDP allows packets to be lost on the way on restore we can somehow mark the socket to drop all data before UNCORK. This way we don't really need to restore contents of UDP CORK-ed sockets send queue.

Links:

Details:

Skill level: intermediate (+linux kernel)
Language: C
Expected size: 350 hours
Mentors: Alexander Mikhalitsyn <alexander@mihalicyn.com>, Pavel Tikhomirov <ptikhomirov@virtuozzo.com>, Andrei Vagin <avagin@gmail.com>

@@ Line 3: / Line 3: @@
 This page contains project ideas for upcoming Google Summer of Code.
-== Suggested ideas ==
+== Contact ==
+First, make sure to go through the [[GSoC Students Recommendations]]. Once you build CRIU locally and C/R a simple process successfully, please contact the respective mentor for the idea you are interested in. For general questions feel free to send an email to the [mailto:criu@lists.linux.dev mailing list] or write in [https://gitter.im/save-restore/criu gitter].
-=== Post-copy for shared memory and hugetlbfs ===
+== Project ideas ==
-'''Summary:''' extend post-copy memory restore and migration to support shared memory and hugetlbfs.
+=== Kubernetes Operator for Automated Checkpointing ===
+'''Summary:''' Extend the Checkpoint/Restore Operator with support for automated policy-based checkpointing.
+The [https://github.com/checkpoint-restore/checkpoint-restore-operator Checkpoint/Restore Operator] for Kubernetes currently supports only policies and parameters that limit the number of checkpoints. This project aims to extend the current support with automated policy-based checkpointing, allowing users to define triggers for checkpoint creation, such as time-based schedules, resource thresholds (CPU, memory, I/O usage), Kubernetes events (node drain, pod eviction, preemption), and application-level signals or annotations.
+'''Links:'''
+* https://github.com/checkpoint-restore/checkpoint-restore-operator
+* https://kubernetes.io/docs/reference/node/kubelet-checkpoint-api
+'''Details:'''
+* Skill level: intermediate
+* Language: Go
+* Expected size: 350 hours
+* Mentors: Prajwal S N <prajwalnadig21@gmail.com>, Radostin Stoyanov <rstoyanov@fedoraproject.org>, Adrian Reber <areber@redhat.com>
+=== Forensic Checkpointing Framework for Kubernetes ===
+Kubernetes provides a highly dynamic and ephemeral environment where workloads can start and disappear very quickly and are continuously being rescheduled across different nodes in the cluster.
+One of the key challenges with forensic investigations in Kubernetes is capturing and preserving the evidence during security incidents. This project aims to address this problem by developing a framework for efficiently capturing and preserving the state of all running applications in a container at a specific point in time, along with the associated container configurations and metadata. These artifacts would allow investigators to accurately reconstruct the events, create a timeline, and analyze security incidents without impacting the running cluster. This is an important step towards enabling forensic readiness for Kubernetes, where cluster administrators proactively ensure the environments are prepared to collect and preserve evidence before a security incident occurs.
+'''Links:'''
+* https://github.com/checkpoint-restore/checkpointctl
+* [https://fosdem.org/2026/events/attachments/F9RANH-forensic-snapshots-in-kubernetes/slides/267371/fosdem_2_4dh73ni.pdf Investigating Security Incidents with Forensic Snapshots in Kubernetes]
+* [https://www.cncf.io/reports/cloud-native-security-whitepaper/ Cloud Native Security Whitepaper]
+* [https://media.defense.gov/2022/Aug/29/2003066362/-1/-1/0/CTR_KUBERNETES_HARDENING_GUIDANCE_1.2_20220829.PDF Kubernetes Hardening Guide]
+'''Details:'''
+* Skill level: intermediate
+* Language: Go
+* Expected size: 350 hours
+* Mentors: Lorena Goldoni <lory.goldoni@gmail.com>, Radostin Stoyanov <rstoyanov@fedoraproject.org>, Adrian Reber <areber@redhat.com>
+=== Enabling Checkpoint/Restore of Rootless Containers ===
+[https://rootlesscontaine.rs/ Rootless containers] are containers that can be created, run, and managed by unprivileged users. Container engines such as Podman natively support running containers in a rootless mode to improve security and usability. While checkpoint/restore functionality is already available for rootful containers and unprivileged checkpointing is possible with the <code>CAP_CHECKPOINT_RESTORE</code> capability, container engines do not yet support native checkpointing of containers running in rootless mode. This project aims to explore and address the remaining challenges required to enable unprivileged checkpoint/restore for rootless containers.
+'''Links:'''
+* https://github.com/checkpoint-restore/criu/pull/1930
+* https://github.com/torvalds/linux/commit/124ea650d3072b005457faed69909221c2905a1f
+* https://src.fedoraproject.org/rpms/criu/pull-request/10#request_diff
+'''Details:'''
+* Skill level: intermediate
+* Language: C, Go
+* Expected size: 350 hours
+* Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Adrian Reber <areber@redhat.com>
+=== Checkpointing of POSIX message queues ===
+'''Summary:''' Add support for checkpoint/restore of POSIX message queues
+POSIX message queues are a widely used inter-process communication mechanism. Message queues are implemented as files on a virtual filesystem (mqueue), where a file descriptor (message queue descriptor) is used to perform operations such as sending or receiving messages. To support checkpoint/restore of POSIX message queues, we need a kernel interface (similar to [https://github.com/checkpoint-restore/criu/commit/8ce9e947051e43430eb2ff06b96dddeba467b4fd MSG_PEEK]) that would enable the retrieval of messages from a queue without removing them. This project aims to implement such an interface that allows retrieving all messages and their priorities from a POSIX message queue.
+'''Links:'''
+* https://github.com/checkpoint-restore/criu/issues/2285
+* https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/ipc/mqueue.c
+* https://www.man7.org/tlpi/download/TLPI-52-POSIX_Message_Queues.pdf
+'''Details:'''
+* Skill level: intermediate
+* Language: C
+* Expected size: 350 hours
+* Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
+* Suggested by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
+=== Add support for SCM_CREDENTIALS / SCM_PIDFD and friends ===
+'''Summary:''' Support for SCM_CREDENTIALS / SCM_PIDFD
+SCM_CREDENTIALS and SCM_PIDFD are types of SCM (Socket-level Control Messages). They play a crucial role
+in systemd and many other user space applications. This project is about adding support for these
+SCMs to be properly saved and restored back with CRIU. There is an existing code in OpenVZ CRIU fork,
+see [1] and [2]. Goal would be first of all to properly port this code, cover with extensive tests and
+ensure that SCM_PIDFD / SO_PEERPIDFD are handled correctly. Also we expect to cover things like
+SO_PASSRIGHTS and SO_PASSPIDFD.
+There is some extra source of complexity here pidfds can be "stale" (see PIDFD_STALE in Linux kernel)
+and we need to ensure that we properly cover those cases.
+'''Links:'''
+* [1] openvz-criu https://bitbucket.org/openvz/criu.ovz/history-node/918653a0a343194385592d7b50b5bd7a8fbe1cc1/criu/sk-unix.c?at=hci-dev
+* [2] openvz-criu https://bitbucket.org/openvz/criu.ovz/history-node/918653a0a343194385592d7b50b5bd7a8fbe1cc1/criu/sk-queue.c?at=hci-dev
+* [3] Linux kernel https://github.com/torvalds/linux/commit/5e2ff6704a275be009be8979af17c52361b79b89
+* [4] Linux kernel https://github.com/torvalds/linux/commit/c679d17d3f2d895b34e660673141ad250889831f
+'''Details:'''
+* Skill level: intermediate / advanced
+* Language: C
+* Expected size: 350 hours
+* Suggested by: Alexander Mikhalitsyn <alexander@mihalicyn.com>
+* Mentors: Andrei Vagin <avagin@gmail.com>, Alexander Mikhalitsyn <alexander@mihalicyn.com>
+=== Integrate with Live Update Orchestrator (LUO) ===
+'''Summary:''' Integrate with Live Update Orchestrator (LUO)
+Live Update Orchestrator (LUO) is a framework for Linux kernel
+live updates (via kexec). Idea behind it is to provide kernel
+and user space API to save specific system resources across
+kexec reboot.
+This research project explores how CRIU can be integrated with LUO.
+For example, if a user is running memcached on a node, the current
+approach would require a full CRIU dump, then saving the entire
+process memory to disk, then followed by restoring it after the
+kernel live update.
-CRIU relies on [[Userfaultfd]] mechanism in the Linux kernel to implement the demand paging in userspace and allow post-copy memory (or lazy) [[Lazy_migration|migration]]. However, currently this support is limited to anonymous private memory mappings, although kernel also supports shared memory areas and hugetlbfs backed memory.
+Instead, CRIU could be extended to leverage the LUO API. When instructed,
+it could preserve selected memory regions directly across the kexec reboot,
+avoiding a full disk dump and significantly accelerating the restore process
+after the kernel update.
-The shared memory support for lazy migration can be added to CRIU without kernel modifications, while proper handling of hugetlbfs would require userfaultfd callbacks for [http://man7.org/linux/man-pages/man2/fallocate.2.html fallocate(PUNCH_HOLE)] and [http://man7.org/linux/man-pages/man2/madvise.2.html madvise(MADV_REMOVE)] system calls.
 '''Links:'''
-* [https://www.kernel.org/doc/html/latest/admin-guide/mm/userfaultfd.html Userfaultfd]
+* [1] LUO kernel documentation https://docs.kernel.org/core-api/liveupdate.html
-* [https://www.kernel.org/doc/html/latest/admin-guide/mm/hugetlbpage.html hugetlbfs]
+* [2] LUO memfd doc https://docs.kernel.org/mm/memfd_preservation.html
 '''Details:'''
-* Skill level: most probably advanced?
+* Skill level: intermediate / advanced
 * Language: C
-* Mentor: Mike Rapoport <rppt@linux.ibm.com> / Andrey Vagin <avagin@gmail.com>
+* Expected size: 350 hours
-* Suggested by: Mike Rapoport <rppt@linux.ibm.com>
+* Suggested by: Andrei Vagin <avagin@gmail.com>
+* Mentors: Andrei Vagin <avagin@gmail.com>, Alexander Mikhalitsyn <alexander@mihalicyn.com>
+=== Optimize COW memory dumping ===
+'''Summary:''' Optimize COW memory dumping
+The Linux kernel memory management subsystem is highly optimized not only for performance, but also to minimize unnecessary memory consumption. A key example of this is how the kernel handles private VMAs when user space invokes the fork() system call.
+Rather than duplicating the entire VMA tree along with all memory contents, the kernel creates optimized copies of inherited VMAs using the Copy-on-Write (COW) mechanism. When a process writes to a page within a COW-ed VMA, a write page fault occurs, and the kernel creates a private copy of that page before applying the modification. However, if the page is only read, no copying is performed.
+This approach significantly improves fork() performance and can dramatically reduce memory usage in many workloads.
+In CRIU, when dumping VMAs and their associated memory pages, this COW optimization is not currently taken into account during the dump phase. As a result, for COW-backed VMAs, CRIU may generate multiple copies of identical memory pages in the dump image.
+During restore, however, CRIU explicitly handles this situation (see [1] and [2]) and attempts to reconstruct COW relationships inside the kernel. This step is critical: without it, a checkpoint/restore (C/R) cycle could lead to a substantial increase in memory consumption for the same process tree. For example, a workload that originally consumed 500 MiB could expand to 800 MiB after restore, which is clearly unacceptable.
+This project aims to improve the dumping algorithm so that it avoids producing multiple unnecessary copies of identical pages belonging to COW-ed VMAs.
+The project requires some understanding of Linux memory management internals and CRIU’s architecture. We strongly encourage GSoC contributors to study references [1] and [2] and experiment with the relevant code paths before applying. We are happy to answer questions and provide guidance along the way.
+'''Links:'''
+* [1] preparing COW VMAs https://github.com/checkpoint-restore/criu/blob/c180188db036f8ea4c08bfee28cbcdbdd52cdfc3/criu/mem.c#L878
+* [2] private vma content restore cow case https://github.com/checkpoint-restore/criu/blob/c180188db036f8ea4c08bfee28cbcdbdd52cdfc3/criu/mem.c#L1219
+'''Details:'''
+* Skill level: intermediate / advanced
+* Language: C
+* Expected size: 350 hours
+* Suggested by: Andrei Vagin <avagin@gmail.com>
+* Mentors: Andrei Vagin <avagin@gmail.com>, Alexander Mikhalitsyn <alexander@mihalicyn.com>
+== Suspended project ideas ==
+Listed here are tasks that seem suitable for GSoC, but currently do not have anybody to mentor it.
 === Optimize logging engine ===
@@ Line 33: / Line 176: @@
 The option to keep log() calls intact might be in pre-compilation pass of the sources. In this pass each <code>log(fmt, ...)</code> call gets translated into a call to a binary log function that saves <code>fmt</code> identifier copies all the args ''as is'' into the log file. The binary log decode utility, required in this case, should then find the fmt string by its ID in the log file and print the resulting message.
 '''Links:'''
 * [[Better logging]]
@@ Line 40: / Line 182: @@
 * Skill level: intermediate
 * Language: C, though decoder/preprocessor can be in any language
-* Mentor: Andrei Vagin <avagin@gmail.com> / Pavel Emelyanov <xemul@openvz.org>
+* Expected size: 350 hours
-* Suggested by: Andrei Vagin <avagin@gmail.com>
+* Suggested by: Andrei Vagin
+* Mentors: Alexander Mikhalitsyn <alexander@mihalicyn.com>
-=== Add support for checkpoint/restore of CORK-ed UDP socket ===
+=== IOUring support ===
+The io_uring Asynchronous I/O (AIO) framework is a new Linux I/O interface, first introduced in upstream Linux kernel version 5.1 (March 2019). It provides a low-latency and feature-rich interface for applications that require AIO functionality.
-'''Summary:''' Support C/R of corked UDP socket
-There's UDP_CORK option for sockets. As man page says:
-<pre>
-    If this option is enabled, then all data output on this socket
-    is accumulated into a single datagram that is transmitted when
-    the option is disabled.  This option should not be used in
-    code intended to be portable.
-</pre>
-Currently criu refuses to dump this case, so it's effectively a bug. Supporting
-this will need extending the kernel API to allow criu read back the write queue
-of the socket (see [[TCP connection|how it's done]] for TCP sockets, for example). Then
-the queue is written into the image and is restored into the socket (with the CORK
-bit set too).
 '''Links:'''
-* https://github.com/checkpoint-restore/criu/issues/409
+* https://blogs.oracle.com/linux/an-introduction-to-the-io_uring-asynchronous-io-framework
-* [[Sockets]], [[TCP connection]]
+* https://github.com/axboe/liburing
-* [[https://groups.google.com/forum/#!topic/comp.os.linux.networking/Uz8PYiTCZSg UDP cork explained]]
 '''Details:'''
-* Skill level: intermediate (+linux kernel)
+* Skill level: expert (+linux kernel)
-* Language: C
+* Expected size: 350 hours
-* Mentor: Pavel Emelianov <xemul@virtuozzo.com>
-* Suggested by: Pavel Emelianov <xemul@virtuozzo.com>
-=== Optimize the pre-dump algorithm ===
+=== Add support for SPFS ===
-'''Summary:''' Optimize the pre-dump algorithm to avoid pinning to many memory in RAM
+'''Summary:''' The SPFS is a special filesystem that allows checkpoint and restore of such things as NFS and FUSE
-Current [[CLI/cmd/pre-dump|pre-dump]] mode is used to write task memory contents into image
+NFS support is already implemented in Virtuozzo CRIU, but it's very beneficial to port it to mainline CRIU. The importaint part of it is the need to implement the integration of Stub-Proxy File System (SPFS) with LXC/yet_another_containers_environment.
-files w/o stopping the task for too long. It does this by stopping the task, infecting it and
-draining all the memory into a set of pipes. Then the task is cured, resumed and the pipes'
-contents is written into images (maybe a [[page server]]). Unfortunately, this approach creates
-a big stress on the memory subsystem, as keeping all memory in pipes creates a lot of unreclaimable
-memory (pages in pipes are not swappable), as well as the number of pipes themselves can be huge, as
-one pipe doesn't store more than a fixed amount of data (see pipe(7) man page).
-A solution for this problem is to use a sys_read_process_vm() syscall, which will mitigate
+'''Links'''
-all of the above. To do this we need to allocate a temporary buffer in criu, then walk the
+* https://github.com/checkpoint-restore/criu/issues/60
-target process vm by copying the memory piece-by-piece into it, then flush the data into image
+* https://github.com/checkpoint-restore/criu/issues/53
-(or page server), and repeat.
+* https://github.com/skinsbursky/spfs
+* https://patchwork.criu.org/series/137/
-Ideally there should be sys_splice_process_vm() syscall in the kernel, that does the same as
-the read_process_vm does, but vmsplices the data
-'''Links:'''
-* [[Memory pre dump]]
-* https://github.com/checkpoint-restore/criu/issues/351
-* [[Memory dumping and restoring]], [[Memory changes tracking]]
-* [http://man7.org/linux/man-pages/man2/process_vm_readv.2.html process_vm_readv(2)] [http://man7.org/linux/man-pages/man2/vmsplice.2.html vmsplice(2)] [https://lkml.org/lkml/2018/1/9/32 RFC for splice_process_vm syscall]
 '''Details:'''
-* Skill level: advanced
+* Skill level: expert
 * Language: C
-* Mentor: Pavel Emelianov <xemul@virtuozzo.com>
+* Mentor: Alexander Mikhalitsyn <alexander@mihalicyn.com>
-* Suggested by: Pavel Emelianov <xemul@virtuozzo.com>
+* Suggested by: Alexander Mikhalitsyn <alexander@mihalicyn.com>
 === Anonymise image files ===
@@ Line 130: / Line 242: @@
 * Skill level: beginner
 * Language: Python
-* Mentor: Pavel Emelianov <xemul@virtuozzo.com>
-* Suggested by: Pavel Emelianov <xemul@virtuozzo.com>
-=== Porting crit functionalities in GO ===
+=== Add support for checkpoint/restore of CORK-ed UDP socket ===
-'''Summary:''' Implement image view and manipulation in Go
+'''Summary:''' Support C/R of corked UDP socket
-CRIU's checkpoint images are stored on disk using protobuf. For easier analysis of checkpoint files CRIU has a tool called [[CRIT|CRiu Image Tool (CRIT)]]. It can display/decode CRIU image files from binary protobuf to JSON as well as encode JSON files back to the binary format. With closer integration of CRIU in container runtimes it becomes important to be able to view the CRIU output files. Either for manipulation before restoring or for reading checkpoint statistics (memory pages written to disk, memory pages skipped, process downtime).
+There's UDP_CORK option for sockets. As man page says:
+<pre>
+    If this option is enabled, then all data output on this socket
+    is accumulated into a single datagram that is transmitted when
+    the option is disabled.  This option should not be used in
+    code intended to be portable.
+</pre>
-Currently CRIT is implemented in Python, for easier integration in other Go projects it is important to have image manipulation and analysis available from GO. This means we need a Go based library to read/modify/write/encode/decode CRIU's image files. Based on this library a Go based implementation of CRIT would be useful.
+Currently criu refuses to dump this case, so it's effectively a bug. Supporting
+this will need extending the kernel API to allow criu read back the write queue
+of the socket (see [[TCP connection|how it's done]] for TCP sockets, for example). Then
+the queue is written into the image and is restored into the socket (with the CORK
+bit set too).
-'''Links:'''
+'''Notes:'''
-* [[CRIT]]
-* Possible use case see LXD: https://github.com/lxc/lxd/blob/cb55b1c5a484a43e0c21c6ae8c4a2e30b4d45be3/lxd/migrate_container.go#L179
-* https://github.com/lxc/lxd/pull/4072
-* https://github.com/checkpoint-restore/go-criu/blob/master/phaul/stats.go
-'''Details:'''
-* Skill level: beginner
-* Language: Go
-* Mentor: Adrian Reber <areber@redhat.com>
-* Suggested by: Adrian Reber <areber@redhat.com>
-=== Memory changes tracking with userfaultfd-WP ===
+We already had a couple (3) of tries for this problem:
-'''Summary:''' add ability to track memory changes of the snapshotted processes using userfaultfd-WP
-Currently CRIU uses [[Memory_changes_tracking|Soft-dirty]] mechanism in Linux kernel to track memory changes.
+* UDP_REPAIR approach didn't succeed: https://lore.kernel.org/netdev/721a2e32-c930-ad6b-5055-631b502ed11b@gmail.com/, https://lore.kernel.org/netdev/?q=udp_repair
-This mechanism can be complemented (or even completely replaced) with recently proposed write protection support for
+* eBPF (CRIB) approach, socket queue iterator was not merged: https://lore.kernel.org/netdev/AM6PR03MB5848EDA002E3D7EACA7C6BDA99A52@AM6PR03MB5848.eurprd03.prod.outlook.com/, and we have general objections to CRIB approach https://lore.kernel.org/bpf/CAHk-=wjLWFa3i6+Tab67gnNumTYipj_HuheXr2RCq4zn0tCTzA@mail.gmail.com/
-userfaultfd (userfaultfd-WP).
-Userfault allows implementation of paging in userspace. It allows an application to receive notifications about page faults and provide the desired memory contents for the faulting pages. In the current upstream kernels only missing page faults are supported, but there is an ongoing work to allow notifications for write faults as well. Using such notifications it would be possible to precisely track memory changes during pre-dump iterations. This approach may prove to be more efficient than soft-dirty.
+We still have one idea we didn't try, as UDP allows packets to be lost on the way on restore we can somehow mark the socket to drop all data before UNCORK. This way we don't really need to restore contents of UDP CORK-ed sockets send queue.
 '''Links:'''
-* [https://www.kernel.org/doc/html/latest/admin-guide/mm/userfaultfd.html Userfaultfd]
+* https://github.com/checkpoint-restore/criu/issues/409
-* [https://github.com/xzpeter/linux/tree/uffd-wp-merged Userfaultfd-WP]
+* https://github.com/criupatchwork/criu/commit/a532312
-* [https://www.kernel.org/doc/html/latest/admin-guide/mm/soft-dirty.html?highlight=soft%20dirty Soft-Dirty]
+* [[Sockets]], [[TCP connection]]
-* https://lwn.net/Articles/777258/
+* [[https://groups.google.com/forum/#!topic/comp.os.linux.networking/Uz8PYiTCZSg UDP cork explained]]
 '''Details:'''
-* Skill level: most probably advanced?
+* Skill level: intermediate (+linux kernel)
 * Language: C
-* Mentor: Mike Rapoport <rppt@linux.ibm.com>
+* Expected size: 350 hours
-* Suggested by: Mike Rapoport <rppt@linux.ibm.com>
+* Mentors: Alexander Mikhalitsyn <alexander@mihalicyn.com>, Pavel Tikhomirov <ptikhomirov@virtuozzo.com>, Andrei Vagin <avagin@gmail.com>
 [[Category:GSoC]]
 [[Category:Development]]