Changes

m
no edit summary
Line 3: Line 3:  
This page contains project ideas for upcoming Google Summer of Code.
 
This page contains project ideas for upcoming Google Summer of Code.
   −
== Contacts ==
+
== Contact ==
   −
Please contact the respective mentor for the idea you are interested in. For general questions feel free to send an email to the [mailto:criu@openvz.org mailing list] or write in [https://gitter.im/save-restore/criu gitter].
+
First, make sure to go through the [[GSoC Students Recommendations]]. Once you build CRIU locally and C/R a simple process successfully, please contact the respective mentor for the idea you are interested in. For general questions feel free to send an email to the [mailto:criu@lists.linux.dev mailing list] or write in [https://gitter.im/save-restore/criu gitter].
    
== Project ideas ==
 
== Project ideas ==
   −
=== Optimize logging engine ===
+
=== Add support for memory compression ===
  −
'''Summary:''' CRIU puts a lots of logs when doing its job. Logging is done with simple fprintf function. They are typically useless, but ''if'' some operation fails -- the logs are the only way to find what was the reason for failure.
  −
 
  −
At the same time the printf family of functions is known to take some time to work -- they need to scan the format string for %-s and then convert the arguments into strings. If comparing criu dump with and without logs the time difference is notable (15%-20%), so speeding the logs up will help improve criu performance.
  −
 
  −
One of the solutions to the problem might be binary logging. The problem with binary logs is the amount of efforts to convert existing logs to binary form. Preferably, the switch to binary logging either keeps existing log() calls intact, either has some automatics to convert them.
  −
 
  −
The option to keep log() calls intact might be in pre-compilation pass of the sources. In this pass each <code>log(fmt, ...)</code> call gets translated into a call to a binary log function that saves <code>fmt</code> identifier copies all the args ''as is'' into the log file. The binary log decode utility, required in this case, should then find the fmt string by its ID in the log file and print the resulting message.
  −
 
  −
'''Links:'''
  −
* [[Better logging]]
  −
  −
'''Details:'''
  −
* Skill level: intermediate
  −
* Language: C, though decoder/preprocessor can be in any language
  −
* Expected size: 350 hours
  −
* Mentor: Pavel Emelyanov <ovzxemul@gmail.com>
  −
* Suggested by: Andrei Vagin <avagin@gmail.com>
  −
 
  −
=== Add support for checkpoint/restore of CORK-ed UDP socket ===
  −
  −
'''Summary:''' Support C/R of corked UDP socket
   
   
 
   
There's UDP_CORK option for sockets. As man page says:
+
'''Summary:''' Support compression for page images
<pre>
  −
    If this option is enabled, then all data output on this socket
  −
    is accumulated into a single datagram that is transmitted when
  −
    the option is disabled.  This option should not be used in
  −
    code intended to be portable.
  −
</pre>
  −
 
  −
Currently criu refuses to dump this case, so it's effectively a bug. Supporting
  −
this will need extending the kernel API to allow criu read back the write queue
  −
of the socket (see [[TCP connection|how it's done]] for TCP sockets, for example). Then
  −
the queue is written into the image and is restored into the socket (with the CORK
  −
bit set too).
  −
  −
'''Links:'''
  −
* https://github.com/checkpoint-restore/criu/issues/409
  −
* [[Sockets]], [[TCP connection]]
  −
* [[https://groups.google.com/forum/#!topic/comp.os.linux.networking/Uz8PYiTCZSg UDP cork explained]]
  −
  −
'''Details:'''
  −
* Skill level: intermediate (+linux kernel)
  −
* Language: C
  −
* Expected size: 350 hours
  −
* Mentor: Pavel Emelianov <ovzxemul@gmail.com>
  −
* Suggested by: Pavel Emelianov <ovzxemul@gmail.com>
  −
 
  −
=== Add support for pidfd file descriptors ===
  −
 
  −
'''Summary:''' Support C/R of pidfd descriptors
  −
 
  −
There is pidfd_open syscall which allows opening
  −
a special PID file descriptor. A user can send a signal to
  −
the process (pidfd_send_signal syscall), wait for the process
  −
(poll() on pidfd).
  −
 
  −
At the moment CRIU can't dump processes that have pidfd's opened.
  −
 
  −
'''Links:'''
  −
* https://lwn.net/Articles/801319/
  −
* https://lwn.net/Articles/794707/
  −
* https://github.com/torvalds/linux/blob/v5.16/kernel/fork.c#L1877
   
   
 
   
'''Details:'''
+
We would like to support memory page files compression
* Skill level: intermediate
+
in CRIU using one of the fastest algorithms (it's matter
* Language: C
+
of discussion which one to choose!).
* Expected size: 350 hours
  −
* Mentors: Alexander Mikhalitsyn <alexander@mihalicyn.com>, Christian Brauner <christian@brauner.io>
  −
* Suggested by: Alexander Mikhalitsyn <alexander@mihalicyn.com>
  −
 
  −
=== Add support for memfd_secret file descriptors ===
  −
 
  −
'''Summary:''' Support C/R of memfd_secret descriptors
  −
 
  −
There is memfd_secret syscall which allows user to open
  −
special memfd which is backed by special memory range which
  −
is inaccessible by another processes (and the kernel too!).
     −
At the moment CRIU can't dump processes that have memfd_secret's opened.
+
This task does not require any Linux kernel modifications
 
+
and scope is limited to CRIU itself. At the same time it's
'''Links:'''
+
complex enough as we need to touch memory dump/restore codepath
* https://lwn.net/Articles/865256/
+
in CRIU and also handle many corner cases like page-server and stuff.
 
   
 
   
 
'''Details:'''
 
'''Details:'''
Line 99: Line 26:  
* Language: C
 
* Language: C
 
* Expected size: 350 hours
 
* Expected size: 350 hours
* Mentors: Alexander Mikhalitsyn <alexander@mihalicyn.com>, Mike Rapoport <mike.rapoport@gmail.com>
+
* Suggested by: Andrei Vagin <avagin@gmail.com>
* Suggested by: Alexander Mikhalitsyn <alexander@mihalicyn.com>
+
* Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Alexander Mikhalitsyn <alexander@mihalicyn.com>, Andrei Vagin <avagin@gmail.com>
 
  −
=== Forensic analysis of container checkpoints ===
  −
 
  −
'''Summary:''' Extending go-crit with capabilities for forensic analysis
  −
 
  −
The go-crit tool was created during GSoC 2022 to enable analysis of CRIU [[images]] with tools written in Go. It allows container management tools such as [https://github.com/checkpoint-restore/checkpointctl checkpointctl] and Podman to provide capabilities similar to CRIT. The goal of this project is to extend go-crit with functionality for forensic analysis of container checkpoints to provide a better user experience.
  −
 
  −
The go-crit tool is still in its early stages of development. To effectively utilise this new feature, the checkpointctl tool would be extended to display information about the processes included in a container checkpoint and their runtime state (e.g., memory, open files, sockets, etc).
  −
 
  −
'''Links:'''
  −
* https://criu.org/CRIT_(Go_library)
  −
* https://github.com/checkpoint-restore/go-criu/tree/master/crit
  −
* https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/
  −
 
  −
'''Details:'''
  −
* Skill level: intermediate
  −
* Language: Go
  −
* Expected size: 350 hours
  −
* Mentor: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Adrian Reber <areber@redhat.com>
  −
* Suggested by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
      
=== Use eBPF to lock and unlock the network ===
 
=== Use eBPF to lock and unlock the network ===
Line 137: Line 44:  
* Language: C
 
* Language: C
 
* Expected size: 350 hours
 
* Expected size: 350 hours
* Mentor: Radostin Stoyanov <rstoyanov@fedoraproject.org>
+
* Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Prajwal S N <prajwalnadig21@gmail.com>
 
* Suggested by: Adrian Reber <areber@redhat.com>
 
* Suggested by: Adrian Reber <areber@redhat.com>
  −
=== CGroup-v2 support ===
  −
  −
'''Summary:''' cgroup is a mechanism to organize processes hierarchically and distribute system resources along the hierarchy in a controlled and configurable manner. cgroup v2 is a new version of the cgroup file system. Unlike v1, cgroup v2 has only single hierarchy. CRIU has to dump/restore a container cgroup hierarchy along with all per-cgroup options. The cgroupv2 support in CRIU has to be compatible with Docker, containerd and cri-o.
  −
  −
'''Links:'''
  −
* [[CGroups]]
  −
* https://github.com/checkpoint-restore/criu/issues/252
  −
* https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html
  −
'''Details:'''
  −
* Skill level: intermediate
  −
* Language: C
  −
* Expected size: 350 hours
  −
* Mentor: Andrei Vagin <avagin@gmail.com>
  −
* Suggested by: Andrei Vagin <avagin@gmail.com>
  −
  −
=== Dump shmem in user-mode (unprivileged-mode) ===
  −
  −
CRIU uses /proc/pid/map_files to dump and restore anonymous shared memory regions, but map_files is restricted to the global CAP_SYS_ADMIN capability. In most cases, it is possible to dump/restore shared memory region without map_files and we need to implement this in CRIU.
  −
  −
'''Links:'''
  −
* [[User-mode]]
  −
  −
'''Details:'''
  −
* Skill level: intermediate
  −
* Language: C
  −
* Expected size: 350 hours
  −
* Suggested by: Andrei Vagin <avagin@gmail.com>
  −
* Suggested by: Pavel Emelyanov <ovzxemul@gmail.com>
  −
* Mentor: Pavel Emelyanov <ovzxemul@gmail.com>
      
=== Files on detached mounts ===
 
=== Files on detached mounts ===
Line 207: Line 84:  
* Skill level: intermediate
 
* Skill level: intermediate
 
* Language: C
 
* Language: C
 +
* Expected size: 350 hours
 
* Mentor: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
 
* Mentor: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
 
* Suggested by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
 
* Suggested by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
 +
 +
=== Checkpointing of POSIX message queues ===
 +
 +
'''Summary:''' Add support for checkpoint/restore of POSIX message queues
 +
 +
POSIX message queues are a widely used inter-process communication mechanism. Message queues are implemented as files on a virtual filesystem (mqueue), where a file descriptor (message queue descriptor) is used to perform operations such as sending or receiving messages. To support checkpoint/restore of POSIX message queues, we need a kernel interface (similar to [https://github.com/checkpoint-restore/criu/commit/8ce9e947051e43430eb2ff06b96dddeba467b4fd MSG_PEEK]) that would enable the retrieval of messages from a queue without removing them. This project aims to implement such an interface that allows retrieving all messages and their priorities from a POSIX message queue.
 +
 +
'''Links:'''
 +
* https://github.com/checkpoint-restore/criu/issues/2285
 +
* https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/ipc/mqueue.c
 +
* https://www.man7.org/tlpi/download/TLPI-52-POSIX_Message_Queues.pdf
 +
 +
'''Details:'''
 +
* Skill level: intermediate
 +
* Language: C
 +
* Expected size: 350 hours
 +
* Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
 +
* Suggested by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
 +
 +
 +
=== Add support for arm64 Guarded Control Stack (GCS) ===
 +
 +
'''Summary:''' Support arm64 Guarded Control Stack (GCS)
 +
 +
The arm64 Guarded Control Stack (GCS) feature provides support for
 +
hardware protected stacks of return addresses, intended to provide
 +
hardening against return oriented programming (ROP) attacks and to make
 +
it easier to gather call stacks for applications such as profiling (taken from [1]).
 +
We would like to support arm64 Guarded Control Stack (GCS) in CRIU, which means
 +
that CRIU should be able to Checkpoint/Restore applications using GCS.
 +
 +
This task should not require any Linux kernel modifications
 +
but will require a lot of effort to understand Linux kernel and
 +
glibc support patches. We have a good example of support for
 +
x86 shadow stack [4].
 +
 +
'''Links:'''
 +
* [1] kernel support https://lore.kernel.org/all/20241001-arm64-gcs-v13-0-222b78d87eee@kernel.org
 +
* [2] libc support https://inbox.sourceware.org/libc-alpha/20250117174119.3254972-1-yury.khrustalev@arm.com
 +
* [3] libc tests https://inbox.sourceware.org/libc-alpha/20250210114538.1723249-1-yury.khrustalev@arm.com
 +
* [4] x86 support https://github.com/checkpoint-restore/criu/pull/2306
 +
 +
'''Details:'''
 +
* Skill level: expert (a lot of moving parts: Linux kernel / libc / CRIU)
 +
* Language: C
 +
* Expected size: 350 hours
 +
* Suggested by: Mike Rapoport <rppt@kernel.org>
 +
* Mentors: Mike Rapoport <rppt@kernel.org>, Andrei Vagin <avagin@gmail.com>, Alexander Mikhalitsyn <alexander@mihalicyn.com>
    
== Suspended project ideas ==
 
== Suspended project ideas ==
    
Listed here are tasks that seem suitable for GSoC, but currently do not have anybody to mentor it.
 
Listed here are tasks that seem suitable for GSoC, but currently do not have anybody to mentor it.
 +
 +
=== Optimize logging engine ===
 +
 +
'''Summary:''' CRIU puts a lots of logs when doing its job. Logging is done with simple fprintf function. They are typically useless, but ''if'' some operation fails -- the logs are the only way to find what was the reason for failure.
 +
 +
At the same time the printf family of functions is known to take some time to work -- they need to scan the format string for %-s and then convert the arguments into strings. If comparing criu dump with and without logs the time difference is notable (15%-20%), so speeding the logs up will help improve criu performance.
 +
 +
One of the solutions to the problem might be binary logging. The problem with binary logs is the amount of efforts to convert existing logs to binary form. Preferably, the switch to binary logging either keeps existing log() calls intact, either has some automatics to convert them.
 +
 +
The option to keep log() calls intact might be in pre-compilation pass of the sources. In this pass each <code>log(fmt, ...)</code> call gets translated into a call to a binary log function that saves <code>fmt</code> identifier copies all the args ''as is'' into the log file. The binary log decode utility, required in this case, should then find the fmt string by its ID in the log file and print the resulting message.
 +
 +
'''Links:'''
 +
* [[Better logging]]
 +
 +
'''Details:'''
 +
* Skill level: intermediate
 +
* Language: C, though decoder/preprocessor can be in any language
 +
* Expected size: 350 hours
 +
* Suggested by: Andrei Vagin
 +
* Mentors: Alexander Mikhalitsyn <alexander@mihalicyn.com>
    
=== IOUring support ===
 
=== IOUring support ===
Line 224: Line 170:  
* Skill level: expert (+linux kernel)
 
* Skill level: expert (+linux kernel)
 
* Expected size: 350 hours
 
* Expected size: 350 hours
* Suggested by: Pavel Emelyanov <ovzxemul@gmail.com>
  −
* Mentor: Pavel Emelyanov <ovzxemul@gmail.com>
      
=== Add support for SPFS ===
 
=== Add support for SPFS ===
Line 242: Line 186:  
* Skill level: expert
 
* Skill level: expert
 
* Language: C
 
* Language: C
* Mentor: Alexander Mikhalitsyn <alexander@mihalicyn.com> / <alexander.mikhalitsyn@virtuozzo.com>
+
* Mentor: Alexander Mikhalitsyn <alexander@mihalicyn.com>
* Suggested by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
+
* Suggested by: Alexander Mikhalitsyn <alexander@mihalicyn.com>
      Line 272: Line 216:  
* Skill level: beginner
 
* Skill level: beginner
 
* Language: Python
 
* Language: Python
* Mentor: Pavel Emelianov <xemul@virtuozzo.com>
+
 
* Suggested by: Pavel Emelianov <xemul@virtuozzo.com>
+
=== Add support for checkpoint/restore of CORK-ed UDP socket ===
 +
 +
'''Summary:''' Support C/R of corked UDP socket
 +
 +
There's UDP_CORK option for sockets. As man page says:
 +
<pre>
 +
    If this option is enabled, then all data output on this socket
 +
    is accumulated into a single datagram that is transmitted when
 +
    the option is disabled.  This option should not be used in
 +
    code intended to be portable.
 +
</pre>
 +
 
 +
Currently criu refuses to dump this case, so it's effectively a bug. Supporting
 +
this will need extending the kernel API to allow criu read back the write queue
 +
of the socket (see [[TCP connection|how it's done]] for TCP sockets, for example). Then
 +
the queue is written into the image and is restored into the socket (with the CORK
 +
bit set too).
 +
 
 +
'''Notes:'''
 +
 
 +
We already had a couple (3) of tries for this problem:
 +
 
 +
* UDP_REPAIR approach didn't succeed: https://lore.kernel.org/netdev/721a2e32-c930-ad6b-5055-631b502ed11b@gmail.com/, https://lore.kernel.org/netdev/?q=udp_repair
 +
* eBPF (CRIB) approach, socket queue iterator was not merged: https://lore.kernel.org/netdev/AM6PR03MB5848EDA002E3D7EACA7C6BDA99A52@AM6PR03MB5848.eurprd03.prod.outlook.com/, and we have general objections to CRIB approach https://lore.kernel.org/bpf/CAHk-=wjLWFa3i6+Tab67gnNumTYipj_HuheXr2RCq4zn0tCTzA@mail.gmail.com/
 +
 
 +
We still have one idea we didn't try, as UDP allows packets to be lost on the way on restore we can somehow mark the socket to drop all data before UNCORK. This way we don't really need to restore contents of UDP CORK-ed sockets send queue.
 +
 +
'''Links:'''
 +
* https://github.com/checkpoint-restore/criu/issues/409
 +
* https://github.com/criupatchwork/criu/commit/a532312
 +
* [[Sockets]], [[TCP connection]]
 +
* [[https://groups.google.com/forum/#!topic/comp.os.linux.networking/Uz8PYiTCZSg UDP cork explained]]
 +
 +
'''Details:'''
 +
* Skill level: intermediate (+linux kernel)
 +
* Language: C
 +
* Expected size: 350 hours
 +
* Mentors: Alexander Mikhalitsyn <alexander@mihalicyn.com>, Pavel Tikhomirov <ptikhomirov@virtuozzo.com>, Andrei Vagin <avagin@gmail.com>
 +
 
       
[[Category:GSoC]]
 
[[Category:GSoC]]
 
[[Category:Development]]
 
[[Category:Development]]
509

edits