Changes

m
no edit summary
Line 1: Line 1:  +
=== Coordinated checkpointing of distributed applications ===
 +
 +
'''Summary:''' Enable coordinated container checkpointing with Kubernetes.
 +
 +
Checkpointing support has been recently introduced in Kubernetes, where the
 +
smallest deployable unit is a Pod (a group of containers).  Kubernetes is often
 +
used to deploy applications that are distributed across multiple nodes.
 +
However, checkpointing such distributed applications requires a coordination
 +
mechanism to synchronize the checkpoint and restore operations. To address this
 +
challenge, we have developed a new tool called <code>criu-coordinator</code>
 +
that relies on the action-script functionality of CRIU to enable synchronization
 +
in distributed environments. This project aims to extend this tool to enable
 +
seamless integration with the checkpointing functionality of Kubernetes.
 +
 +
'''Details:'''
 +
* Contributor: [https://github.com/behouba Behouba Manassé]
 +
* [https://github.com/behouba/gsoc-2025 Final Report]
 +
* [https://docs.google.com/presentation/d/e/2PACX-1vTNV6nObjjlGwH9wR555WJVC8k1jYhTCJ9GeHsFXlRUe91YRXESPHw2GtYKzsUJ1l907z_cic6PQHio/pub Presentation Slides]
 +
* [https://youtu.be/fzXDqbq2X3s Presentation Recording]
 +
* Skill level: intermediate
 +
* Language: Rust / Go / C
 +
* Expected size: 350 hours
 +
* Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Prajwal S N <prajwalnadig21@gmail.com>, Adrian Reber <areber@redhat.com>
 +
 +
=== Add support for arm64 Guarded Control Stack (GCS) ===
 +
 +
'''Summary:''' Support arm64 Guarded Control Stack (GCS)
 +
 +
The arm64 Guarded Control Stack (GCS) feature provides support for
 +
hardware protected stacks of return addresses, intended to provide
 +
hardening against return oriented programming (ROP) attacks and to make
 +
it easier to gather call stacks for applications such as profiling (taken from [1]).
 +
We would like to support arm64 Guarded Control Stack (GCS) in CRIU, which means
 +
that CRIU should be able to Checkpoint/Restore applications using GCS.
 +
 +
This task should not require any Linux kernel modifications
 +
but will require a lot of effort to understand Linux kernel and
 +
glibc support patches. We have a good example of support for
 +
x86 shadow stack [4].
 +
 +
'''Links:'''
 +
* [1] kernel support https://lore.kernel.org/all/20241001-arm64-gcs-v13-0-222b78d87eee@kernel.org
 +
* [2] libc support https://inbox.sourceware.org/libc-alpha/20250117174119.3254972-1-yury.khrustalev@arm.com
 +
* [3] libc tests https://inbox.sourceware.org/libc-alpha/20250210114538.1723249-1-yury.khrustalev@arm.com
 +
* [4] x86 support https://github.com/checkpoint-restore/criu/pull/2306
 +
 +
'''Details:'''
 +
* Contributor: [https://github.com/svilenkov Igor Svilenkov Bozic]
 +
* [https://github.com/checkpoint-restore/criu/pull/2725 Pull Request for CRIU]
 +
* [https://drive.google.com/file/d/1Uoz_E5K-1zRcZwEWXKVcsNtxmdzDIpiY/view?usp=sharing Presentation Recording]
 +
* Linux Plumbers Conference Talk: [https://lpc.events/event/19/contributions/2237/ Guarded Control Stack on arm64: Challenges in Enabling Shadow Stack Support for CRIU]
 +
* Skill level: expert (a lot of moving parts: Linux kernel / libc / CRIU)
 +
* Language: C
 +
* Expected size: 350 hours
 +
* Suggested by: Mike Rapoport <rppt@kernel.org>
 +
* Mentors: Mike Rapoport <rppt@kernel.org>, Andrei Vagin <avagin@gmail.com>, Alexander Mikhalitsyn <alexander@mihalicyn.com>
 +
 +
=== Kubernetes operator for managing container checkpoints ===
 +
 +
'''Summary:''' Develop a Kubernetes operator that automates the management of container checkpoints
 +
 +
Container checkpointing has recently been introduced as an alpha feature in Kubernetes.
 +
To enable this feature, the kubelet API was extended with an endpoint that enables the
 +
creation of checkpoints for individual containers. By default, all container checkpoints
 +
are stored as tar archives in <code>/var/lib/kubelet/checkpoints</code> using the following
 +
file name format: <code>checkpoint-<pod-name>_<namespace-name>-<container-name>-<timestamp>.tar</code>.
 +
However, the current implementation does not provide a mechanism for limiting the number
 +
of checkpoints, which may lead to filling up all existing disk space. This project aims to
 +
develop a Kubernetes operator that automates the management of checkpoints and provides
 +
a garbage collection mechanism to discard obsolete checkpoints.
 +
 +
'''Links:'''
 +
* https://github.com/checkpoint-restore/checkpoint-restore-operator
 +
* https://kubernetes.io/docs/reference/node/kubelet-checkpoint-api/
 +
* https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/
 +
* https://kubernetes.io/blog/2023/03/10/forensic-container-analysis/
 +
* https://github.com/kubernetes/kubernetes/pull/115888
 +
* https://github.com/kubernetes/enhancements/issues/2008
 +
 +
'''Details:'''
 +
* Contributor: [https://github.com/Parthiba-Hazra Parthiba Hazra]
 +
* [https://github.com/Parthiba-Hazra/gsoc-2024 Final Report]
 +
* Skill level: intermediate
 +
* Language: Go
 +
* Expected size: 350 hours
 +
* Mentors: Adrian Reber <areber@redhat.com>, Radostin Stoyanov <rstoyanov@fedoraproject.org>, Prajwal S N <prajwalnadig21@gmail.com>
 +
* Suggested by: Adrian Reber
 +
 +
 +
=== Add support for pidfd file descriptors ===
 +
 +
'''Summary:''' Support C/R of pidfd descriptors
 +
 +
There is pidfd_open syscall which allows opening
 +
a special PID file descriptor. A user can send a signal to
 +
the process (pidfd_send_signal syscall), wait for the process
 +
(poll() on pidfd).
 +
 +
At the moment CRIU can't dump processes that have pidfd's opened.
 +
 +
'''Links:'''
 +
* https://lwn.net/Articles/801319/
 +
* https://lwn.net/Articles/794707/
 +
* https://github.com/torvalds/linux/blob/v5.16/kernel/fork.c#L1877
 +
 +
'''Details:'''
 +
* Skill level: intermediate
 +
* Language: C
 +
* Expected size: 350 hours
 +
* Mentors: Alexander Mikhalitsyn <alexander@mihalicyn.com>, Christian Brauner <christian@brauner.io>
 +
* Suggested by: Alexander Mikhalitsyn <alexander@mihalicyn.com>
 +
 +
=== Add support for memfd_secret file descriptors ===
 +
 +
'''Summary:''' Support C/R of memfd_secret descriptors
 +
 +
There is memfd_secret syscall which allows user to open
 +
special memfd which is backed by special memory range which
 +
is inaccessible by another processes (and the kernel too!).
 +
 +
At the moment CRIU can't dump processes that have memfd_secret's opened.
 +
 +
'''Links:'''
 +
* https://lwn.net/Articles/865256/
 +
* https://warusadura.github.io/gsoc23-final-report.html
 +
* https://github.com/checkpoint-restore/criu/pull/2247
 +
 +
'''Details:'''
 +
* Skill level: intermediate
 +
* Language: C
 +
* Expected size: 350 hours
 +
* Mentors: Alexander Mikhalitsyn <alexander@mihalicyn.com>, Mike Rapoport <mike.rapoport@gmail.com>
 +
* Suggested by: Alexander Mikhalitsyn <alexander@mihalicyn.com>
    
=== Forensic analysis of container checkpoints ===
 
=== Forensic analysis of container checkpoints ===
   −
'''Summary:''' Extending go-crit with capabilities for forensic analysis
+
'''Summary:''' Extending go-crit and checkpointctl with capabilities for forensic analysis
    
'''Merged:''' https://github.com/checkpoint-restore/checkpointctl
 
'''Merged:''' https://github.com/checkpoint-restore/checkpointctl
   −
The go-crit tool was created during GSoC 2022 to enable analysis of CRIU [[images]] with tools written in Go. It allows container management tools such as [https://github.com/checkpoint-restore/checkpointctl checkpointctl] and Podman to provide capabilities similar to CRIT. The goal of this project is to extend go-crit with functionality for forensic analysis of container checkpoints to provide a better user experience.
     −
The go-crit tool is still in its early stages of development. To effectively utilise this new feature, the checkpointctl tool would be extended to display information about the processes included in a container checkpoint and their runtime state (e.g., memory, open files, sockets, etc).
+
The Go implementation of the [[crit]] tool was developed during GSoC 2022 to enable native Go–based decoding and encoding of CRIU [[images]]. In GSoC 2023, this tool was integrated with [https://github.com/checkpoint-restore/checkpointctl checkpointctl] to enable forensic analysis capabilities for container checkpoints. Behouba Manassé implemented support for memory forensics by extending the Go version of the crit tool and checkpointctl with support for parsing memory pages (<code>checkpointctl memparse</code>), and displaying information about the command-line arguments and environment variables when analysing checkpoints with the <code>checkpointctl inspect</code> command. Prajwal Nadig build upon his previous work during GSoC 2022, by implementing capabilities for analysing the process tree, open files, and sockets within a checkpoint, as well as introducing CI tests.
    
'''Links:'''
 
'''Links:'''
Line 15: Line 147:  
* https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/
 
* https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/
    +
'''Details:'''
 +
* Contributor: [https://github.com/behouba Behouba Manassé] and [https://github.com/snprajwal Prajwal Nadig]
 +
* Final Report: [https://github.com/behouba/gsoc-2023 Behouba Manassé], [https://github.com/snprajwal/gsoc-2023 Prajwal Nadig]
 +
* Skill level: intermediate
 +
* Language: Go
 +
* Expected size: 350 hours
 +
* Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Adrian Reber <areber@redhat.com>
    
=== Restrict checks for open/mmaped files ===
 
=== Restrict checks for open/mmaped files ===
Line 68: Line 207:  
'''Links:'''
 
'''Links:'''
 
* [[CRIT (Go library)]]
 
* [[CRIT (Go library)]]
* https://github.com/snprajwal/gsoc-2022
+
* [https://github.com/snprajwal/gsoc-2022 Final Report]
 +
 
 +
=== Use eBPF to lock and unlock the network ===
 +
 +
'''Summary:''' Use eBPF instead of external iptables-restore tool for network lock and unlock.
 +
 
 +
During checkpointing and restoring CRIU locks the network to make sure no network packets are accepted by the network stack during the time the process is checkpointed. Currently CRIU calls out to iptables-restore to create and delete the corresponding iptables rules. Another approach which avoids calling out to the external binary iptables-restore would be to directly inject eBPF rules. There have been reports from users that iptables-restore fails in some way and eBPF could avoid this external dependency.
 +
 
 +
'''Links:'''
 +
* https://www.criu.org/TCP_connection#Checkpoint_and_restore_TCP_connection
 +
* https://github.com/systemd/systemd/blob/master/src/core/bpf-firewall.c
 +
* https://blog.zeyady.com/2021-08-16/gsoc-criu
 +
 
 +
'''Details:'''
 +
* Contributor: [https://github.com/ZeyadYasser Zeyad Yasser]
 +
* [https://github.com/checkpoint-restore/criu/pull/1539 CRIU Pull Request]
 +
* Skill level: intermediate
 +
* Language: C
 +
* Expected size: 350 hours
 +
* Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Prajwal S N <prajwalnadig21@gmail.com>
 +
* Suggested by: Adrian Reber <areber@redhat.com>
 +
 
    
=== Support sparse ghosts ===
 
=== Support sparse ghosts ===
571

edits