Difference between revisions of "GSoC completed projects"
m |
m |
||
| (4 intermediate revisions by the same user not shown) | |||
| Line 47: | Line 47: | ||
'''Details:''' | '''Details:''' | ||
* Contributor: [https://github.com/svilenkov Igor Svilenkov Bozic] | * Contributor: [https://github.com/svilenkov Igor Svilenkov Bozic] | ||
| − | * [https://github.com/checkpoint-restore/criu/pull/2725 | + | * [https://github.com/checkpoint-restore/criu/pull/2725 Pull Request for CRIU] |
* [https://drive.google.com/file/d/1Uoz_E5K-1zRcZwEWXKVcsNtxmdzDIpiY/view?usp=sharing Presentation Recording] | * [https://drive.google.com/file/d/1Uoz_E5K-1zRcZwEWXKVcsNtxmdzDIpiY/view?usp=sharing Presentation Recording] | ||
* Linux Plumbers Conference Talk: [https://lpc.events/event/19/contributions/2237/ Guarded Control Stack on arm64: Challenges in Enabling Shadow Stack Support for CRIU] | * Linux Plumbers Conference Talk: [https://lpc.events/event/19/contributions/2237/ Guarded Control Stack on arm64: Challenges in Enabling Shadow Stack Support for CRIU] | ||
| Line 79: | Line 79: | ||
'''Details:''' | '''Details:''' | ||
| + | * Contributor: [https://github.com/Parthiba-Hazra Parthiba Hazra] | ||
| + | * [https://github.com/Parthiba-Hazra/gsoc-2024 Final Report] | ||
* Skill level: intermediate | * Skill level: intermediate | ||
* Language: Go | * Language: Go | ||
| Line 133: | Line 135: | ||
=== Forensic analysis of container checkpoints === | === Forensic analysis of container checkpoints === | ||
| − | '''Summary:''' Extending go-crit with capabilities for forensic analysis | + | '''Summary:''' Extending go-crit and checkpointctl with capabilities for forensic analysis |
'''Merged:''' https://github.com/checkpoint-restore/checkpointctl | '''Merged:''' https://github.com/checkpoint-restore/checkpointctl | ||
| − | |||
| − | The | + | The Go implementation of the [[crit]] tool was developed during GSoC 2022 to enable native Go–based decoding and encoding of CRIU [[images]]. In GSoC 2023, this tool was integrated with [https://github.com/checkpoint-restore/checkpointctl checkpointctl] to enable forensic analysis capabilities for container checkpoints. Behouba Manassé implemented support for memory forensics by extending the Go version of the crit tool and checkpointctl with support for parsing memory pages (<code>checkpointctl memparse</code>), and displaying information about the command-line arguments and environment variables when analysing checkpoints with the <code>checkpointctl inspect</code> command. Prajwal Nadig build upon his previous work during GSoC 2022, by implementing capabilities for analysing the process tree, open files, and sockets within a checkpoint, as well as introducing CI tests. |
'''Links:''' | '''Links:''' | ||
| Line 146: | Line 147: | ||
* https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/ | * https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/ | ||
| + | '''Details:''' | ||
| + | * Contributor: [https://github.com/behouba Behouba Manassé] and [https://github.com/snprajwal Prajwal Nadig] | ||
| + | * Final Report: [https://github.com/behouba/gsoc-2023 Behouba Manassé], [https://github.com/snprajwal/gsoc-2023 Prajwal Nadig] | ||
| + | * Skill level: intermediate | ||
| + | * Language: Go | ||
| + | * Expected size: 350 hours | ||
| + | * Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Adrian Reber <areber@redhat.com> | ||
=== Restrict checks for open/mmaped files === | === Restrict checks for open/mmaped files === | ||
| Line 199: | Line 207: | ||
'''Links:''' | '''Links:''' | ||
* [[CRIT (Go library)]] | * [[CRIT (Go library)]] | ||
| − | * https://github.com/snprajwal/gsoc-2022 | + | * [https://github.com/snprajwal/gsoc-2022 Final Report] |
| + | |||
| + | === Use eBPF to lock and unlock the network === | ||
| + | |||
| + | '''Summary:''' Use eBPF instead of external iptables-restore tool for network lock and unlock. | ||
| + | |||
| + | During checkpointing and restoring CRIU locks the network to make sure no network packets are accepted by the network stack during the time the process is checkpointed. Currently CRIU calls out to iptables-restore to create and delete the corresponding iptables rules. Another approach which avoids calling out to the external binary iptables-restore would be to directly inject eBPF rules. There have been reports from users that iptables-restore fails in some way and eBPF could avoid this external dependency. | ||
| + | |||
| + | '''Links:''' | ||
| + | * https://www.criu.org/TCP_connection#Checkpoint_and_restore_TCP_connection | ||
| + | * https://github.com/systemd/systemd/blob/master/src/core/bpf-firewall.c | ||
| + | * https://blog.zeyady.com/2021-08-16/gsoc-criu | ||
| + | |||
| + | '''Details:''' | ||
| + | * Contributor: [https://github.com/ZeyadYasser Zeyad Yasser] | ||
| + | * [https://github.com/checkpoint-restore/criu/pull/1539 CRIU Pull Request] | ||
| + | * Skill level: intermediate | ||
| + | * Language: C | ||
| + | * Expected size: 350 hours | ||
| + | * Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Prajwal S N <prajwalnadig21@gmail.com> | ||
| + | * Suggested by: Adrian Reber <areber@redhat.com> | ||
| + | |||
=== Support sparse ghosts === | === Support sparse ghosts === | ||
Latest revision as of 12:53, 25 January 2026
Coordinated checkpointing of distributed applications[edit]
Summary: Enable coordinated container checkpointing with Kubernetes.
Checkpointing support has been recently introduced in Kubernetes, where the
smallest deployable unit is a Pod (a group of containers). Kubernetes is often
used to deploy applications that are distributed across multiple nodes.
However, checkpointing such distributed applications requires a coordination
mechanism to synchronize the checkpoint and restore operations. To address this
challenge, we have developed a new tool called criu-coordinator
that relies on the action-script functionality of CRIU to enable synchronization
in distributed environments. This project aims to extend this tool to enable
seamless integration with the checkpointing functionality of Kubernetes.
Details:
- Contributor: Behouba Manassé
- Final Report
- Presentation Slides
- Presentation Recording
- Skill level: intermediate
- Language: Rust / Go / C
- Expected size: 350 hours
- Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Prajwal S N <prajwalnadig21@gmail.com>, Adrian Reber <areber@redhat.com>
Add support for arm64 Guarded Control Stack (GCS)[edit]
Summary: Support arm64 Guarded Control Stack (GCS)
The arm64 Guarded Control Stack (GCS) feature provides support for hardware protected stacks of return addresses, intended to provide hardening against return oriented programming (ROP) attacks and to make it easier to gather call stacks for applications such as profiling (taken from [1]). We would like to support arm64 Guarded Control Stack (GCS) in CRIU, which means that CRIU should be able to Checkpoint/Restore applications using GCS.
This task should not require any Linux kernel modifications but will require a lot of effort to understand Linux kernel and glibc support patches. We have a good example of support for x86 shadow stack [4].
Links:
- [1] kernel support https://lore.kernel.org/all/20241001-arm64-gcs-v13-0-222b78d87eee@kernel.org
- [2] libc support https://inbox.sourceware.org/libc-alpha/20250117174119.3254972-1-yury.khrustalev@arm.com
- [3] libc tests https://inbox.sourceware.org/libc-alpha/20250210114538.1723249-1-yury.khrustalev@arm.com
- [4] x86 support https://github.com/checkpoint-restore/criu/pull/2306
Details:
- Contributor: Igor Svilenkov Bozic
- Pull Request for CRIU
- Presentation Recording
- Linux Plumbers Conference Talk: Guarded Control Stack on arm64: Challenges in Enabling Shadow Stack Support for CRIU
- Skill level: expert (a lot of moving parts: Linux kernel / libc / CRIU)
- Language: C
- Expected size: 350 hours
- Suggested by: Mike Rapoport <rppt@kernel.org>
- Mentors: Mike Rapoport <rppt@kernel.org>, Andrei Vagin <avagin@gmail.com>, Alexander Mikhalitsyn <alexander@mihalicyn.com>
Kubernetes operator for managing container checkpoints[edit]
Summary: Develop a Kubernetes operator that automates the management of container checkpoints
Container checkpointing has recently been introduced as an alpha feature in Kubernetes.
To enable this feature, the kubelet API was extended with an endpoint that enables the
creation of checkpoints for individual containers. By default, all container checkpoints
are stored as tar archives in /var/lib/kubelet/checkpoints using the following
file name format: checkpoint-<pod-name>_<namespace-name>-<container-name>-<timestamp>.tar.
However, the current implementation does not provide a mechanism for limiting the number
of checkpoints, which may lead to filling up all existing disk space. This project aims to
develop a Kubernetes operator that automates the management of checkpoints and provides
a garbage collection mechanism to discard obsolete checkpoints.
Links:
- https://github.com/checkpoint-restore/checkpoint-restore-operator
- https://kubernetes.io/docs/reference/node/kubelet-checkpoint-api/
- https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/
- https://kubernetes.io/blog/2023/03/10/forensic-container-analysis/
- https://github.com/kubernetes/kubernetes/pull/115888
- https://github.com/kubernetes/enhancements/issues/2008
Details:
- Contributor: Parthiba Hazra
- Final Report
- Skill level: intermediate
- Language: Go
- Expected size: 350 hours
- Mentors: Adrian Reber <areber@redhat.com>, Radostin Stoyanov <rstoyanov@fedoraproject.org>, Prajwal S N <prajwalnadig21@gmail.com>
- Suggested by: Adrian Reber
Add support for pidfd file descriptors[edit]
Summary: Support C/R of pidfd descriptors
There is pidfd_open syscall which allows opening a special PID file descriptor. A user can send a signal to the process (pidfd_send_signal syscall), wait for the process (poll() on pidfd).
At the moment CRIU can't dump processes that have pidfd's opened.
Links:
- https://lwn.net/Articles/801319/
- https://lwn.net/Articles/794707/
- https://github.com/torvalds/linux/blob/v5.16/kernel/fork.c#L1877
Details:
- Skill level: intermediate
- Language: C
- Expected size: 350 hours
- Mentors: Alexander Mikhalitsyn <alexander@mihalicyn.com>, Christian Brauner <christian@brauner.io>
- Suggested by: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Add support for memfd_secret file descriptors[edit]
Summary: Support C/R of memfd_secret descriptors
There is memfd_secret syscall which allows user to open special memfd which is backed by special memory range which is inaccessible by another processes (and the kernel too!).
At the moment CRIU can't dump processes that have memfd_secret's opened.
Links:
- https://lwn.net/Articles/865256/
- https://warusadura.github.io/gsoc23-final-report.html
- https://github.com/checkpoint-restore/criu/pull/2247
Details:
- Skill level: intermediate
- Language: C
- Expected size: 350 hours
- Mentors: Alexander Mikhalitsyn <alexander@mihalicyn.com>, Mike Rapoport <mike.rapoport@gmail.com>
- Suggested by: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Forensic analysis of container checkpoints[edit]
Summary: Extending go-crit and checkpointctl with capabilities for forensic analysis
Merged: https://github.com/checkpoint-restore/checkpointctl
The Go implementation of the crit tool was developed during GSoC 2022 to enable native Go–based decoding and encoding of CRIU images. In GSoC 2023, this tool was integrated with checkpointctl to enable forensic analysis capabilities for container checkpoints. Behouba Manassé implemented support for memory forensics by extending the Go version of the crit tool and checkpointctl with support for parsing memory pages (checkpointctl memparse), and displaying information about the command-line arguments and environment variables when analysing checkpoints with the checkpointctl inspect command. Prajwal Nadig build upon his previous work during GSoC 2022, by implementing capabilities for analysing the process tree, open files, and sockets within a checkpoint, as well as introducing CI tests.
Links:
- https://criu.org/CRIT_(Go_library)
- https://github.com/checkpoint-restore/go-criu/tree/master/crit
- https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/
Details:
- Contributor: Behouba Manassé and Prajwal Nadig
- Final Report: Behouba Manassé, Prajwal Nadig
- Skill level: intermediate
- Language: Go
- Expected size: 350 hours
- Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Adrian Reber <areber@redhat.com>
Restrict checks for open/mmaped files[edit]
Summary: Make sure the file opened (for fd or mapping) at restore is "the same" as it was on dump
Merged: https://github.com/checkpoint-restore/criu/pull/1123
CRIU doesn't carry files contents (except for ghost ones) into images. Thus on dump it saves some "meta" for file to validate it's "the same" on restore. Currently this meta includes only the file size. The task is to add some cookie value that's somehow affected by file's contents. This is primarily needed to reduce the possibility to restore with wrong libraries.
Links:
Optimize the pre-dump algorithm[edit]
Summary: Optimize the pre-dump algorithm to avoid pinning to many memory in RAM
Merged: https://github.com/checkpoint-restore/criu/commit/98608b90de0f853b1c8a6e15b312320e1441c359
Current pre-dump mode is used to write task memory contents into image files w/o stopping the task for too long. It does this by stopping the task, infecting it and draining all the memory into a set of pipes. Then the task is cured, resumed and the pipes' contents is written into images (maybe a page server). Unfortunately, this approach creates a big stress on the memory subsystem, as keeping all memory in pipes creates a lot of unreclaimable memory (pages in pipes are not swappable), as well as the number of pipes themselves can be huge, as one pipe doesn't store more than a fixed amount of data (see pipe(7) man page).
A solution for this problem is to use a sys_read_process_vm() syscall, which will mitigate all of the above. To do this we need to allocate a temporary buffer in criu, then walk the target process vm by copying the memory piece-by-piece into it, then flush the data into image (or page server), and repeat.
Ideally there should be sys_splice_process_vm() syscall in the kernel, that does the same as the read_process_vm does, but vmsplices the data
Links:
- Memory pre dump
- https://github.com/checkpoint-restore/criu/issues/351
- Memory dumping and restoring, Memory changes tracking
- process_vm_readv(2) vmsplice(2) RFC for splice_process_vm syscall
Porting crit functionalities in GO[edit]
Summary: Implement image view and manipulation in Go
Merged: https://github.com/checkpoint-restore/go-criu/pull/66
CRIU's checkpoint images are stored on disk using protobuf. For easier analysis of checkpoint files CRIU has a tool called CRiu Image Tool (CRIT). It can display/decode CRIU image files from binary protobuf to JSON as well as encode JSON files back to the binary format. With closer integration of CRIU in container runtimes it becomes important to be able to view the CRIU output files. Either for manipulation before restoring or for reading checkpoint statistics (memory pages written to disk, memory pages skipped, process downtime).
Currently CRIT is implemented in Python, for easier integration in other Go projects it is important to have image manipulation and analysis available from GO. This means we need a Go based library to read/modify/write/encode/decode CRIU's image files. Based on this library a Go based implementation of CRIT would be useful.
Links:
Use eBPF to lock and unlock the network[edit]
Summary: Use eBPF instead of external iptables-restore tool for network lock and unlock.
During checkpointing and restoring CRIU locks the network to make sure no network packets are accepted by the network stack during the time the process is checkpointed. Currently CRIU calls out to iptables-restore to create and delete the corresponding iptables rules. Another approach which avoids calling out to the external binary iptables-restore would be to directly inject eBPF rules. There have been reports from users that iptables-restore fails in some way and eBPF could avoid this external dependency.
Links:
- https://www.criu.org/TCP_connection#Checkpoint_and_restore_TCP_connection
- https://github.com/systemd/systemd/blob/master/src/core/bpf-firewall.c
- https://blog.zeyady.com/2021-08-16/gsoc-criu
Details:
- Contributor: Zeyad Yasser
- CRIU Pull Request
- Skill level: intermediate
- Language: C
- Expected size: 350 hours
- Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Prajwal S N <prajwalnadig21@gmail.com>
- Suggested by: Adrian Reber <areber@redhat.com>
Support sparse ghosts[edit]
Summary: While sparse ghost files were in part supported for quiet some time, we still was not able to handle big sparse ghost files and highly fragmented sparse ghost files effectively.
Merged: https://github.com/checkpoint-restore/criu/pull/1944 https://github.com/checkpoint-restore/criu/pull/1963
When criu dumps processes it also dumps files that are opened by them. It does this by saving file names by which the files are accessible. But sometimes files can have no names. It may happen if a task opened a file and then removed it. To dump this file criu cannot save its name (because the name doesn't exist). Instead criu saves the whole file. This is called "ghost file". Since saving the whole file is very expensive (copying lots of data on disk) criu limits the maximum size of a ghost file. The latter is also not good, because there are "sparse" files, that are large in size, but may be small from the real disk usage perspective. The goal of the task is to support sparse ghost files, i.e. limit the size of the ghost not by its length but by disk usage and when copying the data detect the used blocks and save only those.
Links: