Changes

m
no edit summary
Line 9: Line 9:  
== Project ideas ==
 
== Project ideas ==
   −
=== Add support for memory compression ===
+
=== Kubernetes Operator for Automated Checkpointing ===
+
 
'''Summary:''' Support compression for page images
+
'''Summary:''' Extend the Checkpoint/Restore Operator with support for automated policy-based checkpointing.
+
 
We would like to support memory page files compression
+
The [https://github.com/checkpoint-restore/checkpoint-restore-operator Checkpoint/Restore Operator] for Kubernetes currently supports only policies and parameters that limit the number of checkpoints. This project aims to extend the current support with automated policy-based checkpointing, allowing users to define triggers for checkpoint creation, such as time-based schedules, resource thresholds (CPU, memory, I/O usage), Kubernetes events (node drain, pod eviction, preemption), and application-level signals or annotations.
in CRIU using one of the fastest algorithms (it's matter
+
 
of discussion which one to choose!).
+
'''Links:'''
 +
* https://github.com/checkpoint-restore/checkpoint-restore-operator
 +
* https://kubernetes.io/docs/reference/node/kubelet-checkpoint-api
 +
 
 +
'''Details:'''
 +
* Skill level: intermediate
 +
* Language: Go
 +
* Expected size: 350 hours
 +
* Mentors: Viktória Spišaková <spisakova@ics.muni.cz>, Radostin Stoyanov <rstoyanov@fedoraproject.org>, Adrian Reber <areber@redhat.com>
 +
 
 +
=== Forensic Checkpointing Framework for Kubernetes ===
 +
 
 +
Kubernetes provides a highly dynamic and ephemeral environment where workloads can start and disappear very quickly and are continuously being rescheduled across different nodes in the cluster.
 +
One of the key challenges with forensic investigations in Kubernetes is capturing and preserving the evidence during security incidents. This project aims to address this problem by developing a framework for efficiently capturing and preserving the state of all running applications in a container at a specific point in time, along with the associated container configurations and metadata. These artifacts would allow investigators to accurately reconstruct the events, create a timeline, and analyze security incidents without impacting the running cluster. This is an important step towards enabling forensic readiness for Kubernetes, where cluster administrators proactively ensure the environments are prepared to collect and preserve evidence before a security incident occurs.
 +
 
 +
'''Links:'''
 +
* https://github.com/checkpoint-restore/checkpointctl
 +
* [https://fosdem.org/2026/events/attachments/F9RANH-forensic-snapshots-in-kubernetes/slides/266249/fosdem_2_4dh73ni.pdf Investigating Security Incidents with Forensic Snapshots in Kubernetes]
 +
* [https://www.cncf.io/reports/cloud-native-security-whitepaper/ Cloud Native Security Whitepaper]
 +
* [https://media.defense.gov/2022/Aug/29/2003066362/-1/-1/0/CTR_KUBERNETES_HARDENING_GUIDANCE_1.2_20220829.PDF Kubernetes Hardening Guide]
   −
This task does not require any Linux kernel modifications
  −
and scope is limited to CRIU itself. At the same time it's
  −
complex enough as we need to touch memory dump/restore codepath
  −
in CRIU and also handle many corner cases like page-server and stuff.
  −
   
'''Details:'''
 
'''Details:'''
 
* Skill level: intermediate
 
* Skill level: intermediate
* Language: C
+
* Language: Go
 
* Expected size: 350 hours
 
* Expected size: 350 hours
* Suggested by: Andrei Vagin <avagin@gmail.com>
+
* Mentors: Lorena Goldoni <lory.goldoni@gmail.com>, Radostin Stoyanov <rstoyanov@fedoraproject.org>, Adrian Reber <areber@redhat.com>
* Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Alexander Mikhalitsyn <alexander@mihalicyn.com>, Andrei Vagin <avagin@gmail.com>
     −
=== Use eBPF to lock and unlock the network ===
+
=== Enabling Checkpoint/Restore of Rootless Containers ===
  −
'''Summary:''' Use eBPF instead of external iptables-restore tool for network lock and unlock.
     −
During checkpointing and restoring CRIU locks the network to make sure no network packets are accepted by the network stack during the time the process is checkpointed. Currently CRIU calls out to iptables-restore to create and delete the corresponding iptables rules. Another approach which avoids calling out to the external binary iptables-restore would be to directly inject eBPF rules. There have been reports from users that iptables-restore fails in some way and eBPF could avoid this external dependency.
+
[https://rootlesscontaine.rs/ Rootless containers] are containers that can be created, run, and managed by unprivileged users. Container engines such as Podman natively support running containers in a rootless mode to improve security and usability. While checkpoint/restore functionality is already available for rootful containers and unprivileged checkpointing is possible with the <code>CAP_CHECKPOINT_RESTORE</code> capability, container engines do not yet support native checkpointing of containers running in rootless mode. This project aims to explore and address the remaining challenges required to enable unprivileged checkpoint/restore for rootless containers.
    
'''Links:'''
 
'''Links:'''
* https://www.criu.org/TCP_connection#Checkpoint_and_restore_TCP_connection
+
* https://github.com/checkpoint-restore/criu/pull/1930
* https://github.com/systemd/systemd/blob/master/src/core/bpf-firewall.c
+
* https://github.com/torvalds/linux/commit/124ea650d3072b005457faed69909221c2905a1f
* https://blog.zeyady.com/2021-08-16/gsoc-criu
+
* https://src.fedoraproject.org/rpms/criu/pull-request/10#request_diff
    
'''Details:'''
 
'''Details:'''
 
* Skill level: intermediate
 
* Skill level: intermediate
* Language: C
+
* Language: C, Go
 
* Expected size: 350 hours
 
* Expected size: 350 hours
* Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Prajwal S N <prajwalnadig21@gmail.com>
+
* Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Adrian Reber <areber@redhat.com>
* Suggested by: Adrian Reber <areber@redhat.com>
      
=== Files on detached mounts ===
 
=== Files on detached mounts ===
Line 103: Line 113:  
* Language: C
 
* Language: C
 
* Expected size: 350 hours
 
* Expected size: 350 hours
* Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Pavel Tikhomirov <ptikhomirov@virtuozzo.com>, Prajwal S N <prajwalnadig21@gmail.com>
+
* Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
 
* Suggested by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
 
* Suggested by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
    +
=== Add support for SCM_CREDENTIALS / SCM_PIDFD and friends ===
   −
=== Add support for arm64 Guarded Control Stack (GCS) ===
+
'''Summary:''' Support for SCM_CREDENTIALS / SCM_PIDFD
  −
'''Summary:''' Support arm64 Guarded Control Stack (GCS)
  −
  −
The arm64 Guarded Control Stack (GCS) feature provides support for
  −
hardware protected stacks of return addresses, intended to provide
  −
hardening against return oriented programming (ROP) attacks and to make
  −
it easier to gather call stacks for applications such as profiling (taken from [1]).
  −
We would like to support arm64 Guarded Control Stack (GCS) in CRIU, which means
  −
that CRIU should be able to Checkpoint/Restore applications using GCS.
     −
This task should not require any Linux kernel modifications
+
SCM_CREDENTIALS and SCM_PIDFD are types of SCM (Socket-level Control Messages). They play a crucial role
but will require a lot of effort to understand Linux kernel and
+
in systemd and many other user space applications. This project is about adding support for these
glibc support patches. We have a good example of support for
+
SCMs to be properly saved and restored back with CRIU. There is an existing code in OpenVZ CRIU fork,
x86 shadow stack [4].
+
see [1] and [2]. Goal would be first of all to properly port this code, cover with extensive tests and
 +
ensure that SCM_PIDFD / SO_PEERPIDFD are handled correctly. Also we expect to cover things like
 +
SO_PASSRIGHTS and SO_PASSPIDFD.
   −
'''Links:'''
+
There is some extra source of complexity here pidfds can be "stale" (see PIDFD_STALE in Linux kernel)
* [1] kernel support https://lore.kernel.org/all/20241001-arm64-gcs-v13-0-222b78d87eee@kernel.org
+
and we need to ensure that we properly cover those cases.
* [2] libc support https://inbox.sourceware.org/libc-alpha/20250117174119.3254972-1-yury.khrustalev@arm.com
  −
* [3] libc tests https://inbox.sourceware.org/libc-alpha/20250210114538.1723249-1-yury.khrustalev@arm.com
  −
* [4] x86 support https://github.com/checkpoint-restore/criu/pull/2306
  −
  −
'''Details:'''
  −
* Skill level: expert (a lot of moving parts: Linux kernel / libc / CRIU)
  −
* Language: C
  −
* Expected size: 350 hours
  −
* Suggested by: Mike Rapoport <rppt@kernel.org>
  −
* Mentors: Mike Rapoport <rppt@kernel.org>, Andrei Vagin <avagin@gmail.com>, Alexander Mikhalitsyn <alexander@mihalicyn.com>
  −
 
  −
=== Coordinated checkpointing of distributed applications ===
  −
  −
'''Summary:''' Enable coordinated container checkpointing with Kubernetes.
  −
 
  −
Checkpointing support has been recently introduced in Kubernetes, where the
  −
smallest deployable unit is a Pod (a group of containers).  Kubernetes is often
  −
used to deploy applications that are distributed across multiple nodes.
  −
However, checkpointing such distributed applications requires a coordination
  −
mechanism to synchronize the checkpoint and restore operations. To address this
  −
challenge, we have developed a new tool called <code>criu-coordinator</code>
  −
that relies on the action-script functionality of CRIU to enable synchronization
  −
in distributed environments. This project aims to extend this tool to enable
  −
seamless integration with the checkpointing functionality of Kubernetes.
      
'''Links:'''
 
'''Links:'''
* https://github.com/checkpoint-restore/criu-coordinator
+
* [1] openvz-criu https://bitbucket.org/openvz/criu.ovz/history-node/918653a0a343194385592d7b50b5bd7a8fbe1cc1/criu/sk-unix.c?at=hci-dev
* https://lpc.events/event/18/contributions/1803/
+
* [2] openvz-criu https://bitbucket.org/openvz/criu.ovz/history-node/918653a0a343194385592d7b50b5bd7a8fbe1cc1/criu/sk-queue.c?at=hci-dev
* https://sched.co/1YeT4
+
* [3] Linux kernel https://github.com/torvalds/linux/commit/5e2ff6704a275be009be8979af17c52361b79b89
* https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/
+
* [4] Linux kernel https://github.com/torvalds/linux/commit/c679d17d3f2d895b34e660673141ad250889831f
    
'''Details:'''
 
'''Details:'''
* Skill level: intermediate
+
* Skill level: intermediate / advanced
* Language: Rust / Go / C
+
* Language: C
 
* Expected size: 350 hours
 
* Expected size: 350 hours
* Mentors: Radostin Stoyanov <rstoyanov@fedoraproject.org>, Prajwal S N <prajwalnadig21@gmail.com>
+
* Suggested by: Alexander Mikhalitsyn <alexander@mihalicyn.com>
* Suggested by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
+
* Mentors: Andrei Vagin <avagin@gmail.com>, Alexander Mikhalitsyn <alexander@mihalicyn.com>
    
== Suspended project ideas ==
 
== Suspended project ideas ==
569

edits