Changes

1,982 bytes added ,  06:37, 13 December 2025
m
Line 1: Line 1: −
Container checkpointing was introduced as an alpha feature in Kubernetes v1.25 and graduated to beta in Kubernetes v1.30. This functionality allows running containers to be transparently checkpointed to persistent storage and later restored to resume execution, or migrated across nodes and clusters. The content of container checkpoints can be further analyzed with the [https://github.com/checkpoint-restore/checkpointctl checkpointctl] tool. This allows to perform forensic analysis in case of security incidents (e.g., suspected compromise, data exfiltration) or application failures by inspecting the saved process memory, open files, sockets, and execution context captured in the checkpoint.
+
[[Image:K8s-cr-arch-v2.png|right|500px|thumb|Overview of container checkpoint/restore in Kubernetes.]]
 +
 
 +
Container checkpointing was introduced as an alpha feature in Kubernetes v1.25 and graduated to beta in Kubernetes v1.30. This functionality allows running containers to be transparently checkpointed to persistent storage and later restored to resume execution, or migrated across nodes and clusters.
 +
 
 +
The content of container checkpoints can be further analyzed with the [https://github.com/checkpoint-restore/checkpointctl checkpointctl] tool. This allows to perform forensic analysis in case of security incidents (e.g., suspected compromise, data exfiltration) or application failures by inspecting the saved process memory, open files, sockets, and execution context captured in the checkpoint.
 +
 
 +
This feature is developed as a community-driven effort at the [https://github.com/kubernetes/community/tree/master/wg-checkpoint-restore Kubernetes Checkpoint/Restore Working Group]. If you want to get more involved by contributing to Kubernetes, join our [https://groups.google.com/a/kubernetes.io/g/wg-checkpoint-restore mailing list] and Slack channel at [https://kubernetes.slack.com/messages/wg-checkpoint-restore #wg-checkpoint-restore].
    
== Kubelet Checkpoint API ==
 
== Kubelet Checkpoint API ==
Line 163: Line 169:  
=== Restoring Container ===
 
=== Restoring Container ===
   −
To restore a container from a checkpoint, specify the OCI image containing the checkpoint in the container's <code>image</code> field.
+
To restore a container from a checkpoint, specify the OCI image containing the checkpoint in the container's <code>image</code> field. When creating a container, CRI-O and containerd detect OCI images with a checkpoint annotation and, instead of a normal start, restore it from the checkpoint. The following example shows how the YAML file used above can be modified to restore the container from a checkpoint:
    
<pre>
 
<pre>
Line 176: Line 182:  
       image: quay.io/foo/bar:latest  # Replace with checkpoint image URI
 
       image: quay.io/foo/bar:latest  # Replace with checkpoint image URI
 
EOF
 
EOF
</pre>
     −
<pre>
   
kubectl apply -f restore-pod.yaml
 
kubectl apply -f restore-pod.yaml
 
</pre>
 
</pre>
 +
 +
== Related Publications, Talks & Blog Posts ==
 +
* Research Papers
 +
** [https://radostin.io/files/vspisakova-jsspp25.pdf Kubernetes Scheduling with Checkpoint/Restore: Challenges and Open Problems]
 +
** [https://doi.org/10.48550/arXiv.2502.16631 CRIUgpu: Transparent Checkpointing of GPU-Accelerated Workloads]
 +
** [https://doi.org/10.1145/3678015.3680477 Towards Efficient End-to-End Encryption for Container Checkpointing Systems]
 +
 +
* KubeCon & CloudNative Talks
 +
** [https://kccnceu2025.sched.com/event/1tx7i Efficient Transparent Checkpointing of AI ML Workloads in Kubernetes]
 +
** [https://sched.co/1dCVs End-to-End Encryption for Container Checkpointing in Kubernetes]
 +
** [https://sched.co/1YeT4 Enabling Coordinated Checkpointing for Distributed HPC Applications]
 +
 +
* Kubernetes Blog Articles
 +
** [https://kubernetes.io/blog/2023/03/10/forensic-container-analysis/ Forensic Container Analysis]
 +
** [https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/ Forensic Container Checkpointing in Kubernetes]
 +
 +
* NVIDIA Technical Blog
 +
** [https://developer.nvidia.com/blog/checkpointing-cuda-applications-with-criu Checkpointing CUDA Applications with CRIU]
571

edits