Forensic analysis of container checkpoints
Summary: Extending go-crit with capabilities for forensic analysis
Merged: https://github.com/checkpoint-restore/checkpointctl
The go-crit tool was created during GSoC 2022 to enable analysis of CRIU images with tools written in Go. It allows container management tools such as checkpointctl and Podman to provide capabilities similar to CRIT. The goal of this project is to extend go-crit with functionality for forensic analysis of container checkpoints to provide a better user experience.
The go-crit tool is still in its early stages of development. To effectively utilise this new feature, the checkpointctl tool would be extended to display information about the processes included in a container checkpoint and their runtime state (e.g., memory, open files, sockets, etc).
Links:
- https://criu.org/CRIT_(Go_library)
- https://github.com/checkpoint-restore/go-criu/tree/master/crit
- https://kubernetes.io/blog/2022/12/05/forensic-container-checkpointing-alpha/
Restrict checks for open/mmaped files
Summary: Make sure the file opened (for fd or mapping) at restore is "the same" as it was on dump
Merged: https://github.com/checkpoint-restore/criu/pull/1123
CRIU doesn't carry files contents (except for ghost ones) into images. Thus on dump it saves some "meta" for file to validate it's "the same" on restore. Currently this meta includes only the file size. The task is to add some cookie value that's somehow affected by file's contents. This is primarily needed to reduce the possibility to restore with wrong libraries.
Links:
Optimize the pre-dump algorithm
Summary: Optimize the pre-dump algorithm to avoid pinning to many memory in RAM
Merged: https://github.com/checkpoint-restore/criu/commit/98608b90de0f853b1c8a6e15b312320e1441c359
Current pre-dump mode is used to write task memory contents into image files w/o stopping the task for too long. It does this by stopping the task, infecting it and draining all the memory into a set of pipes. Then the task is cured, resumed and the pipes' contents is written into images (maybe a page server). Unfortunately, this approach creates a big stress on the memory subsystem, as keeping all memory in pipes creates a lot of unreclaimable memory (pages in pipes are not swappable), as well as the number of pipes themselves can be huge, as one pipe doesn't store more than a fixed amount of data (see pipe(7) man page).
A solution for this problem is to use a sys_read_process_vm() syscall, which will mitigate all of the above. To do this we need to allocate a temporary buffer in criu, then walk the target process vm by copying the memory piece-by-piece into it, then flush the data into image (or page server), and repeat.
Ideally there should be sys_splice_process_vm() syscall in the kernel, that does the same as the read_process_vm does, but vmsplices the data
Links:
Porting crit functionalities in GO
Summary: Implement image view and manipulation in Go
Merged: https://github.com/checkpoint-restore/go-criu/pull/66
CRIU's checkpoint images are stored on disk using protobuf. For easier analysis of checkpoint files CRIU has a tool called CRiu Image Tool (CRIT). It can display/decode CRIU image files from binary protobuf to JSON as well as encode JSON files back to the binary format. With closer integration of CRIU in container runtimes it becomes important to be able to view the CRIU output files. Either for manipulation before restoring or for reading checkpoint statistics (memory pages written to disk, memory pages skipped, process downtime).
Currently CRIT is implemented in Python, for easier integration in other Go projects it is important to have image manipulation and analysis available from GO. This means we need a Go based library to read/modify/write/encode/decode CRIU's image files. Based on this library a Go based implementation of CRIT would be useful.
Links:
Support sparse ghosts
Summary: While sparse ghost files were in part supported for quiet some time, we still was not able to handle big sparse ghost files and highly fragmented sparse ghost files effectively.
Merged: https://github.com/checkpoint-restore/criu/pull/1944 https://github.com/checkpoint-restore/criu/pull/1963
When criu dumps processes it also dumps files that are opened by them. It does this by saving file names by which the files are accessible. But sometimes files can have no names. It may happen if a task opened a file and then removed it. To dump this file criu cannot save its name (because the name doesn't exist). Instead criu saves the whole file. This is called "ghost file". Since saving the whole file is very expensive (copying lots of data on disk) criu limits the maximum size of a ghost file. The latter is also not good, because there are "sparse" files, that are large in size, but may be small from the real disk usage perspective. The goal of the task is to support sparse ghost files, i.e. limit the size of the ghost not by its length but by disk usage and when copying the data detect the used blocks and save only those.
Links: