Schedule
CRIU 2.x
With 2.x we've decided to make several technologies be available as
standalone projects (e.g. Compel ) and tune the development and
releases scheme to release new stuff faster than once every 3 monthes.
v. 2.0
New features
New code layout for sub-projects (e.g. Compel )
Unprivileged dump
Dump/check cpuinfo support for PPC
Explorers for CRIT
Added "post-setup-namespaces" to action scripts
Added timeout for dump procedure (5 sec by default)
Ability to override LSM profile on restore with CLI/RPC option
External bind mounts can be fs-root mounts too
Skip netns' internals on dump and restore (for Docker integration )
Advanced support for external files
C/R for
Mode and uid/gid of cgroup files and dirs
Freeze cgroup state (frozen/thawed)
Task's loginuid and oom score
Per-thread credentials
Filter mode of seccomp
Ghost file in removed directory
Ghost files lutimes
Binfmt-misc FS contents
Netfilter conntracks and expectations
Multi-headed cgroups
CGroup namespaces (no nesting) Optimizations/improvements
Align parasite stack on 16 bits for correctness
Compilation with native libc syscall wrappers and helpers
Parasite code injection done via memfd system call
Make vaddr to pfn conversion with one less syscall
CRIT shows device numbers in "maj:min" manner
CRIT shows mmap's status in verbose
Docker files for builds on all supported arches Fixes
Absent readlink syscall on ARM (use readlinkat instead) could cause dump to fail
Wrong argument to timer_create system call could cause restore to crash
Extra tasks in freeze cgroup caused dump to fail/hand/crash
Unaligned restore-time object allocations caused lock operations to fail
Opened /proc/pid dir of dead task failed the dump
Unaligned stacks caused criu to fail on aarch64
Changed device numbers on restore side could cause random failures
Fixes in mount points sharing/slavery/propagation restore
Race between mntns creation and fds closing in different tasks could cause restore to fail
Hard kernel limit on TCP repair recv queue restore could cause big queue restore to fail
Unconnected dgram UNIX socket with data lost packets on restore
CRIT didn't show IPC objects
CRIT didn't convert IP addresses in images
Logs from PIE code contained corrupted addresses and sizes
Not loaded netfilter modules could cause dump/restore to stuck on dumping netlink socket
Shared external mounts were restored with error Security
User-mode
When checking for namespaces' CRIU entered userns with host creds Deprecated/removed
Completely removed 'show' action. Use CRIT instead.
CRIU 1.x
At this point the project has proven to be doable and we've
concentrated on fulfilling its functionality, improving stability
and performance.
v. 1.8
New features
Ability to check CRIU features via RPC
New zdtm.py test suite
Pre-dump and pre-restore action scripts
The "info" action in CRIT showing stats about image file
More user-friendly output by CRIT
Python API -- pycriu
Ability to add custom paths to irmap scan
C/R of
read-only bind mounts
IPv6 routes and iptables rules
ip rules (it ip tool supports such)
ignore_routes_with_linkdown netns devconf
empty bridges in netns
FILTER mode of seccomp
IP_FREEBIND socket option Optimizations/improvements
Shared pie/non-pie .c files are built two times with proper flags
VDSO code re-shuffled for better re-use between arches
Failures of action scripts are reported in logs
OpenVZ's VENET handling is tuned to fit the current kernel state
Do not use hardcoded /dev/rts maj:min numbers
Unsupported socket protocols are reported at expected place
Slightly faster access to /proc files by using O_PATH open mode
Improved page-server dump speed by keeping control over the Nagle algorithm
Read pages.img in more optimal manner rather than page-by-page
Less "Error"-s in logs, that actually don't lead to errors
Slightly faster /proc/pid/status parsing
Dead/live-locks on internal criu locks now emits a warning into logs Fixes
Page server flooded node with tw buckets during migration
Turned off cgroups controllers weren't detected as such
Netns sysctls from old images weren't properly restored
Running process could be mistakenly stopped after --leave-running dump
Helper processes run by CRIU produced fake error messages in logs
Error code from sigaction restore could be missed
Several potential buffers overruns due to missed '\0' after strcpy-s existed
Killed processes after dump survived in zombie state for some time holding PIDs and resources
If task had MANY children, the latter could be skipped on dump
Task dying while being frozen could fail the dump
On Aarch64 the upper limit for user memory was not properly detected sometimes
Guess for TCP buffer max segment size was too optimistic (could fail the restore on low-mem machines)
CRIT didn't decode userns images
Ghost files were left in the FS tree after failed restore (blocking the next restore attempt)
Some log messages from pie code were lost
Some net/ipc/uts sysctls failed to restore in userns
Move tasks int cgroups failed in userns
Unsupported filesystems silently failed the dump
External tmpfs (and some other) mounts generated tarballs with their contents
Privately mapped files were picked from wrong mount namespace
Controlling tty could be restored on wrong tty end
Tmpfs mount of sub-namespace was restored from wrong image file
Potential stack overflow in libcriu
Partially-restored tasks could be left after failed restore
In-container TCP connection sometimes failed to restore
Race in sending SIGSTOP vs dump might cause dump to fail
Post-restore actions could generate stats files in wrong directories
Freeze-cgroup didn't take sub-cgroups' tasks into account
Tentative state in IPv6 sockets binding prevented socket from being bound immediately
Restoring from images with files pointing to /proc file of dead tasks could crash
Tasks with STOP in queue (i.e. -- not yet stopped) were CONT-ed in case of --leave-running dump
Stopped task with one more STOP in queue caused dump to stuck
If parent task left the MNT namespace it created for children restore could BUG()
Link-local IPv6 addresses sometimes failed to bind() at restore Security
Service run as root could allow users to violate ptrace policies
Service run as root could give users access to privileged files and directories v. 1.7.2
Fixes
Mounting container root on restore could sometimes switch to wrong root path
The slave/shared option for CT root was lost on restore
Duplicate slave mount points could appear on restore
Fanotifies (inotifies) could be restored in wrong mount namespace (though on correct inode)
Fanotifies (inotifies) on bind-mount-ed tmpfs file could fail the dump
Kernel threads found in tree (OpenVZ containers case) blocked the dump
Flat user namespace (0; \infty) restore failed
Off-by-one in unix socket name handling
IPC objects' UIDs and GIDs were not treated as userns ones
Rcv and Snd buffers for sockets grew 2 times on restore v. 1.7
New features
More flexible CGroups managing on restore
Support for seccomp strict mode
Support for stream unix sockets inheritance
Support uid/gid-restricted mounts in userns
Support deleted bind-mounts
Freezer cgroups can be used on dump to freeze fast-spawning processes
Ability to specify maximum ghost file size
OverlayFS support
Support relative unix sockets' bind paths
In libcriu
New set of calls using non-global opts
Ability to pass existing connection to service
Ability to start criu in swrk mode for all requests
Arch-specific improvements
Altivec and PSX support for PPC
Small PIE loader
Preparations for 32-bit x86 Optimizations/improvements
Temporary proc mountpoint is mounted with nosuid, noexec and nodev
Less memory copies when preparing restorer binary
CRIT action "show" for less keystrokes on common use-case
Fsnotify log messages now use hex everywhere :)
CRIT output doesn't mix fields any more Fixes
CRIU binary couldn't be installed independently from man pages
Root dir ignored in install: target
Bug in restoring PPC floating point register
SYSVIPC shmem was not attached on restore with PPC
AIO ring ID was erroneously close()-d
After dump+kill tasks remained in zombie states
Race in zombies vs proc proxy tasks deaths resulted in restore spurious failure
Restore got stuck when CRIU was called with blocked SIGCHILD
Wrong page size value could be used on some ARM compilations
Potential memory corruption when restoring an LSM profile
Opened /dev/kmsg in WRONLY mode failed the restore
Weird paths on tmpfs caused tar to fail
Temporary cgroup mount set (cgyard) got propagated into the host tree
Restore of inherited shared pipe failed
Spaces, tabs and backslashes in mountpoints' paths caused dump to fail
Tmpfs mounted with empty source caused dump to fail
The criu.pc file contained bad version when built from tarball
Deprecated -n option found in docs
On aarch64 the maximum virtual address available for user-space was wrongly hardcoded v. 1.6.1
New features
Support for relative paths for unix sockets Fixes
Crash when restoring netns from older images
Race between unix sockets' connect and listen may cause restore to fail
Multiple unix datagram clients restored server queue multiple times v. 1.6
New features
PowerPC 64bit LE support
Makefile.local for 3-rd party build rules
Ability to "enable" filesystem on dump (--enable-fs)
Ability to skip mountpoint on dump (--skip-mnt)
Prepare to deprecate "criu show" command
External mounts auto-detection
External siblings resolving
External sharing resolving
/dev/tty (current terminal) support
Netdev and netns (all/default) confs C/R
Images v1.1 with extra magic at head
Support fusectl (only ctl) mountpoint
Sub-version format is now as of git-describe
Apparamor labels C/R support Optimizations
Empty image files are not generated in image dir
/proc/pid/fd/locks support for faster and non-intrusive locks dump Fixes
Cscope scanned symlinks on make tags
Compilation with clang failed
Improper PAGE_SIZE constant was used on Aarch64
Selinux blocks attempt to inject parasite w/o any reasonable message
Error code masked on some error paths
O_APPEND files' changed size aborted restore
Errno value could be overwritten by logging
Mount namespace w/o /tmp could not be dumped
Stats file generated in wrong dir sometimes
MS_STRICTATIME mountpoint option was dropped on dump
Read-only tmpfs mount failed to restore
Some files were put into wrong places upon install target
Service couldn't be enabled via systemd ctl after manual installation
Parent's /proc/self files could be accessed by criu processes on restore
When meeting unknown image file CRIT exited with exception instead of printing sane error message v. 1.5.2
Fixes
Mutli-threaded tasks restored with error when --restore-sibling (Docker and LXC cases)
Service (and swrk) couldn't receive too big RPC messages v. 1.5.1
New features
Inheriting FDs now work in "swrk" RPC mode
Restored pid is reported in post-restore RPC notification Fixes
Uninitialized ss in sigframe causes C/R failures on 4.0 kernel
Cgroups' properties are initialized too late on restore
Cgroups' destruction isn't performed in non detached mode
Cgroups' destruction can fail on error paths v. 1.5
New features
CRIT tool
Ability to request CPU compatibility on instructions level only
C/R of empty AIO rings
More detailed errno report via RPC
Per-feature "criu check"
Inheriting FDs on restore
Ability to automatically move veth device to host-side bridge on netns restore
VT terminals support
More user namespaces C/R stuff
Optimizations
TCP send queue is restored in the maximal portions allowed by the kernel
Pre-loading sock-diag modules now happens in a more elegant way Fixes
Multi-threaded tasks on 64bit ARM could segfault upon restore
When doing "check" CRIU could leave un-killed piggie task
The --cpu-cap option argument was parsed with errors
Incorrect handling of --cpu-cap fpu compatibility mode on restore
Criu ignored trailing CLI arguments that resulted in usage confusions
Irmap hints didn't include common "/" path
When run per user request, CRIU left log and pid files belonging to root
Mappings on AUFS could be looked up on wrong mount point
Fixed compilation on Centos6.5
Wrong /proc was used when reading the list of FDs to close on restore
Race in restoring TCP established and listening sockets results in failed bind() on the latter
Legacy ttys errorneously treated as unix98
TTY pairs slavery setup could pick wrong peer
For user-dump the log and pid files still belonged to root
Task could die while being frozen thus causing dump to fail or save wrong task state
Failures in mount points validation and sharing resolving didn't abort the dump (error arose on restore) v. 1.4
New features
Dump and check cpuinfo . Needed to make sure CPU is capable to run the images after restore, e.g. during live migration
Initial support for user namespaces
Use memfd to restore shared memory segments
New (slightly faster) API for mm stuff restore via prctl
[UG]ID-s are dumped from parasite, not from /proc files
The docker_cr.sh script to show how Docker container C/R should (will) look like
New API for writing plugins (old one is still possible)
Service workers change their title to better look in ps output
Ability to feed socket for pre-dump and page-server in swrk mode
Page-server can auto-bind its port
Ability to perform several actions during one connection to RPC service
C/R of opened /proc/$pid/foo files of dead tasks
C/R of /dev/console
C/R of virtualized devtmpfs (openvz and future upstream kernels)
C/R of empty mqueue fs (posix message queues)
C/R of shared bind-mounts Optimizations
BFD engine
Faster that glibc's FILE * buffered read from /proc files
Buffered image files IO
Faster parasite/restorer unload
Use HW breakpoints
Less ptrace GETREGS calls sometimes
Wake pie after sending the FINI command to socket
Merged some pairs of images into one
eventpoll and -tfd
inotify and -wd
fsnotify and -mark
Less setns()-s on dump is much faster on older kernels
Faster access to /proc/self files -- cached fd of /proc/self and openat(this_cache) Fixes
Sibling restore mode didn't set up CRIU signals properly
Unpredictable sibling/child root task restore. Fixed with explicit CLI option
Validation for leaf mount points was skipped
Mount options were corrupted on dump, which resulted in errors bind mounts detection
Uninitialized properties of some cgroups prevented moving tasks into them (e.g. empty cpuset masks and low memcg limit)
File locks could belong to task with different pid (inherited on fork) blocked the dump
Bogus error printed in logs about SIGCHLD catch (was caused by thread dump using traps)
Irmap engine accessed freed root_task on pre-dump
Restore of net namespace could always fail (pid mismatch on fork) if kernel thread was created on netns setup
Cgroups service descriptor was closed too early and failed restore
Auto-loaded *diag modules caused audit netlink socket to contain data on dump (dump fails in this case)
The "(deleted)" prefix accumulated in unlinked files while doing C/R
The devpts filesystem and ptmx file were only dumped when found on /dev/pts and /dev respectively
Data in netlink socket and fanotify was lost after C/R (now dump is aborted if data found in it)
Fanotify mark was restore in different mount namespace
Images were writable by group. Not secure when user-dump was requested
Rootfs has parent id equal to self. CRIU didn't expect this and failed the dump
Shared mount of the --root path failed the restore
Absence (e.g. not compiled in) of any namespace in the kernel failed the dump
Page-server incremental dump didn't detect new tasks properly and failed the stage
Big TCP queues sometimes failed to get restored
Incremental pre-dump could lose track of memory changes by task v. 1.3.1
Fixes
Sibling restore mode didn't set up CRIU signals properly
Unpredictable sibling/child root task restore. Fixed with explicit CLI option
Validation for leaf mount points was skipped
Mount options were corrupted on dump, which resulted in errors bind mounts detection v. 1.3
New features
TimerFD support
VVAR area (newer kernels' part of VDSO) support
CGroups hierarchies support
AUFS support (for Docker)
PDeathSig support
Check for opened file's size on dump and restore is the same
Ability to restore tasks as children using libcriu (criu_restore_child
)
Add pkgconfig file for libcriu
CRTOOLS_IMAGE_DIR variable available in action scripts Optimizations
Merged images with pending signal into core
Per-task images with file locks are merged into one big image
Smaller tasks orchestration memory area on restore
Sigactions are inherited on restore when possible, not overwritten
ZDTM suite now executes tests in parallel Fixes
Dump failed if robust lists were off
Link remaps on tmpfs mounts were not dumped
Non root tasks with custom groups couldn't dump its peers (Security )
Opened and unlinked FIFOs, dirs and devices were restored as regular files
Files opened from alien mount namespace were restored in the local one
Link remap name sometimes was generated with error
Opened and removed cwd couldn't be restored
Sysctl kernel.msgmni was overwritten by subsequent auto_msgmni
Library and RPC APIs didn't match the CLI one
Some external mounts were constantly "postponed" and never got mounted
The self.mm_dumpable prlctl value of 2 caused restore to fail
Errors when writing sysctls with tail \n
The criu show
printed nested repeated fields corrupted
Dump stats were initialized with garbage
Restore sometimes stuck on waiting for inet socket port bind
Spurious SIGHUP when restoring slave ttys
Restore wasn't aborted if sub-task failed early v. 1.3-rc2
New features
Native (w/o plugins) c/r of external bind mounts
C/R of the info in which cgroups tasks live
C/R of task's dumpable flag
Dump pstore, securityfs, fusectl and debugfs mountpoints Fixes
VDSO was searched on stack's guard page
Mount namespace w/o /proc mount blocked the restore
Several misses in searching for COW VMA resulted in sub-optimal pages sharing on restore
FIFO-s path was restored in wrong mount namespace
Mountpoint fsnotify could be restored on a bind-mount
One tmpfs mounted several times was dumped several times
Bind-mount's root path of the top mount was calculated with error
Fix device number calculation out of major:minor on some distros
Devpts mount options got lost on dump
Page-pipes grew endlessly resulting in dump failures on big VMAs
IO and PF mappings were tried to be dumped
Two merged MAP_GROWSDOWN VMAs got dumped with overlapping guard page
Too small shared area was used to fetch tasks mappings that resulted in failed dump of huge mappings
Many fixes in build system
Zdtm's COW test sometimes ignored COW failures v. 1.3-rc1
New features
AArch64
Multiple mount namespaces
FPU state restore control
Restore old FPU state on newer CPUs
Ability to ignore FPU restoration
Support stopped multi-threaded tasks
CRIU now can execv() other binary right after restore is complete
Inode-reverse mapping can be enforced to allow live-migration with FS copying
Gold linker can now be used to compile CRIU
"Berserker" test to check CRIU scalability
Punch pages from mem images on restore (optimizes live-migration) Optimizations
Batched deduplication of memory images
Packed rlimits into core image
Packed timers into core image Fixes
Bad checks for kcmp()
ret codes resulted in errors in file sharing detection
Multiple mmaps of same files with different flags blocked the restore
Integer overflow in huge mapping restore caused restoration failure
devpts's newinstance
option was lost during dump
Subsequent dump could try to find old mem dump for newly forked task
Bad detection of overmounted mountpoints on fsnotify restore
Page-server could read partial message and failed
Errors in dumping of two subsequent anon VMAs in some cases
Irmap mis-compared devices for disk FSs
TMPFS handles always change during dump/restore
Pre-dump sometimes hangs on FIFOs
Post-restore script fails too late (if does it) v. 1.2
New features
Performance improvements
Shared entries in reg-files image
Less accesses to /proc/$pid/map_files
links
Cache for /proc/$pid/pagemap
reads
VDSO page is seeked only in anonymous mappings
Task's auxv is read in one call
Merged mm and vma image files for better packing
NFS inodes' path resolution (for fsnotify) cache
One readlink()
call when checking anon inodes
Don't dump kernel's zero-page
Parse fast /proc/self/maps
when searching for hole for restorer
A bit faster write into image files with writev()
Library versioning
RPC API got closer to CLI
New "post-restore" call in action scripts
Logrotate rules file
Default log file for service when starting via systemd Fixes
A lot for ARM cross-compile
Fsnotifies dumping didn't work on NFS
Images auto-deduplication only worked one level up
Packet socket ID was treated as file-descriptor and close()-d
Badly counted pages stats on restore
Linked remap name conflict when dump and restore on NFS
Sporadic failures in memory draining due to huge pipes used
Broken criu show
of repeated fields
Failure to open mountpoint in foreign pid namespace
Unlinked bound unix socket dump error
Small memory leak when writing to incremental image(s)
Restoring fsnotify for links results in ELOOP
Host's PATH is not suitable when execv-ing tar/ip/iptable to restore namespace (workaround, proper fix will be in 1.3)
Using subdirs in log file name via RPC breaks security v. 1.1
Fixes
Errors from memory dumping are not handled resulting in corrupted dumps
EOF detection in stacked images is done with error
Stacked images don't work on non-shared FS (missing pagemap-s) v. 1.1-rc2
Fixes
Crash in criu check
RPC check always fail on 3.11 kernel
Failed fork() didn't abort restore
Dump fail not reported via RPC
RPC client disconnect wasn't handled
Page server could connect to self for writing images
Hang on pre-dumping task livig in net-namespace
VDSO page mis-handle on pre-dump
FPU state loss on pre-dump
Memory tracking turns ON w/o request
Various fixes (and improvements) in build system v. 1.1-rc1
New features
libcriu.so -- wrapper library for RPC clients
Plugins
External unix sockets
External bind mounts
External net devices
Unknown file types
Images deduplication in incremental dumps
Integration with systemd
Filtering of criu show
output
Note : The API defined in the first two items above may change after -rc1
Fixes
Errors in unlinked files/sockets detection on BTRFS
NFS silly-rename files are not treated as unlinked
Freezer fail to seize quickly forking/pthread_create-ing tasks
Extra stop signal queued for stopped tasks after pre-dump
Wrong dying task state detection
Lost RPC dump response
Crash when reporting restore error via RPC
Negative return code into shell
Tasks left in wrong states after failed dump
A little bit more verbose check action
Coverity checks fail here and there v. 1.0
Fixes
After --leave-running linked remaps were not cleaned
TCP was left locked after --leave-running
Weak criteria in memory COW detection
Private mapping's premmapped address overwrote file ID
Restorer memory could overlap with timers/signals arrays
RPC worker reused options from service task
Suboptimal memory utilization by restorer arguments
TCP unsent/unacked data boundary was lost
Wrong dev_t decoding on 64 bit
Unpredictable daemons (service and page-service) working dir
Parasite stack could be corrupted by its arguments
Error from exe link restore was ignored
Artificial small limit on the number of shared memory segments to restore
Bug in ARM VFP restore
VDSO proxy was unmapped at the very end of restore New features
-W option to specify working dir
CHECK request in RPC
Optimized headers
More info in logs about undumpable files
More comments about tricky dump/restore places
Generic memory allocation for restorer
Proof-of-concept stage