Performance research
Written here are performance issues found.
Timing stats of live migration of a small container with 11 tasks is
- Total time ~3.5 seconds
- Frozen time ~3.0 seconds
- Pre-dump stages ~0.5 seconds each
- Restore time ~1.9 seconds
- Images transfer time ~0.3 seconds
Below is the list of issues found
Images IO
Too many write-s and read-s. Each small object issues a write/read request into file. Need to improve.
Dump
Surprisingly, but the mem-drain time is not the biggest. It's "only" ~0.02 seconds. There are places in code that take longer.
parse_smaps
Time spent in this routine is up to 0.2 seconds on dump. This one exploits /proc heavily. For a container with 11 tasks the syscall stats look like
834 read 1451 fstat 1462 close 1642 openat
while opens and stats happen on
193 openat(4, "map-symlink", O_RDONLY) = -1 ENOENT (No such file or directory) 1438 openat(4, "map-symlink", O_RDONLY) = 5 11 openat(AT_FDCWD, "/proc/$pid/map_files", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4
setns
Starting from the 2nd call, setns() calls synchronize_rcu in kernel. Result -- 4 setns are 6% of 4 tests dump time.
fopen/fgets/fclose
On some (all?) glibcs when doing fopen, fgets, fclose glibc mmap-s and then immediatelly unmaps one anon page. The thing is that for a tree of ~20 tasks these mmaps/munmaps take up to 8% of time and most likely work on the same virtual address. Knowing how kernel works, I can say that each access to this virtual address results in fresh page allocation. Keeping this page "in cache" might save some time.
Restore
Fork vs VMA restore
We restore task's mappings before it goes forking to handle COW. This effectively serializes forking.
Restoring VMAs
There are 4 stages in VMA restore. Relative times of each are below
- Reading images 1%
- Mapping huuge premap area << 1%
- (Re-)mapping sub-areas 73%
- Filling area with data 26%
The 3rd stage has two parts. With timings:
- Opening filemap fd 85%
- Maping vma 15%
Parse maps instead of smaps
When parsing self maps for restorer hint, it's enough to read smaller /proc/self/maps
Perf stats
Here's how perf tree looks like
10.69% criu [.] pb_write_one | --- pb_write_one | |--18.74%-- dump_task_rlims | dump_one_task | cr_dump_tasks | main | __libc_start_main | |--18.63%-- dump_task_mappings | dump_one_task | cr_dump_tasks | main | __libc_start_main | |--18.51%-- parasite_dump_sigacts_seized | dump_one_task | cr_dump_tasks | main | __libc_start_main | |--9.51%-- dump_one_task | cr_dump_tasks | main | __libc_start_main | |--9.30%-- dump_one_reg_file | dump_task_exe_link | dump_task_mm | dump_one_task | cr_dump_tasks | main | __libc_start_main | |--9.20%-- dump_task_mm | dump_one_task | cr_dump_tasks | main | __libc_start_main | |--8.57%-- dump_task_fs | dump_one_task | cr_dump_tasks | main | __libc_start_main | --7.54%-- write_pagemap_loc page_xfer_dump_pages __parasite_dump_pages_seized parasite_dump_pages_seized dump_one_task cr_dump_tasks main __libc_start_main
8.86% criu [.] generate_iovs | --- generate_iovs __parasite_dump_pages_seized parasite_dump_pages_seized dump_one_task cr_dump_tasks main __libc_start_main
8.59% criu [.] should_dump_page | --- should_dump_page | |--81.22%-- generate_iovs | __parasite_dump_pages_seized | parasite_dump_pages_seized | dump_one_task | cr_dump_tasks | main | __libc_start_main | --18.78%-- __parasite_dump_pages_seized parasite_dump_pages_seized dump_one_task cr_dump_tasks main __libc_start_main
5.01% criu [.] is_vma_range_fmt | --- is_vma_range_fmt | |--80.15%-- parse_smaps | collect_mappings | dump_one_task | cr_dump_tasks | main | __libc_start_main | --19.85%-- collect_mappings dump_one_task cr_dump_tasks main __libc_start_main
3.04% criu [.] print_on_level | --- print_on_level | |--33.73%-- parasite_wait_ack | __parasite_wait_daemon_ack | parasite_execute_daemon | parasite_fixup_vdso | dump_one_task | cr_dump_tasks | main | __libc_start_main | |--33.28%-- parasite_start_daemon | parasite_infect_seized | dump_one_task | cr_dump_tasks | main | __libc_start_main | --32.99%-- __parasite_wait_daemon_ack parasite_drain_fds_seized dump_task_files_seized dump_one_task cr_dump_tasks main __libc_start_main
3.03% criu [.] open_image_at | --- open_image_at | |--66.87%-- cr_fdset_open_range | cr_task_fdset_open | dump_one_task | cr_dump_tasks | main | __libc_start_main | --33.13%-- dump_task_files_seized dump_one_task cr_dump_tasks main __libc_start_main
3.00% criu [.] parse_vmflags | --- parse_vmflags | |--66.59%-- parse_smaps | collect_mappings | dump_one_task | cr_dump_tasks | main | __libc_start_main | --33.41%-- collect_mappings dump_one_task cr_dump_tasks main __libc_start_main
3.00% criu [.] vma_opt_str | --- vma_opt_str | |--66.31%-- pr_vma | | | |--50.06%-- dump_task_mappings | | dump_one_task | | cr_dump_tasks | | main | | __libc_start_main | | | --49.94%-- collect_mappings | dump_one_task | cr_dump_tasks | main | __libc_start_main | --33.69%-- collect_mappings dump_one_task cr_dump_tasks main __libc_start_main
2.19% criu [.] do_open_proc | --- do_open_proc | |--36.18%-- parse_pid_stat_small | seize_task | get_children | collect_task | collect_pstree | cr_dump_tasks | main | __libc_start_main | |--31.99%-- parse_threads | collect_threads | collect_task | get_children | collect_task | collect_pstree | cr_dump_tasks | main | __libc_start_main | --31.84%-- seize_task get_children collect_task collect_pstree cr_dump_tasks main __libc_start_main
2.03% criu [.] parse_smaps | --- parse_smaps collect_mappings dump_one_task cr_dump_tasks main __libc_start_main
2.02% criu [.] try_add_page_to | --- try_add_page_to try_add_page page_pipe_add_page generate_iovs __parasite_dump_pages_seized parasite_dump_pages_seized dump_one_task cr_dump_tasks main __libc_start_main
2.02% criu [.] ptrace_peek_area | --- ptrace_peek_area ptrace_swap_area __parasite_execute_syscall syscall_seized mmap_seized parasite_map_exchange parasite_infect_seized dump_one_task cr_dump_tasks main __libc_start_main
2.01% criu [.] collect_mappings | --- collect_mappings dump_one_task cr_dump_tasks main __libc_start_main
2.01% criu [.] __parasite_dump_pages_seized | --- __parasite_dump_pages_seized parasite_dump_pages_seized dump_one_task cr_dump_tasks main __libc_start_main
2.00% criu [.] get_task_regs | --- get_task_regs parasite_start_daemon parasite_infect_seized dump_one_task cr_dump_tasks main __libc_start_main
2.00% criu [.] parasite_args_s | --- parasite_args_s | |--50.05%-- parasite_dump_pages_seized | dump_one_task | cr_dump_tasks | main | __libc_start_main | --49.95%-- dump_one_task cr_dump_tasks main __libc_start_main
2.00% criu [.] parasite_stop_on_syscall | --- parasite_stop_on_syscall parasite_unmap parasite_cure_remote parasite_cure_seized dump_one_task cr_dump_tasks main __libc_start_main
1.97% criu [.] open_pid_proc | --- open_pid_proc | |--51.54%-- parasite_fixup_vdso | dump_one_task | cr_dump_tasks | main | __libc_start_main | --48.46%-- do_open_proc dump_task_fs dump_one_task cr_dump_tasks main __libc_start_main
1.55% criu [.] dump_one_file | --- dump_one_file dump_task_files_seized dump_one_task cr_dump_tasks main __libc_start_main
1.13% criu [.] alloc_cr_fdset | --- alloc_cr_fdset cr_task_fdset_open dump_one_task cr_dump_tasks main __libc_start_main
1.03% criu [.] unseize_task_and_threads | --- unseize_task_and_threads pstree_switch_state cr_dump_tasks main __libc_start_main
1.02% criu [.] timeval_accumulate | --- timeval_accumulate timing_stop __parasite_dump_pages_seized parasite_dump_pages_seized dump_one_task cr_dump_tasks main __libc_start_main
1.01% criu [.] parasite_dump_misc_seized | --- parasite_dump_misc_seized dump_one_task cr_dump_tasks main __libc_start_main
1.01% criu [.] parse_posix_timers | --- parse_posix_timers dump_one_task cr_dump_tasks main __libc_start_main
1.01% criu [.] INIT_LIST_HEAD | --- INIT_LIST_HEAD __parasite_dump_pages_seized parasite_dump_pages_seized dump_one_task cr_dump_tasks main __libc_start_main