Written here are performance issues found.
Timing stats of live migration of a small container with 11 tasks is
- Total time ~3.5 seconds
- Frozen time ~3.0 seconds
- Pre-dump stages ~0.5 seconds each
- Restore time ~1.9 seconds
- Images transfer time ~0.3 seconds
Below is the list of issues found
Images IO
Too many write-s and read-s. Each small object issues a write/read request into file. Need to improve.
Dump
Surprisingly, but the mem-drain time is not the biggest. It's "only" ~0.02 seconds. There are places in code that take longer.
parse_smaps
Time spent in this routine is up to 0.2 seconds on dump. This one exploits /proc heavily. For a container with 11 tasks the syscall stats look like
834 read 1451 fstat 1462 close 1642 openat
while opens and stats happen on
193 openat(4, "map-symlink", O_RDONLY) = -1 ENOENT (No such file or directory)
1438 openat(4, "map-symlink", O_RDONLY) = 5
11 openat(AT_FDCWD, "/proc/$pid/map_files", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4
setns
Starting from the 2nd call, setns() calls synchronize_rcu in kernel. Result -- 4 setns are 6% of 4 tests dump time.
fopen/fgets/fclose
On some (all?) glibcs when doing fopen, fgets, fclose glibc mmap-s and then immediatelly unmaps one anon page. The thing is that for a tree of ~20 tasks these mmaps/munmaps take up to 8% of time and most likely work on the same virtual address. Knowing how kernel works, I can say that each access to this virtual address results in fresh page allocation. Keeping this page "in cache" might save some time.
Restore
Fork vs VMA restore
We restore task's mappings before it goes forking to handle COW. This effectively serializes forking.
Restoring VMAs
There are 4 stages in VMA restore. Relative times of each are below
- Reading images 1%
- Mapping huuge premap area << 1%
- (Re-)mapping sub-areas 73%
- Filling area with data 26%
The 3rd stage has two parts. With timings:
- Opening filemap fd 85%
- Maping vma 15%
Parse maps instead of smaps
When parsing self maps for restorer hint, it's enough to read smaller /proc/self/maps
Perf stats
Here's how perf tree looks like
10.69% criu [.] pb_write_one
|
--- pb_write_one
|
|--18.74%-- dump_task_rlims
| dump_one_task
| cr_dump_tasks
| main
| __libc_start_main
|
|--18.63%-- dump_task_mappings
| dump_one_task
| cr_dump_tasks
| main
| __libc_start_main
|
|--18.51%-- parasite_dump_sigacts_seized
| dump_one_task
| cr_dump_tasks
| main
| __libc_start_main
|
|--9.51%-- dump_one_task
| cr_dump_tasks
| main
| __libc_start_main
|
|--9.30%-- dump_one_reg_file
| dump_task_exe_link
| dump_task_mm
| dump_one_task
| cr_dump_tasks
| main
| __libc_start_main
|
|--9.20%-- dump_task_mm
| dump_one_task
| cr_dump_tasks
| main
| __libc_start_main
|
|--8.57%-- dump_task_fs
| dump_one_task
| cr_dump_tasks
| main
| __libc_start_main
|
--7.54%-- write_pagemap_loc
page_xfer_dump_pages
__parasite_dump_pages_seized
parasite_dump_pages_seized
dump_one_task
cr_dump_tasks
main
__libc_start_main
8.86% criu [.] generate_iovs
|
--- generate_iovs
__parasite_dump_pages_seized
parasite_dump_pages_seized
dump_one_task
cr_dump_tasks
main
__libc_start_main
8.59% criu [.] should_dump_page
|
--- should_dump_page
|
|--81.22%-- generate_iovs
| __parasite_dump_pages_seized
| parasite_dump_pages_seized
| dump_one_task
| cr_dump_tasks
| main
| __libc_start_main
|
--18.78%-- __parasite_dump_pages_seized
parasite_dump_pages_seized
dump_one_task
cr_dump_tasks
main
__libc_start_main
5.01% criu [.] is_vma_range_fmt
|
--- is_vma_range_fmt
|
|--80.15%-- parse_smaps
| collect_mappings
| dump_one_task
| cr_dump_tasks
| main
| __libc_start_main
|
--19.85%-- collect_mappings
dump_one_task
cr_dump_tasks
main
__libc_start_main
3.04% criu [.] print_on_level
|
--- print_on_level
|
|--33.73%-- parasite_wait_ack
| __parasite_wait_daemon_ack
| parasite_execute_daemon
| parasite_fixup_vdso
| dump_one_task
| cr_dump_tasks
| main
| __libc_start_main
|
|--33.28%-- parasite_start_daemon
| parasite_infect_seized
| dump_one_task
| cr_dump_tasks
| main
| __libc_start_main
|
--32.99%-- __parasite_wait_daemon_ack
parasite_drain_fds_seized
dump_task_files_seized
dump_one_task
cr_dump_tasks
main
__libc_start_main
3.03% criu [.] open_image_at
|
--- open_image_at
|
|--66.87%-- cr_fdset_open_range
| cr_task_fdset_open
| dump_one_task
| cr_dump_tasks
| main
| __libc_start_main
|
--33.13%-- dump_task_files_seized
dump_one_task
cr_dump_tasks
main
__libc_start_main
3.00% criu [.] parse_vmflags
|
--- parse_vmflags
|
|--66.59%-- parse_smaps
| collect_mappings
| dump_one_task
| cr_dump_tasks
| main
| __libc_start_main
|
--33.41%-- collect_mappings
dump_one_task
cr_dump_tasks
main
__libc_start_main
3.00% criu [.] vma_opt_str
|
--- vma_opt_str
|
|--66.31%-- pr_vma
| |
| |--50.06%-- dump_task_mappings
| | dump_one_task
| | cr_dump_tasks
| | main
| | __libc_start_main
| |
| --49.94%-- collect_mappings
| dump_one_task
| cr_dump_tasks
| main
| __libc_start_main
|
--33.69%-- collect_mappings
dump_one_task
cr_dump_tasks
main
__libc_start_main
2.19% criu [.] do_open_proc
|
--- do_open_proc
|
|--36.18%-- parse_pid_stat_small
| seize_task
| get_children
| collect_task
| collect_pstree
| cr_dump_tasks
| main
| __libc_start_main
|
|--31.99%-- parse_threads
| collect_threads
| collect_task
| get_children
| collect_task
| collect_pstree
| cr_dump_tasks
| main
| __libc_start_main
|
--31.84%-- seize_task
get_children
collect_task
collect_pstree
cr_dump_tasks
main
__libc_start_main
2.03% criu [.] parse_smaps
|
--- parse_smaps
collect_mappings
dump_one_task
cr_dump_tasks
main
__libc_start_main
2.02% criu [.] try_add_page_to
|
--- try_add_page_to
try_add_page
page_pipe_add_page
generate_iovs
__parasite_dump_pages_seized
parasite_dump_pages_seized
dump_one_task
cr_dump_tasks
main
__libc_start_main
2.02% criu [.] ptrace_peek_area
|
--- ptrace_peek_area
ptrace_swap_area
__parasite_execute_syscall
syscall_seized
mmap_seized
parasite_map_exchange
parasite_infect_seized
dump_one_task
cr_dump_tasks
main
__libc_start_main
2.01% criu [.] collect_mappings
|
--- collect_mappings
dump_one_task
cr_dump_tasks
main
__libc_start_main
2.01% criu [.] __parasite_dump_pages_seized
|
--- __parasite_dump_pages_seized
parasite_dump_pages_seized
dump_one_task
cr_dump_tasks
main
__libc_start_main
2.00% criu [.] get_task_regs
|
--- get_task_regs
parasite_start_daemon
parasite_infect_seized
dump_one_task
cr_dump_tasks
main
__libc_start_main
2.00% criu [.] parasite_args_s
|
--- parasite_args_s
|
|--50.05%-- parasite_dump_pages_seized
| dump_one_task
| cr_dump_tasks
| main
| __libc_start_main
|
--49.95%-- dump_one_task
cr_dump_tasks
main
__libc_start_main
2.00% criu [.] parasite_stop_on_syscall
|
--- parasite_stop_on_syscall
parasite_unmap
parasite_cure_remote
parasite_cure_seized
dump_one_task
cr_dump_tasks
main
__libc_start_main
1.97% criu [.] open_pid_proc
|
--- open_pid_proc
|
|--51.54%-- parasite_fixup_vdso
| dump_one_task
| cr_dump_tasks
| main
| __libc_start_main
|
--48.46%-- do_open_proc
dump_task_fs
dump_one_task
cr_dump_tasks
main
__libc_start_main
1.55% criu [.] dump_one_file
|
--- dump_one_file
dump_task_files_seized
dump_one_task
cr_dump_tasks
main
__libc_start_main
1.13% criu [.] alloc_cr_fdset
|
--- alloc_cr_fdset
cr_task_fdset_open
dump_one_task
cr_dump_tasks
main
__libc_start_main
1.03% criu [.] unseize_task_and_threads
|
--- unseize_task_and_threads
pstree_switch_state
cr_dump_tasks
main
__libc_start_main
1.02% criu [.] timeval_accumulate
|
--- timeval_accumulate
timing_stop
__parasite_dump_pages_seized
parasite_dump_pages_seized
dump_one_task
cr_dump_tasks
main
__libc_start_main
1.01% criu [.] parasite_dump_misc_seized
|
--- parasite_dump_misc_seized
dump_one_task
cr_dump_tasks
main
__libc_start_main
1.01% criu [.] parse_posix_timers
|
--- parse_posix_timers
dump_one_task
cr_dump_tasks
main
__libc_start_main
1.01% criu [.] INIT_LIST_HEAD
|
--- INIT_LIST_HEAD
__parasite_dump_pages_seized
parasite_dump_pages_seized
dump_one_task
cr_dump_tasks
main
__libc_start_main