|
|
(4 intermediate revisions by one other user not shown) |
Line 10: |
Line 10: |
| | | |
| Below is the list of issues found | | Below is the list of issues found |
− |
| |
− | == Images IO ==
| |
− |
| |
− | Too many write-s and read-s. Each small object issues a write/read request into file. Need to improve.
| |
| | | |
| == Dump == | | == Dump == |
Line 33: |
Line 29: |
| 1438 openat(4, "map-symlink", O_RDONLY) = 5 | | 1438 openat(4, "map-symlink", O_RDONLY) = 5 |
| 11 openat(AT_FDCWD, "/proc/$pid/map_files", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4 | | 11 openat(AT_FDCWD, "/proc/$pid/map_files", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4 |
− |
| |
− | === setns ===
| |
− |
| |
− | Starting from the 2nd call, setns() calls synchronize_rcu in kernel. Result -- 4 setns are 6% of 4 tests dump time.
| |
− |
| |
− | === fopen/fgets/fclose ===
| |
− |
| |
− | On some (all?) glibcs when doing fopen, fgets, fclose glibc mmap-s and then immediatelly unmaps one anon page. The thing is that for a tree of ~20 tasks these mmaps/munmaps take up to 8% of time and most likely work on the same virtual address. Knowing how kernel works, I can say that each access to this virtual address results in fresh page allocation. Keeping this page "in cache" might save some time.
| |
− |
| |
| | | |
| == Restore == | | == Restore == |
Line 54: |
Line 41: |
| | | |
| * Reading images 1% | | * Reading images 1% |
− | * Mapping huuge premap area << 1% | + | * Mapping huge premap area << 1% |
| * (Re-)mapping sub-areas 73% | | * (Re-)mapping sub-areas 73% |
| * Filling area with data 26% | | * Filling area with data 26% |
Line 63: |
Line 50: |
| * Maping vma 15% | | * Maping vma 15% |
| | | |
− | === Parse maps instead of smaps ===
| |
− |
| |
− | When parsing self maps for restorer hint, it's enough to read smaller /proc/self/maps
| |
− |
| |
− | == Perf stats ==
| |
− |
| |
− | Here's how perf tree looks like
| |
− |
| |
− | 10.69% criu [.] pb_write_one
| |
− | |
| |
− | --- pb_write_one
| |
− | |
| |
− | |--18.74%-- dump_task_rlims
| |
− | | dump_one_task
| |
− | | cr_dump_tasks
| |
− | | main
| |
− | | __libc_start_main
| |
− | |
| |
− | |--18.63%-- dump_task_mappings
| |
− | | dump_one_task
| |
− | | cr_dump_tasks
| |
− | | main
| |
− | | __libc_start_main
| |
− | |
| |
− | |--18.51%-- parasite_dump_sigacts_seized
| |
− | | dump_one_task
| |
− | | cr_dump_tasks
| |
− | | main
| |
− | | __libc_start_main
| |
− | |
| |
− | |--9.51%-- dump_one_task
| |
− | | cr_dump_tasks
| |
− | | main
| |
− | | __libc_start_main
| |
− | |
| |
− | |--9.30%-- dump_one_reg_file
| |
− | | dump_task_exe_link
| |
− | | dump_task_mm
| |
− | | dump_one_task
| |
− | | cr_dump_tasks
| |
− | | main
| |
− | | __libc_start_main
| |
− | |
| |
− | |--9.20%-- dump_task_mm
| |
− | | dump_one_task
| |
− | | cr_dump_tasks
| |
− | | main
| |
− | | __libc_start_main
| |
− | |
| |
− | |--8.57%-- dump_task_fs
| |
− | | dump_one_task
| |
− | | cr_dump_tasks
| |
− | | main
| |
− | | __libc_start_main
| |
− | |
| |
− | --7.54%-- write_pagemap_loc
| |
− | page_xfer_dump_pages
| |
− | __parasite_dump_pages_seized
| |
− | parasite_dump_pages_seized
| |
− | dump_one_task
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
− |
| |
− | 8.86% criu [.] generate_iovs
| |
− | |
| |
− | --- generate_iovs
| |
− | __parasite_dump_pages_seized
| |
− | parasite_dump_pages_seized
| |
− | dump_one_task
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
− |
| |
− | 8.59% criu [.] should_dump_page
| |
− | |
| |
− | --- should_dump_page
| |
− | |
| |
− | |--81.22%-- generate_iovs
| |
− | | __parasite_dump_pages_seized
| |
− | | parasite_dump_pages_seized
| |
− | | dump_one_task
| |
− | | cr_dump_tasks
| |
− | | main
| |
− | | __libc_start_main
| |
− | |
| |
− | --18.78%-- __parasite_dump_pages_seized
| |
− | parasite_dump_pages_seized
| |
− | dump_one_task
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
− |
| |
− | 5.01% criu [.] is_vma_range_fmt
| |
− | |
| |
− | --- is_vma_range_fmt
| |
− | |
| |
− | |--80.15%-- parse_smaps
| |
− | | collect_mappings
| |
− | | dump_one_task
| |
− | | cr_dump_tasks
| |
− | | main
| |
− | | __libc_start_main
| |
− | |
| |
− | --19.85%-- collect_mappings
| |
− | dump_one_task
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
− |
| |
− | 3.04% criu [.] print_on_level
| |
− | |
| |
− | --- print_on_level
| |
− | |
| |
− | |--33.73%-- parasite_wait_ack
| |
− | | __parasite_wait_daemon_ack
| |
− | | parasite_execute_daemon
| |
− | | parasite_fixup_vdso
| |
− | | dump_one_task
| |
− | | cr_dump_tasks
| |
− | | main
| |
− | | __libc_start_main
| |
− | |
| |
− | |--33.28%-- parasite_start_daemon
| |
− | | parasite_infect_seized
| |
− | | dump_one_task
| |
− | | cr_dump_tasks
| |
− | | main
| |
− | | __libc_start_main
| |
− | |
| |
− | --32.99%-- __parasite_wait_daemon_ack
| |
− | parasite_drain_fds_seized
| |
− | dump_task_files_seized
| |
− | dump_one_task
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
− |
| |
− | 3.03% criu [.] open_image_at
| |
− | |
| |
− | --- open_image_at
| |
− | |
| |
− | |--66.87%-- cr_fdset_open_range
| |
− | | cr_task_fdset_open
| |
− | | dump_one_task
| |
− | | cr_dump_tasks
| |
− | | main
| |
− | | __libc_start_main
| |
− | |
| |
− | --33.13%-- dump_task_files_seized
| |
− | dump_one_task
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
− |
| |
− | 3.00% criu [.] parse_vmflags
| |
− | |
| |
− | --- parse_vmflags
| |
− | |
| |
− | |--66.59%-- parse_smaps
| |
− | | collect_mappings
| |
− | | dump_one_task
| |
− | | cr_dump_tasks
| |
− | | main
| |
− | | __libc_start_main
| |
− | |
| |
− | --33.41%-- collect_mappings
| |
− | dump_one_task
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
− |
| |
− | 3.00% criu [.] vma_opt_str
| |
− | |
| |
− | --- vma_opt_str
| |
− | |
| |
− | |--66.31%-- pr_vma
| |
− | | |
| |
− | | |--50.06%-- dump_task_mappings
| |
− | | | dump_one_task
| |
− | | | cr_dump_tasks
| |
− | | | main
| |
− | | | __libc_start_main
| |
− | | |
| |
− | | --49.94%-- collect_mappings
| |
− | | dump_one_task
| |
− | | cr_dump_tasks
| |
− | | main
| |
− | | __libc_start_main
| |
− | |
| |
− | --33.69%-- collect_mappings
| |
− | dump_one_task
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
− |
| |
− | 2.19% criu [.] do_open_proc
| |
− | |
| |
− | --- do_open_proc
| |
− | |
| |
− | |--36.18%-- parse_pid_stat_small
| |
− | | seize_task
| |
− | | get_children
| |
− | | collect_task
| |
− | | collect_pstree
| |
− | | cr_dump_tasks
| |
− | | main
| |
− | | __libc_start_main
| |
− | |
| |
− | |--31.99%-- parse_threads
| |
− | | collect_threads
| |
− | | collect_task
| |
− | | get_children
| |
− | | collect_task
| |
− | | collect_pstree
| |
− | | cr_dump_tasks
| |
− | | main
| |
− | | __libc_start_main
| |
− | |
| |
− | --31.84%-- seize_task
| |
− | get_children
| |
− | collect_task
| |
− | collect_pstree
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
− |
| |
− | 2.03% criu [.] parse_smaps
| |
− | |
| |
− | --- parse_smaps
| |
− | collect_mappings
| |
− | dump_one_task
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
− |
| |
− | 2.02% criu [.] try_add_page_to
| |
− | |
| |
− | --- try_add_page_to
| |
− | try_add_page
| |
− | page_pipe_add_page
| |
− | generate_iovs
| |
− | __parasite_dump_pages_seized
| |
− | parasite_dump_pages_seized
| |
− | dump_one_task
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
− |
| |
− | 2.02% criu [.] ptrace_peek_area
| |
− | |
| |
− | --- ptrace_peek_area
| |
− | ptrace_swap_area
| |
− | __parasite_execute_syscall
| |
− | syscall_seized
| |
− | mmap_seized
| |
− | parasite_map_exchange
| |
− | parasite_infect_seized
| |
− | dump_one_task
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
− |
| |
− | 2.01% criu [.] collect_mappings
| |
− | |
| |
− | --- collect_mappings
| |
− | dump_one_task
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
− |
| |
− | 2.01% criu [.] __parasite_dump_pages_seized
| |
− | |
| |
− | --- __parasite_dump_pages_seized
| |
− | parasite_dump_pages_seized
| |
− | dump_one_task
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
− |
| |
− | 2.00% criu [.] get_task_regs
| |
− | |
| |
− | --- get_task_regs
| |
− | parasite_start_daemon
| |
− | parasite_infect_seized
| |
− | dump_one_task
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
− |
| |
− | 2.00% criu [.] parasite_args_s
| |
− | |
| |
− | --- parasite_args_s
| |
− | |
| |
− | |--50.05%-- parasite_dump_pages_seized
| |
− | | dump_one_task
| |
− | | cr_dump_tasks
| |
− | | main
| |
− | | __libc_start_main
| |
− | |
| |
− | --49.95%-- dump_one_task
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
− |
| |
− | 2.00% criu [.] parasite_stop_on_syscall
| |
− | |
| |
− | --- parasite_stop_on_syscall
| |
− | parasite_unmap
| |
− | parasite_cure_remote
| |
− | parasite_cure_seized
| |
− | dump_one_task
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
− |
| |
− | 1.97% criu [.] open_pid_proc
| |
− | |
| |
− | --- open_pid_proc
| |
− | |
| |
− | |--51.54%-- parasite_fixup_vdso
| |
− | | dump_one_task
| |
− | | cr_dump_tasks
| |
− | | main
| |
− | | __libc_start_main
| |
− | |
| |
− | --48.46%-- do_open_proc
| |
− | dump_task_fs
| |
− | dump_one_task
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
− |
| |
− | 1.55% criu [.] dump_one_file
| |
− | |
| |
− | --- dump_one_file
| |
− | dump_task_files_seized
| |
− | dump_one_task
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
− |
| |
− | 1.13% criu [.] alloc_cr_fdset
| |
− | |
| |
− | --- alloc_cr_fdset
| |
− | cr_task_fdset_open
| |
− | dump_one_task
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
| | | |
− | 1.03% criu [.] unseize_task_and_threads
| + | === Opening files for mappings === |
− | |
| |
− | --- unseize_task_and_threads
| |
− | pstree_switch_state
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
| | | |
− | 1.02% criu [.] timeval_accumulate
| + | The <code>get_filemap_fd()</code> opens new fd every time. If a file is mapped several |
− | |
| + | times (e.g. -- a library) we can share one fd for that. |
− | --- timeval_accumulate
| |
− | timing_stop
| |
− | __parasite_dump_pages_seized
| |
− | parasite_dump_pages_seized
| |
− | dump_one_task
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
| | | |
− | 1.01% criu [.] parasite_dump_misc_seized
| + | === Staging === |
− | |
| |
− | --- parasite_dump_misc_seized
| |
− | dump_one_task
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
| | | |
− | 1.01% criu [.] parse_posix_timers
| + | When restoring a single task CRIU uses [[stages of restoring]] which slows things down. Need either special-care the single task restore, or introduce fine-grained locking for such things. |
− | |
| |
− | --- parse_posix_timers
| |
− | dump_one_task
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
| | | |
− | 1.01% criu [.] INIT_LIST_HEAD
| + | [[Category: Development]] |
− | |
| + | [[Category: Thinkers]] |
− | --- INIT_LIST_HEAD
| |
− | __parasite_dump_pages_seized
| |
− | parasite_dump_pages_seized
| |
− | dump_one_task
| |
− | cr_dump_tasks
| |
− | main
| |
− | __libc_start_main
| |
Written here are performance issues found.
Timing stats of live migration of a small container with 11 tasks is
- Total time ~3.5 seconds
- Frozen time ~3.0 seconds
- Pre-dump stages ~0.5 seconds each
- Restore time ~1.9 seconds
- Images transfer time ~0.3 seconds
Below is the list of issues found
Surprisingly, but the mem-drain time is not the biggest. It's "only" ~0.02 seconds. There are places in code that take longer.
parse_smaps
[edit]
Time spent in this routine is up to 0.2 seconds on dump. This one exploits /proc heavily. For a container with 11 tasks the syscall stats look like
834 read
1451 fstat
1462 close
1642 openat
while opens and stats happen on
193 openat(4, "map-symlink", O_RDONLY) = -1 ENOENT (No such file or directory)
1438 openat(4, "map-symlink", O_RDONLY) = 5
11 openat(AT_FDCWD, "/proc/$pid/map_files", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4
Restore[edit]
Fork vs VMA restore[edit]
We restore task's mappings before it goes forking to handle COW. This effectively serializes forking.
Restoring VMAs[edit]
There are 4 stages in VMA restore. Relative times of each are below
- Reading images 1%
- Mapping huge premap area << 1%
- (Re-)mapping sub-areas 73%
- Filling area with data 26%
The 3rd stage has two parts. With timings:
- Opening filemap fd 85%
- Maping vma 15%
Opening files for mappings[edit]
The get_filemap_fd()
opens new fd every time. If a file is mapped several
times (e.g. -- a library) we can share one fd for that.
Staging[edit]
When restoring a single task CRIU uses stages of restoring which slows things down. Need either special-care the single task restore, or introduce fine-grained locking for such things.