Line 1: |
Line 1: |
− | Written here are performance issues found | + | Written here are performance issues found. |
| | | |
− | == <code>parse_smaps</code> ==
| + | Timing stats of live migration of a small container with 11 tasks is |
| | | |
− | This guy exploits /proc heavily. For a container with 11 tasks the syscall stats look like | + | * Total time ~3.5 seconds |
| + | * Frozen time ~3.0 seconds |
| + | ** Pre-dump stages ~0.5 seconds each |
| + | ** Restore time ~1.9 seconds |
| + | ** Images transfer time ~0.3 seconds |
| + | |
| + | Below is the list of issues found |
| + | |
| + | == Dump == |
| + | |
| + | Surprisingly, but the mem-drain time is not the biggest. It's "only" ~0.02 seconds. There are places in code that take longer. |
| + | |
| + | === <code>parse_smaps</code> === |
| + | |
| + | Time spent in this routine is up to 0.2 seconds on dump. This one exploits /proc heavily. For a container with 11 tasks the syscall stats look like |
| | | |
| 834 read | | 834 read |
Line 16: |
Line 30: |
| 11 openat(AT_FDCWD, "/proc/$pid/map_files", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4 | | 11 openat(AT_FDCWD, "/proc/$pid/map_files", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4 |
| | | |
| + | == Restore == |
| + | |
| + | === Fork vs VMA restore === |
| + | |
| + | We restore task's mappings before it goes forking to handle COW. This effectively serializes forking. |
| + | |
| + | === Restoring VMAs === |
| + | |
| + | There are 4 stages in VMA restore. Relative times of each are below |
| + | |
| + | * Reading images 1% |
| + | * Mapping huge premap area << 1% |
| + | * (Re-)mapping sub-areas 73% |
| + | * Filling area with data 26% |
| + | |
| + | The 3rd stage has two parts. With timings: |
| + | |
| + | * Opening filemap fd 85% |
| + | * Maping vma 15% |
| + | |
| + | |
| + | === Opening files for mappings === |
| + | |
| + | The <code>get_filemap_fd()</code> opens new fd every time. If a file is mapped several |
| + | times (e.g. -- a library) we can share one fd for that. |
| + | |
| + | === Staging === |
| + | |
| + | When restoring a single task CRIU uses [[stages of restoring]] which slows things down. Need either special-care the single task restore, or introduce fine-grained locking for such things. |
| | | |
− | Time spent in this routine is about 0.1 seconds on dump.
| + | [[Category: Development]] |
| + | [[Category: Thinkers]] |