Changes

Jump to navigation Jump to search
1,329 bytes added ,  20:48, 10 December 2015
no edit summary
Line 1: Line 1: −
Written here are performance issues found
+
Written here are performance issues found.
 +
 
 +
Timing stats of live migration of a small container with 11 tasks is
 +
 
 +
* Total time ~3.5 seconds
 +
* Frozen time ~3.0 seconds
 +
** Pre-dump stages ~0.5 seconds each
 +
** Restore time ~1.9 seconds
 +
** Images transfer time ~0.3 seconds
 +
 
 +
Below is the list of issues found
    
== Dump ==
 
== Dump ==
   −
== <code>parse_smaps</code> ==
+
Surprisingly, but the mem-drain time is not the biggest. It's "only" ~0.02 seconds. There are places in code that take longer.
   −
This guy exploits /proc heavily. For a container with 11 tasks the syscall stats look like
+
=== <code>parse_smaps</code> ===
 +
 
 +
Time spent in this routine is up to 0.2 seconds on dump. This one exploits /proc heavily. For a container with 11 tasks the syscall stats look like
    
     834 read
 
     834 read
Line 18: Line 30:  
     11 openat(AT_FDCWD, "/proc/$pid/map_files", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4
 
     11 openat(AT_FDCWD, "/proc/$pid/map_files", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4
    +
== Restore ==
   −
Time spent in this routine is about 0.1 seconds on dump.
+
=== Fork vs VMA restore ===
   −
== Restore ==
+
We restore task's mappings before it goes forking to handle COW. This effectively serializes forking.
 +
 
 +
=== Restoring VMAs ===
 +
 
 +
There are 4 stages in VMA restore. Relative times of each are below
 +
 
 +
* Reading images                  1%
 +
* Mapping huuge premap area      << 1%
 +
* (Re-)mapping sub-areas          73%
 +
* Filling area with data          26%
 +
 
 +
The 3rd stage has two parts. With timings:
 +
 
 +
* Opening filemap fd              85%
 +
* Maping vma                      15%
 +
 
 +
 
 +
=== Opening files for mappings ===
 +
 
 +
The <code>get_filemap_fd()</code> opens new fd every time. If a file is mapped several
 +
times (e.g. -- a library) we can share one fd for that.
 +
 
 +
=== Staging ===
 +
 
 +
When restoring a single task CRIU uses [[stages of restoring]] which slows things down. Need either special-care the single task restore, or introduce fine-grained locking for such things.
 +
 
 +
[[Category: Development]]
 +
[[Category: Thinkers]]

Navigation menu