Performance research

From CRIU
Revision as of 10:16, 19 March 2015 by Xemul (talk | contribs) (→‎Restore)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Written here are performance issues found.

Timing stats of live migration of a small container with 11 tasks is

  • Total time ~3.5 seconds
  • Frozen time ~3.0 seconds
    • Pre-dump stages ~0.5 seconds each
    • Restore time ~1.9 seconds
    • Images transfer time ~0.3 seconds

Below is the list of issues found

Dump

Surprisingly, but the mem-drain time is not the biggest. It's "only" ~0.02 seconds. There are places in code that take longer.

parse_smaps

Time spent in this routine is up to 0.2 seconds on dump. This one exploits /proc heavily. For a container with 11 tasks the syscall stats look like

   834 read
  1451 fstat
  1462 close
  1642 openat

while opens and stats happen on

   193 openat(4, "map-symlink", O_RDONLY) = -1 ENOENT (No such file or directory)
  1438 openat(4, "map-symlink", O_RDONLY) = 5
    11 openat(AT_FDCWD, "/proc/$pid/map_files", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4

Restore

Fork vs VMA restore

We restore task's mappings before it goes forking to handle COW. This effectively serializes forking.

Restoring VMAs

There are 4 stages in VMA restore. Relative times of each are below

  • Reading images 1%
  • Mapping huuge premap area << 1%
  • (Re-)mapping sub-areas 73%
  • Filling area with data 26%

The 3rd stage has two parts. With timings:

  • Opening filemap fd 85%
  • Maping vma 15%

Staging

When restoring a single task CRIU uses stages of restoring which slows things down. Need either special-care the single task restore, or introduce fine-grained locking for such things.