Difference between revisions of "Page server"

From CRIU
Jump to navigation Jump to search
(started)
m (→‎Limitations: Support for DNS name resolution was added with commit a7c384f6eebcebc8b49655601956c18c51735672)
 
(18 intermediate revisions by 3 users not shown)
Line 3: Line 3:
 
== Rationale ==
 
== Rationale ==
  
For process tree migration, the biggest part of transfer data is the memory used by the processes. Therefore, optimizing this memory transfer would have the most benefit.
+
For process tree migration, the biggest part of transfer data is the memory used by the processes. Therefore, optimizing this memory transfer would be beneficial.
  
 +
[[Image:Memory_migration_without_page_server.png|thumb|500px|right|Memory migration without page server]]
 
Without the page server, migrating the user memory pages consists of:
 
Without the page server, migrating the user memory pages consists of:
* dumping the memory to files on disk;
+
* dumping (writing) the memory to files on disk;
* copying the files over network to the destination system;
+
* reading the files and sending the data over the network to the destination host;
 +
* receiving files on the destination host, writing to files on disk;
 
* restoring by reading the files into memory.
 
* restoring by reading the files into memory.
  
Such a process incurs significant I/O overhead and slows down the migration. This overhead can be avoided by doing a direct memory to memory copy, avoiding any disk I/O.
+
In other words, all the memory is written to the disk twice, and read from the disk also twice. It incurs significant I/O overhead and slows down the migration. The overhead is further multiplied when using <code>criu pre-dump</code> (such as for [[iterative migration]]), as in this case memory is dumped not once but a few times.
 +
 
 +
A way to mitigate the disk I/O overhead is to use [[wikipedia:tmpfs|tmpfs]], a filesystem that use RAM as a storage (see [[Disk-less migration]] for details). This eliminates the disk I/O, but not the double read/write. To eliminate it, a page server mechanism was implemented.
 +
<br clear="both"/>
  
 
== Operation ==
 
== Operation ==
  
FIXME
+
[[Image:Memory_migration_with_page_server.png|thumb|500px|right|Memory migration with page server]]
 +
When using page server, the memory is dumped not to disk, but directly to network, thus eliminating any disk reads/writes on the source. On the destination system, a page server runs, receiving the data from network and writing it to files on disk or tmpfs.
 +
 
 +
Note that page server is only used to migrate user memory, i.e. ''pagemap.img'' and ''pages.img'' files (see [[memory dumps]] if you don't know what that means). Everything else, including, say, process mappings (''mm.img'') is dumped in a traditional manner and should be moved over when doing migration.
 +
 
 +
=== Pages deduplication ===
 +
 
 +
''Main article: [[Memory images deduplication]].''
 +
 
 +
When iterative memory dumping feature (<code>criu pre-dump</code>) is used, memory is sent to page server a few times. The page server can automatically deduplicate pages, by punching holes in parent images where the child image is replacing an existing page. This functionality is turned on by <code>--auto-dedup</code> option of <code>criu page-server</code> command.
 +
<br clear="both"/>
 +
 
 +
== Usage ==
 +
 
 +
The obvious use case for page server is [[live migration]]. In most cases, use [[P.Haul]] to hide the details from you. Otherwise, read on.
 +
 
 +
=== Running page server ===
 +
 
 +
First, run a page server on a destination node:
 +
 
 +
[dst]# criu page-server --images-dir $DIR --port $PORT [--auto-dedup] [...]
 +
 
 +
Note that <code>criu page-server</code> is "one shot" service, meaning you need to run it for every migration, and it exits as soon as all pages are transferred (or on an error). It also makes sense to check its exit code, to make sure there was no error.
 +
 
 +
The options are:
 +
 
 +
; --images-dir $DIR
 +
: A directory to write memory images to. To speed things up, you might want to use tmpfs (see [[disk-less migration]]).
 +
 
 +
; --port $PORT
 +
: A port number to listen at.
 +
 
 +
; --auto-dedup
 +
: Perform auto deduplication of images. Useful with iterative memory dumps.
 +
 
 +
Some other options might also be useful, such as:
 +
 
 +
; --ps-socket $FD
 +
: Use provided file descriptor as socket for incoming connection. In this case <code>--address</code> and <code>--port</code> are ignored. Useful for intercepting page-server traffic e.g. to add encryption or authentication.
 +
 
 +
; -o|--logfile $FILE
 +
: A file to write log to.
 +
 
 +
; -v$N|-vvv[..]
 +
: Set logging level (verbosity). For example, <code>-v4</code> (or <code>-vvvv</code>, which means the same) enables tons of debug info.
 +
 
 +
=== criu pre-dump/dump ===
 +
 
 +
On the source system, run <code>criu pre-dump</code> or <code>criu dump</code> as usual (see [[live migration]], [[iterative migration]] etc.).
 +
To use page server, one need to specify a few additional arguments:
 +
 
 +
; --page-server
 +
: Send pages to a page server (rather than writing to disk files)
 +
 
 +
; --address $IPADDR
 +
: IP address of the page server (destination host).
 +
 
 +
; --port $PORT
 +
: Port number the page servers is listening at.
 +
 
 +
=== Video ===
 +
 
 +
To view an example video of using a page server, please go to https://asciinema.org/a/15847.
 +
 
 +
== Limitations ==
 +
 
 +
* Currently it only works via TCP
 +
* No encryption, no compression, data is passed over network as is
 +
 
 +
== See also ==
 +
* [[Memory dumping and restoring]]
 +
* [[P.Haul]]
 +
* [[Live migration]]
 +
* [[Disk-less migration]]
 +
* [[Iterative migration]]
 +
* [[Incremental dumps]]
 +
* [[Memory images deduplication]]
 +
* [[Lazy migration]]
 +
 
 +
== External links ==
 +
 
 +
* Using page server: https://asciinema.org/a/15847
  
 
[[Category: HOWTO]]
 
[[Category: HOWTO]]
[[Category: Development]]
+
[[Category: Live migration]]
 +
[[Category: API]]

Latest revision as of 12:54, 13 January 2019

Page server is a component of CRIU that allows to copy (rather than dump) user memory to a destination system during the course of live migration. It is also used for lazy migration.

Rationale[edit]

For process tree migration, the biggest part of transfer data is the memory used by the processes. Therefore, optimizing this memory transfer would be beneficial.

Memory migration without page server

Without the page server, migrating the user memory pages consists of:

  • dumping (writing) the memory to files on disk;
  • reading the files and sending the data over the network to the destination host;
  • receiving files on the destination host, writing to files on disk;
  • restoring by reading the files into memory.

In other words, all the memory is written to the disk twice, and read from the disk also twice. It incurs significant I/O overhead and slows down the migration. The overhead is further multiplied when using criu pre-dump (such as for iterative migration), as in this case memory is dumped not once but a few times.

A way to mitigate the disk I/O overhead is to use tmpfs, a filesystem that use RAM as a storage (see Disk-less migration for details). This eliminates the disk I/O, but not the double read/write. To eliminate it, a page server mechanism was implemented.

Operation[edit]

Memory migration with page server

When using page server, the memory is dumped not to disk, but directly to network, thus eliminating any disk reads/writes on the source. On the destination system, a page server runs, receiving the data from network and writing it to files on disk or tmpfs.

Note that page server is only used to migrate user memory, i.e. pagemap.img and pages.img files (see memory dumps if you don't know what that means). Everything else, including, say, process mappings (mm.img) is dumped in a traditional manner and should be moved over when doing migration.

Pages deduplication[edit]

Main article: Memory images deduplication.

When iterative memory dumping feature (criu pre-dump) is used, memory is sent to page server a few times. The page server can automatically deduplicate pages, by punching holes in parent images where the child image is replacing an existing page. This functionality is turned on by --auto-dedup option of criu page-server command.

Usage[edit]

The obvious use case for page server is live migration. In most cases, use P.Haul to hide the details from you. Otherwise, read on.

Running page server[edit]

First, run a page server on a destination node:

[dst]# criu page-server --images-dir $DIR --port $PORT [--auto-dedup] [...]

Note that criu page-server is "one shot" service, meaning you need to run it for every migration, and it exits as soon as all pages are transferred (or on an error). It also makes sense to check its exit code, to make sure there was no error.

The options are:

--images-dir $DIR
A directory to write memory images to. To speed things up, you might want to use tmpfs (see disk-less migration).
--port $PORT
A port number to listen at.
--auto-dedup
Perform auto deduplication of images. Useful with iterative memory dumps.

Some other options might also be useful, such as:

--ps-socket $FD
Use provided file descriptor as socket for incoming connection. In this case --address and --port are ignored. Useful for intercepting page-server traffic e.g. to add encryption or authentication.
-o|--logfile $FILE
A file to write log to.
-v$N|-vvv[..]
Set logging level (verbosity). For example, -v4 (or -vvvv, which means the same) enables tons of debug info.

criu pre-dump/dump[edit]

On the source system, run criu pre-dump or criu dump as usual (see live migration, iterative migration etc.). To use page server, one need to specify a few additional arguments:

--page-server
Send pages to a page server (rather than writing to disk files)
--address $IPADDR
IP address of the page server (destination host).
--port $PORT
Port number the page servers is listening at.

Video[edit]

To view an example video of using a page server, please go to https://asciinema.org/a/15847.

Limitations[edit]

  • Currently it only works via TCP
  • No encryption, no compression, data is passed over network as is

See also[edit]

External links[edit]