Difference between revisions of "Userfaultfd"

m (Add link to userfaultfd man2 page)
 
(4 intermediate revisions by 4 users not shown)
Line 7: Line 7:
  
 
* The <code>restore</code> action accepts yet another API switch: option <code>--lazy-pages</code>. In this mode, <code>restore</code> skips injection of lazy pages into the processes address space, but rather registers lazy memory areas with userfaultfd.
 
* The <code>restore</code> action accepts yet another API switch: option <code>--lazy-pages</code>. In this mode, <code>restore</code> skips injection of lazy pages into the processes address space, but rather registers lazy memory areas with userfaultfd.
* The lazy pages are completely handled by dedicated <code>lazy-pages</code> daemon. The daemon recieves userfault file descriptors from <code>restore</code> via UNIX socket. The userfault file descriptors allow reception of page-fault and other events and resolution of these events by the daemon.
+
* The lazy pages are completely handled by dedicated <code>lazy-pages</code> daemon. The daemon receives userfault file descriptors from <code>restore</code> via UNIX socket. The userfault file descriptors allow reception of page-fault and other events and resolution of these events by the daemon.
 
* For the migration case, the <code>dump</code> action also accepts API switch: option <code>--lazy-pages</code>. When this option is used, the <code>dump</code> keeps the memory pages and allows the <code>lazy-pages</code> daemon to request these pages via TCP connection.
 
* For the migration case, the <code>dump</code> action also accepts API switch: option <code>--lazy-pages</code>. When this option is used, the <code>dump</code> keeps the memory pages and allows the <code>lazy-pages</code> daemon to request these pages via TCP connection.
  
Line 16: Line 16:
 
Tasks after restore have lazy VMAs registered with userfaultfd, the fd itself is sent before resume to <code>lazy-pages</code> daemon and closed. The daemon monitors the UFFD events and repopulates the tasks address space. The <code>lazy-pages</code> daemon can get pages either from images (both local and remote) or directly from the remote side <code>dump</code>.
 
Tasks after restore have lazy VMAs registered with userfaultfd, the fd itself is sent before resume to <code>lazy-pages</code> daemon and closed. The daemon monitors the UFFD events and repopulates the tasks address space. The <code>lazy-pages</code> daemon can get pages either from images (both local and remote) or directly from the remote side <code>dump</code>.
  
When the restored task accesses a missing memory page, it causes a page fault. The <copde>lazy-pages</code> daemon receives the page fault notification and resolves it by populating the faulting task memory. If there were no page faults for some time, the daemon copies the task's remaining memory pages in the background.
+
When the restored task accesses a missing memory page, it causes a page fault. The <code>lazy-pages</code> daemon receives the page fault notification and resolves it by populating the faulting task memory. If there were no page faults for some time, the daemon copies the task's remaining memory pages in the background.
  
 
==== Local images ====
 
==== Local images ====
Line 24: Line 24:
 
==== Remote images ====
 
==== Remote images ====
  
* The [[page-server]] is run on the remote side with <code>--lazy-pages</code> option.
+
* The [[page server]] is run on the remote side with <code>--lazy-pages</code> option.
* The lazy-pages daemon connects to the remote [[page server]] with <code>--page-server</code> option. The <code>--address</code> and <code>--port</code> options allow setting of IP addrees and port of the listening [[page server]].
+
* The lazy-pages daemon connects to the remote [[page server]] with <code>--page-server</code> option. The <code>--address</code> and <code>--port</code> options allow setting of IP address and port of the listening [[page server]].
* Current protocol allows the lazy-pages daemon to request several continous pages.
+
* Current protocol allows the lazy-pages daemon to request several continuous pages.
  
 
==== Migration ====
 
==== Migration ====
* The <code>dump</code> collects the pages into pipes and starts the [[page-server]] in a mode that allows <code>lazy-pages</code> daemon to connect to it and request the memory pages
+
* The <code>dump</code> collects the pages into pipes and starts the [[page server]] in a mode that allows <code>lazy-pages</code> daemon to connect to it and request the memory pages
* When the restored task accesses a missing memory page, the <code>lazy-pages</code> daemon request the page from the [[page-server]] running on the dump side
+
* When the restored task accesses a missing memory page, the <code>lazy-pages</code> daemon request the page from the [[page server]] running on the dump side
 
* After the page is received, the <code>lazy-pages</code> daemon injects it into the task's address space using userfautlfd
 
* After the page is received, the <code>lazy-pages</code> daemon injects it into the task's address space using userfautlfd
  
Line 37: Line 37:
 
* Currently only MAP_PRIVATE | MAP_ANONYMOUS is supported. Newer kernels (4.11+) allow userfaultfd for hugetlbfs and shared memory, yet to be implemented in CRIU.
 
* Currently only MAP_PRIVATE | MAP_ANONYMOUS is supported. Newer kernels (4.11+) allow userfaultfd for hugetlbfs and shared memory, yet to be implemented in CRIU.
 
* Userfault is known not to map one page into two places. Thus -- COW-ed pages will get COW-ed.
 
* Userfault is known not to map one page into two places. Thus -- COW-ed pages will get COW-ed.
* The [[Lazy migration]] use-case might be racy because there is no means to synchronize between pending forks, remote pages transfers and page faults.
 
  
 
== See also ==
 
== See also ==
Line 45: Line 44:
  
 
[[Category:Memory]]
 
[[Category:Memory]]
[[Category:Plans]]
+
[[Category:New features]]
[[Category:Development]]
 
 
[[Category:Under the hood]]
 
[[Category:Under the hood]]

Latest revision as of 04:39, 24 October 2019

This article describes usage of userfaultfd for lazy restore and lazy migration in CRIU.

BackgroundEdit

The userfaultfd mechanism is designed to allow user-space paging. Its initial implementation merged in Linux 4.3 was designed for KVM/QEMU use-case and lacked some functionality necessary for CRIU. In Linux 4.11 the userfaultfd was extended with so-called "non-cooperative" mode, that allows, at least in theory, lazy (or post-copy) restore in CRIU.

ConceptsEdit

  • The restore action accepts yet another API switch: option --lazy-pages. In this mode, restore skips injection of lazy pages into the processes address space, but rather registers lazy memory areas with userfaultfd.
  • The lazy pages are completely handled by dedicated lazy-pages daemon. The daemon receives userfault file descriptors from restore via UNIX socket. The userfault file descriptors allow reception of page-fault and other events and resolution of these events by the daemon.
  • For the migration case, the dump action also accepts API switch: option --lazy-pages. When this option is used, the dump keeps the memory pages and allows the lazy-pages daemon to request these pages via TCP connection.

 

DaemonEdit

Tasks after restore have lazy VMAs registered with userfaultfd, the fd itself is sent before resume to lazy-pages daemon and closed. The daemon monitors the UFFD events and repopulates the tasks address space. The lazy-pages daemon can get pages either from images (both local and remote) or directly from the remote side dump.

When the restored task accesses a missing memory page, it causes a page fault. The lazy-pages daemon receives the page fault notification and resolves it by populating the faulting task memory. If there were no page faults for some time, the daemon copies the task's remaining memory pages in the background.

Local imagesEdit

The daemon uses local page-read engine to read pages from images.

Remote imagesEdit

  • The page server is run on the remote side with --lazy-pages option.
  • The lazy-pages daemon connects to the remote page server with --page-server option. The --address and --port options allow setting of IP address and port of the listening page server.
  • Current protocol allows the lazy-pages daemon to request several continuous pages.

MigrationEdit

  • The dump collects the pages into pipes and starts the page server in a mode that allows lazy-pages daemon to connect to it and request the memory pages
  • When the restored task accesses a missing memory page, the lazy-pages daemon request the page from the page server running on the dump side
  • After the page is received, the lazy-pages daemon injects it into the task's address space using userfautlfd

LimitationsEdit

  • Currently only MAP_PRIVATE | MAP_ANONYMOUS is supported. Newer kernels (4.11+) allow userfaultfd for hugetlbfs and shared memory, yet to be implemented in CRIU.
  • Userfault is known not to map one page into two places. Thus -- COW-ed pages will get COW-ed.

See alsoEdit