This article describes usage of userfaultfd for lazy restore and lazy migration in CRIU.
The userfaultfd mechanism is designed to allow user-space paging. Its initial implementation merged in Linux 4.3 was designed for KVM/QEMU use-case and lacked some functionality necessary for CRIU. In Linux 4.11 the userfaultfd was extended with so-called "non-cooperative" mode, that allows, at least in theory, lazy (or post-copy) restore in CRIU.
restoreaction accepts yet another API switch: option
--lazy-pages. In this mode,
restoreskips injection of lazy pages into the processes address space, but rather registers lazy memory areas with userfaultfd.
- The lazy pages are completely handled by dedicated
lazy-pagesdaemon. The daemon recieves userfault file descriptors from
restorevia UNIX socket. The userfault file descriptors allow reception of page-fault and other events and resolution of these events by the daemon.
- For the migration case, the
dumpaction also accepts API switch: option
--lazy-pages. When this option is used, the
dumpkeeps the memory pages and allows the
lazy-pagesdaemon to request these pages via TCP connection.
Tasks after restore have lazy VMAs registered with userfaultfd, the fd itself is sent before resume to
lazy-pages daemon and closed. The daemon monitors the UFFD events and repopulates the tasks address space. The
lazy-pages daemon can get pages either from images (both local and remote) or directly from the remote side
When the restored task accesses a missing memory page, it causes a page fault. The <copde>lazy-pages daemon receives the page fault notification and resolves it by populating the faulting task memory. If there were no page faults for some time, the daemon copies the task's remaining memory pages in the background.
The daemon uses local page-read engine to read pages from images.
- The page server is run on the remote side with
- The lazy-pages daemon connects to the remote page server with
--portoptions allow setting of IP addrees and port of the listening page server.
- Current protocol allows the lazy-pages daemon to request several continous pages.
dumpcollects the pages into pipes and starts the page server in a mode that allows
lazy-pagesdaemon to connect to it and request the memory pages
- When the restored task accesses a missing memory page, the
lazy-pagesdaemon request the page from the page server running on the dump side
- After the page is received, the
lazy-pagesdaemon injects it into the task's address space using userfautlfd
- Currently only MAP_PRIVATE | MAP_ANONYMOUS is supported. Newer kernels (4.11+) allow userfaultfd for hugetlbfs and shared memory, yet to be implemented in CRIU.
- Userfault is known not to map one page into two places. Thus -- COW-ed pages will get COW-ed.
- The Lazy migration use-case might be racy because there is no means to synchronize between pending forks, remote pages transfers and page faults.