Self dump

Revision as of 19:51, 10 March 2015 by Xemul (talk | contribs)

NOTE: Information on this page is outdated. Features described here are implemented in RPC.

Applications want to dump themselves using CRIU software. This page describes how this should be implemented in CRIU.

Difficulties

First of all, CRIU still requires root privileges to run. This is because the kernel APIs it uses are partly restricted to this. The situation will probably change in the future, but it's not one-day issue.

Second thing, is that if CRIU is spawned from or somehow linked into a program, that would like to do self-checkpoint, CRIU will have to tell resources of this application from those owned by CRIU itself. This task is complex and probably this is how CRIU will do stuff in the future.

Solution

Thus, the solution is -- CRIU service, that is requested to dump a process(es). The self-dump request should look like this (it's not yet implemented):

  • a CRIU daemon is launched in the background listening to connections to unix socket
  • an application, that wants to checkpoint itself opens the connection and asks for it
    • the service gets pid of the process to dump using SO_PEERCREDS socket option. This is improtaint not to ask for pid from the requestor, as malicious user may ask to dump someone else and then study its dumps.
  • CRIU service spawns a process, that executes regular dump
  • while dumping the connection is closed by CRIU, so application may read() the socket to wait for dump end

Details

Some things to keep in mind

  1. Call to connect() should be wrapped into a library function. It's not nice to give raw access to this socket, as the communication protocol is prone to change and it's not nice to keep it backward-compatible.
  2. Application should somehow pass arguments to the CRIU service. At least -- the directory where to put images to. If wrapped into a library call, these args should be declared as a structure in memory, whose contents is then encoded into a stream. Requirement "service version == library version" will help not to keep this protocol compatible.
  3. Paths are better to be transferred as opened files (SCM_CREDS)
  4. Waiting for socket to get closed by CRIU is good way to wait for checkpoint to complete. In this case CRIU should close the socket after freeze, but before dump to properly dump the half-closed connection in the program.
  5. What if applications runs another one and asks to dump itself with kids? In this case the app in question would be able to investigate its sub-task's guts, which would otherwise be impossible.