Difference between revisions of "CGroups"

From CRIU
Jump to navigation Jump to search
(Initial)
 
m (formatting/wording nits)
 
(12 intermediate revisions by 6 users not shown)
Line 9: Line 9:
 
# Mountpoints of "cgroup" file system
 
# Mountpoints of "cgroup" file system
  
Here's how CRIU manages them.
+
CRIU started supporting this info since version 1.3-rc1. Here's how it works.
  
 
== CGroups tasks live in ==
 
== CGroups tasks live in ==
  
Starting from version 1.3-rc1 CRIU saves the information about CGroups each task lives in in the <code>core</code> image's <code>tc.cg_set</code> field. This is an ID of a "set" found in the <code>cgroup.img</code>'s <code>sets</code> section.
+
CRIU defines a "set" of cgroups. A set is a per-controller list of paths where a task lives. If paths to groups for two tasks differ at least for one controller, they are considered to live in different sets.
  
Set contains per-controller paths to groups. If a task changes at least one controller's path it is considered to live in different set.
+
For every set CRIU generates an ID, which is then stored in the task's <code>core.tc.cg_set</code> image. The set in which CRIU lives during dump is also generated and is saved in the inventory image. The set in which the root task lives in is also special -- every other set (except CRIU's one) is checked to contain only sub-dirs of the respective root task's set. Otherwise dump fails.
  
While dumping CRIU gets the set in which the root task lives. Any new set found later should contain only subdirs of the root's set.
+
On restore each task is moved into the respective set. If task's set coincide with CRIU's one task isn't moved anywhere and remains in whatever cgroups CRIU restore was started.
 +
 
 +
== CGroups that are visible by tasks ==
 +
 
 +
Other than CGroups collected with tasks there can be other groups in which no tasks live. To pick up those CRIU gets the root set and saves all the CGroups tree starting from it. This information is stored in the <code>cgroup.controllers</code> image. In the same place CRIU saves the properties of CGroups (i.e. values read from CGroup configuration files). Note that since CRIU starts from root set and scans the directories tree, all the paths in this section are also subdirs of the root set's.
 +
 
 +
In order to make CRIU handle this information on dump and restore one should specify the <code>--manage-cgroups</code> option.
 +
 
 +
== Dumping more cgroups than are visible ==
 +
 
 +
In some cases, it can be useful to dump a specific cgroup subtree, regardless of what cgroups the container's tasks are in. For example, systemd-based containers like Ubuntu 16.04 will put all of their tasks in one of <code>/init.scope</code>, <code>/system.slice/...</code>, or <code>/user.slice/...</code>. By default, then, CRIU's cgroup engine will not dump the root of the cgroup tree <code>/</code>. The problem is that systemd opens <code>/</code> as a directory FD and changes the permissions on it, resulting in errors like
 +
 
 +
<code>(00.361723)      1: Error (criu/files-reg.c:1487): File sys/fs/cgroup/systemd has bad mode 040755 (expect 040775)</code>
 +
 
 +
The solution is for the container engine to tell CRIU the root of the tree to start dumping at, via <code>--cgroup-root</code> on dump, so that these permissions are preserved when checkpointing the cgroup tree.
 +
 
 +
== Mountpoints of "cgroup" file system ==
 +
 
 +
If found in the list of mounts, CRIU would dump one, but only the "root" mount will work. If you bind-mounted some subgroups into container, CRIU dump would fail.
 +
 
 +
== Restoring into different CGroups ==
 +
 
 +
The option syntax is <code>--cgroup-root [''controller'':]/''path''</code>. Without this option, CRIU restores tasks and groups that live in the subtrees starting from the root task's dirs. When this option is given, the respective <code>''controller''</code>s are restored under the given <code>''path''</code>s instead.
 +
 
 +
== CGroups restoring strategy ==
 +
 
 +
When restoring cgroups CRIU may meet already existing cgroup controllers and as result it relies on user choice how to behave in such case: should it overwrite existing properties with values from the image or should ignore them? Or maybe it is unacceptable to modify any existing cgroup?
 +
 
 +
To break a tie CRIU supports that named restore modes, which should be specified as an addition to <code>--manage-cgroups=''mode''</code> option. The <code>''mode''</code> argument may be one of the following:
 +
 
 +
* <code>none</code>. Do not restore cgroup properties but require cgroup to pre-exist at the moment of restore procedure.
 +
* <code>props</code>. Restore cgroup properties and require cgroup to pre-exist.
 +
* <code>soft</code>. Restore cgroup properties if only cgroup has been created by *criu*, otherwise do not restore properies.
 +
* <code>full</code>. Always restore all cgroups and their properties.
 +
* <code>strict</code>. Restore all cgroups and their properties from the scratch, requiring them to not present in the system.
 +
* <code>ignore</code>. Don't deal with cgroups and pretend that they don't exist.
 +
 
 +
By default, <code>soft</code> is assigned if <code>--manage-cgroups</code> option passed without an argument (i.e. the same as <code>--manage-cgroups=soft</code>).
 +
 
 +
== External CGroup yard ==
 +
The option syntax is <code>--cgroup-yard path</code>.
 +
 
 +
Instead of trying to mount cgroups in CRIU, provide a path to a directory with already created cgroup yard. Useful if you don't want to grant <code>CAP_SYS_ADMIN</code> to CRIU. For every cgroup mount there should be exactly one directory. If there is only one controller in this mount, the dir's name should be just the name of the controller. If there are multiple controllers co-mounted, the directory name should be a comma-separated list of controllers.
 +
 
 +
[[Category:Under the hood]]

Latest revision as of 22:48, 5 February 2020

This page describes how CRIU manages CGroups.

Overview[edit]

When talking about C/R of CGroups info, we mean three things:

  1. The groups tasks live in
  2. The groups that exist and are visible by tasks
  3. Mountpoints of "cgroup" file system

CRIU started supporting this info since version 1.3-rc1. Here's how it works.

CGroups tasks live in[edit]

CRIU defines a "set" of cgroups. A set is a per-controller list of paths where a task lives. If paths to groups for two tasks differ at least for one controller, they are considered to live in different sets.

For every set CRIU generates an ID, which is then stored in the task's core.tc.cg_set image. The set in which CRIU lives during dump is also generated and is saved in the inventory image. The set in which the root task lives in is also special -- every other set (except CRIU's one) is checked to contain only sub-dirs of the respective root task's set. Otherwise dump fails.

On restore each task is moved into the respective set. If task's set coincide with CRIU's one task isn't moved anywhere and remains in whatever cgroups CRIU restore was started.

CGroups that are visible by tasks[edit]

Other than CGroups collected with tasks there can be other groups in which no tasks live. To pick up those CRIU gets the root set and saves all the CGroups tree starting from it. This information is stored in the cgroup.controllers image. In the same place CRIU saves the properties of CGroups (i.e. values read from CGroup configuration files). Note that since CRIU starts from root set and scans the directories tree, all the paths in this section are also subdirs of the root set's.

In order to make CRIU handle this information on dump and restore one should specify the --manage-cgroups option.

Dumping more cgroups than are visible[edit]

In some cases, it can be useful to dump a specific cgroup subtree, regardless of what cgroups the container's tasks are in. For example, systemd-based containers like Ubuntu 16.04 will put all of their tasks in one of /init.scope, /system.slice/..., or /user.slice/.... By default, then, CRIU's cgroup engine will not dump the root of the cgroup tree /. The problem is that systemd opens / as a directory FD and changes the permissions on it, resulting in errors like

(00.361723) 1: Error (criu/files-reg.c:1487): File sys/fs/cgroup/systemd has bad mode 040755 (expect 040775)

The solution is for the container engine to tell CRIU the root of the tree to start dumping at, via --cgroup-root on dump, so that these permissions are preserved when checkpointing the cgroup tree.

Mountpoints of "cgroup" file system[edit]

If found in the list of mounts, CRIU would dump one, but only the "root" mount will work. If you bind-mounted some subgroups into container, CRIU dump would fail.

Restoring into different CGroups[edit]

The option syntax is --cgroup-root [controller:]/path. Without this option, CRIU restores tasks and groups that live in the subtrees starting from the root task's dirs. When this option is given, the respective controllers are restored under the given paths instead.

CGroups restoring strategy[edit]

When restoring cgroups CRIU may meet already existing cgroup controllers and as result it relies on user choice how to behave in such case: should it overwrite existing properties with values from the image or should ignore them? Or maybe it is unacceptable to modify any existing cgroup?

To break a tie CRIU supports that named restore modes, which should be specified as an addition to --manage-cgroups=mode option. The mode argument may be one of the following:

  • none. Do not restore cgroup properties but require cgroup to pre-exist at the moment of restore procedure.
  • props. Restore cgroup properties and require cgroup to pre-exist.
  • soft. Restore cgroup properties if only cgroup has been created by *criu*, otherwise do not restore properies.
  • full. Always restore all cgroups and their properties.
  • strict. Restore all cgroups and their properties from the scratch, requiring them to not present in the system.
  • ignore. Don't deal with cgroups and pretend that they don't exist.

By default, soft is assigned if --manage-cgroups option passed without an argument (i.e. the same as --manage-cgroups=soft).

External CGroup yard[edit]

The option syntax is --cgroup-yard path.

Instead of trying to mount cgroups in CRIU, provide a path to a directory with already created cgroup yard. Useful if you don't want to grant CAP_SYS_ADMIN to CRIU. For every cgroup mount there should be exactly one directory. If there is only one controller in this mount, the dir's name should be just the name of the controller. If there are multiple controllers co-mounted, the directory name should be a comma-separated list of controllers.