Nested namespaces

Revision as of 13:54, 20 April 2017 by Avagin (talk | contribs) (→‎Implementation)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Currently supportedEdit

We have experimental support of nesting of user and net namespaces (in criu-dev branch), and support of pid namespaces currently is under development.

Pid namespaceEdit

StatusEdit

The latest version of the patchset is "[PATCH v3 00/55] Nested pid namespaces support": https://lists.openvz.org/pipermail/criu/2017-April/036952.html It makes criu checkpoint/restore code working with multi-level pids and introduces new processes-helpers. They are used to restore processes with right pids in the whole pid namespace hierarchy, because a process may set ns_last_pid only in its current active pid namespace, and can't write to parent's (see kernel's pid_ns_ctl_handler() for the details).

The patchset has limitations. It does not support restore of sid and pgid if there is a nested pid namespace, because in common case unmodified process tree is not restorable. It's need to introduce new helpers like we do for one-level pid namespace now, and this functionality will be implemented in separate patchset later. Currently, Pavel Tikhomirov is working on this problem. Also, there is no a children sorting, so it may happen a situation, when pid ns child reaper goes after a process of this pid ns in the parent's pstree_item::children list. This also will be accounted on Pavel's patchset.

Also, there is no support for zombies in child pid namespace, and this functionality will be implemented in v4 of the series.

ImplementationEdit

Patchset teaches criu primitives to work with dynamically allocated pids, sids, pgids and threads ids, and allows them to have up to 32 nesting level (current maximal level from Linux kernel). It makes pids (etc) be linked on every ns_id::pid::rb_root instead of global one-level pid_root_rb's rb_node, and this allows to find a free pid on the whole pid ns hierarchy. Before the patchset we used pids (etc) given by parasite, now we get them from NSpid (etc) strings in /proc/[pid]/status, if there are several pid namespaces.

One more thing, the patchset does, is unification of pstree_item::ids. We have many always checks like "if (item->ids->has_xxx_ns_id)", and they are noisy and over the whole code. The patchset determines and populates not existing ids for the tasks in single place on the stage of image reading. So, after the patchset we'll be able to delete all of these noisy checks and make the code simply and more beautiful. The patchset itself needs this to know, which pid_ns a task belongs to.

Known issuesEdit

  • Need to restore pid namespaces in specified user namespaces
  • Sessions for processes in sub-namespaces are restored incorrectly
  • Unable to handle a case when two processes from one pid namespaces have a parent from another pid namespace.