Line 1: |
Line 1: |
− | = The problem = | + | = Problem = |
− | A mount namespace is a tree of mount points. In addition, mounts have another type of dependencies which is called groups. Each mount can be a member of two groups, it can be a slave in one group and a member of another group. Currently groups can’t be set, it can be only inherited from a source mount. It is always a problem when you need to restore this sort of properties. This means that we have to find a sequence of steps to get a required step. | + | A mount namespace is a tree of mount points. In addition, mounts have another type of dependencies which is called groups. Each mount can be a member of two groups, it can be a slave in one group and a member of another group. Currently groups can’t be set, it can be only inherited from a source mount. It is always a problem when more than one type of properties have to restored for one call. This means that we have to find a sequence of steps to get a required state. |
| In case of mount namespaces, one more problem is over-mounts. A few mounts may be over-mounted or processes can have file descriptors which are linked with over-mounted files. | | In case of mount namespaces, one more problem is over-mounts. A few mounts may be over-mounted or processes can have file descriptors which are linked with over-mounted files. |
| Another difficulty is that we are not able to create bind-mounts between namespaces, but each file system have to be mounted from a specified user namespace. | | Another difficulty is that we are not able to create bind-mounts between namespaces, but each file system have to be mounted from a specified user namespace. |
| | | |
− | = The solution = | + | = Solution = |
− | When we see all these variation of commands to build a mount tree, we can understand that the final picture may be very complicated to be repeated, so we suggest to add a new flag to the mount() syscall, which allows us to add a mount into an existing group. | + | When we see all variation of commands to build a mount tree, we can understand that the final picture may be very complicated to be repeated, so we suggest to add [https://patchwork.kernel.org/patch/9703885/ a new flag] to the mount() syscall, which allows us to add a mount into an existing group. |
| | | |
| In this case the restore algorithm will be very simple. | | In this case the restore algorithm will be very simple. |
− | Create a temporary mount which is called “root yard” | + | * Create a temporary mount which is called “root yard” |
− | Create all namespaces (in specified user namespaces) | + | * Create all namespaces (in specified user namespaces) |
− | Add root yards from all namespaces into one shared group, so a mount is created in one mntns, will be propagated into others. | + | * Add root yards from all namespaces into one shared group, so a mount is created in one mntns, will be propagated into others. |
− | Create all mounts in separate directories in the root yards. | + | * Create all mounts in separate directories in the root yards. |
− | Restore opened files (nothing is over-mounted at this point) | + | * Restore opened files (nothing is over-mounted at this point) |
− | Build mount trees in namespaces by moving mounts to right places | + | * Build mount trees in namespaces by moving mounts to right places |
− | Do pivot_root() in all namespaces | + | * Do pivot_root() in all namespaces |
| | | |
| Let’s look at the next example: | | Let’s look at the next example: |
Line 87: |
Line 87: |
| | | |
| [[File:mntns-2.0-tree-3.svg]] | | [[File:mntns-2.0-tree-3.svg]] |
− |
| |
| | | |
| = Restore of unix sockets = | | = Restore of unix sockets = |
Line 93: |
Line 92: |
| Unix sockets can be bound to a file. The problem is that an address and a file are not connected between each other in term of unix sockets. For example, if you move a socket file, ss shows the origin address and you can’t find a file where the socket is bound. Another example is that an address may contain a relative path (../socket_name). | | Unix sockets can be bound to a file. The problem is that an address and a file are not connected between each other in term of unix sockets. For example, if you move a socket file, ss shows the origin address and you can’t find a file where the socket is bound. Another example is that an address may contain a relative path (../socket_name). |
| | | |
− | Currently socket_diag shows a device and an inode number for a socket file, but it says nothing about a path to this file and about its mount point. We can introduce a new ioctl, which will return a file descriptor to a socket file. | + | Currently socket_diag shows a device and an inode number for a socket file, but it says nothing about a path to this file and about its mount point. We introduced the SIOCUNIXFILE ioctl, which returns a file descriptor to a socket file. |
| In this case to restore a unix socket we have to: | | In this case to restore a unix socket we have to: |
− | create a temporary directory and mount tmpfs into it before restoring sockets | + | * create a temporary directory and mount tmpfs into it before restoring sockets |
− | Restore sockets | + | * Restore sockets |
− | create a socket address directory where is the last part is a symlink to a proper directory on a required mount point | + | * create a socket address directory where is the last part is a symlink to a proper directory on a required mount point |
− | call chroot() to the temporary directory | + | * call chroot() to the temporary directory |
− | bind the socket to a specified address | + | * bind the socket to a specified address |
| if we restored a server socket, we can get a file descriptor for its file and use it to restore client sockets by calling connect() for /proc/self/fd/[SK_FILE_FD] | | if we restored a server socket, we can get a file descriptor for its file and use it to restore client sockets by calling connect() for /proc/self/fd/[SK_FILE_FD] |
| umount tmpfs from the temporary directory and remove the directory after restoring all sockets | | umount tmpfs from the temporary directory and remove the directory after restoring all sockets |
| + | |
| + | = Source code = |
| + | * [https://github.com/avagin/criu/tree/mntns-2.0 github.com/avagin/criu/tree/mntns-2.0] |
| + | * [https://lkml.org/lkml/2017/5/9/634 [PATCH] fs: add an ioctl to get an owning userns for a superblock] |
| + | |
| + | [[Category: Under the hood]] |