Difference between revisions of "External bind mounts"

From CRIU
Jump to navigation Jump to search
 
(13 intermediate revisions by 5 users not shown)
Line 1: Line 1:
Typical external resource when dumping a container (especially LXC/Docker) -- is a mount point whose root sits outside of the container's root. This situation was intended to be resolved using [[plugins]] but turned out to be ''so'' frequent, that we introduced a non-plugin way of handling them.
+
__TOC__
 +
 
 +
One of typical external resources when dumping a container (especially LXC/Docker) is a mount point whose root sits outside of the container's root. This situation was intended to be resolved using [[plugins]] but turned out to be common enough to introduce a built-in way of handling it.
  
 
== What is external bind mount ==
 
== What is external bind mount ==
Line 20: Line 22:
 
== How to teach CRIU to dump them ==
 
== How to teach CRIU to dump them ==
  
By default CRIU doesn't dump such mountpoints, because there's no way CRIU will be able to restore it -- the root of these mounts is out of the scope of what CRIU dumped. In the logs you would see the message like
+
By default CRIU doesn't dump such mountpoints, because there's no way CRIU will be able to restore it -- the root of these mounts is out of scope of what CRIU dumped. In the logs you would see a message like
  
 
  34:/bar doesn't have a proper root mount
 
  34:/bar doesn't have a proper root mount
  
which will mean, that the mountpoint /bar has inaccessible root.
+
which means the mountpoint /bar has inaccessible root.
  
To dump and restore them there's the <code>--ext-mount-map KEY:VAL</code> option that sets up external mounts root mapping.
+
To dump and restore them there's the <code>--external mnt[KEY]:VAL</code> option that sets up external mounts root mapping.
  
On dump, KEY is a mountpoint inside container and correspoding VAL is a string that will be written into the image as mountpoint's root value.
+
On dump, KEY is a mountpoint inside container, and corresponding VAL is a string that will be written into the image as mountpoint's root value.
  
On restore KEY is the value from the image (VAL from dump) and the VAL is the path on host that will be bind-mounted into container (to the mountpoint path from image).
+
On restore, KEY is the value from the image (VAL from dump), and the VAL is the path on host that will be bind-mounted into container (to the mountpoint path from image).
  
 
For example, if we want to dump the task above we should call
 
For example, if we want to dump the task above we should call
  
  criu dump ... --ext-mount-map /bar:barmount
+
  criu dump ... --external mnt[/bar]:barmount
  
 
The word <code>barmount</code> is an arbitrary identifier, that will be put in the image file instead of the original root path
 
The word <code>barmount</code> is an arbitrary identifier, that will be put in the image file instead of the original root path
Line 43: Line 45:
 
On restore we should tell CRIU where to bind mount the <code>barmount</code> from like this
 
On restore we should tell CRIU where to bind mount the <code>barmount</code> from like this
  
  criu restore ... --ext-mount-map barmount:/foo
+
  criu restore ... --external mnt[barmount]:/foo
  
 
With this CRIU will bind mount the /foo into proper mountpoint.
 
With this CRIU will bind mount the /foo into proper mountpoint.
 +
 +
Note: Mounts from same superblock should remain mounts from same superblock after migration. Options `--external mnt[smth]:/smth` force criu to bindmount from the provided source, that can lead to mounts, which were from the same supperblock before dump, appear to be from different supperblock after restore, which is wrong so these option should be used carefully (can break sharing groups restore).
 +
 +
== Auto detection ==
 +
 +
In case one wants CRIU to autodetect and dump all the external bind mounts, and there is no need to change host mount points on restore, one can use a special syntax:
 +
 +
criu dump ... --external mnt[]:''flags''
 +
 +
Note here is nothing inside square brackets, and the optional <code>:''flags''</code> argument can contain the following characters:
 +
 +
; <code>m</code>
 +
: Also enable dumping of external master mounts (as in <code>mount --make-slave</code>)
 +
; <code>s</code>
 +
: Also enable dumping of external shared mounts (as in <code>mount --make-shared</code>)
 +
 +
By default, neither master nor shared external mounts are dumped (if found, dump is aborted). Note if <code>''flags''</code> are not given, semicolon is optional.
 +
 +
=== Examples ===
 +
 +
criu dump ... --external 'mnt[]'
 +
 +
Auto-detect and dump all external bind mounts.
 +
 +
criu dump ... --external 'mnt[]:s'
 +
 +
Auto-detect and dump all external bind mounts, including the shared ones.
 +
 +
criu dump ... --external 'mnt[]:sm'
 +
 +
Auto-detect and dump all external bind mounts, including the shared and the master ones.
 +
 +
== Sharing ==
 +
 +
External bindmounts can both have internal/external sharing. Please see the example:
 +
 +
# Preparation
 +
unshare -m --propagation private
 +
mkdir /external_mount_sharing_test
 +
mount -t tmpfs tmpfs /external_mount_sharing_test/
 +
mount --make-private /external_mount_sharing_test/
 +
cd /external_mount_sharing_test
 +
# Source of external mount
 +
mkdir external_mount
 +
mount -t tmpfs tmpfs-external external_mount/
 +
mount --make-shared external_mount/
 +
cat /proc/$$/mountinfo | grep external
 +
# 811 755 0:60 / /external_mount_sharing_test rw,relatime - tmpfs tmpfs rw
 +
# 812 811 0:62 / /external_mount_sharing_test/external_mount rw,relatime shared:290 - tmpfs tmpfs-external rw
 +
 +
# Switch to CT mntns
 +
unshare -m --propagation unchanged sh
 +
mkdir root
 +
mount -t tmpfs tmpfs-root root/
 +
mkdir root/external_sharing root/internal_sharing root/proc
 +
 +
# Create external mount
 +
mount --bind external_mount/ root/external_sharing
 +
mount --bind external_mount/ root/internal_sharing
 +
mount --make-private root/internal_sharing
 +
mount --make-shared root/internal_sharing
 +
 +
# More preparations
 +
mount --bind /proc root/proc
 +
cd root
 +
mkdir bin lib64
 +
SH=$(which sh)
 +
cp $SH bin
 +
cp $(ldd $SH | grep "/lib64" | sed 's/^.*\(\/lib64\S*\)\s.*$/\1/') lib64
 +
CAT=$(which cat)
 +
cp $CAT bin
 +
cp $(ldd $CAT | grep "/lib64" | sed 's/^.*\(\/lib64\S*\)\s.*$/\1/') lib64
 +
PATH=$PATH:/bin
 +
chroot . sh
 +
cat /proc/$$/mountinfo
 +
# 843 841 0:63 / / rw,relatime - tmpfs tmpfs-root rw
 +
# 861 843 0:62 / /external_sharing rw,relatime shared:290 - tmpfs tmpfs-external rw
 +
# 898 843 0:62 / /internal_sharing rw,relatime shared:349 - tmpfs tmpfs-external rw
 +
# 899 843 0:5 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
 +
 +
Mounts 812 (on the host) and 861 (in a container) have the same sharing (shared group) - external sharing and mount 898 has it's own local shared group - internal sharing. Same is applicable for master_ids, if we convert them into slaves external/internal shared_id would convert to external/internal master_id.
 +
 +
[https://criu.org/Mount-v2 Mount-v2] is introducing a better support of external sharing:
 +
 +
- External sharing is not supported (converted to internal sharing after c/r) as reasonable container environments should not allow it due to security reasons, and implementing it's lookup would lead to bad performance (host mountinfo reading).
 +
 +
- External slavery is supported for mountpoint external mounts and the root mount. It is detected when criu can't lookup master_id of the mount across shared_ids in container mount namespaces. CRIU relies that mountpoint external source provides right shared/slave mount to copy sharing from. Everything else is considered as internal sharing/slavery.
 +
 +
== Old days ==
 +
 +
For now the same behavior is configured with the <code>--ext-mount-map KEY:VAL</code> option. Soon this option will be [[deprecation|deprecated]].
  
 
[[Category:HOWTO]]
 
[[Category:HOWTO]]
[[Category:API]]
+
[[Category:External]]

Latest revision as of 09:41, 16 May 2022

One of typical external resources when dumping a container (especially LXC/Docker) is a mount point whose root sits outside of the container's root. This situation was intended to be resolved using plugins but turned out to be common enough to introduce a built-in way of handling it.

What is external bind mount[edit]

The way to create such is simple as

mkdir /root
mount --bind /foo /root/bar
chroot /root

This is it. From now on, the /bar file is a mountpoint whose root (the source) is not accessible directly.

If you look at the /proc/$pid/mountinfo file of a task seeing such you would see smth like

11 23 8:3 /root / ... - ext4 /dev/sda1 ...
23 34 8:3 /foo /bar ... - ext4 /dev/sda1 ...

The columns 4 and 5 are root and mountpoint respectively. You can see, that the / is /root file from /dev/sda1 device and /bar file is a mountpoint with the root being /foo file from the same device.

How to teach CRIU to dump them[edit]

By default CRIU doesn't dump such mountpoints, because there's no way CRIU will be able to restore it -- the root of these mounts is out of scope of what CRIU dumped. In the logs you would see a message like

34:/bar doesn't have a proper root mount

which means the mountpoint /bar has inaccessible root.

To dump and restore them there's the --external mnt[KEY]:VAL option that sets up external mounts root mapping.

On dump, KEY is a mountpoint inside container, and corresponding VAL is a string that will be written into the image as mountpoint's root value.

On restore, KEY is the value from the image (VAL from dump), and the VAL is the path on host that will be bind-mounted into container (to the mountpoint path from image).

For example, if we want to dump the task above we should call

criu dump ... --external mnt[/bar]:barmount

The word barmount is an arbitrary identifier, that will be put in the image file instead of the original root path

criu show -f mountpoints.img -F mnt_id,root,mountpoint
mnt_id: 0x22 root: barmount mountpoint: /bar

On restore we should tell CRIU where to bind mount the barmount from like this

criu restore ... --external mnt[barmount]:/foo

With this CRIU will bind mount the /foo into proper mountpoint.

Note: Mounts from same superblock should remain mounts from same superblock after migration. Options `--external mnt[smth]:/smth` force criu to bindmount from the provided source, that can lead to mounts, which were from the same supperblock before dump, appear to be from different supperblock after restore, which is wrong so these option should be used carefully (can break sharing groups restore).

Auto detection[edit]

In case one wants CRIU to autodetect and dump all the external bind mounts, and there is no need to change host mount points on restore, one can use a special syntax:

criu dump ... --external mnt[]:flags

Note here is nothing inside square brackets, and the optional :flags argument can contain the following characters:

m
Also enable dumping of external master mounts (as in mount --make-slave)
s
Also enable dumping of external shared mounts (as in mount --make-shared)

By default, neither master nor shared external mounts are dumped (if found, dump is aborted). Note if flags are not given, semicolon is optional.

Examples[edit]

criu dump ... --external 'mnt[]'

Auto-detect and dump all external bind mounts.

criu dump ... --external 'mnt[]:s'

Auto-detect and dump all external bind mounts, including the shared ones.

criu dump ... --external 'mnt[]:sm'

Auto-detect and dump all external bind mounts, including the shared and the master ones.

Sharing[edit]

External bindmounts can both have internal/external sharing. Please see the example:

# Preparation
unshare -m --propagation private
mkdir /external_mount_sharing_test
mount -t tmpfs tmpfs /external_mount_sharing_test/
mount --make-private /external_mount_sharing_test/
cd /external_mount_sharing_test
# Source of external mount
mkdir external_mount
mount -t tmpfs tmpfs-external external_mount/
mount --make-shared external_mount/
cat /proc/$$/mountinfo | grep external
# 811 755 0:60 / /external_mount_sharing_test rw,relatime - tmpfs tmpfs rw
# 812 811 0:62 / /external_mount_sharing_test/external_mount rw,relatime shared:290 - tmpfs tmpfs-external rw

# Switch to CT mntns
unshare -m --propagation unchanged sh
mkdir root
mount -t tmpfs tmpfs-root root/
mkdir root/external_sharing root/internal_sharing root/proc

# Create external mount
mount --bind external_mount/ root/external_sharing
mount --bind external_mount/ root/internal_sharing
mount --make-private root/internal_sharing
mount --make-shared root/internal_sharing

# More preparations
mount --bind /proc root/proc
cd root
mkdir bin lib64
SH=$(which sh)
cp $SH bin
cp $(ldd $SH | grep "/lib64" | sed 's/^.*\(\/lib64\S*\)\s.*$/\1/') lib64
CAT=$(which cat)
cp $CAT bin
cp $(ldd $CAT | grep "/lib64" | sed 's/^.*\(\/lib64\S*\)\s.*$/\1/') lib64
PATH=$PATH:/bin
chroot . sh
cat /proc/$$/mountinfo
# 843 841 0:63 / / rw,relatime - tmpfs tmpfs-root rw
# 861 843 0:62 / /external_sharing rw,relatime shared:290 - tmpfs tmpfs-external rw
# 898 843 0:62 / /internal_sharing rw,relatime shared:349 - tmpfs tmpfs-external rw
# 899 843 0:5 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw

Mounts 812 (on the host) and 861 (in a container) have the same sharing (shared group) - external sharing and mount 898 has it's own local shared group - internal sharing. Same is applicable for master_ids, if we convert them into slaves external/internal shared_id would convert to external/internal master_id.

Mount-v2 is introducing a better support of external sharing:

- External sharing is not supported (converted to internal sharing after c/r) as reasonable container environments should not allow it due to security reasons, and implementing it's lookup would lead to bad performance (host mountinfo reading).

- External slavery is supported for mountpoint external mounts and the root mount. It is detected when criu can't lookup master_id of the mount across shared_ids in container mount namespaces. CRIU relies that mountpoint external source provides right shared/slave mount to copy sharing from. Everything else is considered as internal sharing/slavery.

Old days[edit]

For now the same behavior is configured with the --ext-mount-map KEY:VAL option. Soon this option will be deprecated.