Security Enhanced Linux
SELinux is protecting the file system, and the host from attack from inside of a container.
The initial SELinux policy for containers was written for a tool called virt-sandbox that used libvirt to launch containers, specifically it used libvirt-lxc.
This first type was called
svirt_lxc_t and it is not allowed to have network access.
The successor of
svirt_lxc_t is called
svirt_lxc_net_t and allows full network access.
The type for content that the
svirt_lxc types could manage is named
This SELinux policy was later adopted by Docker and the aliases
container_file_t were created.
The container policy is defined in the container-selinux package.
By default containers run with the SELinux type
container_t whether this is a container launched by just about any container engine (e.g. podman, cri-o, docker, buildah, moby).
SELinux only allows a
container_t to read/write/execute files labeled
The Docker daemon and Podman are usually running as
container_runtime_t, and the default label for content in
Using correct SELinux label to parasite socket
If running on a system with SELinux enabled the socket for the communication between parasite daemon and the main CRIU process needs to be correctly labeled.
In the case of Podman, CRIU is started from runc and it is running as
The parasite code will be running with the same context as the container process (
CRIU interacts with the parasite code via a Unix socket and allowing a container process to connect via socket to the outside of the container is not desirable. Thus, CRIU first obtains the context of the root container process and tells SELinux to label the created socket with the same label as the root container process.
For this to work it is necessary to have the correct SELinux policies installed. For Fedora based systems this is part of the container-selinux package.
Note that the current implementation assumes all processes CRIU that are to be checkpointed are labeled with the same SELinux context, which is the default behaviour for most container engines.
In the case when a child process has a different label an additional SELinux policies might be required.
Checkpoint and restore any SELinux process label
For successful container checkpoint and restore on a SELinux enabled host it is necessary that the restored container has the same process context as before checkpointing.
During dump CRIU stores any process label to be restored and for processes started from the command-line which are usually running in the
unconfined_t this just works. For containers
an additional policy is needed, which is provided by the latest container-selinux package. This policy allows CRIU (when running as
container_runtime_t) to transition the restored process to
Restoring a process that is running under systemd's control (
unconfined_service_t) without additional policies is likely to fail because CRIU will be not allowed to change the context of the restored process.
For each checkpoint/restore use case on SELinux enabled systems, besides container processes and command-line/shell processes, a dyntransition permission must be granted between the old and new security contexts.
Restoring a multi-threaded process with SELinux
SELinux does not always support changing the process context of a multi-threaded process. The context change of a running multi-threaded process is allowed only if the new security context is bounded by the old security context.
To be able to restore a process without the need to have the new security context bounded by the old security context, CRIU sets the SELinux process context before creating the threads. Thus, all threads are created with the process context of the main process.