
Sandboxing profile without privileges for bubblewrap
Many container execution time tools, such as SystemD-NSPAWN, Dockeretc., Focus on providing infrastructure for system administrators and orchestration tools (for example, kubernettes) to execute containers.
These tools are not suitable for giving them to users without privileges, because it is trivial to convert such access into a root shell with complete privileges in the host.
There is an Isolation system at the Linux kernel level called "User Namespaces" user name spaces that try to allow users without privileges to use the functions of the container. Although significant progress has been achieved, there are still concerns in this regard and is not available for users without privileges in several production distributions such as Centos / Red Hat Enterprise Linux 7, Debian Jessie, etc.
See, for example, CVE-2016-3135, which is a local root vulnerability introduced by users. This March 2016 publication has more discussion.
Bubblewrap could be seen as a whole implementation of a subset of user name spaces. Emphasis on the subset - specifically relevant to the previous CVE, Bubblewrap does not allow control over Iptables.
The original Bubblewrap code existed before user names spaces: inherits the XDG -app Helper code that, in turn, is disturbed by Linux-User-Chroot.
The maintainers of this tool believe that, even when used in combination with the typical software installed in that distribution, it does not allow privilege escalation. However, it can increase the capacity of a user who has logged in to carry out service denial attacks.
In particular, bubblewrap is used pr_set_no_new_privs to deactivate Setuid binary, which is the traditional way to go out of things like chroots.
This program can be shared by all container tools that do not root operations, such as:
FLATPAK RPM-OSTREE WITHOUT BWRAP-OCI privileges we would also like this to be available in the Kubernetes / Onshift clusters. Having the ability that users without privileges use the functions of the container would significantly facilitate the realization of interactive and similar purification scenarios.
Bubblewrap works by creating a new, completely empty assembly names space, where the root is in a TMPFS that is invisible to the host, and will be cleaned automatically when the last process ends. Next, you can use the command line options to build the Root file system and the process environment and the command to execute in the name space.
There is a larger demonstration command sequence in the source code, but here there is a small version that executes a new Shell reusing the file /USR.
BWRAP --ro-Bind /USR /USR-SYMLINK USR /LIB64 /LIB64 --PROC /PROC --DEV /DEV-Share-PID BASH This is an incomplete example, but useful for illustrative purposes. More often, instead of creating a container using the host file system tree, you want to sign up for a chroot. There, instead of creating the Lib64 -> USR/Lib64 symbolic link, TMPFS may have already created it in destination rootfs.
The objective of Bubblewrap is to execute an application in a sand box, where it has restricted access to parts of the operating system or user data, such as the starting directory.
Bubblewrap always creates a new assembly names space, and the user can specify exactly what parts of the file system should be visible in the sandbox. Any of these directories specified will be mounted by predetermined form and can be done only reading.
In addition, you can use these kernel functions:
User name spaces (Clone_Newuser): This hides everything except the current UID and Sandbox. It can also change what the UID / GID value should be in the sandbox.
IPC names spaces (Clone_Newipc): The sand box will obtain its own copy of all different forms of CPI, such as SYSV shared memory and traffic lights.
PID names spaces (clone_newpid): Sandbox will not see any process outside the sandbox. In addition, Bubblewrap will execute a pid1 trivial within its container to handle the requirements of reaping children in the sand box. This avoids what is now known as the Docker PID 1 problem.
Spaces dehttp: //linux.die.net/man/2/clone network names (clone_newnet): Sandbox will not see the network. Instead, you will have your own network name space with only an inverted loop device.
UTS names space (Clone_Newuts): The sand box will have its own host name.
SECCOMP filters: SECCOMP filters that limit calls to the system that can be performed in the testing zone can pass. For more information, see SECCOMP.
Firejail is similar to Flatpak before Bubblewrap was divided into the sense that it combines a setuid tool with many specific desktop sandboxing functions. For example, Firejail knows Pulseaudio, while Bubblewrap no.
Bubblewrap authors believe that it is much easier to audit a small setuid program and maintain characteristics such as Pulseaudio filtering as a process without privileges, as is the case now in Flatpak.
In addition, @Cgwalters thinks that trying to include file routes in the white list is a bad idea given the innumerable ways in which users have to manipulate routes and innumerable ways in which system administrators can configure a system. The bubblewrap approach is to retain only some specific Linux capabilities such as Cap_sys_admin, but always access the file system such as the invocation UID. This completely closes the Tocttou attacks and others.
Sandstorm.io requires spaces for user names without privileges to configure its sandbox, although it could also easily adapt to operate in a setuid mode. @Cgwalters believes that its code is quite good, but it could still make sense to unify in Bubblewrap. However, @Kentonv (from Sandstorm) feels that although this makes sense in principle, the cost of change exceeds practical benefits for now. This decision could be reassess in the future, but today it is not actively applied.
Runc is currently working on the container support without root, without the need for any other privilege during the installation of Runc (using user name spaces without privileges instead of setuid), creation and administration of containers. However, the standard mode of using Runc is similar to Systemd NSPawn in which it is designed to be invoked by root.
Bubblewrap authors believe that Runc and Systemd-NSPAWN are not designed to become setuid, and are far from admitting such a way. However, with containers without root, Runc can comply with certain use cases admitted by Bubblewrap (with the additional benefit of being a complete and standardized oci execution time).
Binctr is just an envelope for Runc, so he inherits all his design compensations.