Linux containers are a unit of software containing all it's required libraries and dependecies. Containers are made of isolated namespaces and linux cgroups.
As per the official Docker website, this is how the container is defined as:
A container is a unit of software that packages code and its dependencies so the application runs quickly and reliably across computing environments.
Okay, that makes sense so a Linux container is a unit of software that is also containing the libraries/dependencies needed to run the software. However, this definition does not really cover things like:
In this article, we'll answer some of these questions. Before we jump to the contents of the containers it's important to understand and get a high-level idea of some of the related fundamentals. We'll keep this short and still explain the idea behind containers.
As the goal of containerization is to provide isolation between applications it is often confused with virtualization. Let's see how containerization and virtualization are related.
You must have seen Docker container definition files having this clause FROM
:
FROM alpine:3.14
RUN apk add --no-cache mysql-client
ENTRYPOINT ["mysql"]
Typically it refers to the base image the container image is built on top of. So in the above snippet is this container built on top of Alpine OS VM? The answer is no! Container and virtualization are completely different technologies though they might help in achieving similar goals.
Virtualization or VMs are actually real complete operating operating systems running either on a host OS using VM technology or a hypervisor if the VM is at kernel level. In the VMs the OS has its own kernel and all the baggage that you can relate to an OS.
Containerization or containers, on the other hand, do not have their own kernel instead the containers rely on the host they’re being run on to be able to run. Containers do not need extra layers that a VM would - emulated hardware-level isolation, virtual processor, memory, storage, network interfaces, etc. These resources can be used from containers but for that, the containers rely on the host on which the container is running.
So, now that we know the containers do not use virtualization for isolating themselves and the processes running inside them. Let’s try to understand how exactly the isolation is achieved with the containers. To understand how isolation works we need to understand a few of the Linux fundamentals which are building blocks of containerization.
Namespaces are a feature of the Linux kernel which allows us to create isolated environments for processes. These namespaces partition kernel resources such that one set of processes sees one set of resources while another set of processes sees a different set of resources.
The process tree created in the namespace is isolated from other processes/namespaces. The processes inside a namespace does not have access to the outer world. There can be nested namespaces where the parent namespace can see details about the child namespace but vice versa is not possible due to isolation requirements.
Linux offers an operation for creating a namespace called unshare
.
unshare --user --pid --map-root-user --mount-proc --fork bash
If you’ve used docker containers this is a similar action to:
docker exec -it <image> /bin/bash
Linux cgroups or control groups are a way to create and allow limiting resources such as CPU, memory, and networking per cgroup. Linux cgroups are enhancements to Linux namespaces.
We can dynamically add processes to the control groups. Control groups are not necessarily tied to namespaces which gives flexibility to share the cgroup between multiple namespaces and share resources with variour flexible combinations.
mkdir -p /sys/fs/cgroup/memor/foo
echo 50000 /sys/fs/cgroup/memory/foo/memory.limit_in_bytes
echo <pid> > /sys/fs/cgroup/memory/foo/cgroup.procs
ps -o cgroup <pid>
The Linux kernel namespaces and groups are the Linux kernel component that powers the isolation capabilities of the containers.
As the namespaces and cgroups are built-in features of the Linux kernel we are saved from creating emulated hardware isolation. Also, because there’s no hypervisor and extra OS running below the isolated processes the containers are quite lightweight compared to virtual machines.
One important detail here is to notice that - the container images might contain the processes, tools and dependecies that are required for that peice of software, However to run a container we're still dependant on the host kernel. Tools like Docker gives us a way to create this host environment along with other container management tools.
Not really. The term container or containerization might have got popular when Docker came into the picture around 2013. However, the idea behind process-level containerization started back in the 1970s when the chroot utility was introduced.
The chroot operation allows changing the root directory for the currently running process and its children to isolate it from the rest of the host processes. This is said to be the first step towards process-level isolation.
More on chroot - linux chroot documentation
Around the early 2000s, the FreeBSD jails were released as part of FreeBSD 4. The FreeBSD jails were built on top of chroot which offered better isolation. Though, chroot offered file system isolation for the chrooted processes the other system processes, users, and network resources were accessible.
Jails expanded the chroot feature by virtualizing access to the file system, the set of users, and the networking subsystem. More fine-grained controls are available for tuning the access of a jailed environment.
The Jails have their own users which are not allowed to access/perform actions outside of the jail environment.
Around 2006 Google started working on the process containers as an independent project which allowed resource isolation similar to linux jails offering better control. Later the project was merged into the Linux kernel and renamed as cgroups(control groups). This has been part of the Linux kernel since 2008.
LinuX containers or LXC was the most complete implementation of process containers. It was developed in 2008 using c groups and Linux namespaces to offer isolation for processes. These containers can be run on Linux kernel without any external patches/software.
In 2013 Google came up with the open-source project libcontainer which is a container manager for Linux which was inspired by LMCTFY.