Containers In Depth

Today we learn about containers, what are they, where do they come from, how do they work and why would you want to use them?

What is a Container

You’ve probably heard of things like Docker or Podman, these are tools to help you run containers. First you create an ‘image’, this is created from a file, usually a Dockerfile or similar. A running instance of this image is called a container.

A container allows you to run software in a fully ‘containerised’ environment on a host. It has it’s own process id’s, it’s own storage system and can even have it’s own limits to memory and cpu.

Virtual Machines and Hypervisors

Before containers, there were already things that let us run multiple pieces of software separately on the same hardware. These are virtual machines. The hypervisor is a program that runs on the host operating system, that ‘hosts’ virtual machines. In this way, you could run your software on the same underlying hardware, but in a completely separate way. The hypervisor virtualises the underlying hardware, so that to the host vm, it still seems like it’s running on actual hardware.

An example hypervisor is VMware, which you could use to host any kind of vm, perhaps a linux vm.

Why Containers

The origins of Docker and containers go way back, we could have an entire article here. Originally things started by wanting to run binaries from external sources in a safe way. If you give me some software, I don’t just want to run it in my usual way, I want to run it in such a way that if it breaks something or is malicious, then it’s only constrained to a particular process. An example of this is a [Jail]¹ from FreeBSD.

Other benefits soon emerged, for example now we could use containers not just to run other potentially malicious software, but just other normal software. This increases hardware utilisation rates. Instead of needing a vm for every application, you can run multiple applications on the same hardware. The [original paper introducing containers to linux]² has this usecase in mind.

Today, containers are using to rapidly create and scale workloads across machines globally. Kubernetes emerged as a way to manage containers at scale.

Container Comparison

While hypervisors virtualise hardware, containers do not need this. Containers can (and should) be run directly on the metal (Brian Cantrill has a great [talk]³ on this. There is no requirements for extra layers of virtualisation.

Software running without extra layers of virtualisation is generally more efficient and performant, so in the general case, containers are superior.

Interestingly, when you run an EC2 instance on AWS you are actually getting a VM, not a container. This seems counterintuitive due to the potential performance issues, but also sensible when you consider that customers might want stronger guaranteers around isolation. Microsoft recently released [HyperLight]⁴, which enables running singular functions on top of a hypervisors. The performance here is pretty crazy, and it’s an interesting read. One of the reasons why you’d go for a container is that it’s more lightweight, but Microsoft seems to have nearly solved the overhead for spinning up new vms. Enabling users to use VMS not only for workloads, but individual functions like lambdas.

Hyperlight is able to create new VMs in one to two milliseconds.

This space is a kind of undercurrent to application development, so it will be interesting to see how software deployment practices change over the coming years.

How do Containers Work?

Containers are native to linux. If you’re running docker and not on linux, then docker is actually running some kind of VM to virtualise a linux operating system so that it can run your containers.

The linux requirement exists because there are a number of linux system calls that make containers work. You won’t find these in MacOS or Windows.

Chroot

[Chroot]⁵ means “Change Root Directory”. This system call changes the root directory of the calling process.

This effectively gives us a way to start a process in any location, which is a desired attribute of containers. We don’t want every container to have the same starting location. Ideally we want this location to be completely separate from what the rest of the processes can see.

You can escape the chroot jail by chrooting to another directory from within your container.

Namespaces

https://www.man7.org/linux/man-pages/man7/namespaces.7.html

There are global system resources like process ids, networks etc. We want to be able to wrap these so that within the container it appears they have their own isolated instance of this global resource.

https://www.man7.org/linux/man-pages/man7/mount_namespaces.7.html

https://www.man7.org/linux/man-pages/man2/pivot_root.2.html

Cgroups

https://www.man7.org/linux/man-pages/man7/cgroups.7.html

Previously we created ways to isolated the file system and resources, cgroups allow us to place hardware / memory limits on processes.

Extras

System Call Blacklisting
- https://docs.docker.com/engine/security/seccomp/#significant-syscalls-blocked-by-the-default-profile

Thread Pulling

Can I run a Windows Container on Linux, and vice-versa?

Windows containers require the windows kernel. Linux can be run on windows using docker desktop, which provides a linux vm.

How can I run a operating system container different to the host on linux?

Ubuntu etc is just a set of files. The containers share the same kernel, so can only use the same system calls.

Containers built on x86_64 will not run on an arm os.

Refs

https://en.wikipedia.org/wiki/FreeBSD_jail ↩︎
https://lwn.net/Articles/199643/ ↩︎
https://www.youtube.com/watch?v=coFIEH3vXPw “Run containers on bare metal already! - Brian Cantrill” ↩︎
https://opensource.microsoft.com/blog/2024/11/07/introducing-hyperlight-virtual-machine-based-security-for-functions-at-scale ↩︎
https://www.man7.org/linux/man-pages/man2/chroot.2.html ↩︎