Xen vs. kernel containers: performance and efficiency

(This is part of a series of entries)

Because Xen and KVM both support unmodified guests, I’d speculate that in the long run their raw CPU performance will converge on whatever concrete limitation that hardware-assisted virtualization presents. And paravirtualization may continue to reign here, or it may not. The harder issues to think about are disk and network I/O.

I was part of an investigation into how to make resource guarantees for workspaces under even the worst conditions on non-dedicated VMMs (Division of Labor: Tools for Growth and Scalability of Grids). The amount of CPU needed to support the guests’ I/O work (what I like to casually call the “on behalf of” work in the service domain) was pretty high and we looked at how to measure what guarantees were needed for the service domain itself to make sure the guest guarantees were met. So we had to write code that would extrapolate the CPU reservations needed across all domains (including the service domain).

One major source of the extra CPU work is context switching overhead, the service domain needs to switch in to process pending I/O events (on large SMPs, I’ve heard recommendations to just dedicate a CPU to the service domain). Also, in networking’s case, the packets are zero copy but they must still traverse the bridging stack in the service domain.

One important thing to consider for the long run on this issue is that there is a lot of work being done to make slices of HW such as Infiniband available directly to guest VMs, this will obviate the need for a driver domain to context switch in. See High Performance VMM-Bypass I/O in Virtual Machines

Container based, kernelspace solutions offer a way out of a lot of this overhead by being implemented directly in the kernel that is doing the “on behalf of” work. They also take advantage of the resource management code already in the Linux kernel.

They can more effectively schedule resources being used inside their regular userspace right alongside the VMs (I’m assuming) — and more easily know what kernel work should be “charged” to what process (I’m assuming). These two things could prove useful, avoiding some of the monitoring and juggling that is needed to correctly do that in a Xen environment (see e.g., the Division of Labor paper mentioned above and the Xen related work from HP).

There is an interesting paper Container-based Operating System Virtualization: A Scalable, High-performance Alternative to Hypervisors out of Princeton.

The authors contrast Xen and VServer and present cases where hard-partitioning (that you find in Xen) breeds too much overhead for grid and high performance use cases. Where full fault isolation and OS heterogeneity are not needed, they advocate that the CPU overhead issues of Xen I/O and VM context switches can be avoided.

(The idea presented there of live updating the kernel (as you migrate the VM) is interesting. For jobs that take months (that will miss out on kernel updates to their template images) or services that should not be interrupted, this presents an interesting alternative for important security updates (though for Linux, I’m under the impression that security problems are far more of a problem in userspace).)