Many times I come across the question of HyperThreading and its benefits – either in personal computing, but more importantly over the last few years, virtualization. I’d like to talk about what HyperThreading is for a moment, and show you if it benefits the virtualized environment.
What is HyperThreading?
Today, you see HyperThreadng (HT) technology is present on almost every Intel processor, be it Xeon or Core i3/i5/i7 Series. Basically, it splits one physical core to two logical cores, but the term splitting is somewhat inaccurate and confuses many consumers. Thinking that when they run a 2.5GHz 4-core, HyperThreaded CPU, they immediately have 8 effective cores carrying the full processing capability of 20 GHz. Mainly because when you say you split something, you think that this has been divided to two equal parts (or at least that’s what I think, anyways).
The term that should be better used in this case is “enhancing” the already present physical core with one logical unit so two threads can be run in one clock cycle. See the picture below:
How it actually works
Every running application or service that runs on top of an OS is composed of threads. These can be basically viewed as workers that pull together to create a useful function that gets delivered to your desktop (or shell, if you’d like). Let’s take a look on how these threads look on Windows 7 OS while using a very smart utility called Process Explorer. I have used the System thread – the Windows Kernel to show you what is running under it:
You see the Thread IDs each Thread has, their CPU usage, and the CSwitch Delta which basically indicates how busy the CPU is processing a certain Thread. The Start Address field is the function of an application module in the memory that is being processed. Currently the Windows Kernel alone, responsible for keeping my system running was consisting of 154 threads, and I had around 1600 threads running in total.
Threads are handled in a serial fashion by each physical CPU. When one thread had its share of run-time and an up to four instructions per clock cycle were executed, another follows. Now, the more physical CPUs you have, the better you are at running the tasks in parallel fashion. Each CPU will have its pipeline the thread can be scheduled and subsequently run in, and less contention occurrs overall. That of course also depends on how well the application scales with increasing amount of physical cores.
HyperThreading is different – simply put, HT enhances it’s “parent” Physical CPU’s pipeline by adding a logical core, so two threads can be run in one clock cycle in parallel to minimize the Physical core’s idle time. This means that mostly highly parallelized applications gain the maximum benefit from this – more stuff can be done in less time! The image below explains it nicely:
Let’s look at the System Idle Process screenshot on a dual-core HyperThreaded processor when the system was idle. This will uncover how the system behaves towards the HyperThreaded cores a little:
As you can see the Physical Cores 0 and 2 have the Windows Kernel’s CPU Idle module running,- these can be interpreted by the system “These cycles are free”. On the flipside, HyperThreaded Cores 1 and 3 have only one single Context switch, that is being handled like “We have no threads to assist with”.
Does ESXi Benefit from HyperThreading?
With what I have described so far, HyperThreading seems to be an ideal hardware assistant for the hypervisor. I have taken a look on one of our “most utilized” hosts with ~63% CPU utilization on Dual-Socket 12-Core HyperThreaded Xeon host running several Citrix Terminal servers, and I have gathered the following results via ESXTOP, after selecting the Power States screen with the p Key:
According to the picture, the Hypervisor decided to load the first NUMA Node more than the second, as there is stil some capacity left. CPUs 0-23 (HT cores are 0 + even numbers) belong to Physical CPU 1 and CPUs 24-47 (HT cores are odd numbers) belong to Physical CPU 2.
I wanted to check again, so I tried another “most-stressed” host in our environment, having Dual 8-Core Xeons with HyperThreading:
I have to say I was pretty surprised with how evenly the load was balanced across all cores and HyperThreading was pretty busy with providing a steady flow of instructions to the Physical CPUs. Still, my opinion is that the true performance gain could be anywhere between nothing and about 20% relative to the true per-core frequency, because the VMkernel really does its job well when tasked with parallel workloads – after all it’s what it was built to do.
The Bottom Line
HyperThreading is an enhancement of the Physical CPU that helps filling its pipeline by a steady stream of instructions in busy times and when a lot of co-scheduling is needed, allowing it to reach the maximum of 4 instruction executions per one clock cycle. By design this should reap the most benefits on heavily threaded applications that rely on a steady stream of instructions. As was shown on the images, the VMkernel uses HyperThreading very well, although the real performance benefit will differ from workload to workload.
There is an article straight from Intel (from which I linked a pair of pictures) about Performance Insights to HyperThreading Technology which is a worthy read if you ‘d like to get even more technical.