Category Archives: Practice

Time for some fun! Here, I’ll be making various experiments in the lab. With pictures and stuff!

How much memory should be free for VMkernel?

Recently I have made a small research to see how much free RAM does VMkernel need to work without any hiccups due to:

  • Memory Reclamation Techniques
  • Memory Reservation for the VMkernel itself

I have gathered this data from live environment. However one very important metric is not included in the below measures and graphs, and that is the Virtual Machine overhead that is individual for each environment and is dependant on the VMs’ Memory and vCPU amount.

A quick explanation:

  • RAM [GB]: How many Gigabytes of RAM are installed in the Server.
  • VMKernel [MB]: How many MB are reserved for the VMkernel itself (you can find this value in Configuration -> System Resources Tab).
  • Reclamation [MB]: Calculated with a Memory Reclamation Formula (900 MB + 1% of memory above 24GB).
  • Total [MB]: Sum of VMKernel & Reclamation values. This should be the governing baseline value.
  • Free [%]: How much % of the total server’s memory should be free.
RAM [GB] VMKernel [MB] Reclamation [MB] Total [MB] Free [%]
8 1393 900.0 2293.0 28.0
16 1749 900.0 2649.0 16.2
24 2378 900.0 3278.0 13.3
48 2682 1145.8 3827.8 7.8
64 2745 1309.6 4054.6 6.2
96 3514.5 1637.3 5151.8 5.2
128 3612 1965.0 5577.0 4.3
192 5218.5 2620.3 7838.8 4.0
384 8220 4586.4 12806.4 3.3
512 9985 5897.1 15882.1 3.0

And a graph is below:

Memory Reservation Graph

A Graph representing the GB Installed vs. MB reserved memory.

I hope this table comes in useful when deciding how much RAM there is in your environment for the hosts to use.

PowerCLI Session 00: Introduction to PowerShell & PowerCLI

A few words to start with

Welcome to PowerCLI Sessions, where I’ll be showing you how VMware’s PowerShell Module PowerCLI works, along with examples so you can analyze, study, and most conveniently use the knowledge you find to your own good. I’d like to start with a “0th” session, to introduce people to PowerShell-based scripting on Windows Server systems.

Starting with Windows Server 2008, Microsoft has included an object-oriented shell to its systems called PowerShell. This was to quickly replace the previous, widely used scripting engine own to the Windows OS Family, Visual Basic. Whereas I have done several Visual Basic scripts in the past, it was frankly cumbersome to work with (and I guess many of you guys who have had the pleasure of scripting in VB will agree) – debugging was quite hard, the mnemonics were not that easy to remember – in short it did its job but you had to spend some time fiddling with the code as IDEs were next to none for this scripting language.

Introducing PowerShell & PowerCLI Module

With PowerShell, your weapons in the arsenal are the so-called commandlets (or cmdlets for short), and they are almost always based on a verb-noun basis, so they are pretty easy to remember. The outputs are almost always objects with various properties (or members) that are incredibly useful as I will be showing you throughout the lessons. You will need to wrap your mind around the fact that you are now working with objects in a shell environment – no more feeding plain strings everywhere. Sometimes you will need to input an object into your command else it will fail.

How does PowerCLI come into play here then? Simply by being a Module (we can also call it a plugin, extension, etc.) for PowerShell – supplying it with many new commandlets to be used exclusively within a VMware environment – all these commandlets operate by communicating with vCenter Server, or the ESXi host itself.

Installation

First things first – make sure your core PowerShell is updated to the most recent version possible (v3 for Windows 2008 R2 and v4 for Windows Server 2012 onwards) You can get the PowerCLI Version 5.5 in VMware’s repository – you will just need an account at my.vmware.com. The installation is pretty simple. Just download the executable, follow the instructions and then launch the console with a shortcut that has been created either in your start menu or on the Desktop.

Getting warmed up

The simplest command you will use in PowerCLI is Connect-VIServer (notice the Verb-Noun mnemonic?). This will establish either a connection to the vCenter or ESXi host. If you are unsure how to use the command, just try to Get-Help for it.gethelp

Now you are ready to run the command.

connectvcenter

From there you can use Get-VMHost to list all your ESXi hosts connected to the vCenter Server, Get-VM to get all the Virtual Machines.getvmhost

To explore each object’s properties, a VERY useful tool is Get-Member – this will show you what else is hiding behind the values that were just listed. Let’s try it with Get-VMHostgetmember

As you have noticed I have used the pipe and a shortcut to this command which is gm – and you will be using a lot while learning about objects’ members. To explore a member of the first position in the array, you use a dot like this – let’s use it to check the build number of the first ESXi host:

getmember-buld

To disconnect from the vCenter Server or the ESXi host, just type in Disconnect-VIServer servername and you are done.

Congratulations! You have just tapped in the awesome world of PowerCLI – we’ll continue with introduction to variables in the next Session.

Online ESXi Firmware and Driver Upgrade on HP Servers

When upgrading firmware and drivers on a huge amount of servers, it used to be time-consuming to perform a firmware upgrade after a reboot on each and every one of your ESXi hosts to match the standard. Not anymore – since Service Pack for ProLiant 2014.09.0, the NIC Firmware can be upgraded online as well since its 10.x version (a bump from the 3.x or 4.x versions that now share a unified firmware). A huge step forward – now all the applicable firmware can be upgraded in one go – and online! No need to wait to catch the boot menu and go through HP Smart Update Manager individually.

Here’s a step by step walkthrough:

  1. Download the HP Service Pack for ProLiant you wish to apply. You will need to have a HP account and a server under warranty linked to it in order to download the newest releases.
  2. Stage the .iso file to a server that has a good connection to all the ESXi hosts you plan to upgrade (preferrably a terminal server inside the Data Center) and unpack it to the location of your liking.
  3. Run \\spplocation\hp\swpackages\x64\hpsum_bin_x64.exe – the binary will depend on your OS flavor.
  4. The following console window will pop up, stating that the HP SUM Web Service has been launched and a default web browser will lanch on the machine, opening the address localhost:63001 and automatically logging you in by passing through your credentials. You can also connect to your terminal server from any other computer that can access its ports 63001 or 63002 (and it is more comfortable that way). I strongly suggest using Google Chrome.
    image001
  5. If you access the web interface, this is what you get.image003
  6.  Start by clicking on the drop-down arrow in the top left corner and select Baseline Libraryimage005
  7. You will need to manually initiate the inventory process for the selected baseline, so click on the already present one for the process to begin.
    image007
    After a few minutes, the inventory completes.
    image009
  8. Now we need to add our ESXi hosts, select VM hosts from the drop-down menu.image011
  9. Localhost is added automatically and unfortunately can’t be changed. Click on Add Node.image013
  10. You can either add a single node by its FQDN or a range of IP addresses separated by a dash. You need to specify the type of device you are adding and the package that is your baseline. Don’t forget to put in the root credentials else the initialization will fail.image015
  11. If you need to select specific nodes inside a range, the second entry in the “Select the type of add” has just what you need. You enter the range, and after a scan you select the nodes you desire. Shift+Click and CTRL+Click work here like a charm.image017
  12. After you have added the nodes via the “Node Range” method, select the baseline to apply to them and enter the root credentials. image019
  13. When you were successful and the hosts were added, you can select multiple hosts by shift+click or ctrl+click and the right frame will change to multiple selection operation.image021
  14. Here you will need to select the baseline again by clicking on Select Baselinesimage023
  15. Select the SPP and click on Add
    image025
  16. Back in the multi-select frame you enter root credentials in order to scan the hostsimage027
  17. You will see the inventarization progressimage029
  18. Once the SUM evaluates an update is needed, input the root credentials again and Deploy the components.image031image033
  19. You have reached the familiar deploy screen where you choose the components to upgrade. When you choose Deploy, it will initialize and you will see a gray wheel spinning beside the chosen hosts.image035

When the deployment is complete, you will have a green light next to your hosts you applied updates to, and the updates will be applied on the next reboot – which is ideal for combination with VMware Update Manager to apply patches & firmware in one take.

My VCP Exam Experience

Hi there! Today I’d like to share my VCP Exam experience with you. I hope this will encourage the people who are preparing to do it at some point in time (be it tomorrow or in a few months) and to share some heart-felt moments with you that have also successfully passed this examination. So stay awhile, and listen 🙂

Preparations.

I got my free exam voucher from Karel Novak by winning a drawing at his vmware-veeam blog. This voucher was time-limited to the end of September, so I wanted to schedule it just for that time so I could soak up the most hands-on experience from my newly acquired position as a last level specialist. Since hands-on experience is the king, I have relied primarily on that resource in order to “prepare” myself.

Since I like to soak up as much information possible on tech stuff I get my hands on, I have purchased a Mastering VMware vSphere 5.5 on which I have written a review in one of my earlier posts (spoiler: it’s helped me a LOT to pass the exam!). Since trip to my job takes around an hour, I was reading it on the bus for about a month before I actually got my exam scheduled.

The Actual Exam.

I arrived at the Pearson-certified testing center and everything went full swing. I have signed a few papers regarding NDA on question reproduction and order in the exam room. Then I got my photo taken by a webcam (you have to sit still for about 10 seconds which was nearly impossible for me), provide an electronic signature and off I went for my exam.

The rumours you might have heard about the draconian exam room rules were all true – you aren’t allowed to drink, chew gum, make any loud noises and do anything else that would disturb other people in the room. You are under surveillance the whole time and the only things you are allowed to carry with you to the exam room is a pen, an erasable board (you also give it back with all your scribblings) and a pen you are given by the attendant. If you want to leave, you have to ring for the attendant to pick you up and provide her with your ID. It sounds horrid but trust me, it’s not bad at all when the exam timer starts ticking – then you’ll have the time to worry about.

My initial (and foolish) thoughts were: Pfff, I’ll be done in a few minutes, take my certificate and go home. No need for all 135 minutes I got for the exam. Oh, I couldn’t have been more wrong – I consumed every single second of that exam and let that timeout state my fate. Halfway through the exam I had an impending doom come down, telling myself “there is no way in hell I’m gonna make this”.

Some of the questions concerned technologies we don’t use in our environment, some (all right, most of them) had pretty tricky wording. I had to read them several times over to make sure the authors don’t catch me off guard this required a tremendous amount of concentration. Overall this exam tests your vSphere knowledge of vSphere thoroughly – configuration, storage, networking, resource management – such a happy concoction. Fortunately, some questions correlated to each other so if I contradicted myself on some of those, I stopped for a minute (and watch  the timer tick away) to re-think my approach to the question. Again, hands-on experience is invaluable in this exam.

In the end, I managed, and a huge boulder fell of my heart.

What’s next?

Passing this certification gave me such a huge boost that I decided to finally start a blog I’d just been rambling about for quite some time, become active in VMware Communities participation, and established another personal goal to try VCAP-DCD or DCA in the next year. It is a gret feeling of accomplishment that you stood the trial by fire, and I hope that all of you who will be undertaking this exam in the future will eventually feel the same way.

DELL Perc H710P Local Storage SSD RAID1 Benchmark

Recently we have equipped one of our ESXi hosts with local SSD storage (Product Number: LB806M) to host a database VM. For redundancy we have chosen RAID1. I have done a small benchmark to compare it to already present 4x 1,2TB 10k RPM (PN: ST1200MM0007) RAID10 array.

The RAID Controller serving the drives was DELL Perc H710P Mini (Dual processor, 1GB DDR3 NV Cache). I have used the IOmeter application with Access Specification File from my favorite tech-news aggregate site, TechPowerUp. I have run the test on a 1GB Chunk of data. Without further ado, here are the results (click on an image to enlarge):

Throughput Benchmark

Throughput Benchmark

IOPs Benchmark

IOPs Benchmark

Latency Benchmark

Latency Benchmark

Also, I’ve captured a few interesting screenshots from esxtop over the course of benchmarks. Notice that the controller doesn’t even break a sweat under that many IOPS:

Installing the Windows VM for benchmarking

Installing the Windows VM for benchmarking

Database Benchmark on the Mechanical Hard DRive and SSD running simultaneously.

Database Benchmark on the Mechanical Hard DRive and SSD running simultaneously.

Nice IOPS :)

Nice IOPS 🙂

Sequential Read Benchmark - the Controller Cache comes into play.

Sequential Read Benchmark – the Controller Cache comes into play.

Hope you enjoyed the numbers. See you around.

Create a Virtual Floppy with OSFmount utility

In the life of an IT administrator, you sometimes (or quite frequently) need to use the means of .iso files or .img virtual floppies to do your part of the job. There are many tools available on the web, but one that I really like is OSFmount, and I will show you why.

OSFmount is a handly little utility that lets you mount (even as a ramdisk) and modify the contents of virtual floppy images. Unfortunately .iso can be mounted only as a read-only media. I’ll show you how to make an empty floppy from scratch – you can follow the same steps to make an .iso

When you launch the application for the first time, an initial window shows up. Click on Mount New…

mountnew

You will see the following form pop up. Fill in the values as you require – keep the offset at 0 as we are creating a brand new image, and choose the drive size you’d like. Since we are creating a floppy, choose the mountpoint so – but you can mount it as hard drive or CD/DVD. Assume that the Image File location is blank at the moment and click the three dots to define the image’s location.

mountnewdrive

Enter the name of your desired image file – doesn’t matter if the object already exists or not, just hit Open and Windows will ask you if you want to create the file.

createfile

Now, the newly created image isn’t formatted, so right-click the newly mounted .img and do that.

defaultformat

And there you go! You just created a 10MB virtual floppy .img file for yourself, which you can use on remote management consoles where you need an attachable media you can R/W to. Don’t forget to unmount before you start using it. Pretty simple, right?

Book Review: Mastering VMware vSphere 5.5

Book Cover

Today I’ll be sharing a short review for the of the book, Mastering VMware vSphere 5.5 – Kindle Edition. I purchased it in order to enhance my knowledge, and in hope that it will help me be better prepared for the VCP510 exam. This book has  greatly fulfilled both of my expectations.

I read it from cover to cover, as I usually read all the technically oriented books in order to soak the most knowledge I can. The book is nicely written, with sidebars providing very useful knowledge from working experience. As can be expected from a book that contains “Mastering” in its name, it focuses on every single aspect of vSphere. Starting from the basics in each of its chapters, smoothly transferring to an in-depth level. There is a step-by-step walk-through for every action that is a subject of the chapter along with screenshots. Technically complex matters are displayed graphically to enhance your imagination, which is always nice.

The paperback version has 840 pages – this translates to roughly 15 hours of reading if you don’t just want to skim through the book or just look for enlightenment in certain chapters. But it is certainly good to read the book bit by bit – I guess reading it all in one sitting would result in an information overload 🙂

I wish I could sum up each element this book covers, but I’d be just typing out all vSphere features and aspects such as networking and storage and their underlying components – that many of you are already familiar with. If you have some vSphere experience – or you grasp the concept of virtualization with some IT background and seek a book that will get you started with vSphere, would like to know how certain things work “under the hood”, or just want to see how the authors tackled some real-world scenarios, this book is for you.

As for myself, I read this book with some ~5 months of last-level operator experience and learned many new things. For example did you know that during vMotion the memory snapshot of the source VM is sent to the destination ESXi host in clear text? Or that while using NFS connection to your datastores, only one of the two uplinks being used for the data transfer, even though LACP is used? And there’s much, much more. While sweating in the exam room answering the VCP questions, I recalled what I read in this book many times. Even though this book is revolving around vSphere 5.5 and its features, it is tremendeously helpful for use with earlier versions of vSphere 5.x.

I heartily recommend this book to everyone who wants to learn more about the vast amount features in vSphere. Be it people recently introduced to virtualization, or seasoned vSphere operators who would like to know more. It will even help you to prepare for VCP if you take your time and indulge in the chapters.

Debugging Machine Check Errors (MCEs)

There comes a time where a hardware failure on one of your ESXi hosts is imminent. You can recognize that when the host crashes while under a certain CPU or Memory intensive load – or even at random. Most of the times without throwing a Purple Screen of Death so you can at least have a notion about what went wrong. There is a VMware KB Article 1005184 concerning this issue, and it has been updated significantly since I have started to take interest in these errors.

UPDATE: I have published a new CPU Stress Test & Machine Check Error debugging article – check it out if you’d like to learn more.

If you are “lucky”, you can see and decode yourself what preceded the crash. This is because both AMD and Intel CPUs have implemented something by the name of Memory Check Architecture. This architecture enables the CPUs to intelligently determine a fault that happens anywhere on the data transfer path during processor operation. This can capture Memory operation errors, CPU Bus interconnect errors, cache errors, and much more. How to determine what has been causing your system to fail? Read on.

You will need to browse to Intel’s website hosting the Intel® 64 and IA-32 Architectures Software Developer Manuals. There, download a manual named “Intel 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes 3A, 3B, and 3C: System Programming Guide”. I highly recommend printing it, because you will be doing some back-and-forth seeking.

Now, to get list of possible Machine Check Errors captured by the VMkernel, run the following in your SSH session with superuser privileges:

cd /var/log;grep MCE vmkernel.log

this will output something similar to this:

Memory Controller Errors

 

Most of the times, the VMkernel decodes these messages for you – on this image you see that  there are plenty of Memory Controller Read Errors. You can see more closely where the problem originates from:

  • CMCI: This stands for Corrected Machine Check Interrupt – an error was captured but it was corrected and the VMkernel can keep on running. If this were to be an uncorrectalbe error, the ESXi host would crash.
  • Logical CPU number where the MCE was detected: This particular host had Dual 8-Core Intel Xeon Processors with HyperThreading enabled. For all other occurrences of this MCE, the cpu# was alternating between 0-15 this means the fault was always detected on the first cpu.
  • Memory Controller Read/Write/Scrubbing error on Channel x: Means that the error was captured on a certain channel of the physical processor’s NUMA node. Since there is a quad-channel memory controller used for this particular CPU, the channels would range from 0-3. This error is reported on Channel 1, which means one or both of the memory sticks on that channel are faulty.

You can turn on your hardware vendor’s support indicating that a component might be failing, or nudge them towards a certain component – but always make sure there is a support representative from VMware to back your findings up. Some companies don’t “trust” these error messages and if their diagnostics software doesn’t reveal the fault (in majority of cases, they don’t) and their engineers do not know about Memory Check Architecture – how it is implemented and whether to trust the error codes (they should). This is where a leverage from your VMware support engineer comes in very handy – speaking from my experience. In the end the memory stick replacement solved the issue – how I got to it being a memory problem will be explained in an upcoming article.

If you are curious what do these hexadecimal strings mean and would like to know how to decode them manually, here’s a short walk-through (This was captured on the same host, when it had scrubbing errors)

  • You have to convert the Status string from Hexadecimal to Binary

Status:0xcc001c83000800c1 Misc:0x9084003c003c68c Addr:0x112616bc40  — Valid.Overflow.Misc valid.Addr valid.

  • Convert the Status hex value to Binary and split it according to Figure 15-6 in the manual

1 1 0 0 1 1 0 0 0 00 0000000011100000 0 0011 0000000000001000 0000 0000 1100 0001

  •  Note down the last bits:

VAL — MCi_STATUS register valid (63) = TRUE
OVER — Error overflow (62) – TRUE , corresponds with Valid.Overflow.Misc valid.Addr valid
UC — Uncorrected error (61) – FALSE
EN — Error reporting enabled (60) – FALSE
PCC – FALSE
0000000011100000 how many errors were corrected = 224 errors

  • Note the first 16 bits

MSCOD: 0000 0000 1100 0001

  •  Compare the code bits according to table 15-6

UC = FALSE and PCC FALSE, therefore: ECC in caches and memory

  •  Decode the compound Code and compare it to the examples found in table 15.9.2

Therefore, the compound error code is “Memory Controller Errors”

MMM = 100
CCCC = 0001
{100}_channel{0001}_ERR

  •  From there, decode this according to table 15-13:

Memory Controller Scrubbing Error on Channel 1

Pretty easy, right? Let me give you another MCE example – This was captured from an ESXi host that eventually had 2 faulty memory modules, but was only acknowledged by the manufacturer when they had exceeded the Corrected ECC threshold. BIOS marked them as inactive after running memtest 86+ on them for 20 hours since that error was detected – the integrated diagnostics utility revealed nothing. I’ll provide a quicker debug here:

 1 1 0 0 1 1 0 0 0 00 0000000000001110 0 0000 0000000000000001 0000 0000 1001 1111

  •  VAL – MCi_STATUS register Valid – TRUE
  • OVER – Error overflow – TRUE
  • UC – Uncorrected Error FALSE
  • EN – Error reporting enabled FALSE
  • MISCV TRUE
  • ADDRV TRUE
  • PCC FALSE
  • S FALSE
  • THRESHOLD – FALSE

MCE CODE: 0000 0000 1001 1111

This code relates to error in: ECC in caches and memory

After debug:
{001}_channel{1111}_ERR
Memory Read Error on an Unspecified Channel

I hope this article has shed some light for you concerning the Machine Check Error architecture. I’m open for discussion about this topic and even some MCEs you had in the comments.