One night we had a situation on our remote site that was running ESX 4.1.0 on a DELL PowerEdge T710 Server. It went to PSOD and then the RAID controller stated that it was unable to boot. The screen captures we got were:
And after a reboot, an unwelcoming screen was shown:
Fortunately, after another reboot the system booted just fine, however it was pretty obvious that the hardware itself was in a pretty unstable state. On iDRAC, we have discovered that we got a critical warning on a component (unfortunately it was late at night and I didn’t think about screenshotting that) with Bus IDs 03:0:0. Listing components via lspci revealed that the following component was sitting on the given ID:
03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS GEN2 Controller (rev 05)
Even if it was straightforward from the get-go which component might have been failing, it was double-confirmed by the very useful lspci command.
We have faced quite a strange issue with one of our Dell PowerEdge servers on a remote site. When the branded image was deployed on the host, we kept getting bootloops. The system has just started unloading modules after they were seemingly loaded. After inspecting the vmkernel log at boot-time by pressing ALT-F11, I have noticed a few strange warnings:
2014-11-24T04:13:50.237Z cpu2:2631)WARNING: ScsiScan: 1485: Failed to add path vmhba1:C0:T0:L0 : Not found
2014-11-24T04:15:08.990Z cpu7:2792)WARNING: ScsiScan: 1485: Failed to add path vmhba1:C0:T0:L0 : Not found
I have poked around the settings in BIOS to find out what could have been causing the issue that were seemingly coming from the RAID controller itself. I have changed the SATA to report as RAID opposed to AHCI which was set previously, and the next boot was successful.
This didn’t have any effect on already present drives or data because the only device that used the on-board storage controller was the DVD-ROM.
Many times I come across the question of HyperThreading and its benefits – either in personal computing, but more importantly over the last few years, virtualization. I’d like to talk about what HyperThreading is for a moment, and show you if it benefits the virtualized environment.
What is HyperThreading?
Today, you see HyperThreadng (HT) technology is present on almost every Intel processor, be it Xeon or Core i3/i5/i7 Series. Basically, it splits one physical core to two logical cores, but the term splitting is somewhat inaccurate and confuses many consumers. Thinking that when they run a 2.5GHz 4-core, HyperThreaded CPU, they immediately have 8 effective cores carrying the full processing capability of 20 GHz. Mainly because when you say you split something, you think that this has been divided to two equal parts (or at least that’s what I think, anyways). Continue reading →
We had a case on one of our ESXi hosts equipped with an Intel Corporation 82571EB Gigabit Ethernet Controller – although it was 1Gbit in speed, we were unable to achieve autonegotiation higher than 100 Mbit. When setting it manually to 1Gbit, the NIC disconnected itself from the network. Every other setting worked – 10 Mbit and 100Mbit both half and full duplex. We tried investigating with our Network Team, forcing 1Gbit on switch and that has also brought the NIC down.
I delved deeper into this issue and observed the VMkernel log via tail -f when I have forcibly disconnected the NIC and reconnected it again via esxcli. One line appeared that caught my attention:
vmnic6 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
e1000e 0000:07:00.0: vmnic6: Link Speed was downgraded by SmartSpeed
e1000e 0000:07:00.0: vmnic6: 10/100 speed: disabling TSO
I immediately caugt up on SmartSpeed and tried to find a way to disable it – that is until I have found out on many discussion threads later that SmartSpeed is an intelligent throtlling mechanism that is supposed to keep the connection running on various link speeds when an error somewhere on the link path is detected. The switches were working okay, the NIC didn’t detect any errors, so the next thing to be checked would be the cabling.
I arranged a cable check with the Data Center operators and what do you know – replacing cables for brand new ones eventually solved the issue! Sometimes the failing component causing you a headache for a good few hours can be a “mundane” piece of equipment such as patch cables.