Archives For Virtual Connect

If you run VMware on HP Proliant servers, then you are probably familiar with http://vibsdepot.hp.com.  In addition to HP customized VMware ESXi ISOs and software bundles, this site also has what HP refers to as VMware firmware and software “recipes”.  The “recipes” list the drivers and firmware that HP recommends running along with a specified Service Pack for Proliant (SPP) and certain ESXi versions.  While applying newer firmware and drivers to HP Blade enclosures can be a pain, it’s a good idea to perform these updates 1-2 times a year since each SPP is only supported for 1 year.

Stacy’s Example:

In the following example, I used the September 2013 “recipe” to apply updates to HP C7000 Blade Enclosures that were already running ESXi 5.0 Update 2 hosts.  There is more than one way to apply these updates, but this is the method I found the easiest.

  • Each HP Blade Enclosure was updated one at a time.
  • For each enclosure, updates were applied to the Onboard Administrators, and Virtual Connect Flex-10 Ethernet modules, and the blades themselves.  (FC switches in enclosures handled separately)
  • Performed the steps detailed below for each enclosure.
  • Note: If your hosts have FC HBAs, check with your storage vendor as well to see if they support the new HBA firmware/drivers.
Blade Driver Updates – VUM
  • Created new VMware Update Manager (VUM) HP Extension/driver baselines based on the September 2013 HP “recipe” (vibsdepot.hp.com)   Reviewed host hardware for each cluster (ie looked at network adapters, RAID controllers, latest offline bundle, etc) to determine the appropriate drivers to include in the baselines.
  • Attached the appropriate baselines to appropriate clusters (again based on hardware for each cluster and the “recipe”, and scanned.
  • Placed all ESXi hosts in the enclosure to be updated in maintenance mode. (It’s great if you are able to shut down and update all blades in the enclosure at once, but not everyone will have this luxury)
  • Suspend alerting for hosts.
  • Remediated the hosts in the blade enclosure using the VUM baselines (Host Extensions).
Blade Firmware Updates – EFM
  • Used the Enclosure Firmware Management (EFM) feature to update blade firmware.  EFM can mount an SPP ISO via URL, where it is hosted on an internal server running IIS.  Prior to updating blade firmware, updated the SPP ISO on the IIS server and re-mounted the ISO in EFM.
  • Shutdown hosts (which were still in maintenance mode) using the vSphere client.
  • Once hosts were shutdown, used the HP EFM feature to manually apply firmware updates.
  • After the firmware updates completed (could take an hour), clicked on Rack Firmware in the OA and reviewed the current version/Firmware ISO version.
Virtual Connect (VC) and Onboard Administrator (OA) Updates – HPSUM from desktop
  • Temporarily disabled the Virtual Connect Domain IP Address (optional setting) in the Virtual Connect Manager in order for HPSUM to discover the Virtual Connects when the Onboard Administrator is added as a target (yes, HP bug workaround).
  • Ran HP SUM from the appropriate HP SPP from desktop.
  • Added Active OA hostname OR IP address as a target, chose Onboard Administrator as type.
  • Blade iLO interfaces, Virtual Connect Manager, and FC Switches were all discovered as associated targets by adding the OA.  For associated targets, de-selected everything except for the Virtual Connect Manager and clicked OK (the iLO interfaces for the blades were updated along with the rest of their firmware using the EFM, and the FC Switch firmware is handled separately).
  • The Virtual Connect Manager may then show as unknown in HPSUM.  Edited that target and changed target type to Virtual Connect, and entered the appropriate credentials.
  • After applying updates to the OAs and VCs, verified they updated to the correct firmware levels.
  • Re-enabled the Virtual Connect Domain IP Address setting.
  • Re-enabled alerting.
Advertisements

If you happen to run at least some of your ESXi clusters on blades, and you have multiple chassis/enclosures,  you may choose to distribute the hosts in these clusters across multiple enclosures.  This is a good idea, in general, for many environments.  After all, though this type of blade enclosure failure is rare, you don’t want a critical issue with a blade enclosure taking down an entire cluster.

Recently, I saw one of these rare events impact an enclosure, and it was not pretty.    For Sys Admins – You know, that feeling you get when alerts about multiple hosts being “down” come streaming into your inbox.  You think – which hosts, which cluster, perhaps even which site…any commonality?  In this case, it was due to the fact that the following bug had just hit an HP Enclosure:  http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c02623029&lang=en&cc=us&taskId=135&prodSeriesId=3794423

This enclosure was NOT extremely out-of-date in terms of firmware and drivers.  The firmware was at a February 2013 SPP level, and the hosts were built from the latest HP ESXi 5.0 U2 customized ISO.

Here is a summary of what was seen when troubleshooting the issue for the impacted enclosure:

  • Both the Onboard Administrators and the Virtual Connect Manager were still accessible – somewhat.  See next bullet.
  • Virtual Connect Manager could be logged into, but was slow to respond.
  • Virtual Connect Manager showed the “stacking link” in a critical state.
  • Virtual Connect Manager also showed the 10Gb aggregated (LAG) uplinks were in an active/passive state as opposed to active/active, which is how they were originally configured.
  • None of the hosts in the enclosure could be pinged.  That is, every single blade lost network connectivity.  They still had FC connectivity to the FC switches.
  • Some of the ESXi hosts were still running, and some had suffered PSODs as a result of the bug.
  • Hosts that were still up eventually saw themselves as “isolated”.  Since the isolation response was set to “shutdown”, impacted VMs (luckily, not that many) were shutdown and restarted on non-isolated hosts.
  • Exported a log from the Virtual Connect Manager, and HP helped to identify the blade triggering the bug.  Host was shutdown, and blade itself was also “reset”, however this did not restore normal functionality.
  • Reset one of the Virtual Connect modules. This restored network connectivity for some of the blades, but not all.
  • Some of the blades had to be rebooted in order for network connectivity to be completely restored.

My plan for preventing this bug from re-occurring on any enclosure, based on the HP advisory (and in general to update everything to the September 2013 HP “recipe” from vibsdepot.hp.com):

  • Using the Enclosure Firmware Management feature to apply the HP September 2013 Service Pack for Proliant (SPP) to each blade
  • Running HPSUM from the latest SPP to update the OA and VC firmware
  • Using Update Manager to apply recommended ESXi nic and HBA drivers, as well as the latest HP Offline bundle.

As a side note, it appears so far that a different, minor HP bug still remains even with these latest updates– as described in Ben Loveday’s blog post: http://bensjibberjabber.wordpress.com/2013/01/09/storage-alert-bug-running-vsphere-on-hp-bl465c-g7-blades/

Sigh….

Last year, I stumbled upon a great post from VCDX/vExpert Blogger Chris Wahl about the steps required to upgrade firmware in a HP C7000 BladeSystem using HP SUM.  This is a simple option for updating the Onboard Administrators (OA), Virtual Connects (VC), or iLO firmware in an HP C7000 Blade enclosure.  However, after several months of using HP SUM to update enclosures, I noticed something strange.

The Issue:

HP SUM was no longer discovering the Virtual Connect modules in any of my enclosures when using the OAs as targets.  The rest of the enclosure components were being discovered just fine, and since the VCs were being discovered via the OA in HP SUM before, I wondered…what changed?

I confirmed that I could add the VCs separately as targets in HP SUM successfully.   I also confirmed that the behavior was seen when using various versions of HP SUM, including the October 2012 SPP release and the February 2013 SPP release.

The Cause:

After a bit of troubleshooting with HP, the cause was identified:  When a “Virtual Connect Domain IP Address” has been enabled for a Virtual Connect Domain, the VCs are no longer discovered when using the OA as a target.  If “Virtual Connect Domain IP Address” was unchecked, HP SUM was again able to discover the VCs by using the OA as a target.  Supposedly, a fix for this bug will be included in a future release of HP SUM.  Until then, a workaround of temporarily disabling the setting of adding the VCs separately in HP SUM can be used.

vc