Archives For HP

If you are importing the HP Smart Array (hpsa) controller driver in Update Manager (VUM), be aware of the following issue, especially if your VUM repository already contains a previous version of the HP hpsa driver.

Recently, I imported the latest HP Smart Array (hpsa version 5.0.0.60-1) driver in VUM, created a new baseline and attached it to hosts. I noticed the compliance for this new baseline changed to green/compliant right away, though I knew these hosts did not have the latest hpsa driver update. After scanning the hosts again, the baseline still showed as green and the details showed “Not Applicable”. When I took a further look at the VUM repository, it appeared that the old version of the hpsa driver, version 5.0.0.44, which I then recalled was in the repository before I imported the newer version, was no longer there. There was only one HP hpsa host extension that could be seen in the VUM repository GUI, and its release date was consistent with the hpsa 5.0.0.60-1 driver.  It was almost as if the two hpsa driver versions had merged in VUM.

The root cause? It appears that the HP hpsa patch name and patch ID in the metadata was the same across various hpsa driver version releases, and had not been made unique by the vendor (hpsa driver for ESX, Patch ID hpsa-500-5.0.0 for multiple versions). In addition, VUM did not warn me that I was trying to import a patch with a patch name and patch ID that was already in the VUM repository.

Screen Shot 2014-07-10 at 12.05.20 AM

Since Update Manager 5.0 does not let you remove items from the repository, the solution that is often proposed is to reinstall Update Manager and start clean. However, I was provided with a workaround for the issue that did not require an immediate reinstall of VUM, though eventually a reinstall will be required to clean up the repository. Hopefully, HP and VMware engineers will make the improvement needed to prevent this type of issue (and make it easier to remove items from the repository).

Workaround – Before you import the latest hpsa driver bundle, extract the files from the bundle. Tweak the following lines in the .xml files below. Zip the files again.

For example, for the hpsa 5.0.0.60-1 bundle, modify the tag in the following two files (ie. add the exact version number to the end of the id):

• hpsa-500-5.0.0-1874739\hpsa-500-5.0.0-offline_bundle-1874739\metadata\vmware.xml
• hpsa-500-5.0.0-1874739\hpsa-500-5.0.0-offline_bundle-1874739\metadata\bulletins\ hpsa-500-5.0.0.xml

Screen Shot 2014-07-10 at 12.05.41 AM

As you import the modified bundle in VUM, you should be able to see in the import wizard if this worked because the patch ID you see in the wizard should match the patch ID you assigned in the steps above.  Try these steps in the lab, use at your own risk, and let me know if you find any issues with the workaround.

Speaking of the HP Smart Array controller ESXi driver, make sure to check out this advisory if you have not seen it already: http://bit.ly/1mVFHQK

Advertisements

If you run VMware on HP Proliant servers, then you are probably familiar with http://vibsdepot.hp.com.  In addition to HP customized VMware ESXi ISOs and software bundles, this site also has what HP refers to as VMware firmware and software “recipes”.  The “recipes” list the drivers and firmware that HP recommends running along with a specified Service Pack for Proliant (SPP) and certain ESXi versions.  While applying newer firmware and drivers to HP Blade enclosures can be a pain, it’s a good idea to perform these updates 1-2 times a year since each SPP is only supported for 1 year.

Stacy’s Example:

In the following example, I used the September 2013 “recipe” to apply updates to HP C7000 Blade Enclosures that were already running ESXi 5.0 Update 2 hosts.  There is more than one way to apply these updates, but this is the method I found the easiest.

  • Each HP Blade Enclosure was updated one at a time.
  • For each enclosure, updates were applied to the Onboard Administrators, and Virtual Connect Flex-10 Ethernet modules, and the blades themselves.  (FC switches in enclosures handled separately)
  • Performed the steps detailed below for each enclosure.
  • Note: If your hosts have FC HBAs, check with your storage vendor as well to see if they support the new HBA firmware/drivers.
Blade Driver Updates – VUM
  • Created new VMware Update Manager (VUM) HP Extension/driver baselines based on the September 2013 HP “recipe” (vibsdepot.hp.com)   Reviewed host hardware for each cluster (ie looked at network adapters, RAID controllers, latest offline bundle, etc) to determine the appropriate drivers to include in the baselines.
  • Attached the appropriate baselines to appropriate clusters (again based on hardware for each cluster and the “recipe”, and scanned.
  • Placed all ESXi hosts in the enclosure to be updated in maintenance mode. (It’s great if you are able to shut down and update all blades in the enclosure at once, but not everyone will have this luxury)
  • Suspend alerting for hosts.
  • Remediated the hosts in the blade enclosure using the VUM baselines (Host Extensions).
Blade Firmware Updates – EFM
  • Used the Enclosure Firmware Management (EFM) feature to update blade firmware.  EFM can mount an SPP ISO via URL, where it is hosted on an internal server running IIS.  Prior to updating blade firmware, updated the SPP ISO on the IIS server and re-mounted the ISO in EFM.
  • Shutdown hosts (which were still in maintenance mode) using the vSphere client.
  • Once hosts were shutdown, used the HP EFM feature to manually apply firmware updates.
  • After the firmware updates completed (could take an hour), clicked on Rack Firmware in the OA and reviewed the current version/Firmware ISO version.
Virtual Connect (VC) and Onboard Administrator (OA) Updates – HPSUM from desktop
  • Temporarily disabled the Virtual Connect Domain IP Address (optional setting) in the Virtual Connect Manager in order for HPSUM to discover the Virtual Connects when the Onboard Administrator is added as a target (yes, HP bug workaround).
  • Ran HP SUM from the appropriate HP SPP from desktop.
  • Added Active OA hostname OR IP address as a target, chose Onboard Administrator as type.
  • Blade iLO interfaces, Virtual Connect Manager, and FC Switches were all discovered as associated targets by adding the OA.  For associated targets, de-selected everything except for the Virtual Connect Manager and clicked OK (the iLO interfaces for the blades were updated along with the rest of their firmware using the EFM, and the FC Switch firmware is handled separately).
  • The Virtual Connect Manager may then show as unknown in HPSUM.  Edited that target and changed target type to Virtual Connect, and entered the appropriate credentials.
  • After applying updates to the OAs and VCs, verified they updated to the correct firmware levels.
  • Re-enabled the Virtual Connect Domain IP Address setting.
  • Re-enabled alerting.

If you happen to run at least some of your ESXi clusters on blades, and you have multiple chassis/enclosures,  you may choose to distribute the hosts in these clusters across multiple enclosures.  This is a good idea, in general, for many environments.  After all, though this type of blade enclosure failure is rare, you don’t want a critical issue with a blade enclosure taking down an entire cluster.

Recently, I saw one of these rare events impact an enclosure, and it was not pretty.    For Sys Admins – You know, that feeling you get when alerts about multiple hosts being “down” come streaming into your inbox.  You think – which hosts, which cluster, perhaps even which site…any commonality?  In this case, it was due to the fact that the following bug had just hit an HP Enclosure:  http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c02623029&lang=en&cc=us&taskId=135&prodSeriesId=3794423

This enclosure was NOT extremely out-of-date in terms of firmware and drivers.  The firmware was at a February 2013 SPP level, and the hosts were built from the latest HP ESXi 5.0 U2 customized ISO.

Here is a summary of what was seen when troubleshooting the issue for the impacted enclosure:

  • Both the Onboard Administrators and the Virtual Connect Manager were still accessible – somewhat.  See next bullet.
  • Virtual Connect Manager could be logged into, but was slow to respond.
  • Virtual Connect Manager showed the “stacking link” in a critical state.
  • Virtual Connect Manager also showed the 10Gb aggregated (LAG) uplinks were in an active/passive state as opposed to active/active, which is how they were originally configured.
  • None of the hosts in the enclosure could be pinged.  That is, every single blade lost network connectivity.  They still had FC connectivity to the FC switches.
  • Some of the ESXi hosts were still running, and some had suffered PSODs as a result of the bug.
  • Hosts that were still up eventually saw themselves as “isolated”.  Since the isolation response was set to “shutdown”, impacted VMs (luckily, not that many) were shutdown and restarted on non-isolated hosts.
  • Exported a log from the Virtual Connect Manager, and HP helped to identify the blade triggering the bug.  Host was shutdown, and blade itself was also “reset”, however this did not restore normal functionality.
  • Reset one of the Virtual Connect modules. This restored network connectivity for some of the blades, but not all.
  • Some of the blades had to be rebooted in order for network connectivity to be completely restored.

My plan for preventing this bug from re-occurring on any enclosure, based on the HP advisory (and in general to update everything to the September 2013 HP “recipe” from vibsdepot.hp.com):

  • Using the Enclosure Firmware Management feature to apply the HP September 2013 Service Pack for Proliant (SPP) to each blade
  • Running HPSUM from the latest SPP to update the OA and VC firmware
  • Using Update Manager to apply recommended ESXi nic and HBA drivers, as well as the latest HP Offline bundle.

As a side note, it appears so far that a different, minor HP bug still remains even with these latest updates– as described in Ben Loveday’s blog post: http://bensjibberjabber.wordpress.com/2013/01/09/storage-alert-bug-running-vsphere-on-hp-bl465c-g7-blades/

Sigh….