vSAN 6.6.1: vSAN Build Recommendation Engine Health issue [RESOLVED]

In my previous post about vSAN Build Recommendation Engine Health test, I have concluded that it was a bug in vSAN 6.6.1 that prevented vSAN Health service from properly connecting to the Internet via proxy.

With vCenter Server Appliance 6.5 Update 1d release, I have noticed that one of two warning messages disappeared from the vSphere Web Client leaving that task in the ‘Unexpected vSphere Update Manager (VUM) baseline creation failure‘ state.

After checking vSAN configuration one more, I concluded the following:

  • Internet connectivity for automatic updates of the HCL database has been set up properly (vSAN_Cluster > Configure > vSAN > General):


  • The HCL database is up-to-date and CEIP is enabled (vSAN_Cluster > Configure > vSAN > Health and Performance):



  • Update Manager has proxy settings configured and working (vSAN_Cluster > Update Manager > Go to Admin View > Manage > Settings > Download Settings):



At the same time, the proxy server replaces SSL certificates with its own one signed by the corporate CA when establishing HTTPS connection with the remote peer.

As a result, it causes an error message for the vSAN Build Recommendation Engine Health task as follows (extract from vmware-vsan-health-service.log):

INFO vsan-health[Thread-49] [VsanVumConnection::RemediateVsanClusterInVum] build = {u’release’: {u’baselineName’: u’VMware ESXi 6.5.0 U1 (build 5969303)’, u’isoDisplayName’: u’VMware ESXi Release 6.5.0, Build 5969303′, u’bldnum’: 5969303, u’vcVersion’: [u’6.5.0′], u’patchids’: [u’ESXi650-Update01′], u’patchDisplayName’: u’VMware ESXi 6.5.0 U1 (vSAN 6.6.1, build 5969303)’}}

INFO vsan-health[Thread-49] [VsanVumConnection::_LookupPatchBaseline] Looking up baseline for patch VMware ESXi 6.5.0 U1 (vSAN 6.6.1, build 5969303) (keys: [40], hash: None)…

INFO vsan-health[Thread-49] [VsanVumConnection::_LookupPatchBaseline] Looking up baseline for patch vSAN recommended patch to be applied on top of ESXi 6.5 U1: ESXi650-201712401-BG (keys: [], hash: None)…

ERROR vsan-health[Thread-49] [VsanVumConnection::RemediateAllClusters] Failed to remediate cluster ‘vim.ClusterComputeResource:domain-c61’

Traceback (most recent call last):
File “/usr/lib/vmware-vpx/vsan-health/pyMoVsan/VsanVumConnection.py”, line 1061, in RemediateAllClusters
performScan = performScan)
File “/usr/lib/vmware-vpx/vsan-health/pyMoVsan/VsanVumConnection.py”, line 876, in RemediateVsanClusterInVum
patchName, patchMap[chosenRelease])
File “/usr/lib/vmware-vpx/vsan-health/pyMoVsan/VsanVumConnection.py”, line 373, in CreateBaselineFromOfficialPatches
baseline = self._LookupPatchBaseline(name, keys)
File “/usr/lib/vmware-vpx/vsan-health/pyMoVsan/VsanVumConnection.py”, line 411, in _LookupPatchBaseline
result = bm.QueryBaselinesForUpdate(update = updateKeys)
File “/usr/lib/vmware-vpx/pyJack/pyVmomi/VmomiSupport.py”, line 557, in <lambda>
self.f(*(self.args + (obj,) + args), **kwargs)
File “/usr/lib/vmware-vpx/pyJack/pyVmomi/VmomiSupport.py”, line 362, in _InvokeMethod
list(map(CheckField, info.params, args))
File “/usr/lib/vmware-vpx/pyJack/pyVmomi/VmomiSupport.py”, line 883, in CheckField
raise TypeError(‘Required field “%s” not provided (not @optional)’ % info.name)
TypeError: Required field “update” not provided (not @optional)

INFO vsan-health[Thread-49] [VsanVumSystemUtil::AddConfigIssue] Add config issue createBaselineFailed

INFO vsan-health[Thread-49] [VsanVumConnection::_DeleteUnusedBaselines] Deleting baseline VMware ESXi 6.5.0 U1 (vSAN 6.6.1, build 5969303) (id 424) because it is unused

INFO vsan-health[Thread-49] [VsanVumSystemUtil::VumRemediateAllClusters_DoWork] Complete VUM check for clusters [‘vim.ClusterComputeResource:domain-c61’]

ERROR vsan-health[Thread-49] [VsanVumConnection::RemediateAllClusters] Failed to remediate cluster

Following the community advice, I decided to add Root CA and subordinate CA certificates (in *.pem format) to the local keystore on vCenter Server Appliance. After copying certificates to /etc/ssl/certs and running the c_rehash command, I added proxy servers to /etc/sysconfig/proxy and rebooted the server.


To test that new configuration works, I used the wget command, and it all seemed to work smoothly.


Regardless of all that changes, I still got error messages with the vSAN Build Recommendation Engine Health test, but this time they looked a bit different:

INFO vsan-health[Thread-11125] [VsanPyVmomiProfiler::InvokeAccessor] Invoke: mo=ServiceInstance, info=content

WARNING vsan-health[Thread-11125] [VsanPhoneHomeWrapperImpl::_try_connect] Cannot connect to VUM. Will retry connection

INFO vsan-health[Thread-11125] [VsanPyVmomiProfiler::InvokeAccessor] Invoke: mo=group-d1, info=name

INFO vsan-health[Thread-11125] [VsanPyVmomiProfiler::InvokeAccessor] Invoke: mo=group-d1, info=name

ERROR vsan-health[Thread-11125] [VsanCloudHealthDaemon::run] VsanCloudHealthSenderThread exception: Exception: HTTP Error 411: Length Required, Url: https://vcsa.vmware.com/ph/api/dataapp/send?_v=1.0&_c=VsanCloudHealth.6_5&_i=<Support_Tag&gt;, Traceback: Traceback (most recent call last):
File “/usr/lib/vmware-vpx/vsan-health/pyMoVsan/VsanCloudHealthUtil.py”, line 511, in getResponse
resp = proxyOpener.open(*args, **kwargs)
File “/usr/lib/python2.7/urllib2.py”, line 435, in open
response = meth(req, response)
File “/usr/lib/python2.7/urllib2.py”, line 548, in http_response
‘http’, request, response, code, msg, hdrs)
File “/usr/lib/python2.7/urllib2.py”, line 473, in error
return self._call_chain(*args)
File “/usr/lib/python2.7/urllib2.py”, line 407, in _call_chain
result = func(*args)
File “/usr/lib/python2.7/urllib2.py”, line 556, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 411: Length Required
Traceback (most recent call last):
File “/usr/lib/vmware-vpx/vsan-health/pyMoVsan/VsanCloudHealthDaemon.py”, line 353, in run
self._sendCloudHealthData(clusterUuid, data=data)
File “/usr/lib/vmware-vpx/vsan-health/pyMoVsan/VsanCloudHealthDaemon.py”, line 321, in _sendCloudHealthData
objectId=clusterUuid, additionalUrlParams=additionalUrlParams)
File “/usr/lib/vmware-vpx/vsan-health/pyMoVsan/VsanCloudHealthConnector.py”, line 156, in send
dataType=dataType, pluginType=pluginType, url=postUrl)
File “/usr/lib/vmware-vpx/vsan-health/pyMoVsan/VsanCloudHealthConnector.py”, line 139, in sendRawData
raise ex
VsanCloudHealthHTTPException: Exception: HTTP Error 411: Length Required, Url: https://vcsa.vmware.com/ph/api/dataapp/send?_v=1.0&_c=VsanCloudHealth.6_5&_i=<Support_Tag&gt;, Traceback: Traceback (most recent call last):
File “/usr/lib/vmware-vpx/vsan-health/pyMoVsan/VsanCloudHealthUtil.py”, line 511, in getResponse
resp = proxyOpener.open(*args, **kwargs)
File “/usr/lib/python2.7/urllib2.py”, line 435, in open
response = meth(req, response)
File “/usr/lib/python2.7/urllib2.py”, line 548, in http_response
‘http’, request, response, code, msg, hdrs)
File “/usr/lib/python2.7/urllib2.py”, line 473, in error
return self._call_chain(*args)
File “/usr/lib/python2.7/urllib2.py”, line 407, in _call_chain
result = func(*args)
File “/usr/lib/python2.7/urllib2.py”, line 556, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 411: Length Required

INFO vsan-health[Thread-11125] [VsanCloudHealthDaemon::run] VsanCloudHealthSenderThread done.

vsan-health[Thread-9] [VsanCloudHealthDaemon::_sendExceptionsToPhoneHome] Exceptions for collection/sending exceptions

I thought that the vSAN Health service might try to contact vSphere Update Manager directly, and the proxy settings set on the OS level redirected this request to the Internet proxy instead.

I have added the local domain to the exception list in /etc/sysconfig/proxy and rebooted the server again.


After reading about ‘HTTP Error 411’, the only idea was to add a domain service account and its password to HTTP_PROXY and HTTPS_PROXY lines in /etc/sysconfig/proxy. If the password has special characters, they should be added in ASCII encoding to work correctly.

To my great surprise, all communication issues have been resolved, and the vSAN Health service was able to synchronise data with vSphere Update Manager and online services correctly.



A few minutes later vSAN system baselines and baseline groups appeared in vSphere Update Manager.


Of cause, those modifications in Photon OS configuration files are not supported by VMware and could be overwritten by future updates. Yet I hope engineers and developers are working on better integration between vSAN Health and vSphere Update Manager when vCenter resides behind the proxy.

23/02/2018 – Update 1: Per VMware documentation, a starting point to troubleshoot connectivity to the CEIP web server is to make sure the following prerequisites are met:

vSAN 6.5: Virtual Machine with more than 64GB memory fails to Storage vMotion to vSAN cluster

VMware has just posted an article in the Virtual Blocks blog which describes this behaviour. It happens only when trying to Storage vMotion a virtual machine with a swap file larger than 64GB to the vSAN datastore.

The task fails and generates the following error messages:


There are two possible workarounds available: either increase the swap file maximum size on the destination ESXi host or set a reservation of memory on the virtual machine. The former one is more preferable, as it does not require host reboot.

VMware provides a KB 2150316 with “more log samples and specifics for identifying the issue as a cause of a migration failure”.

vSAN 6.5-6.6.1: An urgent hotfix ESXi650-201710401

VMware has just released a new hotfix for ESXi and vSAN (KB 2151081) urging customers with all-flash configuration with deduplication enabled to upgrade their environment as soon as possible. This patch resolves data corruption issue which might appear in rare circumstances.


The affected versions of vSAN include 6.5, 6.6, and 6.6.1.

06-10-2017 – Update 1: As listed in KB 2151042, similar issue has been fixed for ESXi 6.0.

vSAN 6.6.1: vSAN Build Recommendation Engine Health fails

As you might already know, vSAN 6.6.1 is the first release with automated build recommendations for vSAN clusters for vSphere Update Manager, which should help to keep your hardware in a supported state by comparing information from the VMware Compatibility Guide and vSAN Release Catalog with information about the installed ESXi releases.

Obviously, this feature requires vSAN to have Internet access to update release metadata, as well as valid My VMware credentials to download ISO images for upgrades.

To help customers with enabling vSAN build recommendations, VMware embedded some health checks into vSAN 6.6.x that contribute to resolve configuration issues. The build recommendation engine health check detects the following states:

  • Internet access is unavailable.
  • vSphere Update Manager (VUM) is disabled or is not installed.
  • VUM is not responsive.
  • vSAN release metadata is outdated.
  • My VMware login credentials are not set.
  • My VMware authentication failed.
  • Unexpected VUM baseline creation failure.

If the virtual environment seats behind the proxy, you should configure proxy settings in the Internet Connectivity option in vSAN_ClusterConfigure > vSAN > General.

vSAN Health Engine Issue - 02

Those parameters are kept in /etc/vmware-vsan-health/config.conf. Be careful with the user password, as it is added to this file without any encryption.

To test access through the proxy, you can click on the Get latest version online button in vSAN_ClusterConfigure > Health and Performance to update the HCL Database. If everything setup correctly, it will generate the following lines in /var/log/vmware/vsan-health/vmware-vsan-health-service.log:

INFO vsan-health[ID] [<user_name> op=UpdateHclDbFromWeb obj=VsanHealthService] Update HCL database from Web
INFO vsan-health[ID] [VsanHclUtil::_getHttpResponse] Download via proxy

However, even if the Internet connection works, the vSAN Build Recommendation Engine Health test will produce a warning message as follows:

vSAN Health Engine Issue - 01

In the log file you will see lines like these:

WARNING vsan-health[healthThread-c3ad57ea-a3f1-11e7] [VsanCloudHealthUtil::checkNetworkConnection] Internet is not connected.

File “/usr/lib/vmware-vpx/vsan-health/pyMoVsan/VsanCloudHealthDaemon.py”, line 337, in run
File “/usr/lib/vmware-vpx/vsan-health/pyMoVsan/VsanCloudHealthDaemon.py”, line 279, in collectedResults
File “/usr/lib/vmware-vpx/vsan-health/pyMoVsan/VsanCloudHealthDaemon.py”, line 230, in updateManifestWithPerCluster
File “/usr/lib/vmware-vpx/vsan-health/pyMoVsan/VsanCloudHealthDaemon.py”, line 190, in _updateManifest
manifestVersion = cls._queryManifestVersion()
File “/usr/lib/vmware-vpx/vsan-health/pyMoVsan/VsanCloudHealthDaemon.py”, line 174, in _queryManifestVersion
dataType=’manifest_version’, objectId=MANIFEST_VERSION_UUID)
File “/usr/lib/vmware-vpx/vsan-health/pyMoVsan/VsanCloudHealthConnector.py”, line 209, in getClusterHealth
maxRetries=maxRetries, waitInSec=waitInSec)
File “/usr/lib/vmware-vpx/vsan-health/pyMoVsan/VsanCloudHealthConnector.py”, line 247, in getObject
responseBody = self._getPhoneHomeResultsWithRetries(urlParams)
File “/usr/lib/vmware-vpx/vsan-health/pyMoVsan/VsanCloudHealthConnector.py”, line 279, in _getPhoneHomeResultsWithRetries
raise e
VsanCloudHealthConnectionException: <urlopen error [Errno 110] Connection timed out>

Apparently, it is a bug in the current version of vSAN that is documented in the VMware KB 2151692. Neither fix nor workaround is available at the time of writing this blog post.

07/02/2018 – Update 1: A workaround to resolve this issue has been found.

vSphere 6.x: The beauty and ugliness of the Content Library – Part 1

The title of this blog post seems to be a bit provocative, and this has been done for a reason.

I believe many VMware engineers, including myself, were really excited about the Content Library feature introduced in vSphere 6.0. The product itself is not completely new for VMware, as it merges code from the content management feature of vCloud Director.

In What’s New in the VMware vSphere 6.0 Platform whitepaper, VMware states the following:

“The Content Library… centrally manages virtual machine templates, ISO images, and scripts, and it performs the content delivery of associated data from the published catalog to the subscribed catalog at other sites.”

Sounds really cool! Now we can centralise all objects that were previously residing on different datastores in one place, and manage them from vSphere Web Client.

In vSphere 6.5, VMware continues improving and polishing this feature:

“Administrators can now mount an ISO directly from the Content Library, apply a guest OS customization specification during VM deployment, and update existing templates.”

However, this article is not only about embracing the tool provided. 🙂 I would like to share with you three specific examples when it doesn’t work as expected, and possible workarounds.

Issue #1 – Provisioning a virtual machine template with the advanced parameters

Affected platform: vSphere 6.0 prior to Update 3.

It was a great surprise to know that provisioning a virtual machine from a VM template which has advanced parameters set can cause any problems in vSphere 6.0. Although the provisioning operation starts as expected, it shows an error message “Failed to deploy OVF package” at the end of it.


Unfortunately, the Error Report in vSphere Web Client wouldn’t be able to clarify the root cause of this event.


After contacting VMware GSS about this issue (SR # 16255562909) in late 2016, I had been advised that this bug would be addressed in vSphere 6.0 Update 3.

In March 2017 I updated my environment to this version and tested this feature, the VM creation was working smoothly. So it took almost two years for VMware since the Content Library feature was generally available to fix it.

Gladly, vSphere 6.5 does not have this problem at all.

Resolution: Update your environment to vSphere 6.0 Update 3 or newer version.

Issue #2 – Provisioning a virtual machine from the Content Library on the vSAN datastore

Affected platform: vSphere 6.5 Standard.

The issue is not related to the Content Library directly, rather to OVA/OVF provisioning. For some reason, when you create a new VM from the template in vSphere 6.5, it triggers “Call DRS for cross vMotion placement recommendations” task.

If you use vSphere 6.5 Standard, for which the DRS feature is not available, it causes this task to fail with the error message “The operation failed due to The operation is not allowed in the current state.”



The Error Report in vSphere Web Client looks similar to one in the picture below.


In the Known Issues in VMware vSAN 6.6 Release Notes, the vendor states the following:

VM OVF deploy fails if DRS is disabled
If you deploy an OVF template on the vSAN cluster, the operation fails if DRS is disabled on the vSAN cluster. You might see a message similar to the following: The operation is not allowed in the current state.

Workaround: Enable DRS on the vSAN cluster before you deploy an OVF template.

After doing some troubleshooting and trying different scenarios, the only difference with the provisioning task I was able to identify was the VM storage policy. Regardless the way the VM creation was initiated (from the OVA/OVF file, or Content Library template), it was the Virtual SAN Default Storage Policy call for the DRS to perform a cross vMotion check.

For example, if you set the VM storage policy in the Select storage dialogue box to “None”, the OVA/OVF file can be provisioned on the vSAN datastore.


The same happens for the VM template from the Subscribed Content Library when the VM storage policy is “None”.

Unfortunately, this trick doesn’t work with the templates in the Local Content Library.

So I decided to dig a bit dipper into the Content Library structure to see if anything can be done there.

The Content Library keeps its data in the contentlib-GUID folder. Each template has its own subfolder with the unique name. Inside the subfolder, there are few files: a descriptor (*.ovf) and one or more data files (*.vmdk).

In vSphere 6.0 those files are named as descriptor_GUID.ovf and disk-vdcs-Disk_Number_GUID.vmdk.

With vSphere 6.5 the files are self-explanatory: Template_Name_GUID.ovf and Template_Name-Disk_Number_GUID.vmdk.


I compared the descriptor files for the VM templates in the Local and Subscribed Content Libraries, and found they had different vmw:name values in the StorageGroupSection. For the Local Content Library it was a “Virtual SAN Default Storage Policy”, and for the subscribed one it was different.


It all led me to the idea of changing this descriptor for the VM template in the Local Content Library. So I could provision the VMs using one of the workarounds below.


  • When provision from the OVA/OVF file, set the VM storage policy in the Select storage dialogue box as “None”,
  • You can provision from the Subscribed Content Library if it has the VM templates with the VM storage policy different from the “Virtual SAN Default Storage Policy”. Set the VM storage policy in the Select storage dialogue box as “None”,
  • You can provision from the Local Content Library if you edit the descriptor file for the VM template and replace the “Virtual SAN Default Storage Policy” with something else. Set the VM storage policy in the Select storage dialogue box as “None”.

Resolution: The support case has been opened, and I am waiting for VMware to resolve this issue. The ETA for this to be fixed is in vSphere 6.5 Update 1 (please refer to SR # 17393663302 when contacting VMware GSS for the future updates).

To be continued

Configuring PERC H730/730p cards for VMware vSAN 6.x

One of the necessary steps to create a new VMware vSAN cluster is to configure the RAID controller.

I have found Joe’s post about setting up Dell PERC H730 cards very informative and easy to follow. However, the latest generation of Dell’s PowerEdge servers has a slightly different configuration interface. So I decided to document configuration process using the BIOS graphical interface.

You can get into it either pressing an F2 key during the server boot or choosing a BIOS Setup option in the Next Boot drop-down menu in the iDRAC Virtual Console.


The next step is to click on the Device Settings and select the RAID controller from the list of available devices.



There are two configuration pages that we should be interested in, as follows:

  • Controller Management > Advanced Controller Management
  • Controller Management > Advanced Controller Properties.

The former gives us ability to switch from RAID mode to HBA mode.


The latter allows disabling the controller caching and setting the BIOS Boot mode.


Please note the system reboot is required for the change to take effect. It is always a good idea to double check that the parameters above were setup correctly.