VMware Tools 10.3.0: Additional considerations when updating Windows machines

Last week VMware released VMware Tools version 10.3.0.

VMware-Tools-1030-00

Not only does it include new features and a security update to address an out-of-bounds read vulnerability, it also introduces a significant change in a way VMware Tools install to Windows operating systems. In some circumstances, VMware Tools 10.3.0 installation or upgrade can even fail, as documented in VMware KB 55798.

Please bear in mind that the installer size for Windows has almost doubled with this release (for example, the 64-bit version has 72.6 MB in size compared to 47.4 MB for version 10.2.5), as it includes the following additional software:

  • Microsoft Visual C++ 2017 Redistributable packages (both 32- and 64-bit versions),

VMware-Tools-1030-02

  • VMware AppDefence (disabled by default).

VMware-Tools-1030-03

As a result, VMware Tools require to reboot the operating system at least once before actually proceed with the software install or upgrade.

VMware-Tools-1030-01

To reduce the maintenance window, the vendor’s recommendations are as follows:

  • Upgrade Windows with latest service pack available from Microsoft and install the Microsoft Visual C++ 2017 Redistributable manually before installing or upgrading to VMware Tools 10.3.x.

Note: If installing Microsoft Visual C++ 2017 Redistributable is not possible, consider installing the Windows Update KB2999226 manually to reduce the need for system restart in versions earlier to Windows 10.

  • When VMware Tools installation or upgrade is invoked with REBOOT=ReallySuppress argument and a system restart is required for completing Microsoft Visual C++ 2017 Redistributable install, re-attempt the VMware Tools installation or upgrade after restarting the Windows system. vSphere client can detect this situation by noticing no change in VMware Tools version and guestinfo.toolsInstallErrCode=3010 in the guest variables or in the advanced configuration of the virtual machine.

Note: When VMware Tools installation or upgrade is invoked without any arguments, a system restart may occur automatically to complete Microsoft Visual C++ 2017 Redistributable install. After Windows system restarts, re-attempt the VMware Tools installation or upgrade.

Also, VMware has enabled the receive data ring support for the VMXNET3 driver in Windows with this update. Thorough testing is required to understand how this change influences the virtual NIC performance.

vCenter Converter 6.2: Error 1053: The service did not respond to the start or control request in a timely fashion

I couldn’t believe something as simple as vCenter Converter Agent installation would cause me a headache. The service was properly registered on the remote machine. However, it was failing to start with the error message as follows:

Converter-Issue-01

For a moment I thought about any dependencies that VMware vCenter Converter Standalone Agent service could have, but it was a false assumption.

Converter-Issue-02

Looking for any hint from the community, I thought it might worth time to check the release notes for VMware vCenter Converter Standalone. To my great surprise, this issue has been documented as the known issue:

Converter-Issue-03

Who can imagine that the Internet access can be a requirement for this tool?

Gladly, it can be fixed with a workaround provided by VMware:

  1. Open HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control in the Windows Registry Editor and create or modify a ServicesPipeTimeout DWORD32 decimal value to at least 300000 (5 minutes).
  2. Restart the system for the changes to take effect.

vSphere 6.0: The operation failed due to The operation is not allowed in the current state.

Working on the PowerCLI script to automate VM provisioning in vSphere 6.0, I ran across the following error message:

New-VM-Issue-02

The script was simply creating a new virtual machine from the Content Library template.

New-VM -Name $vmName -VMHost $vmHost -Location $vmLocation -Datastore $vmDatastore -ContentLibraryItem $vmTemplate

Investigating this behaviour further, I have found that it was related to a failing cross vMotion placement. In vSphere Client those errors were looking like this:

New-VM-Issue-01

It was then I concluded it might be related to DRS being disabled for that cluster. Setting DRS to the Manual mode resolved this issue.

The next step was to check whether this problem could be reproduced for other templates. Surprisingly, the VM provisioning from some of them went smoothly, regardless of the DRS status.

Remembering about my previous articles (here and here) related to the Content Library, I decided to have a look into the VM configuration files (*.ovf) for both templates to compare. The only difference was related to the following storage settings which exist for the faulty templates:

<vmw:StorageGroupSection ovf:required=”false” vmw:id=”group1″ vmw:name=”EMC VNX VM Storage Policy”>
<Info>Storage policy for group of disks</Info>
<vmw:Description>The EMC VNX VM Storage Policy storage policy group</vmw:Description>
</vmw:StorageGroupSection>

<vmw:StorageSection ovf:required=”false” vmw:group=”group1″>
<Info>Storage policy group reference</Info>
</vmw:StorageSection>

As expected, removing those block of data from the OVF-file and re-uploading it to the Content Library resolved provisioning issue completely.

A couple of notes:

  • There is no way to identify any problems with the VM configuration using the Content Library configuration page or ‘New VM from Library…’ dialogue in vCenter Web Client.
  • Even if a modified OVF-file has been uploaded to the local Content Library, this change is not going to propagate to the subscribed ones. You need to repeat this process for all subscribed Content Libraries manually.

vSAN 6.6.1: Replacing a faulty NIC on Dell PowerEdge server

Not long ago I have noticed a 10G network interface flipping on one of the vSAN nodes.

vSAN-NIC-issue-01

I immediately started investigating this issue. An interesting thing was that this device was part of the integrated NIC on the server, and only it was generating connection errors.

vSAN-NIC-issue-02

After consulting with the Networks, we found the following:

  • The interface was up,
  • The port connecting to the switch had no traffic (can send to the server, but not receiving from the server),
  • No errors were recorded,
  • SFP signals were good.

The plan of attack was to replace SFPs – first on the server and, if it didn’t help, on the switch. During this operation we’ve found out the SFP on the server side was unusually warm. Unfortunately, replacing SFPs didn’t help and after approximately 15 minutes of complete silence, disconnects continued.

The next move was to contact a vendor. In our case, it was Dell EMC.

We’ve lodged a support request and sent the SupportAssist Collection to them. The response from the Support was to replace an embedded NIC with a new one. Considering the server was in use, it all sounded tricky to me.

However, thanks to the new algorithm which assigns device names for I/O devices beginning in ESXi 5.5, it all went smoothly. VMware states the following:

vSAN-NIC-issue-03

The number of ports on the embedded NIC hasn’t changed. As a result, hypervisor assigned the same aliases to the onboard ports.

ESXi initialised new ports and vSAN configuration was updated successfully without any human interaction.

As a bonus, when the server was booting after the card replacement, Lifecycle Controller detected an older version of firmware on the device and initiated a firmware update operation automatically.

vSAN-NIC-issue-04

All in all, I am impressed by how robust modern platforms both from Dell EMC and VMware.

ESXi 6.5: Retrieve IPMI SEL request to host failed [FIXED BY VENDOR]

From time to time you might want to check the host hardware health manually in Monitor>Hardware Health (vSphere Client) or Monitor>Hardware Status (vSphere Web Client).

For many months this functionality has been broken for ESXi 6.5 on DellEMC servers.

vSphere Web Client - IPMI Error

When opening the Sensors page, vpxd.log shows the following message:

info vpxd[7FBE59924700] [Originator@6876 sub=vpxLro opID=dam-auto-generated: HardwareStatusViewMediator:dr-425:CimMonitorPropertyProvider:200359:14133-31991-ngc:70004153-e9] [VpxLRO] — BEGIN task-35318 — healthStatusSystem-34 — vim.host.HealthStatusSystem.FetchSystemEventLog

error vpxd[7FBE59924700] [Originator@6876 sub=MoHost opID=dam-auto-generated: HardwareStatusViewMediator:dr-425:CimMonitorPropertyProvider:200359:14133-31991-ngc:70004153-e9] No Content-Length header, WSMan IPMI SEL operation failed

info vpxd[7FBE59924700] [Originator@6876 sub=MoHost opID=dam-auto-generated: HardwareStatusViewMediator:dr-425:CimMonitorPropertyProvider:200359:14133-31991-ngc:70004153-e9] WSMan Msg size 59: part:401 Unauthorized
–> WWW-Authenticate: Basic realm=”OPENWSMAN”)l▒\x7f

warning vpxd[7FBE59924700] [Originator@6876 sub=Default opID=dam-auto-generated: HardwareStatusViewMediator:dr-425:CimMonitorPropertyProvider:200359:14133-31991-ngc:70004153-e9] Closing Response processing in unexpected state: 3

info vpxd[7FBE59924700] [Originator@6876 sub=vpxLro opID=dam-auto-generated: HardwareStatusViewMediator:dr-425:CimMonitorPropertyProvider:200359:14133-31991-ngc:70004153-e9] [VpxLRO] — FINISH task-35318

info vpxd[7FBE59924700] [Originator@6876 sub=Default opID=dam-auto-generated: HardwareStatusViewMediator:dr-425:CimMonitorPropertyProvider:200359:14133-31991-ngc:70004153-e9] [VpxLRO] — ERROR task-35318 — healthStatusSystem-34 — vim.host.HealthStatusSystem.FetchSystemEventLog: vmodl.fault.SystemError:
–> Result:
–> (vmodl.fault.SystemError) {
–> faultCause = (vmodl.MethodFault) null,
–> faultMessage = <unset>,
–> reason = “Retrieve IPMI SEL request to host failed”
–> msg = “”
–> }
–> Args:
–>

Many people were pointing to vpxa.cfg (here and here) as a source of the error:

<log>
<level>verbose</level>
<maxFileNum>10</maxFileNum>
<maxFileSize>1048576</maxFileSize>
<memoryLevel>verbose</memoryLevel>
<outputToConsole>false</outputToConsole>
<outputToFiles>false</outputToFiles>
<outputToSyslog>true</outputToSyslog>
<syslog>
<facility>local4</facility>
<ident>Vpxa</ident>
<logHeaderFile>/var/run/vmware/vpxaLogHeader.txt</logHeaderFile>
</syslog>
</log>

It was not the end of the world, and I didn’t want to edit default log levels manually. So the issue was ignored for a while.

To my great surprise, it all went back to normal after updating hypervisor to the latest version using Dell EMC customised VMware ESXi 6.5 U1 A10 image.

Now, we can see multiple events in vpxd.log generated by VpxLRO:

info vpxd[7FBE58B08700] [Originator@6876 sub=vpxLro opID=combined(dam-auto-generated: ObjectTabbedViewMediator:dr-519,dam-auto-generated: ObjectPropertyFilter:dr-521):01-e6] [VpxLRO] — BEGIN lro-490638 — ResourceModel — cis.data.provider.ResourceModel.query

info vpxd[7FBE58B08700] [Originator@6876 sub=vpxLro opID=combined(dam-auto-generated: ObjectTabbedViewMediator:dr-519,dam-auto-generated: ObjectPropertyFilter:dr-521):01-e6] [VpxLRO] — FINISH lro-490638

info vpxd[7FBE58B08700] [Originator@6876 sub=vpxLro opID=combined(dam-auto-generated: ObjectPropertyFilter:dr-529,dam-auto-generated: ObjectPropertyFilter:dr-533):01-86] [VpxLRO] — BEGIN lro-490639 — ResourceModel — cis.data.provider.ResourceModel.query

info vpxd[7FBE58B08700] [Originator@6876 sub=vpxLro opID=combined(dam-auto-generated: ObjectPropertyFilter:dr-529,dam-auto-generated: ObjectPropertyFilter:dr-533):01-86] [VpxLRO] — FINISH lro-490639

info vpxd[7FBE5B45A700] [Originator@6876 sub=vpxLro opID=dam-auto-generated: ObjectPropertyFilter:dr-529:AssociationHostSystemAdapter:200359:14388-32550-ngc:70004210-ce] [VpxLRO] — BEGIN lro-490640 — HostProfileManager — vim.profile.ProfileManager.findAssociatedProfile

info vpxd[7FBE5B45A700] [Originator@6876 sub=vpxLro opID=dam-auto-generated: ObjectPropertyFilter:dr-529:AssociationHostSystemAdapter:200359:14388-32550-ngc:70004210-ce] [VpxLRO] — FINISH lro-490640

info vpxd[7FBE5A236700] [Originator@6876 sub=vpxLro opID=dam-auto-generated: RelatedItemsManager:dr-535:01-78] [VpxLRO] — BEGIN lro-490641 — ResourceModel — cis.data.provider.ResourceModel.query

info vpxd[7FBE5A236700] [Originator@6876 sub=vpxLro opID=dam-auto-generated: RelatedItemsManager:dr-535:01-78] [VpxLRO] — FINISH lro-490641
2018-04-12T14:02:41.702+08:00 info vpxd[7FBE5A236700] [Originator@6876 sub=vpxLro opID=dam-auto-generated: HardwareStatusViewMediator:dr-545:01-d9] [VpxLRO] — BEGIN lro-490642 — ResourceModel — cis.data.provider.ResourceModel.query

info vpxd[7FBE5A236700] [Originator@6876 sub=vpxLro opID=dam-auto-generated: HardwareStatusViewMediator:dr-545:01-d9] [VpxLRO] — FINISH lro-490642

info vpxd[7FBE5ACCB700] [Originator@6876 sub=vpxLro opID=urn:vmomi:HostSystem:host-28:9a78adfb-4c75-4b84-8d9a-65ab2cc71e51.properties:01-c1] [VpxLRO] — BEGIN lro-490643 — ResourceModel — cis.data.provider.ResourceModel.query

info vpxd[7FBE5ACCB700] [Originator@6876 sub=vpxLro opID=urn:vmomi:HostSystem:host-28:9a78adfb-4c75-4b84-8d9a-65ab2cc71e51.properties:01-c1] [VpxLRO] — FINISH lro-490643

info vpxd[7FBE5A53C700] [Originator@6876 sub=vpxLro opID=dam-auto-generated: HardwareStatusViewMediator:dr-545:CimMonitorPropertyProvider:200359:14395-32555-ngc:70004212-2b] [VpxLRO] — BEGIN task-35322 — healthStatusSystem-28 — vim.host.HealthStatusSystem.FetchSystemEventLog

info vpxd[7FBE5A53C700] [Originator@6876 sub=vpxLro opID=dam-auto-generated: HardwareStatusViewMediator:dr-545:CimMonitorPropertyProvider:200359:14395-32555-ngc:70004212-2b] [VpxLRO] — FINISH task-35322

As a result, the ‘Refresh hardware IPMI System Event Log’ task completes successfully.

vSphere Web Client - IPMI Success