URGENT: VMware Tools 10.3.0 was recalled

VMware has just announced that VMware Tools 10.3.0 was recalled due to a functional issue with 10.3.0 in ESXi 6.5.

VMware-Tools-1030-Issue

As per KB 57796, the VMXNET3 driver released with VMware Tools 10.3.0 can result in a Purple Diagnostic Screen (PSOD) or guest network connectivity loss in certain configurations. Those configurations include:

  • VMware ESXi 6.5 hosts
  • VM Hardware version 13
  • Windows 8/Windows Server 2012 or higher guest operating system (OS).

As a workaround, VMware recommends uninstalling VMware Tools 10.3.0 and then reinstalling VMware Tools 10.2.5 for affected systems.

The vendor will be releasing a revised version of the VMware Tools 10.3 family at some point in the future.

More information is available in VMSA-2018-0017.

vSphere 6.5: Switching to Native Drivers in ESXi 6.5

The Native Device Driver architecture is not something new. Since its introduction more than five years ago, VMware encourages their hardware ecosystem partners to work on developing native drivers. A list of supported hardware is growing with every major release of ESXi, with the company’s aim to deprecate the vmkLinux APIs and associated driver ecosystem completely in the future releases of vSphere.

The benefits of using the native drivers are as follows:

  • It removes the complexity of developing and maintaining Linux derived drivers,
  • It improves the system performance,
  • It frees from the functional limitations of Linux derived drivers,
  • It increases the stability and reliability of the hypervisor, as native drivers are designed specifically for VMware ESXi.

Saying that one of the steps when upgrading to a new version of vSphere is to check that the hardware supports native drivers. By default, if ESXi identifies a native driver for a device it will be loaded instead of Linux derived driver. However, it is not always a case, and you need to check whether native drivers are in use after the system upgrade.

Following steps in KB 1031534 and KB 1034674, you can pinpoint PCI devices and corresponding drivers loaded for each of them:

  • To identify a storage HBA (such as a fibre card or RAID controller), run this command:

# esxcfg-scsidevs -a

  • To identify a network card, run this command:

# esxcfg-nics -l

  • To list device state and note the hardware IDs, run this command:

# vmkchdev -l

The /etc/vmware/default.map.d/ folder on ESXi host contains a full list of map files referring to the native drivers available for your system.

ESXi-Native-Driver-01

To quickly identify the driver version, you can run this command:

# esxcli software vib list | grep <native_driver_name>

In addition, information about available vSphere Installation Bundles (VIBs) in vSphere 6.5 can be found via the web client or PowerCLI session:

  • To view all installed VIBs in vSphere Client (HTML5), open Configure > System > Packages tab in the host settings:

ESXi-Native-Driver-02

  • To view all installed VIBs in VMware Host Client, open Manage > Packages tab in the host settings:

ESXi-Native-Driver-03

  • To list all installed VIBs in PowerCLI, run this command:

# (Get-VMHost -Name ‘<host_name>‘ | Get-EsxCli).software.vib.list() | select Name,Vendor,Version | sort Name

Comparing findings above with information in the IO Devices section in VMware Hardware Compatibility List, you would be able to find out whether native drivers available for your devices, as well as the recommended combination of the driver and firmware, tested and supported by VMware.

It worth reading the release notes for the corresponding drivers and search any reference to it on VMware and the third-party vendors’ websites, in case there are any known issues or limitations that might affect how device function.

If everything seems good, it is time to enable the native driver following steps in KB 2147565:

# esxcli system module set –enabled=true –module=<native_driver_name>

This change requires a host reboot and a thorough testing afterwards. The following commands can be quite helpful when troubleshooting native drivers:

  • To get the driver supported module parameters, run this command:

# esxcfg-module -i <native_driver_name>

  • To get the driver info, run this command:

# esxcli network nic get -n <vmnic_name>

  • To get an uplink stats, run this command:

# esxcli network nic stats -n <vmnic_name>

31/08/2018 – Update 1: After some feedback provided, I have decided to list well-known issues with the native drivers that exist currently. They are as follows:

  • The Mellanox ConnectX-4/ConnectX-5 native ESXi driver might exhibit performance degradation when its Default Queue Receive Side Scaling (DRSS) feature is turned on (Reference: vSphere 6.7 Release Notes),
  • Native software FCoE adapters configured on an ESXi host might disappear when the host is rebooted (Reference: vSphere 6.7 Release Notes),
  • HP host with QFLE3 Driver Version 1.0.60.0 experienced a PSOD or stuck at “Shutting down device drivers…” shutdown or restart (Reference: KB 55088),
  • ESXi 6.5 Storage Performance Issues and Fix (Reference: Anthony Spiteri’s blog).

VMware ESXi 6.0-6.5: Low network receive throughput for VMXNET3 on Windows VM

VMware has just released a new KB 57358 named ‘Low receive throughput when receive checksum offload is disabled and Receive Side Coalescing is enabled on Windows VM‘. This requires attention when configuring the VMXNET3 adapter on Windows operating systems (OS). However, it only affects virtual environments with VMware ESXi 6.0 and ESXi 6.5 only.

VMware states that it happens when the following conditions are met:

  • Guest OS is Windows 2012 / Windows 8 or later
  • VM hardware version 11 or later
  • Virtual network adapter is VMXNET3
  • Receive Side Coalescing (RSC) is enabled on the VMXNET3 driver on the guest OS
  • Some or all of following receive checksum offloads have value Disabled or only Tx Enabled on the VMXNET3 driver on the guest operating system:
    • IPv4 Checksum Offload
    • TCP Checksum Offload (IPv4)
    • TCP Checksum Offload (IPv6)
    • UDP Checksum Offload (IPv4)
    • UDP Checksum Offload (IPv6).

This shouldn’t be a problem if the VMXNET3 driver has the default settings.

VMXNET3-RSC-Issue-00

For example, copying 3.4 gigabytes of data to the test VM via the 1Gbps link took me seconds.

VMXNET3-RSC-Issue-01

With TCP Checksum Offload (IPv4) set to Tx Enabled on the VMXNET3 driver the same data takes ages to transfer.

VMXNET3-RSC-Issue-02

VMware provides a workaround for this issue: you either need to disable RSC, if any of receive checksum offloads is disabled, or manually enable receive checksum offloads. The knowledge base includes the PowerShell commands that help to automate the latter.

vSphere 6.5: How to improve performance of the Content Library data transfers

I came across this subject by chance. However, it worth considering the KB 2112692 when planning for the content library design in vSphere 6.5.

For all operations related to content libraries that involve data transfer, which cannot be conducted by the ESXi hosts, VMware suggests the following:

  • Switch the traffic to run through non-secure HTTP connection. This is to improve the transfer speed for synchronisation tasks.

This can be done by enforcing HTTP connection for Library Sync operations for all the libraries (global approach) or on particular libraries (individual approach) in your vCenter Server instance.

At the time of writing this, there is no support for editing the Content Library Service settings via vSphere Client (HTML5). All changes can be implemented using the vSphere Web Client though.

To set this policy globally, you need to change ‘Force HTTP for Library Sync’ option in Administration > Deployment > System Configuration > Services > Content Library Service to true. This applies immediately and restart is not required.

vSphere65-CL-Performance-01

With the individual approach, you need to edit the URL of the subscribed library to start with http, instead of https. Again, no need to restart the Content Library Service.

  • Bypass the rhttpproxy. This is to improve the transfer speed for synchronisation tasks, importing and exporting items.

This change applies globally. It requires editing the vCenter Server configuration file vdc.properties, changing the firewall settings to allow connections to TCP port 16666, and restarting the Content Library Service.

vSphere 6.5: Failed to deploy OVF package from the Content Library

Working on automating the virtual machine (VM) provisioning from an OVF template in a content library in vSphere 6.5 Update 2, I ran across an interesting behaviour. Some virtual machines were created without any issues, whereas others have been failing with the error message ‘Failed to deploy OVF package.’

OVF-Template-Issue-00

OVF-Template-Issue-01

In the vpxd.log on vCenter Server, this error message looks as follows:

info vpxd[7F83D22C5700] [Originator@6876 sub=Default opID=SelectResourcePageMediator-validate-335519-ngc:70031913-ae-71-7e-92-01] [VpxLRO] — ERROR task-51355 — TEST-VM-01 — ResourcePool.ImportVAppLRO: vim.fault.OvfImportFailed:
–> Result:
–> (vim.fault.OvfImportFailed) {
–> faultCause = (vim.fault.NotFound) {
–> faultCause = (vmodl.MethodFault) null,
–> faultMessage = (vmodl.LocalizableMessage) [
–> (vmodl.LocalizableMessage) {
–> key = “com.vmware.ovfs.ovfs-main.ovfs.object_not_found”,
–> arg = (vmodl.KeyAnyValue) [
–> (vmodl.KeyAnyValue) {
–> key = “0”,
–> value = “/TEST-VM-01/nvram”
–> }
–> ],
–> message = “The specified object /TEST-VM-01/nvram could not be found.”
–> }
–> ]
–> msg = “The specified object /TEST-VM-01/nvram could not be found.”
–> },
–> faultMessage = <unset>
–> msg = “”
–> }
–> Args:
–>

The NVRAM file contains information such as BIOS settings. As VMware states here, a new NVRAM file is created ‘when a virtual machine is migrated, migrated with vMotion, cloned or deployed from a template.’

Looking through the knowledge base articles on the vendor’s website, I have found the following hint in KB 2108718:

Error 2: Could not find the file

This issue occurs if the NVRAM file mentioned in the virtual machines configuration file(*.vmx) is not available anymore.

As you might know, templates are stored in OVF format in the content library. Two files compose a template: an OVF descriptor (.ovf) and a virtual disk file (.vmdk). During the VM provisioning phase, the settings listed in the OVF descriptor form the virtual machine’s configuration file (.vmx). So my assumption was that one of those settings caused a conflict that prevented vSphere from creating those VMs.

When we clone a template to the content library, only minimum required settings are stored in the OVF descriptor. However, there is an option available to include extra configuration.

OVF-Template-Issue-02

I decided to compare a vanilla OVF descriptor with the one with extra settings. While doing this comparison, one particular line in the file with extra configuration caught my eye:

<vmw:ExtraConfig ovf:required=”false” vmw:key=”firmware” vmw:value=”efi”/>

For some reason, the provisioning task is not able to succeed in creating new virtual machines with the EFI firmware from the template in vSphere 6.5. This issue happens regardless of the guest operating system version and content library type (local or subscribed).

Surprisingly, I do not see any problem with deploying VMs with the EFI firmware from the content library in vSphere 6.0. I need some time to test this functionality in vSphere 6.7, before opening a support request with VMware.

Workaround: Feel free to use a PowerCLI script below to set the firmware type to EFI after provisioning a new VM.

 

VMware Tools 10.3.0: Additional considerations when updating Windows machines

Last week VMware released VMware Tools version 10.3.0.

VMware-Tools-1030-00

Not only does it include new features and a security update to address an out-of-bounds read vulnerability, it also introduces a significant change in a way VMware Tools install to Windows operating systems. In some circumstances, VMware Tools 10.3.0 installation or upgrade can even fail, as documented in VMware KB 55798.

Please bear in mind that the installer size for Windows has almost doubled with this release (for example, the 64-bit version has 72.6 MB in size compared to 47.4 MB for version 10.2.5), as it includes the following additional software:

  • Microsoft Visual C++ 2017 Redistributable packages (both 32- and 64-bit versions),

VMware-Tools-1030-02

  • VMware AppDefence (disabled by default).

VMware-Tools-1030-03

As a result, VMware Tools require to reboot the operating system at least once before actually proceed with the software install or upgrade.

VMware-Tools-1030-01

To reduce the maintenance window, the vendor’s recommendations are as follows:

  • Upgrade Windows with latest service pack available from Microsoft and install the Microsoft Visual C++ 2017 Redistributable manually before installing or upgrading to VMware Tools 10.3.x.

Note: If installing Microsoft Visual C++ 2017 Redistributable is not possible, consider installing the Windows Update KB2999226 manually to reduce the need for system restart in versions earlier to Windows 10.

  • When VMware Tools installation or upgrade is invoked with REBOOT=ReallySuppress argument and a system restart is required for completing Microsoft Visual C++ 2017 Redistributable install, re-attempt the VMware Tools installation or upgrade after restarting the Windows system. vSphere client can detect this situation by noticing no change in VMware Tools version and guestinfo.toolsInstallErrCode=3010 in the guest variables or in the advanced configuration of the virtual machine.

Note: When VMware Tools installation or upgrade is invoked without any arguments, a system restart may occur automatically to complete Microsoft Visual C++ 2017 Redistributable install. After Windows system restarts, re-attempt the VMware Tools installation or upgrade.

Also, VMware has enabled the receive data ring support for the VMXNET3 driver in Windows with this update. Thorough testing is required to understand how this change influences the virtual NIC performance.

07/09/2018 – Update 1: VMware has recalled VMware Tools 10.3.0 due to a functional issue with 10.3.0 in ESXi 6.5. More information is available here.