vSphere 6.x: SEsparse snapshot may cause guest OS file system corruption

Early this month, VMware published a KB 59216 named ‘Virtual Machines running on a SEsparse snapshot may report guest data inconsistencies’.

As per the vendor’s documentation, ‘SEsparse is a snapshot format introduced in vSphere 5.5 for large disks, and is the preferred format for all snapshots in vSphere 6.5 and above with VMFS-6‘. On VMFS-5 and NFS datastores, the SEsparse format is used for virtual disks that are 2 TB or larger; whereas on VMFS-6, SEsparse is the default format for all snapshots.

The knowledge base article states that the issue affects vSphere 5.5 and later versions. As of today, it has been fixed only in VMware ESXi 6.7 Update 1, with the Express Patches pending for VMware ESXi 6.0 and 6.5.

How is this related to your production environment? Well, it depends…

For example, when the backup software creates a system snapshot and it coexists with the operating system (OS) experiencing ‘a burst of non-contiguous write IO in a very short period of time‘, this can potentially trigger the data corruption. There might be other scenarios when a snapshot is used during the OS or software upgrades.

While waiting for a permanent solution, VMware provides a workaround that requires disabling SEsparse IO coalescing on each affected host. The advanced setting that controls IO Coalescing (COW.COWEnableIOCoalescing) is not available through the vSphere Client:

ESXi-SEspare-Issue-01

In spite of that, you can always determine and change its value via PowerCLI:

Get-VMHost | Get-AdvancedSetting -Name “COW.COWEnableIOCoalescing” | select Entity,Name,Value | Format-Table -AutoSize

Get-VMHost | Get-AdvancedSetting -Name “COW.COWEnableIOCoalescing” | ? {$_.Value -eq “1”} | Set-AdvancedSetting -Value “0”

Note: After disabling the IO coalescing, all virtual machines resided on that host ‘must be power-cycled or migrated (vMotion) to other hosts that have the config option set‘.

VMware states there will be a performance penalty when disabling IO coalescing and ‘the extent of degradation depends on the individual virtual machine workload‘.

Note: ‘After patches are released, the workaround needs to be rolled back to regain performance benefits of IO coalescing‘.

[IMPORTANT] VMware ESXi 6.x: Denial-of-service vulnerability in 3D-acceleration feature

This week VMware published a security advisory VMSA-2018-0025 about the denial-of-service vulnerability in the 3D-acceleration feature in VMware ESXi, Workstation, and Fusion.

VM3DSupport-Issue-01

It affects all versions of those products if 3D-acceleration feature is enabled for virtual machines (VMs). This is a default setting for all VMs on VMware Workstation and Fusion and might be an issue for the VMs managed by VMware Horizon.

More information about this issue can be found here.

At the moment of writing this article, there were no patches or updates provided by VMware to mitigate this problem. So a workaround would be to disable the 3D-acceleration feature for affected systems.

To identify the VMs that have the 3D-acceleration feature enabled, I wrote the following PowerCLI script:

As soon as the permanent solution provided by the vendor, I will update this blog post with more information.

URGENT: VMDKs residing on vSAN 6.6 and later that have been extended may encounter data inconsistencies [RESOLVED]

Last week VMware published a KB 58715 reporting virtual machine disks residing on vSAN 6.6 and later that have been extended may encounter data inconsistencies. For those who subscribed to VMware email communications, the following message has been sent recently.

vSAN66-Issue-01

As stated in the article, this issue might happen in a rare occurrence. Still, VMware encourages their clients to check the value of the advanced setting VSAN.ClomEnableInplaceExpansion on all ESXi hosts that are part of the vSAN cluster. If it is set to the default value of “1”, the vendor recommends changing it to “0” immediately. This can be done using the following PowerCLI command:

Foreach ($VMHost in (Get-Cluster -Name (Read-Host “Cluster Name”) | Get-VMHost)) {Get-AdvancedSetting -Entity $VMHost -Name VSAN.ClomEnableInplaceExpansion | Where-Object {$_.Value -ne ‘0’} | Set-AdvancedSetting -Value ‘0’ -Confirm:$false}

Fortunately, no reboot or service restart is required for this change to take effect, and it will become effective within 60 seconds.

It is good to see how much effort the vendor put into supporting vSAN and proactively inform users about any problems. Great service, VMware!

04/10/2018 – Update 1: VMware has realeased patches for both vSAN 6.6 and 6.7 that remediate this issue. Please read the resolution section in KB 58715 for more information.