vSphere 6.x: SEsparse snapshot may cause guest OS file system corruption

Early this month, VMware published a KB 59216 named ‘Virtual Machines running on a SEsparse snapshot may report guest data inconsistencies’.

As per the vendor’s documentation, ‘SEsparse is a snapshot format introduced in vSphere 5.5 for large disks, and is the preferred format for all snapshots in vSphere 6.5 and above with VMFS-6‘. On VMFS-5 and NFS datastores, the SEsparse format is used for virtual disks that are 2 TB or larger; whereas on VMFS-6, SEsparse is the default format for all snapshots.

The knowledge base article states that the issue affects vSphere 5.5 and later versions. As of today, it has been fixed only in VMware ESXi 6.7 Update 1, with the Express Patches pending for VMware ESXi 6.0 and 6.5.

How is this related to your production environment? Well, it depends…

For example, when the backup software creates a system snapshot and it coexists with the operating system (OS) experiencing ‘a burst of non-contiguous write IO in a very short period of time‘, this can potentially trigger the data corruption. There might be other scenarios when a snapshot is used during the OS or software upgrades.

While waiting for a permanent solution, VMware provides a workaround that requires disabling SEsparse IO coalescing on each affected host. The advanced setting that controls IO Coalescing (COW.COWEnableIOCoalescing) is not available through the vSphere Client:

ESXi-SEspare-Issue-01

In spite of that, you can always determine and change its value via PowerCLI:

Get-VMHost | Get-AdvancedSetting -Name “COW.COWEnableIOCoalescing” | select Entity,Name,Value | Format-Table -AutoSize

Get-VMHost | Get-AdvancedSetting -Name “COW.COWEnableIOCoalescing” | ? {$_.Value -eq “1”} | Set-AdvancedSetting -Value “0”

Note: After disabling the IO coalescing, all virtual machines resided on that host ‘must be power-cycled or migrated (vMotion) to other hosts that have the config option set‘.

VMware states there will be a performance penalty when disabling IO coalescing and ‘the extent of degradation depends on the individual virtual machine workload‘.

Note: ‘After patches are released, the workaround needs to be rolled back to regain performance benefits of IO coalescing‘.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s