[URGENT] vSAN 6.6.1: Potential data loss due to resynchronisation mixed with object expansion

Last week VMware released an urgent hotfix to remediate potential data loss in vSAN 6.6.1 due to resynchronisation mixed with object expansion.

This is a known issue affecting earlier versions of ESXi 6.5 Express Patch 9. The vendor states that a sequence of the following operations might cause it:

  1. vSAN initiates resynchronisation to maintain data availability.
  2. You expand a virtual machine disk (VMDK).
  3. vSAN initiates another resync after the VMDK expansion.

Detailed information about this problem is available in KB 60299.

If you are a vSAN customer, additional considerations are required before applying this hotfix:

  • If hosts have already been upgraded to ESXi650-201810001, you can proceed with this upgrade,
  • If hosts have not been upgraded to ESXi650-201810001, and if an expansion of a VMDK is likely, the in-place expansion should be disabled on all of them by setting the VSAN.ClomEnableInplaceExpansion advanced configuration option to ‘0‘.

The VSAN.ClomEnableInplaceExpansion advanced configuration option is not available in vSphere Client. I use the following one-liner scrips to determine and change its value via PowerCLI:

# To check the current status
Get-VMHost | Get-AdvancedSetting -Name “VSAN.ClomEnableInplaceExpansion” | select Entity, Name, Value | Format-Table -AutoSize

# To disable the in-place expansion
Get-VMHost | Get-AdvancedSetting -Name “VSAN.ClomEnableInplaceExpansion” | ? {$_.Value -eq “1”} | Set-AdvancedSetting -Value “0”

Note: No reboot is required after the change.

After hosts were upgraded to ESXi650-201810001 or ESXi650-201811002, you can set VSAN.ClomEnableInplaceExpansion back to ‘1‘ to enable the in-place expansion.

vSphere 6.5: Failed to deploy OVF package from the Content Library

Working on automating the virtual machine (VM) provisioning from an OVF template in a content library in vSphere 6.5 Update 2, I ran across an interesting behaviour. Some virtual machines were created without any issues, whereas others have been failing with the error message ‘Failed to deploy OVF package.’

OVF-Template-Issue-00

OVF-Template-Issue-01

In the vpxd.log on vCenter Server, this error message looks as follows:

info vpxd[7F83D22C5700] [Originator@6876 sub=Default opID=SelectResourcePageMediator-validate-335519-ngc:70031913-ae-71-7e-92-01] [VpxLRO] — ERROR task-51355 — TEST-VM-01 — ResourcePool.ImportVAppLRO: vim.fault.OvfImportFailed:
–> Result:
–> (vim.fault.OvfImportFailed) {
–> faultCause = (vim.fault.NotFound) {
–> faultCause = (vmodl.MethodFault) null,
–> faultMessage = (vmodl.LocalizableMessage) [
–> (vmodl.LocalizableMessage) {
–> key = “com.vmware.ovfs.ovfs-main.ovfs.object_not_found”,
–> arg = (vmodl.KeyAnyValue) [
–> (vmodl.KeyAnyValue) {
–> key = “0”,
–> value = “/TEST-VM-01/nvram”
–> }
–> ],
–> message = “The specified object /TEST-VM-01/nvram could not be found.”
–> }
–> ]
–> msg = “The specified object /TEST-VM-01/nvram could not be found.”
–> },
–> faultMessage = <unset>
–> msg = “”
–> }
–> Args:
–>

The NVRAM file contains information such as BIOS settings. As VMware states here, a new NVRAM file is created ‘when a virtual machine is migrated, migrated with vMotion, cloned or deployed from a template.’

Looking through the knowledge base articles on the vendor’s website, I have found the following hint in KB 2108718:

Error 2: Could not find the file

This issue occurs if the NVRAM file mentioned in the virtual machines configuration file(*.vmx) is not available anymore.

As you might know, templates are stored in OVF format in the content library. Two files compose a template: an OVF descriptor (.ovf) and a virtual disk file (.vmdk). During the VM provisioning phase, the settings listed in the OVF descriptor form the virtual machine’s configuration file (.vmx). So my assumption was that one of those settings caused a conflict that prevented vSphere from creating those VMs.

When we clone a template to the content library, only minimum required settings are stored in the OVF descriptor. However, there is an option available to include extra configuration.

OVF-Template-Issue-02

I decided to compare a vanilla OVF descriptor with the one with extra settings. While doing this comparison, one particular line in the file with extra configuration caught my eye:

<vmw:ExtraConfig ovf:required=”false” vmw:key=”firmware” vmw:value=”efi”/>

For some reason, the provisioning task is not able to succeed in creating new virtual machines with the EFI firmware from the template in vSphere 6.5. This issue happens regardless of the guest operating system version and content library type (local or subscribed).

Surprisingly, I do not see any problem with deploying VMs with the EFI firmware from the content library in vSphere 6.0. I need some time to test this functionality in vSphere 6.7, before opening a support request with VMware.

Workaround: Feel free to use a PowerCLI script below to set the firmware type to EFI after provisioning a new VM.