vCSA 6.x: WinSCP fails with the error ‘Received too large SFTP packet’

Back to basics… When you try connecting to vCenter Server Virtual Appliance 6.x (vCSA) using WinSCP, the error message ‘Received too large (1433299822 B) SFTP packet‘ might appear.

vCSA6x-WinSCP-01

This is due to the configuration of vCSA when the default shell used for the root account set to the Appliance Shell.

To fix this issue, VMware recommends switching the vCSA 6.x to the Bash Shell. This can be done in the SSH session with the following command:

chsh -s /bin/bash root

Note: You need to log out from the Appliance Shell and log in back again for the changes to take effect.

[URGENT] vSAN 6.6.1: Potential data loss due to resynchronisation mixed with object expansion

Last week VMware released an urgent hotfix to remediate potential data loss in vSAN 6.6.1 due to resynchronisation mixed with object expansion.

This is a known issue affecting earlier versions of ESXi 6.5 Express Patch 9. The vendor states that a sequence of the following operations might cause it:

  1. vSAN initiates resynchronisation to maintain data availability.
  2. You expand a virtual machine disk (VMDK).
  3. vSAN initiates another resync after the VMDK expansion.

Detailed information about this problem is available in KB 60299.

If you are a vSAN customer, additional considerations are required before applying this hotfix:

  • If hosts have already been upgraded to ESXi650-201810001, you can proceed with this upgrade,
  • If hosts have not been upgraded to ESXi650-201810001, and if an expansion of a VMDK is likely, the in-place expansion should be disabled on all of them by setting the VSAN.ClomEnableInplaceExpansion advanced configuration option to ‘0‘.

The VSAN.ClomEnableInplaceExpansion advanced configuration option is not available in vSphere Client. I use the following one-liner scrips to determine and change its value via PowerCLI:

# To check the current status
Get-VMHost | Get-AdvancedSetting -Name “VSAN.ClomEnableInplaceExpansion” | select Entity, Name, Value | Format-Table -AutoSize

# To disable the in-place expansion
Get-VMHost | Get-AdvancedSetting -Name “VSAN.ClomEnableInplaceExpansion” | ? {$_.Value -eq “1”} | Set-AdvancedSetting -Value “0”

Note: No reboot is required after the change.

After hosts were upgraded to ESXi650-201810001 or ESXi650-201811002, you can set VSAN.ClomEnableInplaceExpansion back to ‘1‘ to enable the in-place expansion.

Windows Installer: MSI installation fails with the error status 1603

During the process of distributing an MSI package to the remote Windows Server 2012 R2 hosts via the Start-Process cmdlet, I ran across an interesting behaviour. In some cases, that MSI package was installed without any issues; in others, it was failing silently generating an event ID 10837 in the Application log.

With the verbose logging enabled, the following error message was observed in the MSI log file:

Installation success or error status: 1603.

The error status 1603 is documented on Microsoft Technet. However, none of those scenarios listed in the article applied to my case. I was able to install that MSI package locally with no issues, and the error popped up randomly when doing installation via PowerShell.

With more testing, I have realised the issue was only popping up when the user account, from which the script was running, had never previously log on to the target system.

I asked one of my colleagues, who has a better understanding of how Windows Installer works, to help with this case. After a thorough investigation, he pointed me to the following lines in the MSI log file:

MSI (s) (2C:C4) [02:22:15:584]: SECREPAIR: New Hash Database creation complete.
MSI (s) (2C:C4) [02:22:15:651]: SECREPAIR: CryptAcquireContext: Could not create the default key container
MSI (s) (2C:C4) [02:22:15:651]: SECREPAIR: Crypt Provider not initialized. Error:-2146892987

MSI (s) (2C:C4) [02:22:15:651]: SECUREREPAIR: Failed to CreateContentHash of the file: installer.msi: for computing its hash. Error: -2146892987
MSI (s) (2C:C4) [02:22:15:651]: SECREPAIR: Failed to create hash for the install source files
MSI (s) (2C:C4) [02:22:15:651]: Note: 1: 2262 2: SourceHash 3: -2147287038
MSI (s) (2C:C4) [02:22:15:651]: SECUREREPAIR: SecureRepair Failed. Error code: 8009034524E29A18
Action start 2:22:15: ProcessComponents.
The requested operation cannot be completed. The computer must be trusted for delegation and the current user account must be configured to allow delegation.

Apparently, in 2014 Microsoft released a security bulletin MS14-049 containing a patch to fix a vulnerability in the Windows Installer service. However, after you install this security update it breaks the MSI package installation. This is documented as a ‘Known issue 1’ in the bulletin and explained in more details here.

To resolve this issue, Microsoft recommends installing update 3000988.

Another option, which is documented in the same bulletin under the ‘Known issue 2’ section, is to opt-out the affected programs by using registry settings. However, this workaround implies more manual work and removes the defence-in-depth security feature for those programs.

I have tested those options and can confirm they both working. Hope this article saves you some time with troubleshooting a similar problem.

vSphere 6.7 Design: A new vSphere Upgrade section on vSphere Central

It is great to know that VMware continues to work on improving vSphere Central.

For those who are not familiar with this resource, vSphere Central provides information which compliments an official documentation in terms of the product features and best practices. It also helps with configuring and administering different components of vSphere 6.5 and 6.7.

vSphere Central library is divided into the following categories:

  • vCenter Server,
  • Security,
  • ESXi Host and Virtual Machine,
  • Resource Management and Availability,
  • Operations Management,
  • Developer and Automation Interfaces and the list is growing.

Now it includes a new section dedicated to upgrading to vSphere 6.7.

vSphere-Central-01

As per VMware, this document helps with vSphere upgrade, including pre-upgrade considerations, upgrade process and post-upgrade considerations.

It is worth reading it when planning for an upgrade, preparing for the official exams, or the VCDX defence. As a bonus, all content can be exported to the PDF format if needed with one click!

vSphere 6.x: SEsparse snapshot may cause guest OS file system corruption

Early this month, VMware published a KB 59216 named ‘Virtual Machines running on a SEsparse snapshot may report guest data inconsistencies’.

As per the vendor’s documentation, ‘SEsparse is a snapshot format introduced in vSphere 5.5 for large disks, and is the preferred format for all snapshots in vSphere 6.5 and above with VMFS-6‘. On VMFS-5 and NFS datastores, the SEsparse format is used for virtual disks that are 2 TB or larger; whereas on VMFS-6, SEsparse is the default format for all snapshots.

The knowledge base article states that the issue affects vSphere 5.5 and later versions. As of today, it has been fixed only in VMware ESXi 6.7 Update 1, with the Express Patches pending for VMware ESXi 6.0 and 6.5.

How is this related to your production environment? Well, it depends…

For example, when the backup software creates a system snapshot and it coexists with the operating system (OS) experiencing ‘a burst of non-contiguous write IO in a very short period of time‘, this can potentially trigger the data corruption. There might be other scenarios when a snapshot is used during the OS or software upgrades.

While waiting for a permanent solution, VMware provides a workaround that requires disabling SEsparse IO coalescing on each affected host. The advanced setting that controls IO Coalescing (COW.COWEnableIOCoalescing) is not available through the vSphere Client:

ESXi-SEspare-Issue-01

In spite of that, you can always determine and change its value via PowerCLI:

Get-VMHost | Get-AdvancedSetting -Name “COW.COWEnableIOCoalescing” | select Entity,Name,Value | Format-Table -AutoSize

Get-VMHost | Get-AdvancedSetting -Name “COW.COWEnableIOCoalescing” | ? {$_.Value -eq “1”} | Set-AdvancedSetting -Value “0”

Note: After disabling the IO coalescing, all virtual machines resided on that host ‘must be power-cycled or migrated (vMotion) to other hosts that have the config option set‘.

VMware states there will be a performance penalty when disabling IO coalescing and ‘the extent of degradation depends on the individual virtual machine workload‘.

Note: ‘After patches are released, the workaround needs to be rolled back to regain performance benefits of IO coalescing‘.

vSphere 6.x: The datastore file browser converts VMDKs to thick-provisioned when copy/move data to the target VMFS datastore

It came as a surprise to me that the datastore file browser in vSphere 6.5 (and all other versions) would convert VMDK disks to thick-provisioned when you copy or move them across from one VMFS datastore to another. This is a default behaviour, even if the initial VMDK file was thin-provisioned and the target datastore supports thin provisioning.

This can potentially cause an outage to the virtual machines resided on the Virtual Machine File System. Unfortunately, when you initiate a copy/move operation in the datastore file browser, the system doesn’t warn you about this change. So you need to remember about it and calculate the required disk space ahead of transferring data.

What is more interesting, I haven’t been able to find any reference to this in the official documentation to vSphere. It actually states quite opposite:

“Virtual disk files are moved or copied without format conversion.”

To illustrate the observed behaviour, I had created a virtual machine named TEST-VM with one thin-provisioned disk of 10 GB.

VMFS-Thick-Issue-01

After the VM was powered on, it reported the following data usage in the datastore file browser:

VMFS-Thick-Issue-02

The Inflate button on the image above indicates that “you can convert the thin disk to a virtual disk in thick provision format.”

The PowerCLI commands below helped me to show the used and provisioned space for TEST-VM:

Get-VM -Name “TEST-VM” | Select Name,@{N=’Used Space (GB)’;E={[math]::Round($_.UsedSpaceGB,2)}},@{N=’Prov. Space (GB)’;E={[math]::Round($_.ProvisionedSpaceGB,2)}} | Format-List

The result was as expected:

VMFS-Thick-Issue-03

I powered off the VM and copied the TEST-VM folder via the datastore file browser to another VMFS datastore.

VMFS-Thick-Issue-04

After this task completes, the resulted VMDK looked different:

VMFS-Thick-Issue-05

Please note the Inflate button is greyed out now. This means the virtual disk is thick provisioned.

I have looked on the Internet for any information and found a few community threads from 2009 discussing this issue – here and here. So the problem exists for a while. According to those threads, during the copy/move operation initiated via the datastore file browser, the underlying vmkfstools utility executes with the default settings creating a thick provisioned disk.

The only workaround is to use the following command to convert the VMDK to thin again:

vmkfstools -i <source.vmdk> <destination.vmdk> -d thin

If the intent is to replace the source thick-provisioned VMDK with a new thin-provisioned one, make sure to use vmkfstools utility for that operation. It will change names for both *.vmdk and *flat.vmdk files, as well as the extent description value in *.vmdk.

[IMPORTANT] VMware ESXi 6.x: Denial-of-service vulnerability in 3D-acceleration feature

This week VMware published a security advisory VMSA-2018-0025 about the denial-of-service vulnerability in the 3D-acceleration feature in VMware ESXi, Workstation, and Fusion.

VM3DSupport-Issue-01

It affects all versions of those products if 3D-acceleration feature is enabled for virtual machines (VMs). This is a default setting for all VMs on VMware Workstation and Fusion and might be an issue for the VMs managed by VMware Horizon.

More information about this issue can be found here.

At the moment of writing this article, there were no patches or updates provided by VMware to mitigate this problem. So a workaround would be to disable the 3D-acceleration feature for affected systems.

To identify the VMs that have the 3D-acceleration feature enabled, I wrote the following PowerCLI script:

As soon as the permanent solution provided by the vendor, I will update this blog post with more information.