vCenter 6.7 Update 2: Error in creating a backup schedule

One of the improvements in vCenter 6.7 Update 2 includes Samba (SMB) protocol support for the built-in File-Based Backup and Restore. Excited about the news, I decided to test this functionality and backup data to the Windows share.

I filled in the backup schedule parameters in the vCenter Server Appliance Management Interface (VAMI) and pressed the Create button, when the following error message appeared: Error in method invocation module ‘util.Messages’ has no attribute ‘ScheduleLocationDoesNotExist’.

Puzzled with this message and not knowing which log file to inspect, I ran the following command in the local console session on the
vCenter Server Appliance (VCSA):

grep -i ‘ScheduleLocationDoesNotExist’ $(find /var/log/vmware/ -type f -name ‘*.log’)

The search results led me to /var/log/vmware/applmgmt/applmgmt.log where I found another clue:

2019-04-30T01:25:24.111 [2476]ERROR:vmware.appliance.backup_restore.schedule_impl:Failed to mount the cifs share //fileserver.company.local/Archive/VMware at /storage/remote/backup/cifs/fileserver.company.local/D4Ji3vNM/fmuCEc6m; Err: rc=32, stdOut:, stdErr: mount error(13): Permission denied
Refer to the mount.cifs(8) manual page (e.g. man mount.cifs)

At first, after some reading, I thought it was related to the SMB protocol version or the wrong security type for the server. So I decided to look for any security events on the file server.

In Windows Event Log, I saw the following:

After double-checking the NTFS and share permissions for the network share, I was confident that the user had permissions to access it and write data into it.

Run out of ideas, I was just looking into the official documentation and some blog posts to see if something was missing. What stroke me was no references to the domain name, neither in a UPN format nor in a form of sAMAccountName, in the backup server credentials in the Create Backup Schedule wizard.

It was easy for me to test if skipping the domain name would make any difference, and it did! The backup job worked like a charm and was completed successfully.

vSphere 6.5: Additional considerations when migrating to VMFS-6 – Part 2

In the Part 1 of this series, I was writing about the most common cases which might prevent a successful migration to VMFS-6. There is another one to cover.

For ESXi hosts that boot from a flash storage or from memory, a diagnostic core dump file can also be placed on a shared datastore. You won’t be able to un-mount this datastore without deleting a core dump first.

VMware recommends using an esxcli utility to view/edit the core dump settings. This also can be automated via PowerCLI.

To check if the core dump file exists and is active, please use the following code:

To delete an old configuration that points to the VMFS-5 datastore, the following script can help:

With this change made you would be able to continue migrating to VMFS-6 without any issue.

If you have any suggestions or concerns, feel free to share them in the comments below.

vCSA 6.x: WinSCP fails with the error ‘Received too large SFTP packet’

Back to basics… When you try connecting to vCenter Server Virtual Appliance 6.x (vCSA) using WinSCP, the error message ‘Received too large (1433299822 B) SFTP packet‘ might appear.

vCSA6x-WinSCP-01

This is due to the configuration of vCSA when the default shell used for the root account set to the Appliance Shell.

To fix this issue, VMware recommends switching the vCSA 6.x to the Bash Shell. This can be done in the SSH session with the following command:

chsh -s /bin/bash root

Note: You need to log out from the Appliance Shell and log in back again for the changes to take effect.

vSphere 6.x: The datastore file browser converts VMDKs to thick-provisioned when copy/move data to the target VMFS datastore

It came as a surprise to me that the datastore file browser in vSphere 6.5 (and all other versions) would convert VMDK disks to thick-provisioned when you copy or move them across from one VMFS datastore to another. This is a default behaviour, even if the initial VMDK file was thin-provisioned and the target datastore supports thin provisioning.

This can potentially cause an outage to the virtual machines resided on the Virtual Machine File System. Unfortunately, when you initiate a copy/move operation in the datastore file browser, the system doesn’t warn you about this change. So you need to remember about it and calculate the required disk space ahead of transferring data.

What is more interesting, I haven’t been able to find any reference to this in the official documentation to vSphere. It actually states quite opposite:

“Virtual disk files are moved or copied without format conversion.”

To illustrate the observed behaviour, I had created a virtual machine named TEST-VM with one thin-provisioned disk of 10 GB.

VMFS-Thick-Issue-01

After the VM was powered on, it reported the following data usage in the datastore file browser:

VMFS-Thick-Issue-02

The Inflate button on the image above indicates that “you can convert the thin disk to a virtual disk in thick provision format.”

The PowerCLI commands below helped me to show the used and provisioned space for TEST-VM:

Get-VM -Name “TEST-VM” | Select Name,@{N=’Used Space (GB)’;E={[math]::Round($_.UsedSpaceGB,2)}},@{N=’Prov. Space (GB)’;E={[math]::Round($_.ProvisionedSpaceGB,2)}} | Format-List

The result was as expected:

VMFS-Thick-Issue-03

I powered off the VM and copied the TEST-VM folder via the datastore file browser to another VMFS datastore.

VMFS-Thick-Issue-04

After this task completes, the resulted VMDK looked different:

VMFS-Thick-Issue-05

Please note the Inflate button is greyed out now. This means the virtual disk is thick provisioned.

I have looked on the Internet for any information and found a few community threads from 2009 discussing this issue – here and here. So the problem exists for a while. According to those threads, during the copy/move operation initiated via the datastore file browser, the underlying vmkfstools utility executes with the default settings creating a thick provisioned disk.

The only workaround is to use the following command to convert the VMDK to thin again:

vmkfstools -i <source.vmdk> <destination.vmdk> -d thin

If the intent is to replace the source thick-provisioned VMDK with a new thin-provisioned one, make sure to use vmkfstools utility for that operation. It will change names for both *.vmdk and *flat.vmdk files, as well as the extent description value in *.vmdk.

vSphere 6.5: Additional considerations when migrating to VMFS-6 – Part 1

For those who use the Virtual Machine File System (VMFS) datastores, one of the steps when upgrading to vSphere 6.5 is to migrate them to VMFS-6.

VMFS6-01

VMware provides a detailed overview of VMFS-6 on the StorageHub, as well as an example of how the migration from VMFS-5 can be automated using PowerCLI.

However, there are three edge cases that require extra steps to continue with the migration. They are as follows:

All those objects, if they exist, prevent the ESXi host from unmounting the datastore, and they need to be moved to a new location before migration continues. The required steps to relocate them will be reviewed in the paragraphs below.

Relocating the system swap

The system swap location can be checked and set via vSphere Client in Configure > System > System Swap settings of the ESXi host.

VMFS6-02

Alternatively, the system swap settings can be retrieved via PowerCLI:

The script above can be modified to create the system swap files on a new datastore:

Note: The host reboot is not required to apply this change.

Moving the persistent scratch location

A persistent scratch location helps when investigating the host failures. It preserves the host log files on a shared datastore. So they can be reachable for troubleshooting, even if the host experienced the Purple Screen of Death (PSOD) or went down.

To identify the persistent scratch location, filter the key column by the ‘scratch’ word in Settings > System > Advanced System Settings of the ESXi host in vSphere Client.

VMFS6-03

You only need to point the ScratchConfig.ConfiguredScratchLocation setting to a new location and reboot the host for this change to take effect.

Note: Before doing any changes, make sure that the .locker folder (should be unique for each configured host to avoid data mixing or overwrites) has been created on the desired datastore. Otherwise, the persistent scratch location remains the same.

To review and modify advanced host parameters including the persistent scratch location via PowerCLI, look for two cmdlets named Get-AdvancedSetting and Set-AdvancedSetting. This procedure is well-documented in KB 1033696.

An information about how to automate the diagnostic coredump file relocation will be covered in Part 2 or this series later on. Keep you posted!

vSphere 6.5: Switching to Native Drivers in ESXi 6.5

The Native Device Driver architecture is not something new. Since its introduction more than five years ago, VMware encourages their hardware ecosystem partners to work on developing native drivers. A list of supported hardware is growing with every major release of ESXi, with the company’s aim to deprecate the vmkLinux APIs and associated driver ecosystem completely in the future releases of vSphere.

The benefits of using the native drivers are as follows:

  • It removes the complexity of developing and maintaining Linux derived drivers,
  • It improves the system performance,
  • It frees from the functional limitations of Linux derived drivers,
  • It increases the stability and reliability of the hypervisor, as native drivers are designed specifically for VMware ESXi.

Saying that one of the steps when upgrading to a new version of vSphere is to check that the hardware supports native drivers. By default, if ESXi identifies a native driver for a device it will be loaded instead of Linux derived driver. However, it is not always a case, and you need to check whether native drivers are in use after the system upgrade.

Following steps in KB 1031534 and KB 1034674, you can pinpoint PCI devices and corresponding drivers loaded for each of them:

  • To identify a storage HBA (such as a fibre card or RAID controller), run this command:

# esxcfg-scsidevs -a

  • To identify a network card, run this command:

# esxcfg-nics -l

  • To list device state and note the hardware IDs, run this command:

# vmkchdev -l

The /etc/vmware/default.map.d/ folder on ESXi host contains a full list of map files referring to the native drivers available for your system.

ESXi-Native-Driver-01

To quickly identify the driver version, you can run this command:

# esxcli software vib list | grep <native_driver_name>

In addition, information about available vSphere Installation Bundles (VIBs) in vSphere 6.5 can be found via the web client or PowerCLI session:

  • To view all installed VIBs in vSphere Client (HTML5), open Configure > System > Packages tab in the host settings:
ESXi-Native-Driver-02
  • To view all installed VIBs in VMware Host Client, open Manage > Packages tab in the host settings:
ESXi-Native-Driver-03
  • To list all installed VIBs in PowerCLI, run this command:

# (Get-VMHost -Name ‘<host_name>‘ | Get-EsxCli).software.vib.list() | select Name,Vendor,Version | sort Name

Comparing findings above with information in the IO Devices section in VMware Hardware Compatibility List, you would be able to find out whether native drivers available for your devices, as well as the recommended combination of the driver and firmware, tested and supported by VMware.

It worth reading the release notes for the corresponding drivers and search any reference to it on VMware and the third-party vendors’ websites, in case there are any known issues or limitations that might affect how device function.

If everything seems good, it is time to enable the native driver following steps in KB 2147565:

# esxcli system module set –enabled=true –module=<native_driver_name>

This change requires a host reboot and a thorough testing afterwards. The following commands can be quite helpful when troubleshooting native drivers:

  • To get the driver supported module parameters, run this command:

# esxcfg-module -i <native_driver_name>

  • To get the driver info, run this command:

# esxcli network nic get -n <vmnic_name>

  • To get an uplink stats, run this command:

# esxcli network nic stats -n <vmnic_name>

31/08/2018 – Update 1: After some feedback provided, I have decided to list well-known issues with the native drivers that exist currently. They are as follows:

  • The Mellanox ConnectX-4/ConnectX-5 native ESXi driver might exhibit performance degradation when its Default Queue Receive Side Scaling (DRSS) feature is turned on (Reference: vSphere 6.7 Release Notes),
  • Native software FCoE adapters configured on an ESXi host might disappear when the host is rebooted (Reference: vSphere 6.7 Release Notes),
  • HP host with QFLE3 Driver Version 1.0.60.0 experienced a PSOD or stuck at “Shutting down device drivers…” shutdown or restart (Reference: KB 55088),
  • ESXi 6.5 Storage Performance Issues and Fix (Reference: Anthony Spiteri’s blog).

18/02/2019 – Update 2: There are other articles on VMware for the owners of Broadcom NICs that require attention:

vSphere 6.5: How to improve performance of the Content Library data transfers

I came across this subject by chance. However, it worth considering the KB 2112692 when planning for the content library design in vSphere 6.5.

For all operations related to content libraries that involve data transfer, which cannot be conducted by the ESXi hosts, VMware suggests the following:

  • Switch the traffic to run through non-secure HTTP connection. This is to improve the transfer speed for synchronisation tasks.

This can be done by enforcing HTTP connection for Library Sync operations for all the libraries (global approach) or on particular libraries (individual approach) in your vCenter Server instance.

At the time of writing this, there is no support for editing the Content Library Service settings via vSphere Client (HTML5). All changes can be implemented using the vSphere Web Client though.

To set this policy globally, you need to change ‘Force HTTP for Library Sync’ option in Administration > Deployment > System Configuration > Services > Content Library Service to true. This applies immediately and restart is not required.

vSphere65-CL-Performance-01

With the individual approach, you need to edit the URL of the subscribed library to start with http, instead of https. Again, no need to restart the Content Library Service.

  • Bypass the rhttpproxy. This is to improve the transfer speed for synchronisation tasks, importing and exporting items.

This change applies globally. It requires editing the vCenter Server configuration file vdc.properties, changing the firewall settings to allow connections to TCP port 16666, and restarting the Content Library Service.