A Home Lab Basics: Part 1 – An automated ESXi installation

Over time almost every virtualisation specialist asks himself a simple question: ‘Do I need a home lab?’

In recent years, this topic has become more and more popular. Many top bloggers write at least once about their experience with building a home lab; some contribute more to the community providing scripted installs, OVA-templates for the nested virtualisation, and even drivers for unsupported devices.

There is plenty of choice in terms of hardware platforms and networking devices to build your lab (including Raspberry Pi), and the sky’s the limit.

My preference would be for the all-in-one solution, and the rest is done in the nested environment. It should be relatively compact and quiet, with minimum wired connections to the router – an ideal option for someone leaving in the apartment.

As a result of my research, I bought Lenovo P-series ThinkStation with two Intel Xeon CPUs and 80 GB of RAM a few years ago. Instead of using magnetic drives, I put in NVMe M.2 SSDs (used for the VMFS and vSAN datastores) and one USB flash drive for the ESXi boot partitions. A workstation has two onboard 1 Gbps network cards and I have added an additional quad-port 1 Gbps PCIe card to test different configurations (bonding, path-through, etc). All NICs are connected to the router which provides access to the home network and to the Internet.

This platform is sufficient for setting up the vSphere, vSAN, and vRealize Automation labs.

In a serious of articles, I am planning to show how to automate different parts of those labs. It all starts here with the scripted ESXi installation using a bootable USB flash drive.

To create a bootable media, we need to do the following steps:

  1. Format a USB flash drive to boot the ESXi Installer.
  2. Copy files from the ESXi ISO image to the USB flash drive.
  3. Modify the configuration file SYSLINUX.CFG.
  4. Modify the configuration file BOOT.CFG.
  5. Create an answer file KS.CFG.

In the paragraphs below, I am going to discuss those steps in detail.

Step #1 – Format a USB flash drive

Depending on the operating system, the process can vary. In the official documentation, VMware details this task for Linux. To have it done on the computer with Mac OS, I used steps described in the blog posts here and here.

Firstly, we need to identify the USB disk using the diskutil list command. In my case, it is /dev/disk2.

Then, we erase that disk using the diskutil eraseDisk command:

diskutil eraseDisk FAT32 ESXIBOOT MBRFormat /dev/disk2
Started erase on disk2
Unmounting disk
Creating the partition map
Waiting for partitions to activate
Formatting disk2s1 as MS-DOS (FAT32) with name ESXIBOOT
512 bytes per physical sector
/dev/rdisk2s1: 7846912 sectors in 980864 FAT32 clusters (4096 bytes/cluster)
bps=512 spc=8 res=32 nft=2 mid=0xf8 spt=32 hds=255 hid=2048 drv=0x80 bsec=7862272 bspf=7664 rdcl=2 infs=1 bkbs=6
Mounting disk
Finished erase on disk2

It is important to choose the MBR format for the disk (MBRFormat option). Otherwise, when you boot from this USB, the ESXi won’t be able to copy data from that partition and will generate the following error message: ‘exception.HandledError: Error (see log for more info): cannot find kickstart file on usb with path — /KS.CFG.’

As a result, you will have one MS-DOS FAT32 partition /dev/disk2s1. The next step is to mark it as active and bootable:

diskutil unmount /dev/disk2s1
Volume ESXIBOOT on disk2s1 unmounted

sudo fdisk -e /dev/disk2
fdisk: 1> flag 1
Partition 2 marked active.
fdisk:*1> write
Writing MBR at offset 0.
fdisk: 1> quit

Before copying any data into USB, we need to remount /dev/disk2s1:

diskutil unmount /dev/disk2s1
Volume ESXIBOOT on disk2s1 unmounted
diskutil mount /dev/disk2s1
Volume ESXIBOOT on /dev/disk2s1 mounted

Step #2 – Copy the ESXi ISO image files to the USB flash drive

There are two simple steps – mount the ISO file and copy data to the USB drive. In the example below, I used the VMware ESXi 6.7 Update 2 image.

hdiutil mount VMware-VMvisor-Installer-6.7.0.update02-13006603.x86_64.iso

cp -R /Volumes/ESXI-6.7.0-20190402001-STANDARD/ /Volumes/ESXIBOOT/

Step #3 – Modify SYSLINUX.CFG

We need to rename ISOLINUX.CFG to SYSLINUX.CFG:

mv /Volumes/ESXIBOOT/ISOLINUX.CFG /Volumes/ESXIBOOT/SYSLINUX.CFG

Then we define a location of BOOT.CFG (here ‘-p 1‘ refers to /dev/disk2s1):

sed -e ‘/-c boot.cfg/s/$/ -p 1/’ -i _BACK /Volumes/ESXIBOOT/SYSLINUX.CFG

Step #4 – Modify BOOT.CFG

Now we need to add a path to the answer file (ks=usb:/KS.CFG) into the boot loader (BOOT.CFG).

However, there are two boot loaders available with the image – one for the BIOS boot, and another one for EFI.

find /Volumes/ESXIBOOT -type f -name ‘BOOT.CFG’
/Volumes/ESXIBOOT/BOOT.CFG
/Volumes/ESXIBOOT/EFI/BOOT/BOOT.CFG

So it makes sense to edit both of them to eliminate any possible issues.

sed -e ‘s+cdromBoot+ks=usb:/KS.CFG+g’ -i _BACK $(find /Volumes/ESXIBOOT -type f -name ‘BOOT.CFG’)

In the example above, I created a backup of the original BOOT.CFG files and replace ‘cdromBoot‘ with ‘ks=usb:/KS.CFG‘ inside them.

Step #5 – Create KS.CFG

Finally, we can work on the answer file that will be used to automate the ESXi host installation.

In a basic scenario, the KS.CFG file should include the following:

  • Accept VMware License agreement,
  • Set the root password,
  • Choose the installation path,
  • Set the network settings,
  • Reboot the host after installation is completed.

A best practice would be to encrypt the root password. This can be done using OpenSSL:

openssl passwd -1

To identify the installation path, I normally boot ESXi with a dummy installation script and then use a local console to search for the device names in /vmfs/devices/disks. An MPX format is a preferable option for the disk device name.

A sample installation script is shown below.

In the next post, I will show how to complete the initial server configuration using PowerCLI.

vSphere 6.5: Additional considerations when migrating to VMFS-6 – Part 2

In the Part 1 of this series, I was writing about the most common cases which might prevent a successful migration to VMFS-6. There is another one to cover.

For ESXi hosts that boot from a flash storage or from memory, a diagnostic core dump file can also be placed on a shared datastore. You won’t be able to un-mount this datastore without deleting a core dump first.

VMware recommends using an esxcli utility to view/edit the core dump settings. This also can be automated via PowerCLI.

To check if the core dump file exists and is active, please use the following code:

To delete an old configuration that points to the VMFS-5 datastore, the following script can help:

With this change made you would be able to continue migrating to VMFS-6 without any issue.

If you have any suggestions or concerns, feel free to share them in the comments below.

[URGENT] vSAN 6.6.1: Potential data loss due to resynchronisation mixed with object expansion

Last week VMware released an urgent hotfix to remediate potential data loss in vSAN 6.6.1 due to resynchronisation mixed with object expansion.

This is a known issue affecting earlier versions of ESXi 6.5 Express Patch 9. The vendor states that a sequence of the following operations might cause it:

  1. vSAN initiates resynchronisation to maintain data availability.
  2. You expand a virtual machine disk (VMDK).
  3. vSAN initiates another resync after the VMDK expansion.

Detailed information about this problem is available in KB 60299.

If you are a vSAN customer, additional considerations are required before applying this hotfix:

  • If hosts have already been upgraded to ESXi650-201810001, you can proceed with this upgrade,
  • If hosts have not been upgraded to ESXi650-201810001, and if an expansion of a VMDK is likely, the in-place expansion should be disabled on all of them by setting the VSAN.ClomEnableInplaceExpansion advanced configuration option to ‘0‘.

The VSAN.ClomEnableInplaceExpansion advanced configuration option is not available in vSphere Client. I use the following one-liner scrips to determine and change its value via PowerCLI:

# To check the current status
Get-VMHost | Get-AdvancedSetting -Name “VSAN.ClomEnableInplaceExpansion” | select Entity, Name, Value | Format-Table -AutoSize

# To disable the in-place expansion
Get-VMHost | Get-AdvancedSetting -Name “VSAN.ClomEnableInplaceExpansion” | ? {$_.Value -eq “1”} | Set-AdvancedSetting -Value “0”

Note: No reboot is required after the change.

After hosts were upgraded to ESXi650-201810001 or ESXi650-201811002, you can set VSAN.ClomEnableInplaceExpansion back to ‘1‘ to enable the in-place expansion.

vSphere 6.x: SEsparse snapshot may cause guest OS file system corruption

Early this month, VMware published a KB 59216 named ‘Virtual Machines running on a SEsparse snapshot may report guest data inconsistencies’.

As per the vendor’s documentation, ‘SEsparse is a snapshot format introduced in vSphere 5.5 for large disks, and is the preferred format for all snapshots in vSphere 6.5 and above with VMFS-6‘. On VMFS-5 and NFS datastores, the SEsparse format is used for virtual disks that are 2 TB or larger; whereas on VMFS-6, SEsparse is the default format for all snapshots.

The knowledge base article states that the issue affects vSphere 5.5 and later versions. As of today, it has been fixed only in VMware ESXi 6.7 Update 1, with the Express Patches pending for VMware ESXi 6.0 and 6.5.

How is this related to your production environment? Well, it depends…

For example, when the backup software creates a system snapshot and it coexists with the operating system (OS) experiencing ‘a burst of non-contiguous write IO in a very short period of time‘, this can potentially trigger the data corruption. There might be other scenarios when a snapshot is used during the OS or software upgrades.

While waiting for a permanent solution, VMware provides a workaround that requires disabling SEsparse IO coalescing on each affected host. The advanced setting that controls IO Coalescing (COW.COWEnableIOCoalescing) is not available through the vSphere Client:

ESXi-SEspare-Issue-01

In spite of that, you can always determine and change its value via PowerCLI:

Get-VMHost | Get-AdvancedSetting -Name “COW.COWEnableIOCoalescing” | select Entity,Name,Value | Format-Table -AutoSize

Get-VMHost | Get-AdvancedSetting -Name “COW.COWEnableIOCoalescing” | ? {$_.Value -eq “1”} | Set-AdvancedSetting -Value “0”

Note: After disabling the IO coalescing, all virtual machines resided on that host ‘must be power-cycled or migrated (vMotion) to other hosts that have the config option set‘.

VMware states there will be a performance penalty when disabling IO coalescing and ‘the extent of degradation depends on the individual virtual machine workload‘.

Note: ‘After patches are released, the workaround needs to be rolled back to regain performance benefits of IO coalescing‘.

24/01/2019 – Update 1: This issue has been resolved with the following set of patches:

[IMPORTANT] VMware ESXi 6.x: Denial-of-service vulnerability in 3D-acceleration feature

This week VMware published a security advisory VMSA-2018-0025 about the denial-of-service vulnerability in the 3D-acceleration feature in VMware ESXi, Workstation, and Fusion.

VM3DSupport-Issue-01

It affects all versions of those products if 3D-acceleration feature is enabled for virtual machines (VMs). This is a default setting for all VMs on VMware Workstation and Fusion and might be an issue for the VMs managed by VMware Horizon.

More information about this issue can be found here.

At the moment of writing this article, there were no patches or updates provided by VMware to mitigate this problem. So a workaround would be to disable the 3D-acceleration feature for affected systems.

To identify the VMs that have the 3D-acceleration feature enabled, I wrote the following PowerCLI script:

As soon as the permanent solution provided by the vendor, I will update this blog post with more information.

vSphere 6.5: Additional considerations when migrating to VMFS-6 – Part 1

For those who use the Virtual Machine File System (VMFS) datastores, one of the steps when upgrading to vSphere 6.5 is to migrate them to VMFS-6.

VMFS6-01

VMware provides a detailed overview of VMFS-6 on the StorageHub, as well as an example of how the migration from VMFS-5 can be automated using PowerCLI.

However, there are three edge cases that require extra steps to continue with the migration. They are as follows:

All those objects, if they exist, prevent the ESXi host from unmounting the datastore, and they need to be moved to a new location before migration continues. The required steps to relocate them will be reviewed in the paragraphs below.

Relocating the system swap

The system swap location can be checked and set via vSphere Client in Configure > System > System Swap settings of the ESXi host.

VMFS6-02

Alternatively, the system swap settings can be retrieved via PowerCLI:

The script above can be modified to create the system swap files on a new datastore:

Note: The host reboot is not required to apply this change.

Moving the persistent scratch location

A persistent scratch location helps when investigating the host failures. It preserves the host log files on a shared datastore. So they can be reachable for troubleshooting, even if the host experienced the Purple Screen of Death (PSOD) or went down.

To identify the persistent scratch location, filter the key column by the ‘scratch’ word in Settings > System > Advanced System Settings of the ESXi host in vSphere Client.

VMFS6-03

You only need to point the ScratchConfig.ConfiguredScratchLocation setting to a new location and reboot the host for this change to take effect.

Note: Before doing any changes, make sure that the .locker folder (should be unique for each configured host to avoid data mixing or overwrites) has been created on the desired datastore. Otherwise, the persistent scratch location remains the same.

To review and modify advanced host parameters including the persistent scratch location via PowerCLI, look for two cmdlets named Get-AdvancedSetting and Set-AdvancedSetting. This procedure is well-documented in KB 1033696.

An information about how to automate the diagnostic coredump file relocation will be covered in Part 2 or this series later on. Keep you posted!

URGENT: VMware Tools 10.3.0 was recalled

VMware has just announced that VMware Tools 10.3.0 was recalled due to a functional issue with 10.3.0 in ESXi 6.5.

VMware-Tools-1030-Issue

As per KB 57796, the VMXNET3 driver released with VMware Tools 10.3.0 can result in a Purple Diagnostic Screen (PSOD) or guest network connectivity loss in certain configurations. Those configurations include:

  • VMware ESXi 6.5 hosts
  • VM Hardware version 13
  • Windows 8/Windows Server 2012 or higher guest operating system (OS).

As a workaround, VMware recommends uninstalling VMware Tools 10.3.0 and then reinstalling VMware Tools 10.2.5 for affected systems.

The vendor will be releasing a revised version of the VMware Tools 10.3 family at some point in the future.

More information is available in VMSA-2018-0017.

25/09/2018 – Update 1: VMware Tools were updated to version 10.3.2 to resolve the issue with VMXNET3 driver. VMware recommends to install VMware Tools 10.3.2, or VMware Tools 10.2.5 or an earlier version of VMware Tools.