vSphere 6.x: The beauty and ugliness of the Content Library – Part 1

The title of this blog post seems to be a bit provocative, and this has been done for a reason.

I believe many VMware engineers, including myself, were really excited about the Content Library feature introduced in vSphere 6.0. The product itself is not completely new for VMware, as it merges code from the content management feature of vCloud Director.

In What’s New in the VMware vSphere 6.0 Platform whitepaper, VMware states the following:

“The Content Library… centrally manages virtual machine templates, ISO images, and scripts, and it performs the content delivery of associated data from the published catalog to the subscribed catalog at other sites.”

Sounds really cool! Now we can centralise all objects that were previously residing on different datastores in one place, and manage them from vSphere Web Client.

In vSphere 6.5, VMware continues improving and polishing this feature:

“Administrators can now mount an ISO directly from the Content Library, apply a guest OS customization specification during VM deployment, and update existing templates.”

However, this article is not only about embracing the tool provided. 🙂 I would like to share with you three specific examples when it doesn’t work as expected, and possible workarounds.

Issue #1 – Provisioning a virtual machine template with the advanced parameters

Affected platform: vSphere 6.0 prior to Update 3.

It was a great surprise to know that provisioning a virtual machine from a VM template which has advanced parameters set can cause any problems in vSphere 6.0. Although the provisioning operation starts as expected, it shows an error message “Failed to deploy OVF package” at the end of it.

CL-Issue01-01

Unfortunately, the Error Report in vSphere Web Client wouldn’t be able to clarify the root cause of this event.

CL-Issue01-02

After contacting VMware GSS about this issue (SR # 16255562909) in late 2016, I had been advised that this bug would be addressed in vSphere 6.0 Update 3.

In March 2017 I updated my environment to this version and tested this feature, the VM creation was working smoothly. So it took almost two years for VMware since the Content Library feature was generally available to fix it.

Gladly, vSphere 6.5 does not have this problem at all.

Resolution: Update your environment to vSphere 6.0 Update 3 or newer version.

Issue #2 – Provisioning a virtual machine from the Content Library on the vSAN datastore

Affected platform: vSphere 6.5 Standard.

The issue is not related to the Content Library directly, rather to OVA/OVF provisioning. For some reason, when you create a new VM from the template in vSphere 6.5, it triggers “Call DRS for cross vMotion placement recommendations” task.

If you use vSphere 6.5 Standard, for which the DRS feature is not available, it causes this task to fail with the error message “The operation failed due to The operation is not allowed in the current state.”

CL-Issue02-01

CL-Issue02-02

The Error Report in vSphere Web Client looks similar to one in the picture below.

CL-Issue02-03

In the Known Issues in VMware vSAN 6.6 Release Notes, the vendor states the following:

VM OVF deploy fails if DRS is disabled
If you deploy an OVF template on the vSAN cluster, the operation fails if DRS is disabled on the vSAN cluster. You might see a message similar to the following: The operation is not allowed in the current state.

Workaround: Enable DRS on the vSAN cluster before you deploy an OVF template.

After doing some troubleshooting and trying different scenarios, the only difference with the provisioning task I was able to identify was the VM storage policy. Regardless the way the VM creation was initiated (from the OVA/OVF file, or Content Library template), it was the Virtual SAN Default Storage Policy call for the DRS to perform a cross vMotion check.

For example, if you set the VM storage policy in the Select storage dialogue box to “None”, the OVA/OVF file can be provisioned on the vSAN datastore.

CL-Issue02-04

The same happens for the VM template from the Subscribed Content Library when the VM storage policy is “None”.

Unfortunately, this trick doesn’t work with the templates in the Local Content Library.

So I decided to dig a bit dipper into the Content Library structure to see if anything can be done there.

The Content Library keeps its data in the contentlib-GUID folder. Each template has its own subfolder with the unique name. Inside the subfolder, there are few files: a descriptor (*.ovf) and one or more data files (*.vmdk).

In vSphere 6.0 those files are named as descriptor_GUID.ovf and disk-vdcs-Disk_Number_GUID.vmdk.

With vSphere 6.5 the files are self-explanatory: Template_Name_GUID.ovf and Template_Name-Disk_Number_GUID.vmdk.

CL-Issue02-05

I compared the descriptor files for the VM templates in the Local and Subscribed Content Libraries, and found they had different vmw:name values in the StorageGroupSection. For the Local Content Library it was a “Virtual SAN Default Storage Policy”, and for the subscribed one it was different.

CL-Issue02-06

It all led me to the idea of changing this descriptor for the VM template in the Local Content Library. So I could provision the VMs using one of the workarounds below.

Workarounds:

  • When provision from the OVA/OVF file, set the VM storage policy in the Select storage dialogue box as “None”,
  • You can provision from the Subscribed Content Library if it has the VM templates with the VM storage policy different from the “Virtual SAN Default Storage Policy”. Set the VM storage policy in the Select storage dialogue box as “None”,
  • You can provision from the Local Content Library if you edit the descriptor file for the VM template and replace the “Virtual SAN Default Storage Policy” with something else. Set the VM storage policy in the Select storage dialogue box as “None”.

Resolution: The support case has been opened, and I am waiting for VMware to resolve this issue. The ETA for this to be fixed is in vSphere 6.5 Update 1 (please refer to SR # 17393663302 when contacting VMware GSS for the future updates).

To be continued

Configuring static network in Photon OS

As more virtual appliances from VMware come with Photon OS, I would like to share a few simple workarounds to assign a static IP address and other network parameters to the virtual machines based on this operating system.

In Photon OS, the process systemd-networkd is responsible for the network configuration. You can check its status by executing the following command:

[ ~ ]# systemctl status systemd-networkd -l

It should give you an output similar to one in the picture below.

PhotonOS-Net-01

 

By default, systemd-networkd receives its settings from the configuration file 10-dhcp-en.network located in /etc/systemd/network/ folder. It has the following format:

[Match]
Name=e*

[Network]
DHCP=yes

I would recommend renaming this file to 10-static-en.network. So it will be easy to troubleshoot network issues in the future.

The file syntax is similar to what is used in Arch Linux. With few additional lines in the file, the network configuration can be set to our requirements. They are as follows:

  • In section [Network]
    • Address – the IP address and subnet mask in the format of XXX.XXX.XXX.XXX/YY
    • Gateway – an IP address of the default gateway
    • DNS – IP addresses of one or more DNS servers (space-separated values)
    • Domains – domain name(s) in FQDN format (space-separated values)
    • NTP – IP addresses or FQDNs of NTP sources (space-separated values).

An example of the static network configuration is shown below.

[Match]
Name=e*

[Network]
DHCP=no
Address=192.168.1.101/24
Gateway=192.168.1.1
DNS=192.168.1.21 192.168.1.1
Domains=testorg.local
NTP=0.au.pool.ntp.org 1.au.pool.ntp.org

The hostname of the system can be added to /etc/hostname file in FQDN format.

All changes should apply after rebooting the virtual machine. To test the results, we can use the following commands:

  • ip a – shows the IP addresses of the network interfaces

PhotonOS-Net-02

  • ip route – shows the routing table,

PhotonOS-Net-03

  • systemctl status systemd-timesyncd -l – shows time synchronisation status.

PhotonOS-Net-04

VMware Log Insight 4.0 and a slow login with the domain user credentials

Recently I was spinning up one more instance of VMware Log Insight 4.0 appliance in a branch office.

After enabling authentication against Active Directory, I have noticed it was relatively slow to log on to the Log Insight web interface. Moreover, when I pointed the Authentication Configuration to the local domain controllers the connection test was always failing.

li-ad-integration-02

I did not have enough time to troubleshoot this issue. So I decided to continue with this task later on.

Few days after the situation became even worth: domain users could not successfully log on to the appliance with the rolling wheel appears when pressing the login button.

li-ad-integration-01

Fortunately, I am not the first customer who came across of this issue. VMware has published an article “Unable to Log In Using Active Directory Credentials” which helps to locate the cause of this behaviour.

As suggested by the vendor, I looked through the records in the /storage/var/loginsight/runtime.log file and have found the following:

[com.vmware.loginsight.aaa.krb5.KrbAuthenticator] [Attempting Kerberos login: [[ user=XXXXX ], [ domain=XXXXX ]]]

[com.vmware.loginsight.aaa.krb5.KrbAuthenticator] [Kerberos login in 270817ms]

jsonResult: {“result”:”Cannot reach kerberos servers through TCP.“}

suggestion Please verify that your firewall settings allow TCP ports for active directory and kerberos.

Here I need to say that Active Directory has the hub-and-spoke topology with the domain controllers in the local and central sites being available to the clients.

By default, Log Insight could be pointed to the specific domain controllers, but not Kerberos servers. As a result, the Kerberos client uses auto-discovery as a mechanism to contact any server listed in the _ldap._tcp.dc._msdcs.[domain_name] namespace and delays with reaching ones that are available. To illustrate this, you can execute the following command from the Log Insight CLI:

~# netstat -A inet –program | egrep -i “kerberos”

It should show you all active UDP sessions which were initiated by the Kerberos client.

The next step is to find the way to narrow down a list of the domain controllers to those which are available to the client. VMware helps us with this task providing “advanced options for Active Directory integration in Log Insight beyond what is available in the administrative user interface.

The problem can be resolved with the following steps:

  1. Open https://loginsight_hostname_or_ipaddress/internal/config web-page.
  2. Add krb-domain-servers option with the appropriate values for the available domain controllers to the advanced configuration and save those changes.
  3. Restart Log Insight server.

After all those changes completed, you should be able to log on quickly to Log Insight with the domain account:

[com.vmware.loginsight.aaa.krb5.KrbAuthenticator] [Attempting Kerberos login: [[ user=XXXXX ], [ domain=XXXXX ]]]

[com.vmware.loginsight.aaa.krb5.KrbAuthenticator] [Kerberos login in 22ms]

03/03/2017 – Update 1: With the release of vRealize Log Insigh 4.3 the issue has been resolved. Please see the release notes for more details.

 

vCenter Support Assistant 6.5: This type of network adapter is not supported by {0}Other Linux (64-bit)

VMware has just released a new version of vCenter Support Assistant 6.5 which officially supports vSphere 6.5 and has a few noticeable improvements comparing to the previous release.

In this appliance, SUSE Linux has been replaced with Photon OS. The shift looks quite logical, as VMware pushes their own Linux flavour to more and more new products. Not only is it help to maintain a holistic approach when distributing virtual appliances, but it also promises an improved performance of the operating system, as VMware heavily invested into making it lightweight and fast.

However, when I completed provisioning vSA 6.5 in my environment and checked the virtual machine settings; to my surprise, it was a warning message shown in the screenshot below.

vsa-issue-01

It is not problematic to understand a root cause of this issue and eliminate it completely.To keep backwards compatibility with previous versions of vCenter Server, the VM hardware was set to version 8 (ESXi 5.0 and later).

To keep backwards compatibility with earlier versions of vCenter Server, the VM hardware was set to version 8 (ESXi 5.0 and later).

vsa-issue-02

This choice of the OS is entirely unexpected, as ‘Other Linux (64-bit)‘ was classified as a Legacy operating system by the vendor.

vsa-issue-03

It is until the VM hardware version 10 when it is possible to change the guest operating system to ‘Other 3.x or later Linux (64-bit)‘ to resolve the problem. So the workaround would be upgrading the VM to at least hardware version 10, and then chose the compatible OS type.

My suggestion to VMware would be to introduce a new Guest OS version called ‘Linux / Photon OS’ with the compatible hardware profile to prevent similar warnings in the future.

vSphere 6.0: Available storage for /storage/log reached warning thershold – less then 30 % available space

For those who have vCenter Server Appliance with an External Platform Services Controller, you might notice a warning message in Services Health area in Administration -> System Configuration -> Summary tab.

VMware Syslog Service reports a warning message as soon as /storage/log has less than 30 percent of free space, similar to what is in the picture below.

syslog-service-issue-01

syslog-service-issue-02

The problem appears to be with the VMDK disk for a /storage/log mount point. On PSC, it has a default size of 5 GB and is quickly filling in with the SSO log files.

syslog-service-issue-03

VMware has two possible solutions to resolve this issue, as follows:

The second option sounds more preferable, as it eliminates the need to monitor changes in the log4j.properties file after a system update. However, the commands in the VMware KB 2126276 do not apply to the Platform Services Controller appliance. It doesn’t have a vpxd_servicecfg script to automate the volume extension.

Fortunately, Florian Grehl has documented a workaround for PSC, which requires us to extend the VMDK5 using the vSphere Web Client and execute the following commands in an SSH session on the affected server:

1. Rescan the SCSI Bus to make Linux aware of the resized virtual disk

# rescan-scsi-bus.sh -w –forcerescan

2. Change the size of the Volume Group by using the Disk Device from the table above

# pvresize /dev/sde

3. Resize the Logical Volume by using the name from the table above

# lvresize –resizefs -l +100%FREE /dev/log_vg/log

After completing the commands and verifying the volume size, we should restart VMware Syslog Service to refresh its state. It can be done from the same SSH session or using vSphere Web Client.

syslog-service-issue-04

And this is how things are back to normal 🙂

vSphere 6.x: Force the datastore capability sets update

When a new datastore provisioned to the vSphere environment, it might be a delay in updating the information about the capability sets, and the datastore would be incompatible with a storage policy.

storage-provider-01

The vCenter Server periodically updates storage data in its database. I couldn’t find the exact time intervals when it occurs. Fortunately, it is possible to force the datastore capability sets update in the vSphere Web Client.

To complete this task, go to the vCenter Manage tab and choose ‘Storage providers’ option. A rescan button is available from the storage system settings.

storage-provider-02

Clicking on that icon initiates rescan and updates the storage capabilities of the datastore.

storage-provider-03

Now it is able to place the virtual machines on the datastore.

Configuring PERC H730/730p cards for VMware vSAN 6.x

One of the necessary steps to create a new VMware vSAN cluster is to configure the RAID controller.

I have found Joe’s post about setting up Dell PERC H730 cards very informative and easy to follow. However, the latest generation of Dell’s PowerEdge servers has a slightly different configuration interface. So I decided to document configuration process using the BIOS graphical interface.

You can get into it either pressing an F2 key during the server boot or choosing a BIOS Setup option in the Next Boot drop-down menu in the iDRAC Virtual Console.

step-00

The next step is to click on the Device Settings and select the RAID controller from the list of available devices.

step-01

Step-02.png

There are two configuration pages that we should be interested in, as follows:

  • Controller Management > Advanced Controller Management
  • Controller Management > Advanced Controller Properties.

The former gives us ability to switch from RAID mode to HBA mode.

Step-03.png

The latter allows disabling the controller caching and setting the BIOS Boot mode.

Step-04.png

Please note the system reboot is required for the change to take effect. It is always a good idea to double check that the parameters above were setup correctly.