vSphere 6.0: Templates are shown as ‘Unknown’ in the local Content Library

Another day another case… This time, I was surprised to see an empty list when provisioning a new virtual machine from a Content Library.

CL Issue - 01

I went to check the Content Library status and found all templates were shown as ‘Unknown’ in there.

CL Issue - 02

Funny enough, this behaviour was happening only with the local Content Library. A subscribed one didn’t have any issues at all, and the synchronisation between those two was still working.

CL Issue - 03

More interestingly, the objects of other types were not affected at all.

There is not enough information about how to troubleshoot the Content Library in vSphere 6.0. Some of the diagnostic files can be found in the /var/log/vmware/vdcs directory on vCenter Server Appliance (VCSA). Unfortunately, they are not that informative.

So I opened the case with VMware GSS (SR # 17504701707) and the response was that “this issue is occurring as there is a corrupted or stale PID for the content library service which has not been cleared from the previous running state.”

VMware is working on this to be resolved, but no ETA at the moment.

A workaround provided by VMware:

  1. Connect to the vCenter Server Appliance using SSH and root credentials.
  2. Navigate to /var/log/vmware/vdcs.
  3. Create a new folder to move the PID file to.
  4. Move the vmware-vdcs.pid file to the folder created in step 3.
  5. Reboot the vCenter Server Appliance (In case of external PSC, reboot the PSC first and then the vCenter).

I personally found that restarting VCSA resolves this issue. However, it reappears after some time.

The future of vSphere: No vCenter for Window and Flash client

vSphere Web Client

This is a logical move for VMware on the urge towards departing from the legacy technologies or third-party dependencies in their core products, and a welcomed one!

As more and more customers get used to VMware vCenter Server Appliance and vSphere Client (HTML5) in their environments, it has been just a matter of time to read news about VMware deprecating vCenter Server for Windows and vSphere Web Client.

On 25th of August 2017 it was officially confirmed the next version of vSphere would be “the terminal release” for those products, as stated in the following articles:

I personally like this change, as it helps a vendor to focus on the cutting edge technologies instead of fixing and patching what was good but not great.

vSphere 6.5 Update 1 has been released!

VMware has just released a major update to vCenter Server 6.5 with a plenty of exciting features including:

  • Ability to run the vCenter Server Appliance GUI and CLI installers on Microsoft Windows 2012 x64 bit, Microsoft Windows 2012 R2 x64 bit, Microsoft Windows 2016 x64 bit, and macOS Sierra
  • vSAN software upgrades through integration with vSphere Update Manager
  • Support for Microsoft SQL Server 2016, Microsoft SQL Server 2016 SP1, and Microsoft SQL Server 2014 SP2 as external databases for vCenter Server
  • Improved HTML5-based vSphere Client
  • Increased configuration maximums for the Linked vCenter Server instances
  • vSphere Replication updates
  • Driver updates and hips of resolved issue.

The following products have been updated:

Updated packages can be found here.

More information about new features is available following those links:

I have a few support requests with VMware GSS open, which should be resolved in this release of the product.

Will keep you posted after upgrading my environment and finishing testing.

ESXi 6.5: Host fails with PSOD when IPV6 is disabled

I have a habit of reading all new KB articles published by VMware every week. Not only is it give a visibility of the current issues that VMware products have, but it helps to be proactive with learning some behaviour and workarounds and prepared to remediate them if required.

Therefore, after writing a few blog posts about vCenter 6.5 and IPv6 here and here, it caught my eye that ESXi 6.5 Hosts could also fail with a Purple Screen of Death when IPV6 is disabled.

VMware has published a KB 2150794 that explained this behaviour.

The only workaround at this moment is to re-enable IPv6 on all hosts in your environment.

ESXi65-IPv6-PSOD

VCSA 6.5: The mysterious dependency on the IPv6 protocol – Part 2

In Part 1 of this mini-series, I was writing about the issue with the Appliance Management User Interface. However, a dependency on the IPv6 protocol in VCSA 6.5 can cause an unexpected behaviour with the vSphere ESXi Dump Collector service as well. Let’s look into this one now.

In the environment with many ESXi hosts, it is vital to have their logs available for troubleshooting. By default, each host has a diagnostic coredump partition available on the local storage. The hypervisor can preserve diagnostic information in one or more pre-configured locations such as the local partition, a file located on VMFS datastore, or a network dump server on vCenter Server.

ESXi-dump-collection-06

In a case of the critical failure with the host, when the system gets into the Purple Screen of Death (PSOD) state, the hypervisor generates a set of diagnostic data archived in a coredump. In my opinion, it is more efficient to have this information stored in the centralised location, and this is where vSphere ESXi Dump Collector service can be useful.

Initially, the vSphere ESXi Dump Collector service is disabled on the vCenter Server Appliance.

ESXi-Dump-Collector-01

The setup process is straightforward: you should select a startup type of this service (by default, it is set to Manual) and click on a Start button to enable it.

ESXi-Dump-Collector-02

Depending on the network requirements and the number of ESXi hosts, you might need changing the Coredump Server UDP Port (6500) and increasing the Repository max size (2GB). Both settings require restarting the vSphere ESXi Dump Collector service.

This process becomes a little bit complicated when IPv6 is disabled on VCSA. An attempt to start the vSphere ESXi Dump Collector service generates an error message in vSphere Web Client as follows:

ESXi-Dump-Collector-03

If we remote to the virtual appliance and run the netdumper service from the console session, it will show us more information:

root@n-vcsa-01 [ ~ ]# service-control –start netdumper
Perform start operation. vmon_profile=None, svc_names=[‘netdumper’], include_coreossvcs=False, include_leafossvcs=False
2017-07-04T10:15:32.179Z Service netdumper state STOPPED
Error executing start on service netdumper. Details {
“resolution”: null,
“detail”: [
{
“args”: [
“netdumper”
],
“id”: “install.ciscommon.service.failstart”,
“localized”: “An error occurred while starting service ‘netdumper'”,
“translatable”: “An error occurred while starting service ‘%(0)s'”
}
],
“componentKey”: null,
“problemId”: null
}
Service-control failed. Error {
“resolution”: null,
“detail”: [
{
“args”: [
“netdumper”
],
“id”: “install.ciscommon.service.failstart”,
“localized”: “An error occurred while starting service ‘netdumper'”,
“translatable”: “An error occurred while starting service ‘%(0)s'”
}
],
“componentKey”: null,
“problemId”: null
}

The next step to troubleshoot this issue is to look into the vSphere ESXi Dump Collector service log file (/var/log/vmware/netdumper/netdumper.log). It reports that the address is already in use:

root@n-vcsa-01 [ ~ ]# cat /var/log/vmware/netdumper/netdumper.log
2017-07-04T10:19:32.121Z| netdumper| I125: Log for vmware-netdumper pid=8347 version=XXX build=build-5318154 option=Release
2017-07-04T10:19:32.121Z| netdumper| I125: The process is 64-bit.
2017-07-04T10:19:32.121Z| netdumper| I125: Host codepage=UTF-8 encoding=UTF-8
2017-07-04T10:19:32.121Z| netdumper| I125: Host is Linux 4.4.8 VMware Photon 1.0 Photon VMware Photon 1.0

2017-07-04T10:19:32.123Z| netdumper| I125: Configured to handle 1024 clients in parallel.
2017-07-04T10:19:32.123Z| netdumper| I125: Configuring /var/core/netdumps as the directory to store the cores
2017-07-04T10:19:32.123Z| netdumper| I125: Configured to use wildcard [::0/0.0.0.0]:6500 as IP address:port
2017-07-04T10:19:32.123Z| netdumper| I125: Using /var/log/vmware/netdumper/netdumper.log as the logfile.
2017-07-04T10:19:32.123Z| netdumper| I125: Nothing to post process
2017-07-04T10:19:32.123Z| netdumper| I125: Couldn’t bind socket to port 6500: 98 Address already in use
2017-07-04T10:19:32.123Z| netdumper| I125:

Playing a bit with the Linux commands gave me some clues:

root@n-vcsa-01 [ ~ ]# netstat -lup
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
udp 0 0 *:kerberos *:* 1489/vmdird
udp 0 0 *:sunrpc *:* 1062/rpcbind
udp 0 0 n-vcsa-01.testorg.l:ntp *:* 1249/ntpd
udp 0 0 photon-machine:ntp *:* 1249/ntpd
udp 0 0 *:ntp *:* 1249/ntpd
udp 0 0 *:epmap *:* 1388/dcerpcd
udp 0 0 *:syslog *:* 2229/rsyslogd
udp 0 0 *:794 *:* 1062/rpcbind
udp 0 0 *:ideafarm-door *:* 3905/vpxd
udp 0 0 *:llmnr *:* 1223/systemd-resolv
udp6 0 0 [::]:tftp [::]:* 1/systemd
udp6 0 0 [::]:sunrpc [::]:* 1062/rpcbind
udp6 0 0 [::]:ntp [::]:* 1249/ntpd
udp6 0 0 [::]:syslog [::]:* 2229/rsyslogd
udp6 0 0 [::]:794 [::]:* 1062/rpcbind
udp6 0 0 [::]:boks [::]:* 17377/vmware-netdum

root@n-vcsa-01 [ ~ ]# ps -p 17377
PID TTY TIME CMD
17377 ? 00:00:00 vmware-netdumpe

root@n-vcsa-01 [ ~ ]# cat /proc/17377/cmdline
/usr/sbin/vmware-netdumper-d/var/core/netdumps-o6500-l/var/log/vmware/netdumper/netdumper.log

Even if it reports an error at startup, the vSphere ESXi Dump Collector service is running (partially) on the virtual appliance.

Thanks to Michael (for sharing a detailed guide), I was able to test this assumption quickly.

ESXi-Dump-Collector-04

ESXi-Dump-Collector-05

The coredump was successfully transferred from the ESXi host to the /var/core/netdumps/ folder on the VCSA appliance. However, there were no records about this operation in the netdumper.log.

This issue has been reported to VMware GSS (SR # 17385781602) and should be resolved in the future updates to VCSA 6.5.