VCSA 6.5: The mysterious dependency on the IPv6 protocol – Part 2

In Part 1 of this mini-series, I was writing about the issue with the Appliance Management User Interface. However, a dependency on the IPv6 protocol in VCSA 6.5 can cause an unexpected behaviour with the vSphere ESXi Dump Collector service as well. Let’s look into this one now.

In the environment with many ESXi hosts, it is vital to have their logs available for troubleshooting. By default, each host has a diagnostic coredump partition available on the local storage. The hypervisor can preserve diagnostic information in one or more pre-configured locations such as the local partition, a file located on VMFS datastore, or a network dump server on vCenter Server.

ESXi-dump-collection-06

In a case of the critical failure with the host, when the system gets into the Purple Screen of Death (PSOD) state, the hypervisor generates a set of diagnostic data archived in a coredump. In my opinion, it is more efficient to have this information stored in the centralised location, and this is where vSphere ESXi Dump Collector service can be useful.

Initially, the vSphere ESXi Dump Collector service is disabled on the vCenter Server Appliance.

ESXi-Dump-Collector-01

The setup process is straightforward: you should select a startup type of this service (by default, it is set to Manual) and click on a Start button to enable it.

ESXi-Dump-Collector-02

Depending on the network requirements and the number of ESXi hosts, you might need changing the Coredump Server UDP Port (6500) and increasing the Repository max size (2GB). Both settings require restarting the vSphere ESXi Dump Collector service.

This process becomes a little bit complicated when IPv6 is disabled on VCSA. An attempt to start the vSphere ESXi Dump Collector service generates an error message in vSphere Web Client as follows:

ESXi-Dump-Collector-03

If we remote to the virtual appliance and run the netdumper service from the console session, it will show us more information:

root@n-vcsa-01 [ ~ ]# service-control –start netdumper
Perform start operation. vmon_profile=None, svc_names=[‘netdumper’], include_coreossvcs=False, include_leafossvcs=False
2017-07-04T10:15:32.179Z Service netdumper state STOPPED
Error executing start on service netdumper. Details {
“resolution”: null,
“detail”: [
{
“args”: [
“netdumper”
],
“id”: “install.ciscommon.service.failstart”,
“localized”: “An error occurred while starting service ‘netdumper'”,
“translatable”: “An error occurred while starting service ‘%(0)s'”
}
],
“componentKey”: null,
“problemId”: null
}
Service-control failed. Error {
“resolution”: null,
“detail”: [
{
“args”: [
“netdumper”
],
“id”: “install.ciscommon.service.failstart”,
“localized”: “An error occurred while starting service ‘netdumper'”,
“translatable”: “An error occurred while starting service ‘%(0)s'”
}
],
“componentKey”: null,
“problemId”: null
}

The next step to troubleshoot this issue is to look into the vSphere ESXi Dump Collector service log file (/var/log/vmware/netdumper/netdumper.log). It reports that the address is already in use:

root@n-vcsa-01 [ ~ ]# cat /var/log/vmware/netdumper/netdumper.log
2017-07-04T10:19:32.121Z| netdumper| I125: Log for vmware-netdumper pid=8347 version=XXX build=build-5318154 option=Release
2017-07-04T10:19:32.121Z| netdumper| I125: The process is 64-bit.
2017-07-04T10:19:32.121Z| netdumper| I125: Host codepage=UTF-8 encoding=UTF-8
2017-07-04T10:19:32.121Z| netdumper| I125: Host is Linux 4.4.8 VMware Photon 1.0 Photon VMware Photon 1.0

2017-07-04T10:19:32.123Z| netdumper| I125: Configured to handle 1024 clients in parallel.
2017-07-04T10:19:32.123Z| netdumper| I125: Configuring /var/core/netdumps as the directory to store the cores
2017-07-04T10:19:32.123Z| netdumper| I125: Configured to use wildcard [::0/0.0.0.0]:6500 as IP address:port
2017-07-04T10:19:32.123Z| netdumper| I125: Using /var/log/vmware/netdumper/netdumper.log as the logfile.
2017-07-04T10:19:32.123Z| netdumper| I125: Nothing to post process
2017-07-04T10:19:32.123Z| netdumper| I125: Couldn’t bind socket to port 6500: 98 Address already in use
2017-07-04T10:19:32.123Z| netdumper| I125:

Playing a bit with the Linux commands gave me some clues:

root@n-vcsa-01 [ ~ ]# netstat -lup
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
udp 0 0 *:kerberos *:* 1489/vmdird
udp 0 0 *:sunrpc *:* 1062/rpcbind
udp 0 0 n-vcsa-01.testorg.l:ntp *:* 1249/ntpd
udp 0 0 photon-machine:ntp *:* 1249/ntpd
udp 0 0 *:ntp *:* 1249/ntpd
udp 0 0 *:epmap *:* 1388/dcerpcd
udp 0 0 *:syslog *:* 2229/rsyslogd
udp 0 0 *:794 *:* 1062/rpcbind
udp 0 0 *:ideafarm-door *:* 3905/vpxd
udp 0 0 *:llmnr *:* 1223/systemd-resolv
udp6 0 0 [::]:tftp [::]:* 1/systemd
udp6 0 0 [::]:sunrpc [::]:* 1062/rpcbind
udp6 0 0 [::]:ntp [::]:* 1249/ntpd
udp6 0 0 [::]:syslog [::]:* 2229/rsyslogd
udp6 0 0 [::]:794 [::]:* 1062/rpcbind
udp6 0 0 [::]:boks [::]:* 17377/vmware-netdum

root@n-vcsa-01 [ ~ ]# ps -p 17377
PID TTY TIME CMD
17377 ? 00:00:00 vmware-netdumpe

root@n-vcsa-01 [ ~ ]# cat /proc/17377/cmdline
/usr/sbin/vmware-netdumper-d/var/core/netdumps-o6500-l/var/log/vmware/netdumper/netdumper.log

Even if it reports an error at startup, the vSphere ESXi Dump Collector service is running (partially) on the virtual appliance.

Thanks to Michael (for sharing a detailed guide), I was able to test this assumption quickly.

ESXi-Dump-Collector-04

ESXi-Dump-Collector-05

The coredump was successfully transferred from the ESXi host to the /var/core/netdumps/ folder on the VCSA appliance. However, there were no records about this operation in the netdumper.log.

This issue has been reported to VMware GSS (SR # 17385781602) and should be resolved in the future updates to VCSA 6.5.

VCSA 6.5: The mysterious dependency on the IPv6 protocol – Part 1

Starting from vSphere 4.1, IPv6 support has been introduced to the virtual platform from VMware. It is enabled in the vCenter Server Appliance by default and can be controlled in VCSA 6.0 and 6.5 from the Direct Console User Interface (Customize System > Configure Management Network > IPv6 Configuration).

IPv6-Issue-01

To my surprise, disabling IPv6 can cause some problems with the VCSA updates. I will explain this statement and provide a workaround in the paragraphs below.

Imagine your security team requires IPv6 to be turned off on vCenter Server. Following this call, you proceeded with the configuration change in DCUI.

IPv6-Issue-02

After rebooting the virtual machine, it all should work fine. Now, it is time to update the virtual appliance to a newer version. You downloaded a patch file, attached it to the VM, and started the update process from the VMware vSphere Appliance Management Interface.

When the server reboots, you will notice the Appliance Management User Interface is not accessible anymore. To troubleshoot this issue further, we need to open SSH session with the appliance and enable Shell mode.

Firstly, we need to netstat command to see if any service is listening on TCP port 5480. The command output does not show anything.

IPv6-Issue-03

The next step is to identify the service which provides the Appliance MUI and its current status. Fortunately, I have noticed an error message which is related to the problem when the operating system is booting up.

IPv6-Issue-04

Querying the vami-lighttp.service status shows the following results.

IPv6-Issue-05

So it is a duplicate parameter server.use-ipv6 in the configuration file which was causing this behaviour. To find this file, I was using a combination of rpm and egrep commands to filter the output.

IPv6-Issue-06

A quick search in /opt/vmware/etc/lighttpd/lighttpd.conf shows that there are two identical lines with IPv6 settings as follows:

IPv6-Issue-07

To fix this issue, I removed one of the lines, started the vami-lighttp.service and checked that the service works as expected.

IPv6-Issue-08

To be continued…

vSphere 6.0: Available storage for /storage/log reached warning thershold – less then 30 % available space

For those who have vCenter Server Appliance with an External Platform Services Controller, you might notice a warning message in Services Health area in Administration -> System Configuration -> Summary tab.

VMware Syslog Service reports a warning message as soon as /storage/log has less than 30 percent of free space, similar to what is in the picture below.

syslog-service-issue-01

syslog-service-issue-02

The problem appears to be with the VMDK disk for a /storage/log mount point. On PSC, it has a default size of 5 GB and is quickly filling in with the SSO log files.

syslog-service-issue-03

VMware has two possible solutions to resolve this issue, as follows:

The second option sounds more preferable, as it eliminates the need to monitor changes in the log4j.properties file after a system update. However, the commands in the VMware KB 2126276 do not apply to the Platform Services Controller appliance. It doesn’t have a vpxd_servicecfg script to automate the volume extension.

Fortunately, Florian Grehl has documented a workaround for PSC, which requires us to extend the VMDK5 using the vSphere Web Client and execute the following commands in an SSH session on the affected server:

1. Rescan the SCSI Bus to make Linux aware of the resized virtual disk

# rescan-scsi-bus.sh -w –forcerescan

2. Change the size of the Volume Group by using the Disk Device from the table above

# pvresize /dev/sde

3. Resize the Logical Volume by using the name from the table above

# lvresize –resizefs -l +100%FREE /dev/log_vg/log

After completing the commands and verifying the volume size, we should restart VMware Syslog Service to refresh its state. It can be done from the same SSH session or using vSphere Web Client.

syslog-service-issue-04

And this is how things are back to normal 🙂

vSphere 6.5 GA: VMware-VMRC.exe – Failed to install hcmon driver.

After upgrading the vCenter Server Appliance to version 6.5, I needed to install a new version of VMware Remote Console 9.0 on my Windows 10 machine.

vmware-vmrc-install
VMware-VMRC.msi was downloaded from the vCenter Server, and I initiated its installation.

vmware-vmrc-download

To my surprise, this task ended up with an error message below.

vmware-vmrc-error

I immediately searched on VMware for any explanation and found KB # 2130850. Despite the workaround provided, I haven’t had vSphere Client installed on the computer.

Quickly checking the list of VMware products available, I was able to identify the package which caused the problem. It was a VMware Remote Console Plug-in 5.1 from the previous version of vSphere which prevented the installer from doing its job. Removing the old piece of software completely resolved the obstacle for my environment. Easy-peasy!