EMC VNX POOL LUNS + VMWARE VSPHERE + VAAI = STORAGE DEATH v2

Recently one of my friends has been testing a greenfield vSphere environment and came across an issue with Storage vMotion being slow. It took almost one hour to copy a VM with 510GB VMDK (thick provision eager zeroed) across two LUNs within the same physical array.

vm-disk-properties

In this case, it was EMC VNX5200 with the following firmware versions:

  • OE for Block – 05.33.009.5.155
  • OE for File – 8.1.9-155.

Multipathing policies are default for the storage with SATP set to VMW_SATP_ALUA_CX and PSP set to VMW_PSP_RR.

According to EMC and VMware HCL, this storage should offload XCOPY operations using VAAI feature in ESXi 6.0.

The ESXi hosts were connected through 8Gb SAN, all with firmware and driver versions supported by VMware.

What was more interesting, he noticed that the host had had warning messages as follows:

Device naa. performance has deteriorated. I/O latency increased from average value of XXXXX microseconds to XXXXXX microsecond

VMware KB article # 2007236 states that the possible root causes for this behaviour could be changes made on the target, disk or media failures, overload conditions on the device, and failover. Storage system didn’t report any hardware failures in the past. So, most probably, it was the result of misconfiguration or software fault.

A quick search on the Internet directed me to Neal Dolson’s blog post published in 2014 that described a similar problem. Using the same methodology as the author did, we have received the same results.

esxtop-vaai-enabled

Esxtop showed high storage device command latency and constant switches between vmhba3 and vmhba4.

ua-vaai-enabled

On the storage side, a response time in Unisphere Analyser went up from few milliseconds to 850-900 milliseconds. In the graph above, VMFS_05 is the LUN from which data has been migrated.

Neal’s article suggested contacting the vendor and upgrading the storage firmware. EMC released a fix for this particular problem in version 05.32.000.5.217 of VNX OE for Block (page 17 of the document). However, it applies only to the first generation of VNX:

Platforms:
VNX5100 VNX5150 VNX5300 VNX-VSS100 VNX5500 VNX5700 VNX7500

Severity:
Medium

Frequency of occurrence:
Always under a specific set of circumstances

Tracking number:
61525078/624886

Slow performance was seen on a storage system when running VMware ESX operations that use the VAAI (vStorage APIs for Array Integration) data move primitive (xcopy), such as cloning virtual machines or templates, migrating virtual machines with storage vmotion, and deploying virtual machines from template.

This software has multiple enhancements to improve latency, as well as new code efficiencies to greatly improve cloning and vmotion.

KnowledgeBase ID:
None

Fixed in version:
05.32.000.5.217

I looked at the latest release notes for VNX Operating Environment for Block for VNX5200, and couldn’t find similar information there.

As a workaround, we disabled “DataMover.HardwareAcceleratedMove” option in Advanced System Settings on all hosts using this simple PowerCLI command:

Get-VMHost | Get-AdvancedSetting -Name DataMover.HardwareAcceleratedMove | Set-AdvancedSetting -Value 0

This change is not destructive and can be done online (even if you have Storage vMotion running).

The next step is to log the case with VMware and wait for the resolution.

If you had a similar problem, feel free to share your experience in the comments.

I will keep updating this post when more information is available.

23/09/2016 – Update 1: VMware GSS confirmed that the system had been configured correctly and suggested contacting the storage vendor about the matter.

16/03/2017 – Update 2: A workaround for this issue is to follow the recommendations from EMC and increase the value of DataMover.MaxHwTransferSize parameter to “16384” on each host connected to the LUN.

VMware Host Client and Chrome SSH Console

Among many recent improvements in ESXi Embedded Host Client, one feature stands out and makes the life of IT professionals easier. I am talking about the integration between the Host Client (Fling 11) and Chrome SSH Console (nassh).

After updating the Host Client to version 1.8.1, a new item ‘Get SSH for Chrome’ appears in the context menu when you right-click on the Host item.

host-client-01

If the access to the Internet is available, the option allows you to initiate the installation of the Chrome SSH Console. The plug-in’s size is about 6 Mb. It integrates into the Chrome Apps, and can be used for connecting to the remote SSH sessions.

secure-shell-01

The project is in a beta version, and, of course, it has some limitations. For example, I wasn’t able to get a correct output from the esxtop command, rather than a continuous flow of the raw data. At the same time, for basic troubleshooting and configuration, this functionality can save you some time.

secure-shell-02

It’s not clear whether VMware has any plans to introduce this feature to the official VMware Host Client. However, I welcome the developers’ approach making the user experience a bit smoother. Well done, VMware!

12/09/2016 – Update 1: The terminal emulator should be changed to vt102 with this command “export TERM=vt102” to get esxtop and DCUI working. Alternatively, the default TERM value can be modified in the program defaults.

esxtop-in-chrome-ssh-client

dcui-in-chrome-ssh-client

12/09/2016 – Update 2: There are chances this functionality is included in the official release of VMware Host Client in the future. I will provide an update when more information is available.

And the story begins…

sydney-cbd-view

Sunday evening is a perfect time to start something new. I’m not in the middle of a busy week. All home duties were done. My bike was cleaned and lubed. So next morning I ride it to the office (my very recent and exciting habit). I even managed to watch Manchester derby today. Not bad!

Let’s talk about business then.

This blog is about IT and life around it. For someone who has almost two decades of working with different aspects of it, this is my way to contribute, share, communicate, and learn.

As you might already notice, English is my second language. I apologise in advance for the mistakes made. However, I keep improving this skill through the writing.

I think it’s enough for the introduction. More posts will come shortly.

And the story begins…