We have configured new Linux hosts to run in a VMWare environment. We have 1 VM running on a single ESX host. The HP Linux server has 4 sockets with 10 cores/socket for a total of 40 CPU's.
The test is to connect to a database running on the new server, remotely using (-db, -H, -S) to the same database in a loop the connection should be made in less than 50 milliseconds. All that is being done, is to do a native Progress ABL connect -db (dbname) -H (IP Address) and -S (port number), and record the elapsed time before and after the connection.
When we configure the host to run with 1 socket and 8 CPU's, for a total of 8 cores, the database acts normal and most connections are within 25-35 milliseconds.
When we change the configuration to 4 sockets and 8 CPU's, for a total of 32, the same exact causes the connection to pause, for various times, from 8 seconds to sometimes over 2 minutes.
Working with VMWare and Progress on this issue, it was determined that we try a different network driver. We were using vmxnet3, and tried using e1000. The e1000 driver worked, and eliminated the pause completely.
We were contacted by Progress that we should be using vmxnet3, and VMWare is stating that e1000 driver is old and should be using vmxnet3 as well.
So my question, has anyone run into this issue? Is anyone running with VMWare, with vmxnet3 network driver, over multiple sockets? Has anyone tested doing a remote database connection to determine if there have been any pauses in connecting?
We discussed 6.5 vs. 6.7 and we decided to go to 6.5. We rarely go bleeding edge...
So we installed 6.5 and did the test with vmxnet3, and it worked flawlessly... The issue is gone..
I forgot to ask this, but - how does/did VMware explain the difference between vmxnet3 vs e1000 and what did they say about the cause of vmxnet3 being 'slow on connect' ?
hahahahahaha!!! Classic! Same application does exactly the same thing. Only diff is a vmware change. Of course they said it was an application problem.
> They said it was an application problem.
And that's IT ? And that's been accepted ??
Their mantra is that they have no control over or exposure to what happens inside a VM and that we need to prove to them that we are not the problem.
I think the fact that switching the adapters makes a difference has proved it already, but that does not help Rob here. (I mean if we'd start acting like them, them being VMware and pointing fingers instead of actually trying to figure out something). I suppose the proof would be install to Linux on the same bare metal where the ESX host is and redo the same test. I have kicked off the connect/disconnect code 2 hours ago and leaving it overnight, but from my previous (albeit quick) tests - using ESX 6.0 and 6.7 I saw no hiccups.
Thanks for the update, Rob. I have run 12 hours of your code against 6.7 - no dice seeing the problem. Same test will run against 6.0 later today overnight, it is only CPU time after all :)
I would be very interested to know what does/would VMware say about 6.5 correcting the issue :)
<BIG SMILE> !!
Great news Rob. Not just for you, but for all of us out here who support customers with these configurations.
The next thing to do is download ProTop and run some of the built-in benchmark tests to compare your new server to your old server. Better yet, install ProTop on the old server so that we can gather some historical metrics. That way when you go live, we can quickly and easily compare before and after.
Ping me offline if you're interested.
Sigh Paul. And such a nice technical thread this was ... .:p :-D
Hey hey hey! The benchmark tests are free. And short-term historical metrics are free, too. You only have to pay if you want access to alerting and long-term historical metrics.
I am asking that question to them... I will post the reply.
Hi Rob, are we now agreed that this wasn't a Progress issue?