We have configured new Linux hosts to run in a VMWare environment. We have 1 VM running on a single ESX host. The HP Linux server has 4 sockets with 10 cores/socket for a total of 40 CPU's.
The test is to connect to a database running on the new server, remotely using (-db, -H, -S) to the same database in a loop the connection should be made in less than 50 milliseconds. All that is being done, is to do a native Progress ABL connect -db (dbname) -H (IP Address) and -S (port number), and record the elapsed time before and after the connection.
When we configure the host to run with 1 socket and 8 CPU's, for a total of 8 cores, the database acts normal and most connections are within 25-35 milliseconds.
When we change the configuration to 4 sockets and 8 CPU's, for a total of 32, the same exact causes the connection to pause, for various times, from 8 seconds to sometimes over 2 minutes.
Working with VMWare and Progress on this issue, it was determined that we try a different network driver. We were using vmxnet3, and tried using e1000. The e1000 driver worked, and eliminated the pause completely.
We were contacted by Progress that we should be using vmxnet3, and VMWare is stating that e1000 driver is old and should be using vmxnet3 as well.
So my question, has anyone run into this issue? Is anyone running with VMWare, with vmxnet3 network driver, over multiple sockets? Has anyone tested doing a remote database connection to determine if there have been any pauses in connecting?
We discussed 6.5 vs. 6.7 and we decided to go to 6.5. We rarely go bleeding edge...
So we installed 6.5 and did the test with vmxnet3, and it worked flawlessly... The issue is gone..
This sounds suspiciously like a KB entry that was distributed via the PANS late last week.
I agree with VMWare/PSC that you should not be using E1000 over vmxnet3. I suspect that if you do further benchmarking tests you'll see that the E1000 is significantly slower than the vmxnet3. Certainly we have in our testing.
Your comments regarding 1 socket vs multiple sockets suggests that this is a NUMA issue within the hypervisor. Do simple ping tests show similar results? I.e. if you run ping for an hour or two on the 32-core VM, do you see any variations in the ping times?
Does the 4-socket/32-core configuration represent the entire physical capacity of the physical server?
Did you configure the hypervisor with vCPU = # of actual processors, or # of hyperthreads? I often see hypervisor configurations where the the number of vCPU is = # hyperthreads = 2 X #cores.
Perhaps the better question is how many physical CPUs and cores, how many NUMA nodes, and then how many vCPUs configured in ESXi?
What is the scheduler being used? Example:
$ cat /sys/block/sda/queue/scheduler
noop [deadline] cfq
What is the output of numactl -H on the 32 core VM?
numactl also allow you to bind a process and it's memory to a single core or set of cores. I would be curious to see if the results are different if you tie the broker, its memory and the single _mprosrv -m1 to a single processor.
How many _mprosrv -m1 processes are running and being used by your tests? I would think just one, but want to be certain.
Are you checking if the broker and presumably single _mprosrv -m1 process are bouncing around cores? The example below shows that my _mprosrv processes are bound to either CPU 0 or 1 (the PSR column).
$ for i in $(pgrep _mprosrv);do ps -mo pid,tid,fname,user,psr -p $i;done
PID TID COMMAND USER PSR
10244 - _mprosrv root -
- 10244 - root 1
10274 - _mprosrv root -
- 10274 - root 0
10278 - _mprosrv root -
- 10278 - root 1
10282 - _mprosrv root -
- 10282 - root 1
10285 - _mprosrv root -
- 10285 - root 0
10313 - _mprosrv root -
- 10313 - root 1
10336 - _mprosrv root -
- 10336 - root 0
10366 - _mprosrv root -
- 10366 - root 1
10369 - _mprosrv root -
- 10369 - root 1
10397 - _mprosrv root -
- 10397 - root 1
10424 - _mprosrv root -
- 10424 - root 0
10937 - _mprosrv root -
- 10937 - root 1
10994 - _mprosrv root -
- 10994 - root 1
16326 - _mprosrv root -
- 16326 - root 1
34287 - _mprosrv root -
- 34287 - root 1
You did not mention your vSphere/ESX version, and the version of VM Tools, there have been recent vmxnet3 issues reported (in general). Pardon me for being blunt, but why/what you would need 32 CPUs on a single box for? Aside for this setting 'engaging' NUMA for the VM, you might be running into kernel scheduling contention on the host. You might want to try assigning 4 NICs for this VM, binding each NIC to one NUMA node and bonding them.
[noop] deadline cfq
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 0 size: 491519 MB
node 0 free: 397671 MB
Physical server has 4 socket, 10 cores each with HT enabled. 1 numa node when there is 4 socket, 8 cores which has been assigned to a VM.
We will be running 60 OE databases on this server, and that is why we need 32 cpu's. That is not uncommon for us. We have 4 servers set up in this configuration, to run 200 databases, about 5000 user connections.
proserve -db test -S 25000 -minport 20000 -maxport 29999 -B 25000 -Mn 25 -Mpb 20 -Ma 20 -Mi 1
This server is very idle right now as this is the only functioning process on this server.
When I do the test, it's always going to the same pid, as I see that through the db.lg. This is a copy of the sports2000 database so it's very isolated but I can have up to 20 server processes.
When I just did the ps -mo pid,fname,user,psr -p (pid) of the server process... I can see this process assigned to different processors 8, 9, 11, 12, 14, etc... just looping every 3 seconds..
Also I want to add... when I run an strace command against the process that my test connects to, the pause does not happen and when I terminate the strace... the pause starts again. very odd.
Can you please answer Libor's questions: ESX version and VMTools version? He mentioned some known vmxnet3 issues.
On the top of it - what is the HW brand in question? If that's confidentail then what has BIOS NUMA node interleaving settings has been set to ?
The Linux view of a single NUMA node configuration is suspicious. Clearly you have more than one NUMA node.
Check out this article:
HP DL560 Gen 10 chip...
Libor & Paul,
Please note that during testing Rob did reduce the memory down to 128 GB and lower such that numactl showed a single node. The performance problem was still present.
I found this via Google & wonder if it applies ... kb.vmware.com/.../2129176
I did a lot of testing on our VCloud Director environment using the following config:
- CentOS 7.5, memory varied between 64 and 480 GB, virtual sockets varied between 1 and 4, virtual cpus varied between 8 and 32, OpenEdge 11.5.1.
- Windows 10 64 bit client, 1 virtual cpu, 4 GB memory, OpenEdge 11.5.1 / 11.6 / 11.7.
- CentOS 7.5, 1 virtual cpu, 4 GB memory, OpenEdge 11.5.1.
I could see the problem very, very slightly when using Windows 10 as the client and the large CentOS VM as the server but it was random and the delays were much smaller than what Rob sees..
Using CentOS 7.5 for both client and server did not show the problem.
Let me know if you want to know anything else we did during the testing.
One of the first things to try is ESXi 6.5 or later. I know I know, easy for me to say over here in my office. But you are testing a vmxnet3 issue on a 3-year old version of ESXi and Libor mentioned that there are known vmxnet3 issues that have been corrected since then.
We tried disabling the LSO/RSC and powered the VM off and on...
The issue was still present. Seeing lenghy pauses....
│Connect Time Disconnect Time│
│ 16 0│
│ 15 0│
│ 17 0│
│ 19,802 0│
│ 47,292 0│
We are going to try the ESXi 6.5 version and we will test again...
> Please note that during testing Rob did reduce the memory down to 128 GB and lower such that numactl showed a
> single node
IMO, 32vCPUs won't fit into 1 single NUMA node, they will span across 4 (or 3 if the HyperThreading is on). It's possible that Node interleaving feature is enabled (and it should not be) which would explain the numactl single output, but that information is not available in this thread.
I am assuming that the C/S test is done from Windows and that Windows is also a VM, tho it is not specified explicitly said here.I would try same C/S (in Rob's environment) stress test on Linux to further clarify where the hiccups issue resides as well as getting the last VM tools off vmware.com site (10.3.2) - my.vmware.com/.../details
> We are going to try the ESXi 6.5 version and we will test again.
Why not 6.7 ?