The premise of providing -spin is to allow 'retries' when not available, then nap to napmax before trying again.
For example, -spin 5000, -nap 1, -napmax 32
Assuming a Latch is held for 100 nanoseconds, 2.4Ghz CPU.
- -spin 5000, takes 2083 ns spin time (~21 times longer than a latch is held by another process)
- failing to get that latch, nap 1 == (10,000 times longer that that latch is held)
- the CPU is available to process 479 full spin cycles of 5000, or 2395000 operations in this 1ms nap time and very likely to aquire that latch the next cycle
However, each process is a separate operating system process which is penalized for the slow shared memory sync. (Cache Coherency)
- Every database has shared memory cache, these must be coordinated with mutex locks (latches).
- The process of doing this requires CPU caches to be synchronized.
- When a higher value for spin is used, the process starts jumping around cores.
- By lessening the number of processes connected directly to shared memory by connecting client/server, Cache Coherency is reduced.
With more than 16 CPU's, NUMA is enabled with memory utilization with multiple cores and local vs. remote cache.
- Numa Quotent -- the time it takes for a CPU to read memory on a remote note as compared to reading memory locally
- When the LPAR spans the NUMA zone, the effect is worse
- This can be improved by binding a process and it's memory to a smaller set of cores.
Revise with System Specialists:
- The number of physical CPUs and cores,
- How many NUMA nodes,
- How many vCPUs configured (numactl -H)
- "lparstat -i" and "lssrad -va"