Understanding what ProTop is telling me

Posted by James Palmer on 16-Jun-2015 10:00

We've finally been able to get ProTop installed on our production server. Running 11.5 on Win 2012 R2 (without the Linux patch). 

Things are running so much more efficiently than they did on our old box, but most of our settings were done for that box so I want to revisit them over the next weeks and months to tweak and change to get the most out of what we've got. 

So first of all - the Configuration Viewer: http://gyazo.com/75db8010dbff285388ea7ed8a9fde9d6

And then a couple of snapshots of the main dashboard during a reasonably busy period:

http://gyazo.com/981e86be5d60abf5ea01c7b3aeeac888

http://gyazo.com/1d56bab479c08584eac627dc50ac1886

http://gyazo.com/7d70100d304cd096bfbc7bc63d519ff3

http://gyazo.com/2ab7ed2942c889dd4f9c5755bb019b69

This is ProTop's default refresh rate. 

Our most recent startup output from the DB from the DB logs

[2015/06/13@08:18:37.713+0100] P-4952       T-4956  I BROKER  0: (333)   Multi-user session begin. 
[2015/06/13@08:18:37.715+0100] P-4952       T-4956  I BROKER  0: (10545) Connections to this database will not be allowed until all Database Services started have completed their startup and initialisation. 
[2015/06/13@08:18:37.762+0100] P-4952       T-4956  I BROKER  0: (15321) Before Image Log Initialisation at block 0  offset 1335. 
[2015/06/13@08:18:38.251+0100] P-4952       T-4956  I BROKER  0: (452)   Login by inencoadmin on batch. 
[2015/06/13@08:18:38.298+0100] P-4728       T-4700  I RPLS  162: (-----) Login by inencoadmin.
[2015/06/13@08:18:38.308+0100] P-4728       T-4700  I RPLS  162: (7129)  Usr 162 set name to inencoadmin. 
[2015/06/13@08:18:38.308+0100] P-4728       T-4700  I RPLS  162: (10819) The Fathom Replication property file is being processed. 
[2015/06/13@08:18:38.308+0100] P-4728       T-4700  I RPLS  162: (10326) Server Properties 
[2015/06/13@08:18:38.308+0100] P-4728       T-4700  I RPLS  162: (10327) Database Name (database): primary 
[2015/06/13@08:18:38.308+0100] P-4728       T-4700  I RPLS  162: (10330) Transition Method (transition): manual. 
[2015/06/13@08:18:38.308+0100] P-4728       T-4700  I RPLS  162: (10331) Transition Timeout (transition-timeout): 600. 
[2015/06/13@08:18:38.308+0100] P-4728       T-4700  I RPLS  162: (11715) Minimum Polling Delay (minimum-polling-delay): 5. 
[2015/06/13@08:18:38.308+0100] P-4728       T-4700  I RPLS  162: (11716) Maximum Polling Delay (maximum-polling-delay): 500. 
[2015/06/13@08:18:38.308+0100] P-4728       T-4700  I RPLS  162: (11718) Defer Agent Startup (defer-agent-startup): Not Active. 
[2015/06/13@08:18:38.308+0100] P-4728       T-4700  I RPLS  162: (12231) Replication Keep Alive (repl-keep-alive): 120. 
[2015/06/13@08:18:38.308+0100] P-4728       T-4700  I RPLS  162: (10326) Control Agent Properties 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (10332) Control Agent (control-agent) : agent1. 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (10333) Host Name (host): hobt2. 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (10334) Port (port): 8000. 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (14249) TCP/IP Version (ipver): ipv4. 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (10335) Critical (critical): 0. 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (10336) Replication Method (replication-method): async. 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (10715) Connect Timeout (connect-timeout): 120. 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (10337) Maximum Message Length (maximum-message): 8512 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (12233) Schema Lock Action (schema-lock-action) : wait. 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (12685) Agent Shutdown Action (agent-shutdown-action) : recovery. 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (10326) Transition Properties 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (13261) Database-role (database-role) : normal. 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (13262) Responsibility (responsibility) : . 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (13263) Restart database after transition (restart-after-transition) : No. 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (16734) Start secondary broker after transition (start-secondary-broker) : No. 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (13264) Source startup arguments (source-startup-arguments) : None. 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (13264) Source Secondary Broker startup arguments (source-secondary-broker-startup-arguments) : None. 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (13264) Target startup arguments (target-startup-arguments) : None. 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (13264) Target Secondary Broker startup arguments (target-secondary-broker-startup-arguments) : None. 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (13264) Normal startup arguments (normal-startup-arguments) : None. 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (13264) Normal Secondary Broker startup arguments (normal-secondary-broker-startup-arguments) : None. 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (13265) Automatically begin after-imaging during transition (auto-begin-ai) : Yes. 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (13266) Automatically add after-image areas during transition (auto-add-ai-areas) : Yes. 
[2015/06/13@08:18:38.310+0100] P-4728       T-4700  I RPLS  162: (13267) Structure file that contains the after-image area definitions to automatically add (ai-structure-file) : f:\database\live\icmaslivai.st. 
[2015/06/13@08:18:38.311+0100] P-4728       T-4700  I RPLS  162: (13268) Database backup method to use during transition (backup-method) : mark. 
[2015/06/13@08:18:38.311+0100] P-4728       T-4700  I RPLS  162: (13269) Backup arguments (backup-arguments) : None. 
[2015/06/13@08:18:38.311+0100] P-4728       T-4700  I RPLS  162: (13269) Incremental Backup arguments (incremental-backup-arguments) : None. 
[2015/06/13@08:18:38.311+0100] P-4728       T-4700  I RPLS  162: (13269) Recovery Backup arguments (recovery-backup-arguments) : None. 
[2015/06/13@08:18:38.311+0100] P-4728       T-4700  I RPLS  162: (10500) The Fathom Replication Server successfully started as PID 4728. 
[2015/06/13@08:18:38.313+0100] P-4728       T-4700  I RPLS  162: (10842) Connecting to Fathom Replication Agent agent1. 
[2015/06/13@08:18:38.313+0100] P-4952       T-4956  I BROKER  0: (-----) The OpenEdge Replication Server is starting...
[2015/06/13@08:18:38.320+0100] P-4952       T-4956  I BROKER  0: (5644)  Started for hob-icmasliv using TCP IPV4 address 0.0.0.0, pid 4952. 
[2015/06/13@08:18:38.320+0100] P-4952       T-4956  I BROKER  0: (8836)  Connecting to Admin Server on port 7844. 
[2015/06/13@08:18:38.326+0100] P-4952       T-4956  I BROKER  0: (14262) Successfully connected to AdminServer on port 7844 using TCP/IP IPV4 address 192.168.125.1. 
[2015/06/13@08:18:41.950+0100] P-4728       T-4700  I RPLS  162: (10507) The Fathom Replication Server has successfully connected to the Fathom Replication Agent agent1 on host 192.168.125.45. 
[2015/06/13@08:18:41.950+0100] P-4728       T-4700  I RPLS  162: (11251) The Replication Server successfully connected to all of its configured Agents. 
[2015/06/13@08:18:41.951+0100] P-4728       T-4700  I RPLS  162: (10508) Beginning Fathom Replication synchronization for the Fathom Replication Agent agent1. 
[2015/06/13@08:18:41.954+0100] P-4728       T-4700  I RPLS  162: (11805) Unlocking after-image file 13 and locking ALL FULL after-image files beginning with file 14. 
[2015/06/13@08:18:41.954+0100] P-4728       T-4700  I RPLS  162: (10436) The source database icmasliv and the target database f:\database\live\icmasliv on host hobt2 are synchronized. 
[2015/06/13@08:18:51.065+0100] P-4952       T-4956  I BROKER  0: (9149)  Time out limit (10 seconds) exceeded waiting for ACK. 
[2015/06/13@08:18:51.066+0100] P-4952       T-4956  I BROKER  0: (8845)  Error registering with Admin Server. 
[2015/06/13@08:18:51.067+0100] P-4952       T-4956  I BROKER  0: (4234)  Progress OpenEdge Release 11.5 build 1114 on WINNT . 
[2015/06/13@08:18:51.067+0100] P-4952       T-4956  I BROKER  0: (4281)  Server started by inencoadmin on batch. 
[2015/06/13@08:18:51.068+0100] P-4952       T-4956  I BROKER  0: (6574)  Started using pid: 4952. 
[2015/06/13@08:18:51.069+0100] P-4952       T-4956  I BROKER  0: (9426)  Large database file access has been enabled. 
[2015/06/13@08:18:51.069+0100] P-4952       T-4956  I BROKER  0: (13871) After-image Management Archival Method: Timed. 
[2015/06/13@08:18:51.071+0100] P-4952       T-4956  I BROKER  0: (13875) This database is enabled for OpenEdge Replication as a Source database. 
[2015/06/13@08:18:51.071+0100] P-4952       T-4956  I BROKER  0: (15219) Encryption enabled: 0 
[2015/06/13@08:18:51.072+0100] P-4952       T-4956  I BROKER  0: (15824) Multi-tenancy enabled: 0 
[2015/06/13@08:18:51.072+0100] P-4952       T-4956  I BROKER  0: (15824) Table Partitioning enabled: 0 
[2015/06/13@08:18:51.073+0100] P-4952       T-4956  I BROKER  0: (-----) LRU mechanism enabled.
[2015/06/13@08:18:51.073+0100] P-4952       T-4956  I BROKER  0: (4282)  Parameter File: Not Enabled. 
[2015/06/13@08:18:51.073+0100] P-4952       T-4956  I BROKER  0: (9336)  Created shared memory with segment_id: 1 
[2015/06/13@08:18:51.074+0100] P-4952       T-4956  I BROKER  0: (4250)  Before-Image Cluster Size: 33554432. 
[2015/06/13@08:18:51.074+0100] P-4952       T-4956  I BROKER  0: (4251)  Before-Image Block Size: 8192. 
[2015/06/13@08:18:51.075+0100] P-4952       T-4956  I BROKER  0: (13873) After-image Management Archival Directory List (-aiarcdir): j:\live 
[2015/06/13@08:18:51.075+0100] P-4952       T-4956  I BROKER  0: (13874) Create After-image Management Archival Directory(s) (-aiarcdircreate): Not Enabled 
[2015/06/13@08:18:51.076+0100] P-4952       T-4956  I BROKER  0: (13872) After-image Management Archival Interval (-aiarcinterval): 3600 
[2015/06/13@08:18:51.076+0100] P-4952       T-4956  I BROKER  0: (4256)  Number of After-Image Buffers (-aibufs): 200 
[2015/06/13@08:18:51.077+0100] P-4952       T-4956  I BROKER  0: (4254)  After-Image Stall (-aistall): Enabled 
[2015/06/13@08:18:51.077+0100] P-4952       T-4956  I BROKER  0: (17555) Starting index number for statistics range (-baseindex): 1 
[2015/06/13@08:18:51.078+0100] P-4952       T-4956  I BROKER  0: (17554) Starting table number for statistics range (-basetable): 1 
[2015/06/13@08:18:51.078+0100] P-4952       T-4956  I BROKER  0: (4252)  Number of Before-Image Buffers (-bibufs): 200 
[2015/06/13@08:18:51.079+0100] P-4952       T-4956  I BROKER  0: (6552)  BI File Threshold Stall (-bistall): Disabled. 
[2015/06/13@08:18:51.079+0100] P-4952       T-4956  I BROKER  0: (9238)  BI File Threshold size (-bithold): 0.0   Bytes 
[2015/06/13@08:18:51.080+0100] P-4952       T-4956  I BROKER  0: (6573)  Database Blocksize (-blocksize): 8192 
[2015/06/13@08:18:51.080+0100] P-4952       T-4956  I BROKER  0: (12812) BIW writer delay (-bwdelay): 0 
[2015/06/13@08:18:51.081+0100] P-4952       T-4956  I BROKER  0: (12813) Allowed index cursors (-c): 4004. 
[2015/06/13@08:18:51.081+0100] P-4952       T-4956  I BROKER  0: (12265) SSL Certificate Store Path (-certstorepath): Not Enabled 
[2015/06/13@08:18:51.082+0100] P-4952       T-4956  I BROKER  0: (4264)  Character Set (-cpinternal): ISO8859-1 
[2015/06/13@08:18:51.082+0100] P-4952       T-4956  I BROKER  0: (4235)  Physical Database Name (-db): f:\database\live\icmasliv 
[2015/06/13@08:18:51.082+0100] P-4952       T-4956  I BROKER  0: (4238)  Direct I/O (-directio): Not Enabled 
[2015/06/13@08:18:51.083+0100] P-4952       T-4956  I BROKER  0: (4236)  Database Type (-dt): PROGRESS 
[2015/06/13@08:18:51.083+0100] P-4952       T-4956  I BROKER  0: (15218) Encryption cache size (-ecsize): 1000 
[2015/06/13@08:18:51.084+0100] P-4952       T-4956  I BROKER  0: (12814) Group delay (-groupdelay): 10 
[2015/06/13@08:18:51.084+0100] P-4952       T-4956  I BROKER  0: (4242)  Hash Table Entries (-hash): 1037347 
[2015/06/13@08:18:51.085+0100] P-4952       T-4956  I BROKER  0: (4244)  Crash Recovery (-i): Enabled 
[2015/06/13@08:18:51.085+0100] P-4952       T-4956  I BROKER  0: (17557) Number of indexes included in statistics collection (-indexrangesize): 2020 
[2015/06/13@08:18:51.086+0100] P-4952       T-4956  I BROKER  0: (14268) TCP/IP Version (-ipver): IPV4 
[2015/06/13@08:18:51.086+0100] P-4952       T-4956  I BROKER  0: (12263) SSL Key Alias Name (-keyalias): Not Enabled 
[2015/06/13@08:18:51.088+0100] P-4952       T-4956  I BROKER  0: (12815) Lock table hash table size (-lkhash): 6661 
[2015/06/13@08:18:51.088+0100] P-4952       T-4956  I BROKER  0: (17805) Original Lock Release Algorithm (-lkrela): Not Enabled 
[2015/06/13@08:18:51.089+0100] P-4952       T-4956  I BROKER  0: (17560) Number of LRU force skips (-lruskips): 50 
[2015/06/13@08:18:51.089+0100] P-4952       T-4956  I BROKER  0: (17561) Number of LRU2 force skips (-lru2skips): 0 
[2015/06/13@08:18:51.090+0100] P-4952       T-4956  I BROKER  0: (13953) Maximum Area Number (-maxArea): 32000 
[2015/06/13@08:18:51.090+0100] P-4952       T-4956  I BROKER  0: (12540) Size of JTA transaction table (-maxxids):  100 
[2015/06/13@08:18:51.091+0100] P-4952       T-4956  I BROKER  0: (5649)  Maximum Port for Auto Servers (-maxport): 5000 
[2015/06/13@08:18:51.091+0100] P-4952       T-4956  I BROKER  0: (5648)  Minimum Port for Auto Servers (-minport): 3000 
[2015/06/13@08:18:51.091+0100] P-4952       T-4956  I BROKER  0: (17564) Multi-tenancy partition cache size (-mtpmsize): 1024 
[2015/06/13@08:18:51.092+0100] P-4952       T-4956  I BROKER  0: (12821) Use muxlatches (-mux): 1 
[2015/06/13@08:18:51.092+0100] P-4952       T-4956  I BROKER  0: (4260)  Maximum Number of Users (-n): 1001 
[2015/06/13@08:18:51.093+0100] P-4952       T-4956  I BROKER  0: (17566) Minimum time to nap at first -spin exhaustion (-nap): 10 
[2015/06/13@08:18:51.093+0100] P-4952       T-4956  I BROKER  0: (17565) Maximum time to nap at -spin exhaustion (-napmax): 250 
[2015/06/13@08:18:51.094+0100] P-4952       T-4956  I BROKER  0: (12273) No SSL Session Cache (-nosessioncache): Not Enabled 
[2015/06/13@08:18:51.094+0100] P-4952       T-4956  I BROKER  0: (17807) Disable LRU mechanism (-nolru): Not Enabled 
[2015/06/13@08:18:51.095+0100] P-4952       T-4956  I BROKER  0: (16689) Login Governor (-nGovernor): 0 of 1001 
[2015/06/13@08:18:51.095+0100] P-4952       T-4956  I BROKER  0: (8527)  Storage object cache size (-omsize): 2660 
[2015/06/13@08:18:51.096+0100] P-4952       T-4956  I BROKER  0: (13870) Database Service Manager - IPC Queue Size (-pica): 8.0   MBytes 
[2015/06/13@08:18:51.096+0100] P-4952       T-4956  I BROKER  0: (17802) Shared memory segments locked (-pinshm): Not Enabled 
[2015/06/13@08:18:51.097+0100] P-4952       T-4956  I BROKER  0: (16953) Use pollset mechanism for client/server (-pollset): Not Enabled 
[2015/06/13@08:18:51.097+0100] P-4952       T-4956  I BROKER  0: (16955) Delay first prefetch message (-prefetchDelay): Not Enabled 
[2015/06/13@08:18:51.098+0100] P-4952       T-4956  I BROKER  0: (16956) Prefetch message fill percentage (-prefetchFactor): 0 
[2015/06/13@08:18:51.098+0100] P-4952       T-4956  I BROKER  0: (16957) Minimum records in prefetch msg (-prefetchNumRecs): 16 
[2015/06/13@08:18:51.098+0100] P-4952       T-4956  I BROKER  0: (16958) Suspension queue poll priority (-prefetchPriority): 0 
[2015/06/13@08:18:51.099+0100] P-4952       T-4956  I BROKER  0: (17568) APW queue scan cycle time in milliseconds (-pwqdelay): 100 
[2015/06/13@08:18:51.099+0100] P-4952       T-4956  I BROKER  0: (17569) APW minimum queue length before write (-pwqmin): 1 
[2015/06/13@08:18:51.100+0100] P-4952       T-4956  I BROKER  0: (17570) APW buffer scan cycle time in seconds (-pwsdelay): 1 
[2015/06/13@08:18:51.100+0100] P-4952       T-4956  I BROKER  0: (17571) APW maximun number of buffers to scan per cycle (-pwscan): 5001 
[2015/06/13@08:18:51.101+0100] P-4952       T-4956  I BROKER  0: (17572) APW maximum number of buffers to write per cycle (-pwwmax): 25 
[2015/06/13@08:18:51.101+0100] P-4952       T-4956  I BROKER  0: (4247)  Before-Image File I/O (-r -R): Reliable 
[2015/06/13@08:18:51.102+0100] P-4952       T-4956  I BROKER  0: (17563) Record free chain search depth factor (-recspacesearchdepth): 5 
[2015/06/13@08:18:51.102+0100] P-4952       T-4956  I BROKER  0: (6526)  Number of Semaphore Sets (-semsets): 3 
[2015/06/13@08:18:51.103+0100] P-4952       T-4956  I BROKER  0: (12264) SSL Session Timeout (-sessiontimeout): 0 
[2015/06/13@08:18:51.103+0100] P-4952       T-4956  I BROKER  0: (13924) Maximum Shared Memory Segment Size (-shmsegsize): 32768 Mb 
[2015/06/13@08:18:51.103+0100] P-4952       T-4956  I BROKER  0: (4243)  Current Spin Lock Tries (-spin): 50000 
[2015/06/13@08:18:51.104+0100] P-4952       T-4956  I BROKER  0: (17803) SSL Encryption for TCP/IP connections (-ssl): Not Enabled 
[2015/06/13@08:18:51.104+0100] P-4952       T-4956  I BROKER  0: (17556) Number of tables included in statistics collection (-tablerangesize): 620 
[2015/06/13@08:18:51.106+0100] P-4952       T-4956  I BROKER  0: (14017) Area block consistency check (-AreaCheck): Not Enabled 
[2015/06/13@08:18:51.106+0100] P-4952       T-4956  I BROKER  0: (4239)  Number of Database Buffers (-B): 3000000 
[2015/06/13@08:18:51.107+0100] P-4952       T-4956  I BROKER  0: (17562) Number of Alternate Database Buffers (-B2): 600 
[2015/06/13@08:18:51.107+0100] P-4952       T-4956  I BROKER  0: (9422)  Maximum private buffers per user (-Bpmax): 64 
[2015/06/13@08:18:51.108+0100] P-4952       T-4956  I BROKER  0: (14016) Database block consistency check (-DbCheck): Not Enabled 
[2015/06/13@08:18:51.108+0100] P-4952       T-4956  I BROKER  0: (13869) Database Service Manager - Service(s) to start (-DBService): replserv 
[2015/06/13@08:18:51.109+0100] P-4952       T-4956  I BROKER  0: (10535) Enhanced Read-Only mode (-ERO): Not Enabled 
[2015/06/13@08:18:51.109+0100] P-4952       T-4956  I BROKER  0: (4237)  Force Access (-F): Not Enabled 
[2015/06/13@08:18:51.110+0100] P-4952       T-4956  I BROKER  0: (4249)  Before-Image Truncate Interval (-G): 0 
[2015/06/13@08:18:51.110+0100] P-4952       T-4956  I BROKER  0: (4261)  Host Name (-H): HOB 
[2015/06/13@08:18:51.111+0100] P-4952       T-4956  I BROKER  0: (14018) Index block consistency check (-IndexCheck): Not Enabled 
[2015/06/13@08:18:51.111+0100] P-4952       T-4956  I BROKER  0: (4241)  Current Size of Lock Table (-L): 50016 
[2015/06/13@08:18:51.112+0100] P-4952       T-4956  I BROKER  0: (16688) Lock Governor (-LGovernor): 0% 
[2015/06/13@08:18:51.112+0100] P-4952       T-4956  I BROKER  0: (4257)  Maximum Number of Clients Per Server (-Ma): 6 
[2015/06/13@08:18:51.113+0100] P-4952       T-4956  I BROKER  0: (14016) Memory overwrite check (-MemCheck): Not Enabled 
[2015/06/13@08:18:51.113+0100] P-4952       T-4956  I BROKER  0: (4245)  Delay of Before-Image Flush (-Mf): 3 
[2015/06/13@08:18:51.114+0100] P-4952       T-4956  I BROKER  0: (4259)  Minimum Clients Per Server (-Mi): 3 
[2015/06/13@08:18:51.114+0100] P-4952       T-4956  I BROKER  0: (12818) Message Buffer Size (-Mm): 4096 
[2015/06/13@08:18:51.115+0100] P-4952       T-4956  I BROKER  0: (4258)  Maximum Number of Servers (-Mn): 162 
[2015/06/13@08:18:51.115+0100] P-4952       T-4956  I BROKER  0: (12819) Servers per Protocol (-Mp): 0 
[2015/06/13@08:18:51.116+0100] P-4952       T-4956  I BROKER  0: (5647)  Maximum Servers Per Broker (-Mpb): 155 
[2015/06/13@08:18:51.117+0100] P-4952       T-4956  I BROKER  0: (4240)  Excess Shared Memory Size (-Mxs): 356 
[2015/06/13@08:18:51.117+0100] P-4952       T-4956  I BROKER  0: (4263)  Network Type (-N): TCP 
[2015/06/13@08:18:51.118+0100] P-4952       T-4956  I BROKER  0: (16954) Server network message wait time (-Nmsgwait): 2 
[2015/06/13@08:18:51.118+0100] P-4952       T-4956  I BROKER  0: (10357) Pending client connection timeout (-PendConnTimeout): 0 
[2015/06/13@08:18:51.119+0100] P-4952       T-4956  I BROKER  0: (4262)  Service Name (-S): 8000 
[2015/06/13@08:18:51.119+0100] P-4952       T-4956  I BROKER  0: (17804) Broker server group support (-ServerType): ABL 
[2015/06/13@08:18:51.120+0100] P-4952       T-4956  I BROKER  0: (10028) SQL Server Max Open Cursors (-SQLCursors): 0 
[2015/06/13@08:18:51.120+0100] P-4952       T-4956  I BROKER  0: (10026) SQL Server Stack Size (-SQLStack): 0 
[2015/06/13@08:18:51.121+0100] P-4952       T-4956  I BROKER  0: (10027) SQL Server Statement Cache Size (-SQLStmtCache): 0 
[2015/06/13@08:18:51.121+0100] P-4952       T-4956  I BROKER  0: (10029) Size [1K byte units] of SQL Server temp table buffer (-SQLTempStoreBuff): 0 
[2015/06/13@08:18:51.121+0100] P-4952       T-4956  I BROKER  0: (10030) Size [1K byte units] of SQL Server temp table disk storage (-SQLTempStoreDisk): 0 
[2015/06/13@08:18:51.123+0100] P-4952       T-4956  I BROKER  0: (10031) Size [1K byte units] of SQL Server temp table data page (-SQLTempStorePageSize): 0 
[2015/06/13@08:18:51.123+0100] P-4952       T-4956  I BROKER  0: (14019) Record block consistency check (-TableCheck): Not Enabled 
[2015/06/13@08:18:51.124+0100] P-4952       T-4956  I BROKER  0: (17717) TXE Lock retry limit (-TXERetryLimit): 0 
[2015/06/13@08:18:51.124+0100] P-4952       T-4956  I BROKER  0: (13896) TXE Commit lock skip limit (-TXESkipLimit): 10000 
[2015/06/13@08:18:51.126+0100] P-4952       T-4956  I BROKER  0: (10836) Database connections are not allowed at this time. 
[2015/06/13@08:18:51.126+0100] P-4952       T-4956  I BROKER  0: (10471) Database connections have been enabled. 


Any help and pointers much appreciated. 

All Replies

Posted by ChUIMonster on 16-Jun-2015 10:12

On the config screen I see that:

1) You are not using the new "prefetch" parameters

2) You have replication set on "go slow" (-pica 8MB)

3) You have an awful lot of very small (<2GB) extents.  That must be a PITA to deal with.

Posted by James Palmer on 16-Jun-2015 10:17

1) How new are the prefetch params, and do I need to have done an UpdateVST to use them? Can't update VST at the moment as we're still on 11.2.1 in development so developers need to be able to restore from backup using 11.2.1. Hopefully not for much longer though. Is there a guide to their use somewhere please?

2) Replication on go slow isn't too much of an issue as the target server isn't very well configured - it's just a stop gap until we can configure up the new DR server over the next couple of weeks. That being said, is there a guide to choosing the right size for this?

3) Yes. Right PITA! It's a throwback to small DB files that has never been addressed. I'm planning on doing a D&L in the near future to right a number of woes such as LOBs in data areas, and bad RPB choices. So I can simplify the extent structure at that point.

Posted by ChUIMonster on 16-Jun-2015 10:18

On the Dashboard I see:

1) You have a long running transaction problem.

2) You have 72 after-image extents.  I hate to ask but... why?

3) There is a SQL connection.

4) You've got a fairly steady record read load going on.  The next step would be to look at the table and index stats and see what is driving that and determine if it is reasonable for your application.

Posted by ChUIMonster on 16-Jun-2015 10:24

The prefetch stuff came in 10.2b  (service pack 6 if I recall).  You do not need to update VSTs to use it.  It is related to client/server network traffic.  pugchallenge.org/.../2687_Whats_New_OEDB.pptx

Extents can be resized (and the number of them and/or locations thereof changed) via probkup & prorest.  No need to D&L.  Just whip up a new .st file.  You cannot change storage areas, block size, rows per block or blocks per cluster this way but you can shuffle extents around easily.

Posted by James Palmer on 16-Jun-2015 10:25

1) Well observed. We're slowly working these through. We used to often have them up to 45 minutes. BI was a nightmare to maintain.

2) The (terribly mistaken) thinking there (not mine I might add) is that it gives us 72 hours to fix replication if it breaks before the AI extents are all full. Again, something I can fix when doing a D&L. How many would you recommend?

3) Need to remind myself what that is. I think it's PDSOE.

4) Right :)

Posted by James Palmer on 16-Jun-2015 10:26

Thanks for the link.

Posted by TheMadDBA on 16-Jun-2015 10:30

-B2 is way too small. Size it appropriately based on your mostly static tables and assign them to -B2.

-spin seems very high (check out the latch timeouts and maybe consider backing that down a bit).

In addition to the prefetch parameters that Tom mentioned you need to increase the -Mm size for your client server connections.

The application might need a little love in the performance tuning side.

Why the mix of appserver and client server connections?

Posted by James Palmer on 16-Jun-2015 10:40

-B2 was sized very conservatively for a single table as we were on 32 bit OS and Progress before and were running out of resources. Agreed - it needs some TLC.

I'll look at -spin

Is there a rule of thumb for -Mm?

We've got loads of work happening on the AppServers. The primary use of them is for queries from the REMC users. They select their filters and then fire off their query to the ansynch appserver and can then work away on another screen while they await its return.

We've also got a web portal that uses the AppServers to return data to the users.

Posted by TheMadDBA on 16-Jun-2015 10:52

-Mm will vary based on your network and how your application is coded. 16384 is usually the "right" number in my experience but I have used the slightly less than 32k max a few times. Keep in mind that -Mm must be set on both the server and the clients or you will get a connection error.

Are all of the clients on the same local network or is there a WAN involved?

Are the appservers connected with -H -S or with shared memory? Performance will be better with shared memory connections.

I would look at the appserver code first. If those queries really takes that long to process then they are probably your best bet for performance improvements.

Posted by James Palmer on 16-Jun-2015 10:55

I implemented Shared Memory connections for the AppServers a good while ago. Made a huge difference :)

Thanks for the other comments and tips. Loads to go on :)

Posted by ChUIMonster on 16-Jun-2015 11:07

Regarding -Mm...  I like 8192 or higher.

If you are going to set it large you really do need to be looking at -prefetchFactor 100  (try to fill the message 100%) or -prefetchNumRecs X (where X is the target number of messages).  Personally, I prefer shooting for 100%

If you do not set these you will get a maximum of 16 records per message (that is the default).  So much of your potential with a large -Mm will not be used.

On a related note:  I've also had good experiences with "jumbo frames" in conjunction with larger -Mm and -prefetch*.   Jumbo frames allows TCP/IP to send bigger messages and reduces overhead on the network.  It does mean that your admins have to get involved and it can be "trying" to explain to them why one bigger message is better than lots of smaller ones.  Almost as much fun as explaining to the SAN guys why RAID5 is less than desirable.

Posted by Rob Fitzpatrick on 16-Jun-2015 11:21

10.2Bxx and 11.x were in development concurrently in 2012.  I believe the prefetch enhancements appeared in 10.2B06 and in 11.1.  

Posted by Rob Fitzpatrick on 16-Jun-2015 11:30

Your -omsize of 2660 is too low.  You probably have in the neighbourhood of 200 system storage objects, plus 2589 application storage objects.  

Set it to 3000 for now, and keep an eye on your _StorageObject record count as the schema changes (or as you upgrade to new OE versions).  This will prevent OM latch activity.

Posted by George Potemkin on 16-Jun-2015 11:41

> Your -omsize of 2660 is too low.  You probably have in the neighbourhood of 200 system storage objects, plus 2589 application storage objects

Do the applications use these 200 system storage objects at runtime?

Posted by Rob Fitzpatrick on 16-Jun-2015 11:46

> Do the applications use these 200 system storage objects at runtime?

Some amount of them will be accessed.  Given the small incremental shared memory cost of 200 more OM cache entries, is it worth quibbling about exactly how many?  ;)

Posted by ChUIMonster on 16-Jun-2015 13:15

Also...

There are not many samples but your BogoMIPS are 50-60% of their peak.  That suggests that you might be on a virtualized server that is perhaps competing for CPU cycles with other VMs on the same physical server.  Did you allow some sort of over-committed VM to be created?

Your IO response is pretty consistent and mostly in the 2 or 3 ms range.  That is not too bad if the disks are rotating rust.  If they are SSD I would expect better.

Posted by James Palmer on 17-Jun-2015 03:27

Is omsize one of the ones I can increaseto?

Posted by James Palmer on 17-Jun-2015 03:34

Yep it's a VM. Unfortunately I'm struggling to ascertain what other servers are conflicting with mine. I wouldn't be surprised if it's over committed though.

Disks are SSD and allegedly the fastest you can buy pretty much. But reads come out of RAID5 as I understand it. Writes are RAID10 so that's ok, right? Because writes are what slow us down, right? /sarcasm

I'll keep an eye on the IO response and see if it improves as the compellant has a chance to put stuff in the right tiers.

Posted by Libor Laubacher on 17-Jun-2015 03:51

Ø  Is omsize one of the ones I can increaseto?

Yes. BTW you can proserve sports db and try J
 
[collapse]
From: James Palmer [mailto:bounce-jdpjamesp@community.progress.com]
Sent: Wednesday, June 17, 2015 10:28 AM
To: TU.OE.RDBMS@community.progress.com
Subject: RE: [Technical Users - OE RDBMS] Understanding what ProTop is telling me
 
Reply by James Palmer

Is omsize one of the ones I can increaseto?

Stop receiving emails on this subject.

Flag this post as spam/abuse.

[/collapse]

Posted by ChUIMonster on 17-Jun-2015 11:33

I doubt that reads go to one sort of RAID and writes to another.

It is more likely that you have a SAN that magically migrates data to different tiers (that may be different RAID levels) depending on how it thinks the data is being used.

Not everyone agrees with me but, personally, I think those things are devil-spawn.

Posted by Libor Laubacher on 17-Jun-2015 11:39

Dunno. James previously said they are using SSD. Nothing about SAS/SATA. But perhaps he hasn't been told the whole config.

Devil spawn lol possibly if not configured properly. I am aware of few setups like that with no complaints about performance.

Sent from Nine

[collapse]
From: ChUIMonster <bounce-ChUIMonster@community.progress.com>
Sent: 17 Jun 2015 6:34 pm
To: TU.OE.RDBMS@community.progress.com
Subject: RE: [Technical Users - OE RDBMS] Understanding what ProTop is telling me

Reply by ChUIMonster

I doubt that reads go to one sort of RAID and writes to another.

It is more likely that you have a SAN that magically migrates data to different tiers (that may be different RAID levels) depending on how it thinks the data is being used.

Not everyone agrees with me but, personally, I think those things are devil-spawn.

Stop receiving emails on this subject.

Flag this post as spam/abuse.

[/collapse]

Posted by TheMadDBA on 17-Jun-2015 11:55

I have had excellent results with high end storage (EMC, Hitatchi. etc) and RAID 5. Very little if any difference performance wise even when doing outlandish sustained tests. Part of that is of course the tremendous amount of cache at every level of the SAN and the sheer number of spindles involved.

Auto migrating is the devil for sure. Especially if one or more of your systems has inconsistent workloads (day vs night) or periods of peak activity like month end processing. I still have to fight from time to time on that issue. Sometimes it "works" but I think that is either pure luck or the SAN admins have set a large enough sample time before the migration starts happening.

When you get into lower tier storage it is usually a lot easier to overload the cache and the disks and the horrors of RAID 5 (or "special" RAID) start showing up a lot quicker.

As far as the 2-3ms access time for SSD... that doesn't really amaze me for a VM environment. Especially a Windows based one. To get the most out of SSD disks you need to make sure all of the adapter and disk settings for number of concurrent commands and queue depth are increased drastically. You can get into a situation where the actual disks are sitting around waiting for work but the queue is building up on the host itself.

Posted by Paul Koufalis on 17-Jun-2015 11:57

Some of the auto tiering vendors set them up to always write to SSD then migrate the data down (or up) at night. I'm at a customer now with an EMC SAN and the sysadmin said it had one TB of cache. I said I didn't know Tom made caches...but I digress. It has a TB of flash drives at the top tier to receive the writes.

Posted by Tim Kuehn on 17-Jun-2015 11:59

> It has a TB of flash drives at the top tier to receive the writes.

Most likely to ensure no marketing fluff gets in. :) 

[collapse]
On Wed, Jun 17, 2015 at 12:57 PM, Paul Koufalis <bounce-pkoufalis@community.progress.com> wrote:
Reply by Paul Koufalis

Some of the auto tiering vendors set them up to always write to SSD then migrate the data down (or up) at night. I'm at a customer now with an EMC SAN and the sysadmin said it had one TB of cache. I said I didn't know Tom made caches...but I digress. It has a TB of flash drives at the top tier to receive the writes.

Stop receiving emails on this subject.

Flag this post as spam/abuse.




--
Tim Kuehn:  Senior Consultant  - TDK Consulting Services
President - Ontario PUG 
Program Committee Chair - PUG Challenge Americas, 
Course Instructor: Intro to OO Concepts for Procedural Programmers

Skype: timothy.kuehn
Ph: 519-576-8100
Cell: 519-781-0081
[/collapse]

Posted by TheMadDBA on 17-Jun-2015 12:12

If you have some time this is a good read about the EMC VMAX series (amazing systems)

www.emc.com/.../h6544-vmax-w-enginuity-pdg.pdf

You can actually have 1 TB of RAM in a maxed out config that handles a lot of the caching before you get to the flash drive tier... before you get to the Fibre tier... before you get to the SATA tier.

Quite a bit of difference between these and a bunch of disks attached to a RAID controller :)

Posted by ChUIMonster on 17-Jun-2015 12:15

Usually there is some sort of SATA tier for slow stuff, fiber for busy stuff and SSD for high demand stuff.  When your db is well tuned and there isn't much load it all works great.  (Which could be said about a lot of things.)

The problem is that these things see a well tuned db and they think that, because a lot of the time there isn't much IO, it should all get migrated to the slower tiers.

Users, of course, expect the system to jump whenever they want it too and to ask "how high" on the way up.  They are not too keen on systems waiting a few minutes (or more than "a few") to notice that the db has become busy with an unusual query and that it might have been a good idea to use the fast disks.

If your users are happy to wait and won't complain to management about the application being slow then it isn't much of a problem.  If they do complain and the storage admins are happy to dedicate SSD to your database then it also won't be a problem.  (If either situation is true for you then you should immediately buy a lottery ticket!)

Posted by TheMadDBA on 17-Jun-2015 12:48

I guess I am going to have to buy a Powerball ticket tonight :)

I have managed to get the SAN admins to turn off storage migration on a couple of occasions (for Progress and Oracle databases). Mostly because I  first convinced the higher ups at the company that variable performance on their big money making application was a bad idea. Also the SAN guys wouldn't guarantee in writing that it would never happen.

I admit it isn't always easy... especially when infrastructure is outsourced to a different company. But it can happen. If your app is small in the corporate scheme of things you are usually out of luck though.

Posted by ChUIMonster on 17-Jun-2015 13:12

I'll accept a 15% gratuity on that :)

Posted by Paul Koufalis on 17-Jun-2015 13:24

@MadDBA: VMax refers to the price right?

Posted by TheMadDBA on 17-Jun-2015 13:47

Tom: When I win I will be sure to give you a call ;-)

Paul: lol... the price tag is not exactly small on any of the EMC storage, especially not the VMAX. Properly set up the performance and reliability is insane but it does come at a premium. The only storage I have worked on that was faster than the VMAX was Oracle Exadata... but that cheats a bit by having the storage integrated so tightly with the DB.

Posted by ChUIMonster on 17-Jun-2015 13:55

"Properly set up"... another reason to buy that lottery ticket ;)

Posted by TheMadDBA on 17-Jun-2015 14:11

lol... I have learned more about SAN setup than I should ever know without actually being a SAN administrator.

If you are lucky enough to have competent admins life is a breeze... if not then you spend a lot of time in meetings explaining things on whiteboards to the CIO and the SAN team :-(

But I will say most of the EMC techs that show up for those kind of meetings are usually pretty straightforward about the trade offs for performance and ease of admin/setup. A couple of times they even said that local disks would be faster for random IO than their SAN but went on to list all of the things EMC did that local disks didn't.

Posted by ChUIMonster on 17-Jun-2015 14:18

> A couple of times they even said that local disks would be faster for random IO than their SAN...

I got that out of them in writing once.  One of the happier days in my professional life :)

Posted by James Palmer on 17-Jun-2015 14:22

That sounds familiar already! I'll try and ascertain more concrete info on what we actually have tomorrow. The plan the database is on means that it will all reside in ssd permanently unless it isn't written to for 12 days. Which is giving our sys admin guy kittens because that's a quarter of the ssd space gone just for the db let alone the AI and BI.

James Palmer | Application Developer
Tel: 01253 785103

[collapse] From: TheMadDBA
Sent: ‎17/‎06/‎2015 20:12
To: TU.OE.RDBMS@community.progress.com
Subject: RE: [Technical Users - OE RDBMS] Understanding what ProTop is telling me

Reply by TheMadDBA

lol... I have learned more about SAN setup than I should ever know without actually being a SAN administrator.

If you are lucky enough to have competent admins life is a breeze... if not then you spend a lot of time in meetings explaining things on whiteboards to the CIO and the SAN team :-(

But I will say most of the EMC techs that show up for those kind of meetings are usually pretty straightforward about the trade offs for performance and ease of admin/setup. A couple of times they even said that local disks would be faster for random IO than their SAN but went on to list all of the things EMC did that local disks didn't.

Stop receiving emails on this subject.

Flag this post as spam/abuse.




This email has been scanned for email related threats and delivered safely by Mimecast.
For more information please visit http://www.mimecast.com
[/collapse]

Posted by TheMadDBA on 17-Jun-2015 14:37

I wouldn't have an issue with the DB taking up that much of the SSD space as long the DB houses the mission critical app for the company. It is all a matter of priority really. What does the admin want to put there that he can't? His personal DVD collection doesn't count :-)

If it becomes an issue you can certainly figure out which DB extents have the most IO and put them on SSD and let the rest reside on Fibre attached (which is still really fast).

Posted by James Palmer on 18-Jun-2015 01:55

Yeah completely with you there. The application is pretty much the business's main asset other than people.

My long term plan is to structure the DB in such a way that it is easy to put stuff in relevant plans as you say.

In theory, any tables that are high read, very low write should be in -B2, so if they're in their own storage area then surely that area can go down to a lower tier as it's resident in memory anyway, and I'm sure there are other such strategies I can come up with in time.

For now though I'm picking the low hanging fruit, and I will be for the foreseeable future I suspect!

Posted by James Palmer on 18-Jun-2015 03:27

So this is the SAN: www.dell.com/.../pd

As I understand it we have 2TB of flash, 22 of physical disk, but not sure what configuration.

The virtual hosts are: www.dell.com/.../pd

And yes, they haven't been set in any way to reserve processor operations for us at this time. This will be changing!

Posted by James Palmer on 18-Jun-2015 03:36

Found the SAN config. We have more than I was told.

38.528TB Usable

69,056 IOPS from both Tier1 and Tier2 (Peak)

3 x SC220 SAS 24 Bay 2.5 Inch Disk Enclosures

6 x 400GB SLC SSD

6 x 1.6TB MLC SSD

36 x 1.2TB SAS 10k Drives

Posted by TheMadDBA on 18-Jun-2015 08:29

Have not worked with that exact model but it seems fine based on a quick look at the specs and a few google searches for reviews. Based on the activity you posted and the fact that your DB is all on SSD I wouldn't worry about tuning the SAN right now.

The VM settings for CPU,Memory and IO are probably more likely to provide measurable improvements.

Plus the other network parameters and -B2 changes you are already working on.

Posted by James Palmer on 18-Jun-2015 08:36

Okie doke. Thanks so much to everyone for comments. Writing my plan of attack.

This thread is closed