We've finally been able to get ProTop installed on our production server. Running 11.5 on Win 2012 R2 (without the Linux patch).
Things are running so much more efficiently than they did on our old box, but most of our settings were done for that box so I want to revisit them over the next weeks and months to tweak and change to get the most out of what we've got.
So first of all - the Configuration Viewer: http://gyazo.com/75db8010dbff285388ea7ed8a9fde9d6
And then a couple of snapshots of the main dashboard during a reasonably busy period:
http://gyazo.com/981e86be5d60abf5ea01c7b3aeeac888
http://gyazo.com/1d56bab479c08584eac627dc50ac1886
http://gyazo.com/7d70100d304cd096bfbc7bc63d519ff3
http://gyazo.com/2ab7ed2942c889dd4f9c5755bb019b69
This is ProTop's default refresh rate.
Our most recent startup output from the DB from the DB logs
[2015/06/13@08:18:37.713+0100] P-4952 T-4956 I BROKER 0: (333) Multi-user session begin. [2015/06/13@08:18:37.715+0100] P-4952 T-4956 I BROKER 0: (10545) Connections to this database will not be allowed until all Database Services started have completed their startup and initialisation. [2015/06/13@08:18:37.762+0100] P-4952 T-4956 I BROKER 0: (15321) Before Image Log Initialisation at block 0 offset 1335. [2015/06/13@08:18:38.251+0100] P-4952 T-4956 I BROKER 0: (452) Login by inencoadmin on batch. [2015/06/13@08:18:38.298+0100] P-4728 T-4700 I RPLS 162: (-----) Login by inencoadmin. [2015/06/13@08:18:38.308+0100] P-4728 T-4700 I RPLS 162: (7129) Usr 162 set name to inencoadmin. [2015/06/13@08:18:38.308+0100] P-4728 T-4700 I RPLS 162: (10819) The Fathom Replication property file is being processed. [2015/06/13@08:18:38.308+0100] P-4728 T-4700 I RPLS 162: (10326) Server Properties [2015/06/13@08:18:38.308+0100] P-4728 T-4700 I RPLS 162: (10327) Database Name (database): primary [2015/06/13@08:18:38.308+0100] P-4728 T-4700 I RPLS 162: (10330) Transition Method (transition): manual. [2015/06/13@08:18:38.308+0100] P-4728 T-4700 I RPLS 162: (10331) Transition Timeout (transition-timeout): 600. [2015/06/13@08:18:38.308+0100] P-4728 T-4700 I RPLS 162: (11715) Minimum Polling Delay (minimum-polling-delay): 5. [2015/06/13@08:18:38.308+0100] P-4728 T-4700 I RPLS 162: (11716) Maximum Polling Delay (maximum-polling-delay): 500. [2015/06/13@08:18:38.308+0100] P-4728 T-4700 I RPLS 162: (11718) Defer Agent Startup (defer-agent-startup): Not Active. [2015/06/13@08:18:38.308+0100] P-4728 T-4700 I RPLS 162: (12231) Replication Keep Alive (repl-keep-alive): 120. [2015/06/13@08:18:38.308+0100] P-4728 T-4700 I RPLS 162: (10326) Control Agent Properties [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (10332) Control Agent (control-agent) : agent1. [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (10333) Host Name (host): hobt2. [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (10334) Port (port): 8000. [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (14249) TCP/IP Version (ipver): ipv4. [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (10335) Critical (critical): 0. [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (10336) Replication Method (replication-method): async. [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (10715) Connect Timeout (connect-timeout): 120. [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (10337) Maximum Message Length (maximum-message): 8512 [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (12233) Schema Lock Action (schema-lock-action) : wait. [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (12685) Agent Shutdown Action (agent-shutdown-action) : recovery. [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (10326) Transition Properties [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (13261) Database-role (database-role) : normal. [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (13262) Responsibility (responsibility) : . [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (13263) Restart database after transition (restart-after-transition) : No. [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (16734) Start secondary broker after transition (start-secondary-broker) : No. [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (13264) Source startup arguments (source-startup-arguments) : None. [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (13264) Source Secondary Broker startup arguments (source-secondary-broker-startup-arguments) : None. [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (13264) Target startup arguments (target-startup-arguments) : None. [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (13264) Target Secondary Broker startup arguments (target-secondary-broker-startup-arguments) : None. [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (13264) Normal startup arguments (normal-startup-arguments) : None. [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (13264) Normal Secondary Broker startup arguments (normal-secondary-broker-startup-arguments) : None. [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (13265) Automatically begin after-imaging during transition (auto-begin-ai) : Yes. [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (13266) Automatically add after-image areas during transition (auto-add-ai-areas) : Yes. [2015/06/13@08:18:38.310+0100] P-4728 T-4700 I RPLS 162: (13267) Structure file that contains the after-image area definitions to automatically add (ai-structure-file) : f:\database\live\icmaslivai.st. [2015/06/13@08:18:38.311+0100] P-4728 T-4700 I RPLS 162: (13268) Database backup method to use during transition (backup-method) : mark. [2015/06/13@08:18:38.311+0100] P-4728 T-4700 I RPLS 162: (13269) Backup arguments (backup-arguments) : None. [2015/06/13@08:18:38.311+0100] P-4728 T-4700 I RPLS 162: (13269) Incremental Backup arguments (incremental-backup-arguments) : None. [2015/06/13@08:18:38.311+0100] P-4728 T-4700 I RPLS 162: (13269) Recovery Backup arguments (recovery-backup-arguments) : None. [2015/06/13@08:18:38.311+0100] P-4728 T-4700 I RPLS 162: (10500) The Fathom Replication Server successfully started as PID 4728. [2015/06/13@08:18:38.313+0100] P-4728 T-4700 I RPLS 162: (10842) Connecting to Fathom Replication Agent agent1. [2015/06/13@08:18:38.313+0100] P-4952 T-4956 I BROKER 0: (-----) The OpenEdge Replication Server is starting... [2015/06/13@08:18:38.320+0100] P-4952 T-4956 I BROKER 0: (5644) Started for hob-icmasliv using TCP IPV4 address 0.0.0.0, pid 4952. [2015/06/13@08:18:38.320+0100] P-4952 T-4956 I BROKER 0: (8836) Connecting to Admin Server on port 7844. [2015/06/13@08:18:38.326+0100] P-4952 T-4956 I BROKER 0: (14262) Successfully connected to AdminServer on port 7844 using TCP/IP IPV4 address 192.168.125.1. [2015/06/13@08:18:41.950+0100] P-4728 T-4700 I RPLS 162: (10507) The Fathom Replication Server has successfully connected to the Fathom Replication Agent agent1 on host 192.168.125.45. [2015/06/13@08:18:41.950+0100] P-4728 T-4700 I RPLS 162: (11251) The Replication Server successfully connected to all of its configured Agents. [2015/06/13@08:18:41.951+0100] P-4728 T-4700 I RPLS 162: (10508) Beginning Fathom Replication synchronization for the Fathom Replication Agent agent1. [2015/06/13@08:18:41.954+0100] P-4728 T-4700 I RPLS 162: (11805) Unlocking after-image file 13 and locking ALL FULL after-image files beginning with file 14. [2015/06/13@08:18:41.954+0100] P-4728 T-4700 I RPLS 162: (10436) The source database icmasliv and the target database f:\database\live\icmasliv on host hobt2 are synchronized. [2015/06/13@08:18:51.065+0100] P-4952 T-4956 I BROKER 0: (9149) Time out limit (10 seconds) exceeded waiting for ACK. [2015/06/13@08:18:51.066+0100] P-4952 T-4956 I BROKER 0: (8845) Error registering with Admin Server. [2015/06/13@08:18:51.067+0100] P-4952 T-4956 I BROKER 0: (4234) Progress OpenEdge Release 11.5 build 1114 on WINNT . [2015/06/13@08:18:51.067+0100] P-4952 T-4956 I BROKER 0: (4281) Server started by inencoadmin on batch. [2015/06/13@08:18:51.068+0100] P-4952 T-4956 I BROKER 0: (6574) Started using pid: 4952. [2015/06/13@08:18:51.069+0100] P-4952 T-4956 I BROKER 0: (9426) Large database file access has been enabled. [2015/06/13@08:18:51.069+0100] P-4952 T-4956 I BROKER 0: (13871) After-image Management Archival Method: Timed. [2015/06/13@08:18:51.071+0100] P-4952 T-4956 I BROKER 0: (13875) This database is enabled for OpenEdge Replication as a Source database. [2015/06/13@08:18:51.071+0100] P-4952 T-4956 I BROKER 0: (15219) Encryption enabled: 0 [2015/06/13@08:18:51.072+0100] P-4952 T-4956 I BROKER 0: (15824) Multi-tenancy enabled: 0 [2015/06/13@08:18:51.072+0100] P-4952 T-4956 I BROKER 0: (15824) Table Partitioning enabled: 0 [2015/06/13@08:18:51.073+0100] P-4952 T-4956 I BROKER 0: (-----) LRU mechanism enabled. [2015/06/13@08:18:51.073+0100] P-4952 T-4956 I BROKER 0: (4282) Parameter File: Not Enabled. [2015/06/13@08:18:51.073+0100] P-4952 T-4956 I BROKER 0: (9336) Created shared memory with segment_id: 1 [2015/06/13@08:18:51.074+0100] P-4952 T-4956 I BROKER 0: (4250) Before-Image Cluster Size: 33554432. [2015/06/13@08:18:51.074+0100] P-4952 T-4956 I BROKER 0: (4251) Before-Image Block Size: 8192. [2015/06/13@08:18:51.075+0100] P-4952 T-4956 I BROKER 0: (13873) After-image Management Archival Directory List (-aiarcdir): j:\live [2015/06/13@08:18:51.075+0100] P-4952 T-4956 I BROKER 0: (13874) Create After-image Management Archival Directory(s) (-aiarcdircreate): Not Enabled [2015/06/13@08:18:51.076+0100] P-4952 T-4956 I BROKER 0: (13872) After-image Management Archival Interval (-aiarcinterval): 3600 [2015/06/13@08:18:51.076+0100] P-4952 T-4956 I BROKER 0: (4256) Number of After-Image Buffers (-aibufs): 200 [2015/06/13@08:18:51.077+0100] P-4952 T-4956 I BROKER 0: (4254) After-Image Stall (-aistall): Enabled [2015/06/13@08:18:51.077+0100] P-4952 T-4956 I BROKER 0: (17555) Starting index number for statistics range (-baseindex): 1 [2015/06/13@08:18:51.078+0100] P-4952 T-4956 I BROKER 0: (17554) Starting table number for statistics range (-basetable): 1 [2015/06/13@08:18:51.078+0100] P-4952 T-4956 I BROKER 0: (4252) Number of Before-Image Buffers (-bibufs): 200 [2015/06/13@08:18:51.079+0100] P-4952 T-4956 I BROKER 0: (6552) BI File Threshold Stall (-bistall): Disabled. [2015/06/13@08:18:51.079+0100] P-4952 T-4956 I BROKER 0: (9238) BI File Threshold size (-bithold): 0.0 Bytes [2015/06/13@08:18:51.080+0100] P-4952 T-4956 I BROKER 0: (6573) Database Blocksize (-blocksize): 8192 [2015/06/13@08:18:51.080+0100] P-4952 T-4956 I BROKER 0: (12812) BIW writer delay (-bwdelay): 0 [2015/06/13@08:18:51.081+0100] P-4952 T-4956 I BROKER 0: (12813) Allowed index cursors (-c): 4004. [2015/06/13@08:18:51.081+0100] P-4952 T-4956 I BROKER 0: (12265) SSL Certificate Store Path (-certstorepath): Not Enabled [2015/06/13@08:18:51.082+0100] P-4952 T-4956 I BROKER 0: (4264) Character Set (-cpinternal): ISO8859-1 [2015/06/13@08:18:51.082+0100] P-4952 T-4956 I BROKER 0: (4235) Physical Database Name (-db): f:\database\live\icmasliv [2015/06/13@08:18:51.082+0100] P-4952 T-4956 I BROKER 0: (4238) Direct I/O (-directio): Not Enabled [2015/06/13@08:18:51.083+0100] P-4952 T-4956 I BROKER 0: (4236) Database Type (-dt): PROGRESS [2015/06/13@08:18:51.083+0100] P-4952 T-4956 I BROKER 0: (15218) Encryption cache size (-ecsize): 1000 [2015/06/13@08:18:51.084+0100] P-4952 T-4956 I BROKER 0: (12814) Group delay (-groupdelay): 10 [2015/06/13@08:18:51.084+0100] P-4952 T-4956 I BROKER 0: (4242) Hash Table Entries (-hash): 1037347 [2015/06/13@08:18:51.085+0100] P-4952 T-4956 I BROKER 0: (4244) Crash Recovery (-i): Enabled [2015/06/13@08:18:51.085+0100] P-4952 T-4956 I BROKER 0: (17557) Number of indexes included in statistics collection (-indexrangesize): 2020 [2015/06/13@08:18:51.086+0100] P-4952 T-4956 I BROKER 0: (14268) TCP/IP Version (-ipver): IPV4 [2015/06/13@08:18:51.086+0100] P-4952 T-4956 I BROKER 0: (12263) SSL Key Alias Name (-keyalias): Not Enabled [2015/06/13@08:18:51.088+0100] P-4952 T-4956 I BROKER 0: (12815) Lock table hash table size (-lkhash): 6661 [2015/06/13@08:18:51.088+0100] P-4952 T-4956 I BROKER 0: (17805) Original Lock Release Algorithm (-lkrela): Not Enabled [2015/06/13@08:18:51.089+0100] P-4952 T-4956 I BROKER 0: (17560) Number of LRU force skips (-lruskips): 50 [2015/06/13@08:18:51.089+0100] P-4952 T-4956 I BROKER 0: (17561) Number of LRU2 force skips (-lru2skips): 0 [2015/06/13@08:18:51.090+0100] P-4952 T-4956 I BROKER 0: (13953) Maximum Area Number (-maxArea): 32000 [2015/06/13@08:18:51.090+0100] P-4952 T-4956 I BROKER 0: (12540) Size of JTA transaction table (-maxxids): 100 [2015/06/13@08:18:51.091+0100] P-4952 T-4956 I BROKER 0: (5649) Maximum Port for Auto Servers (-maxport): 5000 [2015/06/13@08:18:51.091+0100] P-4952 T-4956 I BROKER 0: (5648) Minimum Port for Auto Servers (-minport): 3000 [2015/06/13@08:18:51.091+0100] P-4952 T-4956 I BROKER 0: (17564) Multi-tenancy partition cache size (-mtpmsize): 1024 [2015/06/13@08:18:51.092+0100] P-4952 T-4956 I BROKER 0: (12821) Use muxlatches (-mux): 1 [2015/06/13@08:18:51.092+0100] P-4952 T-4956 I BROKER 0: (4260) Maximum Number of Users (-n): 1001 [2015/06/13@08:18:51.093+0100] P-4952 T-4956 I BROKER 0: (17566) Minimum time to nap at first -spin exhaustion (-nap): 10 [2015/06/13@08:18:51.093+0100] P-4952 T-4956 I BROKER 0: (17565) Maximum time to nap at -spin exhaustion (-napmax): 250 [2015/06/13@08:18:51.094+0100] P-4952 T-4956 I BROKER 0: (12273) No SSL Session Cache (-nosessioncache): Not Enabled [2015/06/13@08:18:51.094+0100] P-4952 T-4956 I BROKER 0: (17807) Disable LRU mechanism (-nolru): Not Enabled [2015/06/13@08:18:51.095+0100] P-4952 T-4956 I BROKER 0: (16689) Login Governor (-nGovernor): 0 of 1001 [2015/06/13@08:18:51.095+0100] P-4952 T-4956 I BROKER 0: (8527) Storage object cache size (-omsize): 2660 [2015/06/13@08:18:51.096+0100] P-4952 T-4956 I BROKER 0: (13870) Database Service Manager - IPC Queue Size (-pica): 8.0 MBytes [2015/06/13@08:18:51.096+0100] P-4952 T-4956 I BROKER 0: (17802) Shared memory segments locked (-pinshm): Not Enabled [2015/06/13@08:18:51.097+0100] P-4952 T-4956 I BROKER 0: (16953) Use pollset mechanism for client/server (-pollset): Not Enabled [2015/06/13@08:18:51.097+0100] P-4952 T-4956 I BROKER 0: (16955) Delay first prefetch message (-prefetchDelay): Not Enabled [2015/06/13@08:18:51.098+0100] P-4952 T-4956 I BROKER 0: (16956) Prefetch message fill percentage (-prefetchFactor): 0 [2015/06/13@08:18:51.098+0100] P-4952 T-4956 I BROKER 0: (16957) Minimum records in prefetch msg (-prefetchNumRecs): 16 [2015/06/13@08:18:51.098+0100] P-4952 T-4956 I BROKER 0: (16958) Suspension queue poll priority (-prefetchPriority): 0 [2015/06/13@08:18:51.099+0100] P-4952 T-4956 I BROKER 0: (17568) APW queue scan cycle time in milliseconds (-pwqdelay): 100 [2015/06/13@08:18:51.099+0100] P-4952 T-4956 I BROKER 0: (17569) APW minimum queue length before write (-pwqmin): 1 [2015/06/13@08:18:51.100+0100] P-4952 T-4956 I BROKER 0: (17570) APW buffer scan cycle time in seconds (-pwsdelay): 1 [2015/06/13@08:18:51.100+0100] P-4952 T-4956 I BROKER 0: (17571) APW maximun number of buffers to scan per cycle (-pwscan): 5001 [2015/06/13@08:18:51.101+0100] P-4952 T-4956 I BROKER 0: (17572) APW maximum number of buffers to write per cycle (-pwwmax): 25 [2015/06/13@08:18:51.101+0100] P-4952 T-4956 I BROKER 0: (4247) Before-Image File I/O (-r -R): Reliable [2015/06/13@08:18:51.102+0100] P-4952 T-4956 I BROKER 0: (17563) Record free chain search depth factor (-recspacesearchdepth): 5 [2015/06/13@08:18:51.102+0100] P-4952 T-4956 I BROKER 0: (6526) Number of Semaphore Sets (-semsets): 3 [2015/06/13@08:18:51.103+0100] P-4952 T-4956 I BROKER 0: (12264) SSL Session Timeout (-sessiontimeout): 0 [2015/06/13@08:18:51.103+0100] P-4952 T-4956 I BROKER 0: (13924) Maximum Shared Memory Segment Size (-shmsegsize): 32768 Mb [2015/06/13@08:18:51.103+0100] P-4952 T-4956 I BROKER 0: (4243) Current Spin Lock Tries (-spin): 50000 [2015/06/13@08:18:51.104+0100] P-4952 T-4956 I BROKER 0: (17803) SSL Encryption for TCP/IP connections (-ssl): Not Enabled [2015/06/13@08:18:51.104+0100] P-4952 T-4956 I BROKER 0: (17556) Number of tables included in statistics collection (-tablerangesize): 620 [2015/06/13@08:18:51.106+0100] P-4952 T-4956 I BROKER 0: (14017) Area block consistency check (-AreaCheck): Not Enabled [2015/06/13@08:18:51.106+0100] P-4952 T-4956 I BROKER 0: (4239) Number of Database Buffers (-B): 3000000 [2015/06/13@08:18:51.107+0100] P-4952 T-4956 I BROKER 0: (17562) Number of Alternate Database Buffers (-B2): 600 [2015/06/13@08:18:51.107+0100] P-4952 T-4956 I BROKER 0: (9422) Maximum private buffers per user (-Bpmax): 64 [2015/06/13@08:18:51.108+0100] P-4952 T-4956 I BROKER 0: (14016) Database block consistency check (-DbCheck): Not Enabled [2015/06/13@08:18:51.108+0100] P-4952 T-4956 I BROKER 0: (13869) Database Service Manager - Service(s) to start (-DBService): replserv [2015/06/13@08:18:51.109+0100] P-4952 T-4956 I BROKER 0: (10535) Enhanced Read-Only mode (-ERO): Not Enabled [2015/06/13@08:18:51.109+0100] P-4952 T-4956 I BROKER 0: (4237) Force Access (-F): Not Enabled [2015/06/13@08:18:51.110+0100] P-4952 T-4956 I BROKER 0: (4249) Before-Image Truncate Interval (-G): 0 [2015/06/13@08:18:51.110+0100] P-4952 T-4956 I BROKER 0: (4261) Host Name (-H): HOB [2015/06/13@08:18:51.111+0100] P-4952 T-4956 I BROKER 0: (14018) Index block consistency check (-IndexCheck): Not Enabled [2015/06/13@08:18:51.111+0100] P-4952 T-4956 I BROKER 0: (4241) Current Size of Lock Table (-L): 50016 [2015/06/13@08:18:51.112+0100] P-4952 T-4956 I BROKER 0: (16688) Lock Governor (-LGovernor): 0% [2015/06/13@08:18:51.112+0100] P-4952 T-4956 I BROKER 0: (4257) Maximum Number of Clients Per Server (-Ma): 6 [2015/06/13@08:18:51.113+0100] P-4952 T-4956 I BROKER 0: (14016) Memory overwrite check (-MemCheck): Not Enabled [2015/06/13@08:18:51.113+0100] P-4952 T-4956 I BROKER 0: (4245) Delay of Before-Image Flush (-Mf): 3 [2015/06/13@08:18:51.114+0100] P-4952 T-4956 I BROKER 0: (4259) Minimum Clients Per Server (-Mi): 3 [2015/06/13@08:18:51.114+0100] P-4952 T-4956 I BROKER 0: (12818) Message Buffer Size (-Mm): 4096 [2015/06/13@08:18:51.115+0100] P-4952 T-4956 I BROKER 0: (4258) Maximum Number of Servers (-Mn): 162 [2015/06/13@08:18:51.115+0100] P-4952 T-4956 I BROKER 0: (12819) Servers per Protocol (-Mp): 0 [2015/06/13@08:18:51.116+0100] P-4952 T-4956 I BROKER 0: (5647) Maximum Servers Per Broker (-Mpb): 155 [2015/06/13@08:18:51.117+0100] P-4952 T-4956 I BROKER 0: (4240) Excess Shared Memory Size (-Mxs): 356 [2015/06/13@08:18:51.117+0100] P-4952 T-4956 I BROKER 0: (4263) Network Type (-N): TCP [2015/06/13@08:18:51.118+0100] P-4952 T-4956 I BROKER 0: (16954) Server network message wait time (-Nmsgwait): 2 [2015/06/13@08:18:51.118+0100] P-4952 T-4956 I BROKER 0: (10357) Pending client connection timeout (-PendConnTimeout): 0 [2015/06/13@08:18:51.119+0100] P-4952 T-4956 I BROKER 0: (4262) Service Name (-S): 8000 [2015/06/13@08:18:51.119+0100] P-4952 T-4956 I BROKER 0: (17804) Broker server group support (-ServerType): ABL [2015/06/13@08:18:51.120+0100] P-4952 T-4956 I BROKER 0: (10028) SQL Server Max Open Cursors (-SQLCursors): 0 [2015/06/13@08:18:51.120+0100] P-4952 T-4956 I BROKER 0: (10026) SQL Server Stack Size (-SQLStack): 0 [2015/06/13@08:18:51.121+0100] P-4952 T-4956 I BROKER 0: (10027) SQL Server Statement Cache Size (-SQLStmtCache): 0 [2015/06/13@08:18:51.121+0100] P-4952 T-4956 I BROKER 0: (10029) Size [1K byte units] of SQL Server temp table buffer (-SQLTempStoreBuff): 0 [2015/06/13@08:18:51.121+0100] P-4952 T-4956 I BROKER 0: (10030) Size [1K byte units] of SQL Server temp table disk storage (-SQLTempStoreDisk): 0 [2015/06/13@08:18:51.123+0100] P-4952 T-4956 I BROKER 0: (10031) Size [1K byte units] of SQL Server temp table data page (-SQLTempStorePageSize): 0 [2015/06/13@08:18:51.123+0100] P-4952 T-4956 I BROKER 0: (14019) Record block consistency check (-TableCheck): Not Enabled [2015/06/13@08:18:51.124+0100] P-4952 T-4956 I BROKER 0: (17717) TXE Lock retry limit (-TXERetryLimit): 0 [2015/06/13@08:18:51.124+0100] P-4952 T-4956 I BROKER 0: (13896) TXE Commit lock skip limit (-TXESkipLimit): 10000 [2015/06/13@08:18:51.126+0100] P-4952 T-4956 I BROKER 0: (10836) Database connections are not allowed at this time. [2015/06/13@08:18:51.126+0100] P-4952 T-4956 I BROKER 0: (10471) Database connections have been enabled.
Any help and pointers much appreciated.
On the config screen I see that:
1) You are not using the new "prefetch" parameters
2) You have replication set on "go slow" (-pica 8MB)
3) You have an awful lot of very small (<2GB) extents. That must be a PITA to deal with.
1) How new are the prefetch params, and do I need to have done an UpdateVST to use them? Can't update VST at the moment as we're still on 11.2.1 in development so developers need to be able to restore from backup using 11.2.1. Hopefully not for much longer though. Is there a guide to their use somewhere please?
2) Replication on go slow isn't too much of an issue as the target server isn't very well configured - it's just a stop gap until we can configure up the new DR server over the next couple of weeks. That being said, is there a guide to choosing the right size for this?
3) Yes. Right PITA! It's a throwback to small DB files that has never been addressed. I'm planning on doing a D&L in the near future to right a number of woes such as LOBs in data areas, and bad RPB choices. So I can simplify the extent structure at that point.
On the Dashboard I see:
1) You have a long running transaction problem.
2) You have 72 after-image extents. I hate to ask but... why?
3) There is a SQL connection.
4) You've got a fairly steady record read load going on. The next step would be to look at the table and index stats and see what is driving that and determine if it is reasonable for your application.
The prefetch stuff came in 10.2b (service pack 6 if I recall). You do not need to update VSTs to use it. It is related to client/server network traffic. pugchallenge.org/.../2687_Whats_New_OEDB.pptx
Extents can be resized (and the number of them and/or locations thereof changed) via probkup & prorest. No need to D&L. Just whip up a new .st file. You cannot change storage areas, block size, rows per block or blocks per cluster this way but you can shuffle extents around easily.
1) Well observed. We're slowly working these through. We used to often have them up to 45 minutes. BI was a nightmare to maintain.
2) The (terribly mistaken) thinking there (not mine I might add) is that it gives us 72 hours to fix replication if it breaks before the AI extents are all full. Again, something I can fix when doing a D&L. How many would you recommend?
3) Need to remind myself what that is. I think it's PDSOE.
4) Right :)
Thanks for the link.
-B2 is way too small. Size it appropriately based on your mostly static tables and assign them to -B2.
-spin seems very high (check out the latch timeouts and maybe consider backing that down a bit).
In addition to the prefetch parameters that Tom mentioned you need to increase the -Mm size for your client server connections.
The application might need a little love in the performance tuning side.
Why the mix of appserver and client server connections?
-B2 was sized very conservatively for a single table as we were on 32 bit OS and Progress before and were running out of resources. Agreed - it needs some TLC.
I'll look at -spin
Is there a rule of thumb for -Mm?
We've got loads of work happening on the AppServers. The primary use of them is for queries from the REMC users. They select their filters and then fire off their query to the ansynch appserver and can then work away on another screen while they await its return.
We've also got a web portal that uses the AppServers to return data to the users.
-Mm will vary based on your network and how your application is coded. 16384 is usually the "right" number in my experience but I have used the slightly less than 32k max a few times. Keep in mind that -Mm must be set on both the server and the clients or you will get a connection error.
Are all of the clients on the same local network or is there a WAN involved?
Are the appservers connected with -H -S or with shared memory? Performance will be better with shared memory connections.
I would look at the appserver code first. If those queries really takes that long to process then they are probably your best bet for performance improvements.
I implemented Shared Memory connections for the AppServers a good while ago. Made a huge difference :)
Thanks for the other comments and tips. Loads to go on :)
Regarding -Mm... I like 8192 or higher.
If you are going to set it large you really do need to be looking at -prefetchFactor 100 (try to fill the message 100%) or -prefetchNumRecs X (where X is the target number of messages). Personally, I prefer shooting for 100%
If you do not set these you will get a maximum of 16 records per message (that is the default). So much of your potential with a large -Mm will not be used.
On a related note: I've also had good experiences with "jumbo frames" in conjunction with larger -Mm and -prefetch*. Jumbo frames allows TCP/IP to send bigger messages and reduces overhead on the network. It does mean that your admins have to get involved and it can be "trying" to explain to them why one bigger message is better than lots of smaller ones. Almost as much fun as explaining to the SAN guys why RAID5 is less than desirable.
10.2Bxx and 11.x were in development concurrently in 2012. I believe the prefetch enhancements appeared in 10.2B06 and in 11.1.
Your -omsize of 2660 is too low. You probably have in the neighbourhood of 200 system storage objects, plus 2589 application storage objects.
Set it to 3000 for now, and keep an eye on your _StorageObject record count as the schema changes (or as you upgrade to new OE versions). This will prevent OM latch activity.
> Your -omsize of 2660 is too low. You probably have in the neighbourhood of 200 system storage objects, plus 2589 application storage objects
Do the applications use these 200 system storage objects at runtime?
> Do the applications use these 200 system storage objects at runtime?
Some amount of them will be accessed. Given the small incremental shared memory cost of 200 more OM cache entries, is it worth quibbling about exactly how many? ;)
Also...
There are not many samples but your BogoMIPS are 50-60% of their peak. That suggests that you might be on a virtualized server that is perhaps competing for CPU cycles with other VMs on the same physical server. Did you allow some sort of over-committed VM to be created?
Your IO response is pretty consistent and mostly in the 2 or 3 ms range. That is not too bad if the disks are rotating rust. If they are SSD I would expect better.
Is omsize one of the ones I can increaseto?
Yep it's a VM. Unfortunately I'm struggling to ascertain what other servers are conflicting with mine. I wouldn't be surprised if it's over committed though.
Disks are SSD and allegedly the fastest you can buy pretty much. But reads come out of RAID5 as I understand it. Writes are RAID10 so that's ok, right? Because writes are what slow us down, right? /sarcasm
I'll keep an eye on the IO response and see if it improves as the compellant has a chance to put stuff in the right tiers.
Ø Is omsize one of the ones I can increaseto?
Is omsize one of the ones I can increaseto?
Flag this post as spam/abuse.
I doubt that reads go to one sort of RAID and writes to another.
It is more likely that you have a SAN that magically migrates data to different tiers (that may be different RAID levels) depending on how it thinks the data is being used.
Not everyone agrees with me but, personally, I think those things are devil-spawn.
I doubt that reads go to one sort of RAID and writes to another.
It is more likely that you have a SAN that magically migrates data to different tiers (that may be different RAID levels) depending on how it thinks the data is being used.
Not everyone agrees with me but, personally, I think those things are devil-spawn.
Flag this post as spam/abuse.
I have had excellent results with high end storage (EMC, Hitatchi. etc) and RAID 5. Very little if any difference performance wise even when doing outlandish sustained tests. Part of that is of course the tremendous amount of cache at every level of the SAN and the sheer number of spindles involved.
Auto migrating is the devil for sure. Especially if one or more of your systems has inconsistent workloads (day vs night) or periods of peak activity like month end processing. I still have to fight from time to time on that issue. Sometimes it "works" but I think that is either pure luck or the SAN admins have set a large enough sample time before the migration starts happening.
When you get into lower tier storage it is usually a lot easier to overload the cache and the disks and the horrors of RAID 5 (or "special" RAID) start showing up a lot quicker.
As far as the 2-3ms access time for SSD... that doesn't really amaze me for a VM environment. Especially a Windows based one. To get the most out of SSD disks you need to make sure all of the adapter and disk settings for number of concurrent commands and queue depth are increased drastically. You can get into a situation where the actual disks are sitting around waiting for work but the queue is building up on the host itself.
Some of the auto tiering vendors set them up to always write to SSD then migrate the data down (or up) at night. I'm at a customer now with an EMC SAN and the sysadmin said it had one TB of cache. I said I didn't know Tom made caches...but I digress. It has a TB of flash drives at the top tier to receive the writes.
Reply by Paul KoufalisSome of the auto tiering vendors set them up to always write to SSD then migrate the data down (or up) at night. I'm at a customer now with an EMC SAN and the sysadmin said it had one TB of cache. I said I didn't know Tom made caches...but I digress. It has a TB of flash drives at the top tier to receive the writes.
Stop receiving emails on this subject.Flag this post as spam/abuse.
If you have some time this is a good read about the EMC VMAX series (amazing systems)
www.emc.com/.../h6544-vmax-w-enginuity-pdg.pdf
You can actually have 1 TB of RAM in a maxed out config that handles a lot of the caching before you get to the flash drive tier... before you get to the Fibre tier... before you get to the SATA tier.
Quite a bit of difference between these and a bunch of disks attached to a RAID controller :)
Usually there is some sort of SATA tier for slow stuff, fiber for busy stuff and SSD for high demand stuff. When your db is well tuned and there isn't much load it all works great. (Which could be said about a lot of things.)
The problem is that these things see a well tuned db and they think that, because a lot of the time there isn't much IO, it should all get migrated to the slower tiers.
Users, of course, expect the system to jump whenever they want it too and to ask "how high" on the way up. They are not too keen on systems waiting a few minutes (or more than "a few") to notice that the db has become busy with an unusual query and that it might have been a good idea to use the fast disks.
If your users are happy to wait and won't complain to management about the application being slow then it isn't much of a problem. If they do complain and the storage admins are happy to dedicate SSD to your database then it also won't be a problem. (If either situation is true for you then you should immediately buy a lottery ticket!)
I guess I am going to have to buy a Powerball ticket tonight :)
I have managed to get the SAN admins to turn off storage migration on a couple of occasions (for Progress and Oracle databases). Mostly because I first convinced the higher ups at the company that variable performance on their big money making application was a bad idea. Also the SAN guys wouldn't guarantee in writing that it would never happen.
I admit it isn't always easy... especially when infrastructure is outsourced to a different company. But it can happen. If your app is small in the corporate scheme of things you are usually out of luck though.
I'll accept a 15% gratuity on that :)
@MadDBA: VMax refers to the price right?
Tom: When I win I will be sure to give you a call ;-)
Paul: lol... the price tag is not exactly small on any of the EMC storage, especially not the VMAX. Properly set up the performance and reliability is insane but it does come at a premium. The only storage I have worked on that was faster than the VMAX was Oracle Exadata... but that cheats a bit by having the storage integrated so tightly with the DB.
"Properly set up"... another reason to buy that lottery ticket ;)
lol... I have learned more about SAN setup than I should ever know without actually being a SAN administrator.
If you are lucky enough to have competent admins life is a breeze... if not then you spend a lot of time in meetings explaining things on whiteboards to the CIO and the SAN team :-(
But I will say most of the EMC techs that show up for those kind of meetings are usually pretty straightforward about the trade offs for performance and ease of admin/setup. A couple of times they even said that local disks would be faster for random IO than their SAN but went on to list all of the things EMC did that local disks didn't.
> A couple of times they even said that local disks would be faster for random IO than their SAN...
I got that out of them in writing once. One of the happier days in my professional life :)
lol... I have learned more about SAN setup than I should ever know without actually being a SAN administrator.
If you are lucky enough to have competent admins life is a breeze... if not then you spend a lot of time in meetings explaining things on whiteboards to the CIO and the SAN team :-(
But I will say most of the EMC techs that show up for those kind of meetings are usually pretty straightforward about the trade offs for performance and ease of admin/setup. A couple of times they even said that local disks would be faster for random IO than their SAN but went on to list all of the things EMC did that local disks didn't.
Flag this post as spam/abuse.
I wouldn't have an issue with the DB taking up that much of the SSD space as long the DB houses the mission critical app for the company. It is all a matter of priority really. What does the admin want to put there that he can't? His personal DVD collection doesn't count :-)
If it becomes an issue you can certainly figure out which DB extents have the most IO and put them on SSD and let the rest reside on Fibre attached (which is still really fast).
Yeah completely with you there. The application is pretty much the business's main asset other than people.
My long term plan is to structure the DB in such a way that it is easy to put stuff in relevant plans as you say.
In theory, any tables that are high read, very low write should be in -B2, so if they're in their own storage area then surely that area can go down to a lower tier as it's resident in memory anyway, and I'm sure there are other such strategies I can come up with in time.
For now though I'm picking the low hanging fruit, and I will be for the foreseeable future I suspect!
So this is the SAN: www.dell.com/.../pd
As I understand it we have 2TB of flash, 22 of physical disk, but not sure what configuration.
The virtual hosts are: www.dell.com/.../pd
And yes, they haven't been set in any way to reserve processor operations for us at this time. This will be changing!
Found the SAN config. We have more than I was told.
38.528TB Usable
69,056 IOPS from both Tier1 and Tier2 (Peak)
3 x SC220 SAS 24 Bay 2.5 Inch Disk Enclosures
6 x 400GB SLC SSD
6 x 1.6TB MLC SSD
36 x 1.2TB SAS 10k Drives
Have not worked with that exact model but it seems fine based on a quick look at the specs and a few google searches for reviews. Based on the activity you posted and the fact that your DB is all on SSD I wouldn't worry about tuning the SAN right now.
The VM settings for CPU,Memory and IO are probably more likely to provide measurable improvements.
Plus the other network parameters and -B2 changes you are already working on.
Okie doke. Thanks so much to everyone for comments. Writing my plan of attack.