Setting the -pica parameter for OE Replication - OpenEdge RDBMS - Forum

All Replies

Posted by James Palmer on 15-Sep-2017 04:41

Where did you read that -pica should be the same on all databases? Because that's news to me, and something I fall foul of.

Which version of Progress are we talking by the way?

Posted by lbryan on 15-Sep-2017 04:48

knowledgebase.progress.com/.../P104638

Posted by Paul Koufalis on 15-Sep-2017 08:31

This advice should apply to 99% of OE environments:

If you are on any version before 10.2B08, use the maximum allowable pica value in your version. In 10.2B08 the max was raised from 8192 to 1,000,000 (I think they were sick of raising it every couple of years). In that case, use 32,768.

Larger pica values are required if a) the total AI writes is VERY high and/or b) the bandwidth to the target is relatively slow.

You cannot base your pica calculation only on AI writes unless you have a very fast (read local) network. The minute you're going over a WAN, bandwidth varies wildly. Last week one of our customer's replication was falling behind every day and almost catching up every night. We were tracking it in ProTop, watching AI files fill and get locked and fill and get locked. Replication status was stuck in "synchronizing" mode and never quite got back to "Normal Processing". The network admins swore nothing was wrong and only 1% of the WAN bandwidth was being used.

It must be a Progress bug!! Not.

Copying a 1 MB file from source server to target server took minutes. Copying it back from target to source server took one second. Yeah, must be a Progress issue.

The lesson here is to monitor pica usage and to stop the replication server process before the pica buffers fill, otherwise forward processing for ALL users will be stalled and your phone will start ringing off the hook. Monitor replication server AND agent status (both sides) as there are bugs in earlier versions whereby the source will report "Normal Processing" and the target will report "Listening". ProTop does all of this and more.

Regards,

Paul

Posted by ChUIMonster on 15-Sep-2017 08:46

To complete Paul's story...

... after pestering the network vendor they had an "aha moment" and suddenly found something to change. Normal bandwidth was restored and the replication caught up in a few hours and returned to its normal state of being always up to date.

Posted by n.lewers on 15-Sep-2017 16:22

When you say that "watching AI files fill and get locked and fill and get locked", is it then possible to fill up all available AI extents on the source DB and if so what happens if that is the case? I would assume that the source db would then generate an error and pause, but does it give an AI write error or a replication sync error? Thanks,

Posted by Paul Koufalis on 15-Sep-2017 18:57

If AI files are fixed size, once all of them fill your database will either crash or stall, depending on the use or not of the -aistall DB startup parameter.

If AI files are variable and are rotated every x minutes, (n-1) AI files will fill (status: locked) and the nth AI file (the busy one) will continue to grow until you hit some external limit: 2 GB file size if large files are not enabled, or you fill the file system. When this happens, again the DB will crash or stall depending on the use of -aistall.

As this is happening, the replication server and agent processes would be generating error messages in the database log files.

To summarize, you should be monitoring pica usage, AI file status (free, full, locked), replication server and agent status and the database log file. You can write your own scripts to do this or you can use ProTop to generate alerts when configured threshold(s) are reached. For example, you could configure ProTop to send an email when 2 AI files are locked and to send an SMS when 4 AI files are locked. Please note that alert generation is a paid add-on to ProTop Free.

Regards,

Posted by lbryan on 18-Sep-2017 06:02

Thanks. One of the databases will have a high volume of ai writes so I think I should start with this database to get the most accurate value for -pica.

I'm not sure how to check the network speed. Just copy a file and see the speed it says?

I should have said - I am using version 11.6.

Posted by Paul Koufalis on 18-Sep-2017 07:43

Your network admin can tell you the theoretical bandwidth and then you can copy some largish files (hundreds of MB) to estimate real bandwidth.

Another thing that I like to do is rotate AI files every 15 minutes or less and chart the file sizes. If the peaks are high relative to my network bandwidth, I might ask the network admin to spit out a chart of bandwidth utilization between the main datacentre and the DR site. You may find that your peak AI writes correspond to peak network usage, requiring some QoS configuration.

Posted by Dmitri Levin on 22-Sep-2017 14:50

Paul

Is there a way to quantitatively monitor -pica usage with either promon or VST or anything else?

Other than it is Good ("Normal Processing".) or Bad ("synchronizing" )

All the rest that you write above sounds pretty straight forward:

Count how many AI are Locked, check the status of server and agent being 1101 and 3101. We can even rotate AI files faster ( 15 min ) or slower ( 60 min or even 2 hr ) based on how many AI are locked out of total number that are available.

If server status not 1101 dsrutil -C restart server

If agent status not 3101 dsrutil -C restart agent

Posted by Libor Laubacher on 22-Sep-2017 14:59

promon - r&d - 1 - 16 I think

Posted by Paul Koufalis on 22-Sep-2017 15:00

Dmitri,

I think you're asking for promon - R&D - 1 - 16. That's the only way that I know to monitor pica usage and that's how ProTop does it. There is nothing in the VSTs.

FYI I use dsrutil db -C status -detail and check for 6021 or 3049.

FYI #2 (not for Dmitri but for other readers) dsrutil db -C restart agent is only available in 11.6+. Prior to that, you had to bounce the target DB to restart the agent.

Posted by Rob Fitzpatrick on 22-Sep-2017 15:35

You can get R&D 1|16 info from the _DbServiceManager VST in 11.6+. Example:

find _dbservicemanager no-lock.         
display _dbservicemanager with 1 column.

ID                                   : 1                
Size of Pica                         : 8193.08          
Available Pica Message Entries       : 74908            
Free Pica Message Entries            : 74908            
Queued Pica Message Entries          : 0                
Used Message High Water Mark         : 15288            
Number of Times Pica Filled          : 0                
User Number of Last Pica Latch Holder: -1               
Number of Times Locks Taken Out      : 799392373        
Number of Lock Collisions            : 2021             
PID of Lock Holder                   : 0                
User Type of Lock Holder             : Not available    
User Number of Lock Holder           : 0

(I fudged the column labels for the sake of clarity.)

More info on the new and updated VSTs, and other OE Repl enhancements:

http://pugchallenge.org/downloads2017/372_Whats%20New%20in%20Replication.pptx

Posted by Dmitri Levin on 28-Sep-2017 17:28

Thank you Libor, Paul and Rob

Communication Area Size : 16384.00 KB

Total Message Entries : 149796

Free Message Entries : 149796

Used Message Entries : 0

Used HighWater Mark : 216

Area Filled Count : 0

Service Latch Holder : -1

Access Count : 589755

Access Collisions : 114

Now I know I set -pica too high at 16384.

Used HighWater Mark shows that.

So the goal is that HWM is somehow close to Total Message Entries, like half or more, right?

Area Filled Count should be small (single digit) or zero?

Access Collisions ???

Posted by kevin hermans on 29-Sep-2017 03:55

I see you have also Access Collisions is there somebody who can tell something more about this parameter.
I think there is no problem if it not exceeds a certain proportion to the Access Count but I'm curious.

Communication Area Size   :       65.00 KB
Total Message Entries   :         595
Free Message Entries    :         595
Used Message Entries    :           0
Used HighWater Mark     :         495
Area Filled Count       :           0
Service Latch Holder    :          -1
Access Count            :   213971543
Access Collisions       :       65747

Posted by George Potemkin on 29-Sep-2017 04:13

My bet: access count and access collisions are similar to the locks and the naps for the latches. Service latch is an administrative (a.k.a. “private”) latch. The administrative latches are quite different from the spin's latches but any latches are born to resolve the collisions.

Posted by George Potemkin on 29-Sep-2017 07:59

Example of pica filled:

Communication Area Size: 20500.00 KB
Total Message Entries: 187428

Time     Free Msgs  Used Msgs  Used HWM  Filled Cnt  Access Cnt  Collisions  Locked by Replication Server
04:00:28    187428          0     39351           0   146653272       24813          0
04:30:29    176446      10982     39351           0   146787113       24823      10982 
05:00:30    129880      57548     57554           0   146903954       24837      57548 
05:30:31     82080     105348    105349           0   147012984       24864     105348 
06:00:32     40849     146579    146579           0   147115650       24895     146579 
06:30:33      1727     185701    185736           0   147259217       24934     185701 
07:00:34        90     187338    187428        2757   147399645       24985     187338 
07:30:35         9     187419    187428        6039   147546696       25036     187419 
08:00:36      6094     181334    187428        6826   147699472       25088     181334 
08:30:37       148     187280    187428        8676   147835357       25132     187280 
09:00:38       282     187146    187428       11422   147990051       25175     187146 
09:30:39        11     187417    187428       13920   148140550       25230     187417 
10:00:40         8     187420    187428       16565   148301390       25271     187420 
10:30:40         0     187428    187428       19071   148474473       25320     187428   
11:00:41         6     187422    187428       20430   148667535       25351     187422 
11:30:42         0     187428    187428       21451   148860809       25367     187428 
12:00:43         0     187428    187428       22290   149052507       25387     187428
12:30:44         0     187428    187428       23280   149244708       25404     187428
13:00:45         6     187422    187428       24393   149439472       25427     187422 
13:30:46         0     187428    187428       25901   149634877       25463     187428
14:00:47         0     187428    187428       27601   149829466       25516     187428
14:30:48         0     187428    187428       30126   150026275       25562     187428

Access Collisions are almost the same all the time.

Used Message Entries = Locked by OpenEdge Replication Server

Activity: AI Log

Activity: AI Log

Time    Total Wrt  AIW Wrt Notes Wrt  Bytes Wrt BusyWaits Partial writes
04:00:28  6410836  6134415    860955K 98412861K  3506  248953
04:30:29    30234    29677   4079689    476171K    17     429
05:00:30    76472    75515  10441933   1211905K    40     616
05:30:31    73177    72211  10123924   1159241K    45     614
06:00:32    66706    65792   9253486   1055882K    45     614
06:30:33    86124    85119  11848009   1365994K    54     622
07:00:34    50341    49417   6874101    794400K    35     615
07:30:35    47549    46614   6566173    749848K    37     612
08:00:36    64745    63774   8997106   1024461K    46     620
08:30:37    44369    43458   6426670    698868K    43     621
09:00:38    29663    28685   4459687    464162K    41     609
09:30:39    41087    40112   6251794    646457K    56     624
10:00:40    28148    27162   4220942    439805K    91     617
10:30:40    26172    25201   4019687    408615K    99     595
11:00:41    13070    12416   2005665    201711K   173     443
11:30:42    10044     9531   1543874    154703K   112     359
12:00:43     8280     7677   1261205    126705K    68     347
12:30:44     9675     9166   1479821    148782K   149     362
13:00:45    11091    10619   1723126    171474K   185     359
13:30:46    15510    14665   2414339    240059K   287     488
14:00:47    17176    16211   2681374    266200K   226     514
14:30:48    24562    23546   3876160    383844K   154     538

And, of course, the filled pica queue cause the slowness on the source db (Total AI Writes are decreased by 10 times).

Posted by Jeff Owen on 29-Sep-2017 08:04

Access collisions is the number of times that a process tried to add to the queue or remove from the queue (repl server) and found

that the queue was locked. It can be used with access counts to determine what level contention exists when accessing the queue.

Your example suggests that there is very little contention on the pica queue. And that interactions with the pica queue are not affecting

the performance of the source database.

-jeff

Posted by George Potemkin on 29-Sep-2017 08:35

> Access collisions is the number of times that a process tried to add to the queue or remove from the queue (repl server) and found that the queue was locked.

What is a typical duration of a queue lock and what are the factors that can affect it?

Posted by Jeff Owen on 29-Sep-2017 09:56

Once the lock is obtained, a free entry is either available or it is not. Search of the queue is not required to find a free entry while

the lock is held. If a free entry is not available, the lock is released. If a free entry on the queue is available, a small amount of data

is updated in the message entry ( 112 bytes). Then the lock is released. So the lock is held for a very, very, short time in duration.

-jeff

Posted by George Potemkin on 29-Sep-2017 12:16

Thanks, Jeff.

The pica queue lock is the same as service latch lock, isn't it?

> So the lock is held for a very, very, short time in duration.

Is it less than a microsecond as in case with the "spin" latches? Sorry I don't know a correct term to name the classical latches (like MTX, USR etc) in opposite to the administrative latches (like the pica service latch).

Access Count: 150,000,000 per 30 min in my example. If access count = lock count then we can roughly estimate the total duration of the queue lock as being less than 150 sec during 30 min interval or 8% (or less) of all time the queue is locked. Is it correct?

Access Collisions to Access Count ratio is only 0.017% in my example and 0.019% in Dmitri's example. Which processes and how activily are competing for the queue lock? Why the collisions are so rare?

UPD: The administrative latches seem to be used only for situations with low collisions (like BIADD latch used by prostrct add online utility). I guess they don't even use the spin lock mechanism. So the low access collisions in Service Manager seem to be expected.

Posted by George Potemkin on 01-Oct-2017 03:38

I missed that Status: Database Service Manager reports the counters since db startup. So I slightly "re-worded" the same statistics I posted above.

Status: Database Service Manager


Time   Free Msgs  Used Msgs  Locked by RPLS  Filled      Access  Collisions  AI Blocks  Percent  Access/Block  Notes/Block
04:00:28  187428          0       0               0 146,653,272       24813  6,161,883    96%     23.80          140
04:30:29  176446      10982   10982               0     133,841          10     29,805    99%      4.49          137
05:00:30  129880      57548   57548               0     116,841          14     75,856    99%      1.54          138
05:30:31   82080     105348  105348               0     109,030          27     72,563    99%      1.50          140
06:00:32   40849     146579  146579               0     102,666          31     66,092    99%      1.55          140
06:30:33    1727     185701  185701               0     143,567          39     85,502    99%      1.68          139
07:00:34      90     187338  187338            2757     140,428          51     49,726    99%      2.82          138
07:30:35       9     187419  187419            3282     147,051          51     46,937    99%      3.13          140
08:00:36    6094     181334  181334             787     152,776          52     64,125    99%      2.38          140
08:30:37     148     187280  187280            1850     135,885          44     43,748    99%      3.11          147
09:00:38     282     187146  187146            2746     154,694          43     29,054    98%      5.32          153
09:30:39      11     187417  187417            2498     150,499          55     40,463    98%      3.72          155
10:00:40       8     187420  187420            2645     160,840          41     27,531    98%      5.84          153
10:30:40       0     187428  187428            2506     173,083          49     25,577    98%      6.77          157
11:00:41       6     187422  187422            1359     193,062          31     12,627    97%     15.29          159
11:30:42       0     187428  187428            1021     193,274          16      9,685    96%     19.96          159
12:00:43       0     187428  187428             839     191,698          20      7,933    96%     24.16          159
12:30:44       0     187428  187428             990     192,201          17      9,313    96%     20.64          159
13:00:45       6     187422  187422            1113     194,764          23     10,732    97%     18.15          161
13:30:46       0     187428  187428            1508     195,405          36     15,022    97%     13.01          161
14:00:47       0     187428  187428            1700     194,589          53     16,662    97%     11.68          161
14:30:48       0     187428  187428            2525     196,809          46     24,024    98%      8.19          161
15:00:49       8     187420  187420            2627     191,212          59     26,520    98%      7.21          160
15:30:51    2244     185184  185184            2285     175,022          70     28,485    98%      6.14          159
16:00:52   24712     162716  162716            1165     172,104          87     42,393    99%      4.06          161
16:30:53  103103      84325   84325               0     143,864          55     26,906    98%      5.35          152
17:00:54  123870      63558   63558               0      68,242          14     17,954    97%      3.80          145
17:30:55  141597      45831   45831               0      67,828           9     19,297    97%      3.51          146
18:00:56  187428          0       0               0     178,082          40     16,868    97%     10.56          149
18:30:57  187428          0       0               0     257,580          44      7,775    94%     33.13          143
19:00:58  187428          0       0               0     345,113          96     25,769    98%     13.39          149
19:30:59  187428          0       0               0     355,315          77     22,158    98%     16.04          137
20:00:59  187428          0       0               0     281,343          69     18,364    98%     15.32          137
20:31:00  187428          0       0               0      81,337          10        902    89%     90.17          142
21:01:01  187428          0       0               0      60,178           5        190    71%    316.73          138
21:31:02  187428          0       0               0      59,145           2        247    79%    239.45          140
22:01:03  187428          0       0               0      55,854           4        130    71%    429.65          132
22:31:04  187428          0       0               0      53,761           0         28    25%  1,920.04          125
23:01:05  187428          0       0               0      57,486           1        161    65%    357.06          137
23:31:06  187428          0       0               0      48,780           1         15    42%  3,252.00          132
00:01:07  187428          0       0               0      51,271           3        658    96%     77.92          147
00:31:08  187428          0       0               0      50,421           3      1,696    98%     29.73          143
01:01:09  187428          0       0               0      78,023           3        442    71%    176.52          140
01:31:10  187428          0       0               0     172,987           4      2,707    83%     63.90          143
02:01:11  187428          0       0               0      80,442           1        281    51%    286.27          142
02:31:12  187428          0       0               0      85,199           1        593    68%    143.67          134
03:01:13  187428          0       0               0     185,981          23      9,860    97%     18.86          132

where:

First interval - 410 hrs 14 min (db uptime).
The rest intervals - 30 min.
"Filled", "Access", "Collisions" are the increments per interval of the corresponding counters in Status: Database Service Manager ("Area Filled Count", "Access Count", "Access Collisions").

"AI Blocks" - the number of AI blocks generated per interval = "Total AI Writes" minus "Partial Writes" (from Activity: AI Log)

"Percent" = "AI Blocks" / "Total AI writes" * 100% = an indirect indicator of idle db.

"Notes/Block" = "Records written" (from Activity: AI Log) / "AI Blocks". AI blocksize = 16K. Average size of recovery notes is 112 bytes in the given example.

Conclusions/Remarks:

An unknown process(es) on source db is creating the collisions trying to access the pica queue but all 47 checks by promon show that the message entries locked only by Replication Server. The collisions are extremely low even during the most busy time: 96 collisions per 30 minutes (or one collision in 20 sec). In other words, almost 100% of the pica queue accesses are generated by Replication Server. I guess the ABL sessions do not access the pica queue. In my example there were about 2,500 sessions. It's unlikely that Replication Server would be able to dominate over their access requests. So the access collisions can be created only by the special processes that are working with AI files. UPD: Documentation says: "AIW and REPL attempting simultaneous access".

Replication Server generates at least one access per incoming AI block (e.g. 102,666 accesses per 66,092 AI blocks). When transaction activity is low the number of pica queue accesses per AI block can be huge. In other words, Replication Server accesses the pica queue at least 27 times per sec (e.g. 48,780 accesses per 30 min with only 15 incoming AI blocks). For comparison: at busy time db was able to generate 85,502 AI blocks per 30 min or 48 AI blocks per sec.

Max value of "Area Filled Count" per 30 min interval was 3,282 or almost twice per sec.

Posted by George Potemkin on 01-Oct-2017 06:53

> Now I know I set -pica too high at 16384.

> Used HighWater Mark shows that.

16 MB of RAM per db is not a big price for insurance. ;-)

> So the goal is that HWM is somehow close to Total Message Entries, like half or more, right?

Unfortunately HWM is almost useless without date/time when the message entries had rised to the specified value. It could take place a month ago due to some particular conditions at this time (e.g. abnormal peak of transaction activity, a network issue between source and target dbs etc). The rest of time (untill db restart) you will see a "reminder" about past bad day that does not matter for the current situation.

> Area Filled Count should be small (single digit) or zero?

Only zero! Other values would mean that performance on source db was considerably degradated due to a short pica queue at least for some period of time.

> Access Collisions ???

It's a field that can be totally ignored.

Posted by Jeff Owen on 02-Oct-2017 07:36

The duration of a lock of the pica queue is less than a millisecond. Processes that flush AI buffers add entries to the pica queue and the replication server reads entries from the queue. I should note that the pica queue is not used during the startup synchronization process. During synchronization, the repl server reads directly from the AI extent(s). The pica queue is then “turned on” once is gets close to the most recently written AI blocks. The pica queue is then used during normal processing. I would assume your contention is very low because you are running an AIW on the source. The only contention in this case would only between the AIW and the repl server.

Generally, you should see very little contention for the pica queue with an AI configuration that is well tuned. Otherwise, we’d like to hear about an environment where a high level of pica queue is seen.

-jeff

Posted by George Potemkin on 02-Oct-2017 08:47

Earlier mentioned knowledgebase article "Clarification of the -pica replication enabled database startup parameter"

knowledgebase.progress.com/.../P104638

says: "There is a PICA queue on both the source and target databases"

Should the -pica parameters be the same on source and target databases or should we tune them individually? These databases are running on a different hardware and under a different load.

I guess that in the customer's case I posted above the performance degradation on source db was caused by slowness of replication agent on target db. Theory: replication server was spending a lot of time waiting for the acknowledgments from replication agent that in turn resulted in the filled pica queue on source db => source's APW was waiting for pica access => the updates on source db was very slow. Is it a reasonable explanation?

Will we also get the negative impact on source database if the pica queue is filled on target database for any reasons? Will it cause the filled pica queue also on source db as well?

Posted by George Potemkin on 02-Oct-2017 09:08

Comparing the pica queues on source and target dbs:

Source:

09/27/17        Status: Database Service Manager
16:30:53        
Communication Area Size   :      20500.00 KB
  Total Message Entries   :      187428
  Free Message Entries    :      103103
  Used Message Entries    :       84325
  Used HighWater Mark     :      187428
  Area Filled Count       :       36203
  Service Latch Holder    :          -1
  Access Count            :   150708477
  Access Collisions       :       25833
Registered Database Service Objects
Name                             Rdy Status  Messages Locked by
OpenEdge Replication Server       Y  RUN        84325

Target:

09/27/17        Status: Database Service Manager
16:30:49        
Communication Area Size   :      20500.00 KB
  Total Message Entries   :      187428
  Free Message Entries    :      187428
  Used Message Entries    :           0
  Used HighWater Mark     :           0
  Area Filled Count       :           0
  Service Latch Holder    :          -1
  Access Count            :    24717946
  Access Collisions       :         141
Registered Database Service Objects
Name                             Rdy Status  Messages Locked by
OpenEdge Replication Agent        Y  RUN            0

Access count on source db is 6 times larger than the one on target.
Large Area Filled Count on source db and zero HWM on target.

Posted by Jeff Owen on 02-Oct-2017 10:16

The pica queue on the target does not need to be same on the source and the target. The default size on the target is likely sufficient since the queue is only used by dsrutil to terminate and agent, restart and agent (11.6+), and to start transition. When configuring restart after transition, the value used for the restart should be the same as the setting for the original source.

Regarding the customer’s problem, the pica queue on the target should not fill up during the processing of the AI blocks. The queue on the target is not used as part of the replication process. The repl server generally doesn’t wait for acknowledgments from the agent so I would be skeptical of that theory. The agent may not be keeping up with the source. How similar in configuration are the source and target servers? Is a BIW running on the target? And if AI is configured on the target, is an AIW running?

Posted by George Potemkin on 03-Oct-2017 04:45

BIW is running on the target. AI on the target is not enabled.

The customer's case: a rather high transaction activity (~ 0.7 MB/sec of AI/BI writes) caused the persistent increase of the used message entries in the pica queue on source db. When the pica queue was totally filled we got 100% busy MTX and AIB latches. It means that almost every snapshot in promon shown an owner of these latches. MTX latch was owned by ABL users (each time a different one). AIB latch was always owned by AIW process. Just for a record: the competition for AIB latch between ABL users are not possible because they always lock AIB (and BIB) latch under cover of MTX latch: they lock MTX latch first and only then they're requesting AIB/BIB lock. I guess there are only a few special processes that can compete for AIB latch directly but mainly it's AIW.

The number of the AIB latch locks was rather low for a busy latch: 700-2000 locks per sec. Under normal conditions the lock duration of AIB latch is less than 100 nanoseconds. Hence to make the latch to be 100% busy we need 10 millions locks per sec - it's impossible to get such high numbers of AIB locks in a real database.

> The duration of a lock of the pica queue is less than a millisecond.

It's a huge time compared to the duration of the "spin" latches like AIB. And it also means that we can create only about a thousand locks per sec (that matches the numbers above).

New explanation of the customer's case:

I guess the AIW process is requesting the pica queue lock while it holds AIB latch. When AIW is forced to wait for the pica queue lock then the duration of AIB latch is increasing by 10 thousands times - from 100 ns to 1 ms. ABL process that already owns the MTX latch is forced to wait for AIB latch. That is why the MTX latch became 100% busy at the same time with the AIB latch. The number of MTX/AIB/BIB latch locks are closed to each other - the difference between MTX and AIB latch locks used to be plus/minus 1%, the difference between MTX and BIB latches used to be plus/minus 10% (or more during the idle time).

If this my "theory" is correct then the only solution is to have a large pica queue to avoid its total filling during the periods with highest transaction activity.

Posted by Jeff Owen on 03-Oct-2017 07:42

I misspoke regarding the duration of a pica queue lock. The duration of a pica queue lock is less than a microsecond. The mechanism is the same as with “spin” latches. My apologizes for any confusion.

It’s not so much, in the customer’s case, that is it is waiting for the pica queue lock, it’s that it’s waiting for a free message entry on the queue. The APW is queuing messages faster than the repl server can read AI blocks and send them to the target. The repl server’s job can be affected by network bandwidth, the target server being less performant than the source server, and if the target is not tuned as well as the source. In any case, I would agree that the solution for the customer is to increase the pica queue size so that it is large enough to accommodate the highest level of transaction activity.

Posted by George Potemkin on 03-Oct-2017 09:10

Thanks again, Jeff.

> The duration of a pica queue lock is less than a microsecond. The mechanism is the same as with “spin” latches.

How long AIW will wait before a second try if it fails to get the pica queue lock? 10 ms? Does it use the -nap value?

Posted by Jeff Owen on 04-Oct-2017 06:56

Yes, 10 ms. No for using -nap value, it is not configurable.

-jeff

Posted by Dmitri Levin on 06-Oct-2017 11:43

>The customer's case: a rather high transaction activity (~ 0.7 MB/sec of AI/BI writes) caused the persistent increase

>of the used message entries in the pica queue on source db.

What was the -pica value? And if it was increased, what is it now?

We have pretty similar transaction activity.

>BIW is running on the target. AI on the target is not enabled.

Why would one enable AI on Target DB and is it even possible?

Posted by Valeriy Bashkatov on 10-Oct-2017 03:18

>Why would one enable AI on Target DB and is it even possible?

Yes it is possible. Someone use AI on the Target database to archive AI extents from Target not from Source, but for me this decision is doubtful, since there is a possible a time lag between the Source and Target.

However, if you want to use a Replication Set in OE 11.7 you must activate after-imaging on your target databases for this.

This thread is closed