RHEL 5.6, OE 10.2B07 64-bit
I am having difficulty with tuning -B2 in a customer database. In the course of investigating and addressing performance issues, I took a maintenance window to make some startup param changes, do some periodic maintenance (dbtool fix SQL width, update SQL stats), and assign objects to the ABP. In total I assigned 17 tables and their 42 indexes to the ABP (at the object level), along with the schema area (via proutil enableb2). All tables are in areas with appropriate RPB.
I did my initial sizing exercise mostly with dbanalys data. Indexes are straightforward; I took block counts from the report. For tables, I took record data size from the report. Knowing that this isn't an accurate measure of the size of the blocks that hold these records, I doubled the reported size to give a margin for error. And for the schema area, I simply took the size of dbname.d1 and divided by DB block size. I set -B2 to 10,888.
I restarted the DB and its batch clients and did some other maintenance work, like the SQL stats. I noticed, while in promon R&D 6 11, that there was heavy LRU2 latch activity. Sure enough, R&D 2 3 showed LRU2 replacement policy enabled. I thought I must have sized -B2 too small. So with proutil increaseto I bumped up -B2 from 10,888 to 20,000, thinking that that was *definitely* enough, and I used promon to disable the LRU2 mechanism and zero out the latch counts. A couple of hours later, the LRU2 mechanism was enabled again, so I doubled -B2 again to 40,000 and manually disabled LRU2 again and zeroed out the latch counts. Since then, even at 40,000, I have again seen the LRU2 mechanism enabled several times.
I can't understand why this is happening. Sure, I could be off somewhat on my rough estimate of space needed for RM blocks. But I didn't think I could be off by a factor of 8.
I decided I needed a better way of calculating table size in blocks. (As an aside, I would *really* like to have dbanalys/tabanalys report table sizes in blocks, for Type II areas.) I thought of two ways to measure blocks needed for tables. One is to do a FOR EACH on each ABP-assigned table and count the unique dbkeys. The second is to take a backup of the DB, restore it to a new location, start that DB with a large -B2 and do FOR EACH on each of those tables and look at the ABP OS reads.
The first approach (counting dbkeys) gave me a grand total of 3,964 blocks for all 17 tables. From dbanalys, I have 2,207 blocks of index data. And based on the 17 MB size of my schema area, if is 2096 blocks in size, for a total of 8,267 blocks. The second method (restore backup, set large -B2, look at OS reads) gave me a similar measure of 4,237 blocks for table data, meaning a total of 8,540 for -B2. That's a larger discrepancy than I can explain, but in either case the original -B2 setting of 10,888 should have been enough, let alone 20,000 or 40,000. For the latter method, I did run viewb2 reports against both DBs and diff them to ensure they had the same ABP assignments.
These tables aren't absolutely static. There is some CUD activity on some of them, but even the highest numbers of creates are low; about 20-30 records created per day, with record sizes of about 50-100 bytes. All of the assigned indexes belong to assigned tables; there are no indexes of large, fast-growing tables in the ABP.
So I'm open to suggestions. Am I missing something obvious? Is there a known ABP bug in 10.2B07? Also, for those who may question the version, I'm planning to install SP08 in the short term (a month or so), and the client has begun an upgrade project, to move to our latest application version on 11.4. That should go live in about October.
Hard to say without seeing all of the information myself but here a few things to consider...
- Are there any LOB columns for those tables?
- If you make the B2 huge on the restored copy and run a dbanalys how many blocks show up used in B2?
What do the promon screens show for the alternate buffer pool?
Thanks for the input.
> - Are there any LOB columns for those tables?
I should have said; there are no LOBs.
> - If you make the B2 huge on the restored copy and run a dbanalys how many blocks show up used in B2?
That's more or less what I did, except that I used ABL instead of dbanalysis to do the reads. I'll bounce the test DB and run dbanalys on it and report the results.
> What do the promon screens show for the alternate buffer pool?
Over about the last two days I'm showing about 1.2 billion logical reads, about 7800/second, in the ABP. In that time there are 32 OS reads in ABP. So that suggests I'm very close to having a large enough -B2. As I write this, LRU2 is enabled again. I will bump up -B2 by another 1,000 blocks and see whether LRU2 replacement remains disabled after that.
I started my test DB with -B2 100,000. It's a copy of prod.
After proserve, I have 75 ABP OS reads.
After running dbanalys I have 18,455 ABP OS reads.
After running a table scan on each table assigned to ABP (not including schema area), I have 18,516 ABP OS reads.
After two consecutive runs of a program that scan each assigned application table with each assigned index, I still have 18,516 ABP OS reads.
After another dbanalys, still 18,516.
I ran proutil viewb2 to confirm that all assignments in the test DB match the prod DB; they do.
- I have no idea why this new total is so much higher than my earlier calculation;
- I have no idea why, despite this total being much smaller than my current prod -B2 setting of 40,000, I'm still getting LRU2 latching in prod.
> If you do not use -Bp 100 (or some small value) when doing an online backup LRU2 will get enabled.
I believe you're referring to defect OE00208701; it was fixed in SP05. And my backup script uses -Bp 10 for online backups.
So there is something else going on. I agree that it could have been caused by some other utility, like dbtool.
Did you check the contribution of each individual table and index using _TableStat-OsRead/_IndexStat-OsRead?
BTW, dbanalys does not make friends with _TableStat/_IndexStat. Use 4GL queries for tests.
> Did you check the contribution of each individual table and index using _TableStat-OsRead/_IndexStat-OsRead?
I did check _TableStat-OsRead, in prod. The numbers didn't make sense to me. Each of the 17 tables showed far more OS reads than my recid calculations showed should be necessary.
My calculations show that my ABP-assigned tables vary in size from 1 block to 1,560 blocks. The OS reads varied from 993 blocks (for a table I calculate to have 74 blocks) to 6,352 blocks (for a table I calculate to have 1,311 blocks). Suspiciously, six of the tables had identical values for OS reads: 1,016 blocks; a seventh had 1,017 OS reads. My calculations show these seven tables vary in size from 2 to 258 blocks.
On the one hand, I would expect OS reads to be higher than table sizes if -B2 was actually too small, as evicted blocks could be re-read from disk several times. And that would be consistent with the heavy LRU2 latch activity I have seen.
Based on the current assignments, roughly half the logical I/O in the DB is now in the ABP. The piece I am missing is: if the size of -B2 is now 40,000, as it has been for a week and a half now, why have any evictions been required in recent days? The only thing that would make sense is if the proutil increaseto didn't actually increase the size of -B2, but reported that it did. It will be interesting to see if this problem persists after a DB restart.
I should also mention that when I ran increaseto I also changed the startup script for the DB to have the new value of 40,000 for -B2, and the DB hasn't been restarted since I did maintenance on 06/23, and promon R&D 1 12 shows 40,000 for -B2. Also, in case it matters, I haven't yet had evictions from -B. It is set to 3,155,360 and I currently have 29,112 empty buffers in total.
> BTW, dbanalys does not make friends with _TableStat/_IndexStat. Use 4GL queries for tests.
Sorry George, I'm not sure I follow; can you elaborate please? Are you referring to the fact that dbanalys uses C code rather than the 4GL query engine to gather its stats?
Also, is it safe to assume that dbanalys reads each DB block no more than once? If not then my test result is invalid.
The reason I suggested dbanalys is that it usually faster than reading the tables and it will read all of the table and index blocks (where a for each will just use one index per table).
I guess I should have been clearer before - what I was wondering about is how many blocks used are showing up in the B2 compared to the startup parameter. If you are running out of blocks that is one issue... if you have free blocks and LRU2 keeps coming back then it is a different issue.
OS reads can be spotty and SQL doesn't update the VSTs properly for Type II areas until you get to 10.2B08 or 11.4 (possibly some SP of 11.3).
What does dbanalys show for fragments on your tables? You can use RECID to find the parent block but that doesn't mean that your record isn't stored in more than one block.
You make a good point about fragmentation; I negected to consider that with my dbkey-counting approach. And that could account for the discrepancy between the results I got from that and from dbanalysis.
Of my 17 assigned tables, two have fragmentation. One has 6 extra fragments and the other has 606.
Regarding blocks used, are you referring to _BuffStatus._BfStatus-UsedBuffs? It is 3,166,250. As far as I am aware there isn't a separate promon/VST field for used buffers in -B versus -B2.
I meant that dbanalys does not update _TableStat-read/_IndexStat-read. It does not read the index keys (but it reads the index blocks). That is why it does not update _IndexStat-read. It does read all fragments of each records but dbanalys does it in a way that differs how 4GL does. For example, dbanalys can read the records with sizes that exceed 32K. It does not consolidate the fragments in one record. I guess it's a reason why it does not update _TableStat-read. I did not check if dbanalys updates _TableStat-OsRead/_IndexStat-OsRead or not. BTW, idxfix (with one exception) also does not update _TableStat/_IndexStat. I think it's just a bug.
Thanks for the explanation George. I think at this point my best path forward is to installed SP08, increase -B2 in the prod DB yet again, bounce the DB, and monitor ABP OS reads and individual table and index OS reads from application activity. I'll hold off on running utilities like idxcompact etc. until I am sure I have -B2 sized appropriately.
Total buffers: 3195362
Hash table size: 1037347
Used buffers: 3166250
Empty buffers: 29112
On lru chain: 3155361
On lru2 chain: 40000
On apw queue: 0
On ckp queue: 0
Modified buffers: 830
Marked for ckp: 0
Last checkpoint number: 153
LRU force skips: 100
LRU2 force skips: 100
-B = 3,155,360
-B2 = 40,000 (was 10,888 at DB startup; was increased online to 40,000)
Used buffers = 3,166,250 LRU + LRU2 = 3,195,361
So used buffers < (LRU + LRU2). And empty buffers > 0.