Can you replace that code with:
for each customer no-lock:
find address no-lock where address.addresskey = customer.addresskey.
If you can do that -- does it perform better?
If you get a runtime error on the FIND because the WHERE is not unique then that is why your ratio is off. Baically if you have an index on the address table whose leading component is "addresskey" and which if address key is unique then you do not need the FIRST and the query will be as efficient as possible.
If you do not have such an index then you have to do extra work to find the FIRST record. (And that FIRST record is also denormalized black magic which will probably cause a bug some day.)
What's a "bad" amount of fragmentation? Depends.
When a row is fragmented, that means that the row (almost always) occupies more than one data block. This results in additional disk I/O and additional database buffer pool accesses which in turn affects performance.
If all the rows in a table are in two fragments, this is probably bad. But if it is a table that doesn't get read very often, maybe it doesn't matter. So you have to consider the activity on the fragmented table in comparison to the overall workload. protop can show you that.
FOR EACH icmas.customer FIELDS (addresskey) NO-LOCK.
FIND icmas.address WHERE icmas.address.addresskey EQ icmas.customersite.addresskey NO-LOCK.
This has next to no difference in performance.
I'm trying the dump and load approach in my development environment. So I've dumped the two tables and loaded them. It suggested I might want to rebuild the indexes. On doing this I get:
Fix RECID 1650335137, CustomerSite already exists with Customer Site Key 554968.
72k of these errors.
Which suggests it's duplicated records. I'm guessing I missed a step. Any ideas what?
No that would have been sensible wouldn't it?! :)
Is it just a for each, delete, or is there a more simple way?
Thanks Julian. Decided to drop the table and recreate with an incremental df.
One question - when I dumped the tables I ended up with 2 files for one table and 4 files for the other. I've had to load them back in one at a time using
proutil icmasliv -C load c:\temp\customersite.bd
1 command for each file.
Is there a way of getting it to load them all in one?
Maybe I should start a new thread called James's high demand random question relating to DBA work thread. Feel like I've got loads of stupid questions!
"Next to no difference" measured how? The ratio of logical reads to record reads? Or some flaky timing comparison?
The difference in runtime going from from a 3.5: 1 to a 2:1 ratio when we are only talking about 10,000 record reads is going to be minuscule. Either way it is a small fraction of a second. If this is all about making a query like that go noticeably faster you're probably barking up the wrong tree anyway.
If you have some bit of code reading roughly 10,000 records and it is taking more than a fraction of a second the problem isn't that the ratio of logical reads to record reads is slightly higher than ideal -- the problem lies elsewhere.
Measured on logical vs record reads. The issue is that a colleague is running a similar query over an Appserver from a web interface and it's taking up to 20 times as long to run in live as soon as you add the address lookup compared to just having the customer. Buffer hits are pretty poor.
Obviously the long term solution is to improve the buffer hit % for example. But to do that we need to do a lot of upgrading - not least 64bit OS and DB executables so we have a decent amount of RAM for the -B.
I'm just trying to work out if there is anything we can do in the short term to improve the performance whilst we wait for the slow cogs of industry to turn and give us what we need to improve our hardware.
james, are we talking about your database or your colleague's ? i think we need to see metrics from your colleague's. and configuration parameter settings. problem could be caused by many different things.
You are reading twice as many records when you add "address" to the query so 2x makes sense. The fragmentation seems to be about 20% -- so that's another 20% -- still a far cry from 20x.
I'll be surprised if dumping and loading has much of an impact.
If "buffer hits" (or the lack thereof) is truly the problem then you would normally expect that running the same query twice in a row would show a big difference in response time (the 2nd run should be much faster because the data is in the buffers). Have you tested that? Is that your experience?
If that isn't the case then it is more likely to be related to the transfer of data between the app server and the client or something about the code that surrounds the query rather than the query itself. (Or some unrevealed aspect of the query being run on the app server that differs from what you are telling us.)
Your colleague is also querying roughly 5,000 x 2 records? How long does that take? Are we talking fractions of a second? A second or two? Minutes? Hours?
gus - overall we're talking about production. And I can provide the metrics from there. It's just harder to pinpoint what is my query, and what is noise. There's not a huge amount of point providing configuration settings at this time as we're (hopefully) moving to a more modern server in the next few weeks meaning we'd have to reconfigure anyway.
Thanks everyone for their input. It has been helpful.
ChUIMonster We tested the query using FIELDS( ) so that it would minimise what is brought across the wire.
Ultimately I think this discussion is now moot. We need to upgrade our servers and then spend time configuring them to our requirements. Until then anything else is a waste of time.
I disagree about the discussion being moot. There is much to be learned even if you cannot act on things immediately.
For instance -- using FIELDS reduces the size of the data but it doesn't really speak to how the marshaling and de-marhsalling of the data is managed. There are many ways to do that and some of them have performance implications. Saying that you are using FIELDS and that, therefore, the transfer between the app server and the client cannot be a problem is nonsense.
Likewise -- deciding to try a dump & load based on fragmentatiion in your test database rather than the metrics from production would be ill-advised.
BTW -- ProTop can easily show you CRUD stats per table and index by user. That would go a long ways toward eliminating the noise.
Don't take this wrong but, from a performance analysis POV, you seem to be consistently focusing on minutiae and silver bullets. That's not unusual but it is also not going to solve your problem (except accidentally).
Thanks Tom. Not taken wrongly. :) I'm trying to learn but without being given time to learn. Last DBA training I had was the introductory DBA course provided by Progress back in around 2000. Learning everything else on the job. This place is very helpful. As are you! ;)
Thanks for the plug of ProTop. :D