Hi, I was hoping someone might be able to give some guidance/suggestions on a fairly major performance issue we are having at a customer site where basically most days we seem to see a period of up to an hour where the system performance pretty much grinds to a halt.
For a bit of background the customer is running on AIX 5.3 and OpenEdge 10.2A, The application is running AppServer with around 70 users, plus a couple of batch processes running on the server itself. Our database is now fairly large at getting on for 90GB. Their server has 6 mirrored disks which we currently have split as operating system, database, bi and database indexes (I know this isn't good practice but its something we recently tried to split the main database across two disks and isn't the cause of the problem as we saw the same thing before!), ai, interface files, and backups.
We generally see the system start to slow down around the same time - though not at exactly the same time so we can't tie it to one process - pretty much every day.
We have OpenEdge Management installed on the server and so have various monitors in place on disk I/O, APW, database reads and writes. During a period of system slowdown we can clearly see through Fathom that the reason the system is slowing down is because the disk I/O on the database drive has gone up to 100%. Our problem is we need to identify why - and in the first instance identify whether it is our application, Progress, the server, the O/S, or something else.
I've attached a few graphs from Fathom, but basically these show that when the system slows down disk I/O on the database drive goes up to 100%. Disk I/O on the mirrored drive for the database also goes up to 100%. The disk I/O on the drive that has the indexes and bi stays roughly similar - there's some evidence that it actually drops - when during the general course of the system running we tend to see more activity on the index disk than the database one (because we have too many indexes in our database!). We also see a drop in the APW performance, and a drop in the number of database area writes according to Fathom.
To me, with my fairly limited knowledge of AIX, this means that something is writing to the databse disk drive (as the mirror also peaks), but this isn't the database itself being written to because database writes are actually dropping. It isn't our server - it is the customers - but we're as certain as we can be that there is nothing on this disk other than our database. What there is on this disk is the database log files, but they don't appear to be getting hit during this period.
Can anyone make any suggestions about firstly whether this is a Progress database issue, and how we can identify what is causing the slowdown? Assuming it is genuinely something being written to on disk how do we identify what is writing to disk during this period, and what file(s) are being written to? Is there anything else we can look to monitor in Fathom Management?
Any help would be greatly appreciated!