Activity: Performance Indicators - Latch timeouts - Forum - OpenEdge RDBMS - Progress Community

Activity: Performance Indicators - Latch timeouts

 Forum

Activity: Performance Indicators - Latch timeouts

This question is not answered

Hello,

I'm trying to learn how to manage performance in an OpenEdge database. I've set up a monitoring system. And I decided to start with setting the -spin parameter.

With the database works from 1.7K to 3.2К users depending from time of day. On the database server we have 128 CPU. Several databases are running on the same server. Currently the value of -spin is set to 10000 for all databases.  On the monitoring system I set following thresholds for Latch timeouts: less than 300 - Ok(Green zone); more that 300 and less than 500 - Warning (Yellow zone); more than 500 - Critical (Red zone). I've noticed that most databases have the Latch timeouts value in the green zone (less than 300).

But in one database, the values of Latch timeouts are almost always in the red zone. On average 809, max 1.5 K.

The question is, should I increase or decrease the -spin value for this database? Or maybe I should pay attention to some other metrics?

Thanks in advance for any tips!

All Replies
  • BTW, what can cause the breaks on the graph of MTX naps in the test with -nosavepoint? Both breaks happened when the number of sessions was a few time smaller (23 and 50) than the number of CPUs (96). The breaks are rather sharp.

  • > On May 2, 2019, at 9:54 AM, George Potemkin wrote:

    >

    > I guess it’s not intended to use LKT to start or to commit a transaction.

    no, but there are circumstances wherein a lock table entry is added or removed while holding mtx.

    starting a transaction ivolves getting the next transaction id and allocating a transaction table entry.

    committing is a bit more complicated (e.g. 2pc) and among other things, requires releasing or downgrading locks which require fiddling with the lock table.

  • > On May 2, 2019, at 10:09 AM, George Potemkin wrote:

    >

    > what can cause the breaks on the graph of MTX naps in the test with -nosavepoint?

    how many bi buffers did you have? i will guess, with /absolutely no evidence/ (aka a "WAG") to support it, that you reached some sort of equilibrium point between spooling of before-image notes and activities of before-image writer.

    further experiments could shed light on this WAG

  • > how many bi buffers did you have?

    Default: -bibufs 20

    But the test with -nosavepoint does not use BI file.

  • Hello,

    It was a very interesting discussion. Thank you!

    About the MTX latch I also see large values per second for this database. That's not good?

    About 128 CPU. There are 15 databases running on this server. These fifteen databases are part of one large application. And it's only in one branch. A separate server is used for each branch. My colleagues told me that earlier, a long time ago, it was a single database. But because of the very high MTX, they had to eventually split this single database into several in order to spread the load and resource utilization, including for MTX.

    Also, each client connects to all fifteen databases at once when logging in to the application.

    For this database is set -Mn=900/-Mi=1/-Ma=30. However, I don't know why, for other databases on this server the value of -Ma is 1. That is, in this database one server process serves up to 30 clients, while in other databases one server process serves only one client. Is it possible that because of -Ma=30 we have high MTX and BHT latches in this database?

    Also, if I understand correctly, one physical server is running at the same time -Mn=900 * 15 DB server processes for remote clients. Wouldn't it be better if we had a lot of CPU for better performance?