Performace degradation between / /Power8 - Forum - OpenEdge RDBMS - Progress Community

Performace degradation between / /Power8

 Forum

Performace degradation between / /Power8

  • It would be a good idea at this point to open a case with Progress TS (or have your vendor do it, if you're an indirect customer).  They'll likely ask much the same questions we have, but they may be able to offer some good insight as well.  

    And if this is an emerging platform-compatibility or behaviour-change issue, the techs may already be aware of the issue and a workaround or fix.

    BTW you said you're in the process of migrating.  Does that mean the old box is still prod and the new one is test?  Or is the new one now prod?

  • I just ran a truss, and I'm seeing very high calls to statx looking for files that don't exist. I'll have to ask them what that's about. Topas has high Namei calls, but low disk throughput.

    Thanks for the truss suggestion.

  • We will be opening a case with OpenEdge.

    These new systems are still in testing.

  • The high number of stat calls for non-existent files sounds like propath searching.  It could be normal for your application.  It would be helpful to compare those numbers against prod, or a prod-like environment.

    Can you see the file names in your trace?  If so, is it a lot of files with .p or .r extensions?

  • Yes, most .p and .r's. 160000 calls in approximately a minute.

  • Are they all local paths?

  • Yes, all local disk.

  • Compare the propaths between the batch clients on the old and new systems.  If they are different the client may be spending more time on the new box searching code directories instead of doing useful work.

  • In this test environment are there other clients connected apart from the batch client you mentioned?  

    One possibility that is consistent with symptoms of a slow application and no apparent system-level bottlenecks is record lock contention.

    Example:

    Client A obtains an exclusive lock on record 1 in table X.  Client B (your batch client) attempts to obtain a lock on that same record and can't; its request is queued.  Depending on how the batch client's code was written (e.g. whether it specifies NO-WAIT on the query), client B may block and do nothing until one of two things happens.  Either client A releases the lock and client B obtains it and continues processing, or client A retains the lock until client B's lock wait timeout expires (30 minutes by default).

    I think this is a pretty unlikely scenario.  If this was your issue you would expect to see similar contention in prod.  If anything, this problem would be worse in prod than in test due to (probably) greater user count and activity.  But it's a possibility.  A client in that state would show up under "blocked clients" in promon or ProTop.  You would also see record waits for that client in promon R&D 3 3 (lock requests by user).  If there is a lock wait timeout you would see an (8812) error in the client's client log and in the database log.

    Another possibility is that the client is blocking on a network I/O.  I have seen ABL client performance nosedive when it is attempting reads or writes on an unresponsive or unreliable NFS share (or disk).

  • Great idea to test.

  • Hi Russell,
     
    I see you are still chasing this ‘old chestnut’.

    Ø  Yes, most .p and .r's. 160000 calls in approximately a minute.

    Make sure you have –q parameter for the client(s).

    Something to consider would be to put .r into .pl files, possibly even memory mapped .pl files.

    /LL

  • Is it possible that there is a difference between startup parameters for the 2 runs – most notably the –q parameter?  Or that the PROPATH is different between the 2 runs?
     
    With so many statx() system calls, I would think that the  ABL’s PVM is validating .p or .r location more often.
     
     Oops - didn't see Libor's or Rob's posts.  I agree with them!
     
     
    From: Rob Fitzpatrick [mailto:bounce-robfsit@community.progress.com]
    Sent: Tuesday, December 09, 2014 8:53 PM
    To: TU.OE.RDBMS@community.progress.com
    Subject: RE: [Technical Users - OE RDBMS] Performace degradation between / /Power8
     
    Reply by Rob Fitzpatrick

    The high number of stat calls for non-existent files sounds like propath searching.  It could be normal for your application.  It would be helpful to compare those numbers against prod, or a prod-like environment.

    Can you see the file names in your trace?  If so, is it a lot of files with .p or .r extensions?

    Stop receiving emails on this subject.

    Flag this post as spam/abuse.

  • >BTW, unrelated to this post. A moderator should place a notice somewhere that to reply or post you have to join a group first. It took a half hour of messing with
    >adblock and noscript to realize it was actually a forum issue that I saw no reply buttons.

    Will look into it.

    -- peter

     
     
    From: RussellAdams [mailto:bounce-RussellAdams@community.progress.com]
    Sent: Tuesday, 09 December, 2014 17:42
    To: TU.OE.RDBMS@community.progress.com
    Subject: RE: [Technical Users - OE RDBMS] Performace degradation between / /Power8
     
    Reply by RussellAdams

    > Smells like NUMA.

    In what way? Progress ran fine on the prior system. There are more hardware threads on the POWER8 (like HyperThreading on Intel) but otherwise they are similar. Also our memory size and system size aren't very large.

    > How many CPUs in the physical box?  LPAR?  What exact model of p8?

    10 cores in the system. The LPAR in question has 4 dedicated cores. It's an 8284-22A.

    > What are your DB startup params?

    That I'd have to get from the DBA.

    Stop receiving emails on this subject.

    Flag this post as spam/abuse.

  • >>  check the queue_depth settings on all of your disks using lsattr -El <hdiskname>

    >Queue depths are not used. In fact, I'd argue we had a QD of 1 the entire time on the new systems. Does Progress do  

    >any concurrent read IO at all?

    When you say queue depths are not used do you mean iostat doesn't show any queue waits? queue_depth controls how many IOs can be requested concurrently for that logical disk.

    Sounds like you have checked all of the obvious stuff though, which wasn't really apparent from the OP :)

  • There is no stress, unusual latency, high disk busy, or queue saturation on the IO subsystem at all.

    The high statx calls are likely to have been there all along, so that's not new but a potential refinement.

    I'm looking forward to what OpenEdge says.