Performace degradation between / /Power8 - Forum - OpenEdge RDBMS - Progress Community

Performace degradation between / /Power8

 Forum

Performace degradation between / /Power8

  • We are in the process of migrating from a power 7 LPAR, Aix 7.1 SP1, OpenEdge 11.3 SP2  ->  Power 8 LPAR, Aix 7.1 SP3,  OpenEdge 11.3 SP3..

    We are struggling very much with overall performance. We have a batch (night) job that is taking 4 hours on the old system that is somewhere between 12 and 16 hours on the new.

    The new system has a disk-layout optimized for concurrency and has done 100k IOPS with ndisk64 testing (multithreaded), blowing the old system out of the water. The main difference beeing multiLUN/Paths/Filesystems  - the new having more of all.

    Single-treaded (single inflight IOs) ndisk64 testing has similar latency/IOPs counts between the platforms.

    The new system does not seem to be starving on CPU or memory. (It has more of both than the old)

    Is there any hidden OpenEdge best practices for AIX somewhere?

    Tips and tricks greatly appreciated!

  • Smells like NUMA.

    How many CPUs in the physical box?  LPAR?  What exact model of p8?

    What are your DB startup params?

    Paul Koufalis
    White Star Software

    pk@wss.com
    @oeDBA (https://twitter.com/oeDBA)

    ProTop: The #1 Free OpenEdge DB Monitoring Tool
    http://protop.wss.com
  • It is very easy to get misleading results with ndisk (and other tools like that). For example: reading small files over and over again will most likely read from the AIX or SAN buffer cache instead of actually hitting the disk. This makes the hardware seem much faster than it really is.

    There aren't really any magic switches to flip between AIX7 and AIX8.

    A few things to check....

    0) Does your new LPAR span NUMA zones? This can be a huge problem.

    1) Check your Progress startup parameters and DB settings to make sure they are the same.

    2) Compare the output of the following commands on both AIX boxes.. looking for differences

    ioo -a            (io related parameters)

    vmo -a          (memory related parameters)

    lsps -a          (paging space)

    3)  use nmon and iostat to compare what is going on with memory,cpu and disk during your tests (on both systems)

    4) use promon to compare what is happening at a database level (on both systems)

    5) check the queue_depth settings on all of your disks using lsattr -El <hdiskname>

    How big is your database and each AIX LPAR?

    Is this batch job a single Progress session or a collection of processes?

  • Don't forget the all-time classic false performance problem when comparing "old" to "new".  Especially if the apparent problem is confined to a special something.  Like a particular batch job.  Sometimes what is happening is just that the new system has a cold cache.  The numbers that you have from the old system are almost certainly with a hot cache.  It seems silly but I've seen people ready to toss out some very expensive and quite capable hardware because they overlooked that.

    Of course it might be other issues (NUMA is a glaring possibility) but it never hurts to double check the obvious.


    On 12/9/14, 4:21 PM, TheMadDBA wrote:
    Reply by TheMadDBA

    It is very easy to get misleading results with ndisk (and other tools like that). For example: reading small files over and over again will most likely read from the AIX or SAN buffer cache instead of actually hitting the disk. This makes the hardware seem much faster than it really is.

    There aren't really any magic switches to flip between AIX7 and AIX8.

    A few things to check....

    0) Does your new LPAR span NUMA zones? This can be a huge problem.

    1) Check your Progress startup parameters and DB settings to make sure they are the same.

    2) Compare the output of the following commands on both AIX boxes.. looking for differences

    ioo -a            (io related parameters)

    vmo -a          (memory related parameters)

    lsps -a          (paging space)

    3)  use nmon and iostat to compare what is going on with memory,cpu and disk during your tests (on both systems)

    4) use promon to compare what is happening at a database level (on both systems)

    5) check the queue_depth settings on all of your disks using lsattr -El <hdiskname>

    How big is your database and each AIX LPAR?

    Is this batch job a single Progress session or a collection of processes?

    Stop receiving emails on this subject.

    Flag this post as spam/abuse.



    -- 
    Tom Bascom
    603 396 4886
    tom@greenfieldtech.com

    --
    Tom Bascom
    tom@wss.com

  • > Smells like NUMA.

    In what way? Progress ran fine on the prior system. There are more hardware threads on the POWER8 (like HyperThreading on Intel) but otherwise they are similar. Also our memory size and system size aren't very large.

    > How many CPUs in the physical box?  LPAR?  What exact model of p8?

    10 cores in the system. The LPAR in question has 4 dedicated cores. It's an 8284-22A.

    > What are your DB startup params?

    That I'd have to get from the DBA.

    BTW, unrelated to this post. A moderator should place a notice somewhere that to reply or post you have to join a group first. It took a half hour of messing with adblock and noscript to realize it was actually a forum issue that I saw no reply buttons.

  • > ndisk misleading results

    I agree completely. We understand that benchmarking tools aren't perfect, however with a 50G file and no filesystem cache we get excellent performance using direct IO and CIO. I would expect Progress to accomplish the same.

    > There aren't really any magic switches to flip between AIX7 and AIX8.

    I agree there aren't any large changes between POWER7 and POWER8, with AIX7. None of my other customers with competing database software have any issues.

    > Does your new LPAR span NUMA zones? This can be a huge problem.

    I've worked with AIX since RS/6000 and I've never had to investigate NUMA. Where do you see this, and is it documented for the POWER platform?

    > Check your Progress startup parameters and DB settings to make sure they are the same.

    I understood them to be identical. We also tried with -direct and additional APWs. Still slow.

    > Compare the output of the following commands on both AIX boxes.. looking for differences

    I'll go one better. Not only did we manually tune, and then try defaults, but we had IBM do a system trace down to the point where the IO is dispatched by the physical HBA. We were sub-millisecond in latency in every layer and IBM's layer 3 kernel team found no problematic configuration in the IO stack.

    The conclusion is the application just isn't trying.

    > use nmon and iostat to compare what is going on with memory,cpu and disk during your tests (on both systems)

    Absolutely. We did. Existing production performs over 3000 IOPS during the job run, new systems do under 1000 IOPS. CPU is modest on existing production, and little to none on the new. We initially thought it was waiting on IO, now I'm not sure it wasn't just sitting there idling inside Progress.

    > use promon to compare what is happening at a database level (on both systems)

    I'd love to. I'll ask the DBA.

    >  check the queue_depth settings on all of your disks using lsattr -El <hdiskname>

    Queue depths are not used. In fact, I'd argue we had a QD of 1 the entire time on the new systems. Does Progress do any concurrent read IO at all?

    > How big is your database and each AIX LPAR?

    DB is approximately 500GB, 4 cores and 100GB RAM in AIX.

    SAN is high end EMC with 10 LUNs with interdisk policy striping. Each DB on a dedicated filesystem for DB and logs. Similar configurations at other customers with different DB software excels at IO.

    > Is this batch job a single Progress session or a collection of processes?

    I believe this job is a serial series of batch jobs, one at a time. I think it's a nightly close and reporting run that executes sequentially.

  • > Sometimes what is happening is just that the new system has a cold cache

    For 4-12 hours on a system with 4 x 16GB HBAs? I agree the cache can be cold, but I could read the whole 500GB into RAM in a fraction of that time.

    > NUMA is a glaring possibility

    I'd love to know more about NUMA issues in relation to Progress, especially as it applies to the POWER platform. Can you elaborate?

  • Numa and progress are like oil and water. My understanding of the architecture and why is hazy so I'll leave that to someone else to explain, but I've seen evidence in practise.
    I believe one thing you can try is to disable some of the CPUs and rerun your benchmark and see if it improves. Disabling the processors forces numa to deactivate.

    James Palmer | Application Developer
    Tel: 01253 785103

    From: RussellAdams
    Sent: ‎09/‎12/‎2014 22:53
    To: TU.OE.RDBMS@community.progress.com
    Subject: RE: [Technical Users - OE RDBMS] Performace degradation between / /Power8

    Reply by RussellAdams

    > Sometimes what is happening is just that the new system has a cold cache

    For 4-12 hours on a system with 4 x 16GB HBAs? I agree the cache can be cold, but I could read the whole 500GB into RAM in a fraction of that time.

    > NUMA is a glaring possibility

    I'd love to know more about NUMA issues in relation to Progress, especially as it applies to the POWER platform. Can you elaborate?

    Stop receiving emails on this subject.

    Flag this post as spam/abuse.




    This email has been scanned for email related threats and delivered safely by Mimecast.
    For more information please visit http://www.mimecast.com
  • NUMA is a good thought, but I don't believe that applies.

    IBM's description of process affinity and memory location (local, near, far) are here:

    www.ibm.com/.../local_near_far_memory_part_2_virtual_machine_cpu_memory_lay_out3

    Our system shows only one memory and CPU domain, like so:

    lssrad

    REF1   SRAD        MEM      CPU

    0

                    0   58409.62      0-15

    I think I can say NUMA isn't an issue.

  • If there are truly only 4 cores (out of a total of 10) and they are
    truly dedicated then it probably isn't a NUMA problem. Although it
    could still be a virtualization issue. What is the CPU entitlement?

    But 10 is a very strange total number to have.

    Usually these things come in powers of 2.

    LPAR configuration can make a big difference. The defaults for AIX are
    not friendly to databases. By default AIX makes everything dynamic and
    spreads your CPU over as many cores as it can. You have to go way out
    of your way to change that. Databases use large shared memory caches
    that must be coordinated. That is generally done with mutex locks (aka
    "latches") and the process of doing that requires CPU caches to be
    synchronized. That is *much* more efficient when the cores are
    dedicated and share the same silicon. Which is the exact opposite of
    the defaults on NUMA servers (and almost all large servers are NUMA
    these days) and in virtualized environments.

    A couple of simple commands that might shed some light: "lparstat -i"
    and "lssrad -va"

    --
    Tom Bascom
    tom@wss.com

  • - We know it's not NUMA (single-processor with 10 cores per www-01.ibm.com/.../ssialias).

    - We know it's not disk I/O (your disks are doing nothing)

    - If it was a memory/swapping issue I'm certain you would have seen it, so let's rule that out.

    - You say the server is not CPU starved but is the batch process single threaded?  Even at that, the new cores should be faster than the old cores.

    - What's left?  Kernel calls?  Weird nice levels?  New Progress issue with p8?

    I had a very similar issue going from p6 to p7 and it turned out to be a UNIX SILENT chmod that was run half a million times.  I'm not saying this is your issue, but it's definitely time to think outside the box.

    1. You said you think it's a whole string of jobs run one-after-the-other.  Find out how long EACH one takes on the old box vs. the new box.  That way we can see if it's a generalized issue or one particular job that is misbehaving.

    2. Get some DB stats.  Download protop (dbappraise.com/protop.html) and use it to see what's going on.  ProTop is much more information-dense than promon. Are these read-intensive or write-intensive batch jobs?

    3. Triple-check the DB startup parameters.  This could very well be an "oops!" moment.  Don't forget BI block size and cluster size.

    4. Truss the processes and see if they are doing anything interesting at the kernel level.  IBM has a post-truss cruncher that chews up the output and spits out a nice report.  That's how we saw the UNIX SILENT issue: abnormally high fork()'s .

    5. Are you running OpenEdge Replication too?

    6. Did you dump and load going from the old box to the new?  Or make any changes to the DB like storage area stuff?

    Paul Koufalis
    White Star Software

    pk@wss.com
    @oeDBA (https://twitter.com/oeDBA)

    ProTop: The #1 Free OpenEdge DB Monitoring Tool
    http://protop.wss.com
  • I'd also run this batch client with -yx.  At the end of the batch run, look at proc.mon to see which procedures register the most execution time.  If there is an outlying value that could lead you to a code-related issue like the one Paul described with the repeated chmods.

    Also, double-check the other client startup parameters and see if something obvious is missing.

    • Where does the batch client run?  Is it self-service or shared-memory?
    • Where does the code reside relative to the client?
    • Is it r-code or procedure libraries?  Or compile on-the-fly?
    • Is the client using -q?
    • Where do the client's temp files reside (-T), and what is their size and I/O during execution?
    • How many databases does the client connect to?

    It would be helpful to know all of the client and broker startup parameters, including those in parameter files.

  • ChUIMonster
    If there are truly only 4 cores (out of a total of 10) and they are
    truly dedicated then it probably isn't a NUMA problem. Although it
    could still be a virtualization issue. What is the CPU entitlement?

    But 10 is a very strange total number to have.

    Usually these things come in powers of 2.

    10 CPU, 256GB RAM.


    ChUIMonster
    LPAR configuration can make a big difference. The defaults for AIX are
    not friendly to databases. By default AIX makes everything dynamic and
    spreads your CPU over as many cores as it can. You have to go way out
    of your way to change that. Databases use large shared memory caches
    that must be coordinated. That is generally done with mutex locks (aka
    "latches") and the process of doing that requires CPU caches to be
    synchronized. That is *much* more efficient when the cores are
    dedicated and share the same silicon. Which is the exact opposite of
    the defaults on NUMA servers (and almost all large servers are NUMA
    these days) and in virtualized environments.

    A couple of simple commands that might shed some light: "lparstat -i"
    and "lssrad -va"

    I already posted lssrad, and it looks like one piece. The LPAR is a 4 core dedicated LPAR, we aren't using shared procs (yes, i know it's still shared under the hood).

    Processor folding was introduced in late AIX 5.3 to compensate for POWER's VCPU dilution problem. We have all cores folded until the first core exceeds a busy threshold, and that's set appropriately.

    We are seeing really very low CPU utilization (ie: <10%), and reading the spin lock documentation I take it if we were waiting in spin I would see higher CPU?

  • Paul Koufalis

    I had a very similar issue going from p6 to p7 and it turned out to be
    a UNIX SILENT chmod that was run half a million times.  I'm not saying
    this is your issue, but it's definitely time to think outside the box.



    Was this a cron job making changes? That would show up as high disk
    busy due to the IO in the inode table.

    Paul Koufalis

    1. You said you think it's a whole string of jobs run
       one-after-the-other.  Find out how long EACH one takes on the old
       box vs. the new box.  That way we can see if it's a generalized
       issue or one particular job that is misbehaving.

    2. Get some DB stats.  Download protop (dbappraise.com/protop.html)
       and use it to see what's going on.  ProTop is much more
       information-dense than promon. Are these read-intensive or
       write-intensive batch jobs?

    3. Triple-check the DB startup parameters.  This could very well be an
       "oops!" moment.  Don't forget BI block size and cluster size.


    I'll send that to our DBA to check.

    Paul Koufalis

    4. Truss the processes and see if they are doing anything interesting
       at the kernel level.  IBM has a post-truss cruncher that chews up
       the output and spits out a nice report.  That's how we saw the UNIX
       SILENT issue: abnormally high fork()'s .


    I intend to. The IBM team has trussed our ndisk, but not the main app.

    Paul Koufalis

    5. Are you running OpenEdge Replication too?


    No.

    Paul Koufalis

    6. Did you dump and load going from the old box to the new?  Or make
       any changes to the DB like storage area stuff?


    We reorganized to add many more disks. We have much more IO capacity
    now, and dedicated filesystems per DB where they were shared before.

  • Yes it was a cron job running MRP via some _progres $DB -p batch.p etc etc... We trussed the _progres and saw the high fork()'s .  We  did NOT see abnormally high disk I/O in nmon or iostat.  Or if you prefer, the disk I/O we saw seemed consistent with the job.

    The next step is really to get some DB and application stats as suggested by ChUI and Rob.

    Last point: you're likely one of the first to migrate to the P8 (it's only been out for a few months) so this could be a real Progress issue.  I'm having a hard time believing this even as I write it, but it's possible.

    Paul Koufalis
    White Star Software

    pk@wss.com
    @oeDBA (https://twitter.com/oeDBA)

    ProTop: The #1 Free OpenEdge DB Monitoring Tool
    http://protop.wss.com