Database backups crashing on OE 11.6.3 - Forum - OpenEdge RDBMS - Progress Community

Database backups crashing on OE 11.6.3

 Forum

Database backups crashing on OE 11.6.3

This question is not answered

Our database administrator tells me that our database crashes whenever we do backups.  This began within the very first week of installing OE 11.6.3 (we upgraded to 11.6.3 from 11.3). 

We are run HP-UX IA64 In this environment that is crashing and also use a feature in OE called the "Alternate Buffer Pool".

Has anyone else had these issues?  Apparently there is a KB as well: http://knowledgebase.progress.com/articles/Article/database-B2-crashes-with-1040-000078422

We didn't know about this problem in the planning phase when we were making preparations for the upgrade to 11.6.3.  Prior to the planning, I had already done some of my own preliminary testing of OE 11.6.3 on a personal database environment and had experienced no problems, but I was NOT using an "Alternate Buffer Pool" (B2).

Given this experience with 11.6.3, I think Progress should be warning all its customers to avoid this service pack and disallow access to it in ESD.  Has there been any communication along these lines?  If they do allow customers to move forward, it leads to a lot of trouble.   I am having a hard time making sense of it all;  but at this point I'm thinking that the B2 option is extremely uncommon among OE RDBMS customers... (or that OE customers may view backups as an unnecessary luxury.)

Either way we must get a better service pack (ie. 11.6.4.)  As it stands, this final service pack for 11.6 will leave customers in a position where they have to choose between using B2 and doing backups.  It doesn't make sense.  Nor does it make sense that every single customer that moves up to 11.6.3 should also have to request a private hotfix in addition to the service pack.  Maybe my experiences with the service packs and the hotfixes from other vendors do not compare to the way Progress creates these things.  

Is it unreasonable to think that 11.6.4 is needed?  After all, 11.6 is the most recent version of OE, and 11.6.3 is probably causing more trouble than it fixes.   How can Progress rule out a new service pack on the current version of OE?

 

All Replies
  • Have you tried it with the -Bp parameter as explained in the KB? In all honesty you should be using this parameter anyway for the backups as otherwise your -B is wiped out every time you back up.

  • It seems to me that if the "-B2" option is an uncommon thing for OE customers to use (or for Progress to test when releasing service packs) then this "-Bp" is probably even more uncommon.  We'd like to be using the product in a mainstream way so that we don't bump into problems before everyone else.

    It was very surprising that we ran into this, and ran into it as quickly as we did.  OE 11.6.3 has been available since September but perhaps there aren't many who have tried using it yet.  (And/or they aren't using the B2 option).

    Is "-Bp" something that most OE dba's should be familiar with?  It appears to be a client session parameter, and I'm guessing that most dba's overlook those when performing database administration tasks like serving databases, or performing backups.  (In order to use the product in a more mainstream fashion, I would just as soon stop using "B2" and just use "B" ... at least until the 11.6.4 service pack).

  • Not sure whether I'd say a dba should be aware of -Bp, but it's something I've become aware of in the last couple of years. The whole point of -B is that it's meant to be a cache of what people are using frequently. Anything that comes along and wipes that out for administration purposes is not a welcome addition as it means the buffer is no longer what people actually want. I generally only use it on the backups because they happen every night, or even more frequently, and this can have a negative impact on users.

    It is something that is probably considered 'Best practise' nowadays.

  • Reading about "-Bp", it seems to say that the buffers are stolen from the public (-B) buffers anyway, so either way the public buffers are being consumed by the backup.  

    Hopefully a backup isn't so greedy as to "wipe out" all available buffers.  It would seem foolish considering most of the data is used on a one-time basis and won't be accessed again.

    Since we are talking about this "BP" client parameter that is new to me, I have a related question.  In the past I had wondered if there was a way to flush clean buffers in order to test the performance of reads from the I/O disk system.  

    (see community.progress.com/.../86187

    Would the Bp option serve the purpose of effectively flushing my buffers?  In other words, whenever I restart an ABL client with private buffers, will all the initial reads to to the disk before it starts using the data cached in buffers?  Nobody had suggested this as an option when we were talking about flushing buffers, and the best ideas were either to restart the entire database or perform a long-running dbanlysis to flush out the shared buffers (maybe in a similar way as what you say happens during a backup).

  • Yes I would agree wholeheartedly here - -Bp should be used when using any of the online tools.  We have implemented B2 and seen some tremendous performance benefits so this is a pretty serious bug in my opinion.

  • Re: -Bp "stealing" buffers... I think you are thinking about it backwards -- if a block is already in -B then it will not count against -Bp.  If a block referenced by a process using -Bp is NOT in -Bp then rather than replace a "public" buffer with that block one of the -Bp buffers will be used.  Thus "public" buffers do not get flushed by processes doing sequential and non-repeating access (like backups).

    So, no, -Bp is not a sneaky way to flush -B.  It is quite the opposite.  A way to *avoid* flushing -Bp.

    --
    Tom Bascom
    tom@wss.com

  • An online probkup will copy each block into memory - thus it will fill your buffer pool - this is why -Bp is advised.  Note, this will keep your buffer pool relatively clear but the file system cache will be impacted.

    Sorry, I don't understand the second question so cannot comment.

  • Thanks Tom,

    But if my ABL client session is reading data on a testing environment that *nobody* else is interested in, then the private (-Bp) buffers will be used for storing that data, even in preference to using free "public" (-B) buffers. ... And the next time the ABL code is started in a brand new client process, it will go back to disk again... Right?

  • For Private Read-Only buffers, take a look at:

    000022329 - How Progress uses Buffers and when to use Private Buffers (-Bp) ?

    knowledgebase.progress.com/.../P95829

    And specifically for online probkups:

    000021080 - When should the -Bp parameter be used with a Progress Online Probkup?

    knowledgebase.progress.com/.../P49128

    The initial question raised:

    When your database crashes whenever online PROBKUP runs, is it specifically crashing with error (1040) SYSTEM ERROR: Not enough database buffers (-B)

    000078422 - A database running with -B2 crashes with error (1040)

    knowledgebase.progress.com/.../database-B2-crashes-with-1040-000078422

    Does not only affect only PROBKUPS.

    It is specifically only when -B2 has Object Level assignments (not at the area level)

    The workaround for PROBKUP and for example a dbanalys when the database is started with -B2 and objects have been assigned, is to use Private Buffers (-Bp) which avoids this problem.

    If your online PROBKUP is crashing for any other reason, you're not hitting this issue.

  • If your first session accesses more data than fits into whatever you defined for -Bp then the excess will result in buffers being evicted from -Bp.  If nobody else referenced then then they are no longer in memory.

    If later on a 2nd session wants to reference some data that your first session referenced and that data is no longer in memory then it will need to be re-read from disk.

    It doesn't "go back to disk" -- it is simply removed from the list of blocks that are available in memory.  (Unless you modified some data -- but this discussion is all about reads....)

    Offhand I'm not entirely sure what happens if -B is very large and underutilized compared to -Bp.  I suppose that Progress *might* decide that if a block is being "evicted" from -Bp *and* there are unused -B blocks then it could kept around in memory just in case.  It does not seem to me like there would be any harm to that -- except that it would have to be coded and every bit of coding means potential for bugs and unanticipated overhead.  If it were me I probably wouldn't do it.  It is far too speculative and the whole point of -Bp is that you are saying "I don't think anyone else will care about this data".

    --
    Tom Bascom
    tom@wss.com

  • Buffers associated with a user connection's -Bp are indeed re-associated with the -B when the user disconnects from the database.  This will not cause paging of the -B since the buffers are already allocated.  The buffer is just removed from one LRU chain and added to the another.

  • So I'm hearing that -Bp doesn't allow me to reset/flush my buffers after a client disconnects.  The buffers stay in memory.  (Nor does this help with the ultimate goal of testing the performance of reads when they are going all the way back to disk.).

    It might be that we have some pretty slow I/O hardware.  But I've often noticed that some ABL code, like certain reports, will execute *extremely* slow if run once in a day, and will be fast on subsequent executions (every ten minutes). There should be a way to troubleshoot and optimize the *first* execution of the report, and in that way isolate the I/O hardware issues that lie beneath the database.  Today the only good way to troubleshoot the *first* execution is to stop and restart the entire OE database.  

    In the Windows world we use an I/O tool called "sqlio" for isolating and troubleshooting hardware. And insofar as SQL server itself is concerned, we can use DBCC DROPCLEANBUFFERS whenever we need our database reads to go all the way back to the disk.

    I am still absolutely convinced that there is a secret command for easily flushing out the OE buffer pools (ie. removing clean buffers from the "LRU chains" for -Bp and -B).  I *really* wish someone would tell us what it is.  We promise to use it in development and not in production.

    Insofar as my initial question goes, are there any thoughts on the likelihood of a new service pack (11.6.4) given the -B2 issues?  I'm hearing that -B2 is a fairly popular feature (or at least it was before it broke).  And I think we were specifically advised to use it with "object level assignments" that reference the specific tables that would benefit the most.  Can someone tell me how many severe bugs need to be found before an OE service pack is made available?  I'd rather stay on 11.6.3 for testing and wait to do a production upgrade after 11.6.4 is available.  

  • The secret command is proshut .... :)

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice  http://www.cintegrity.com

  • Flushing the buffer pool is a fairly common question in DBA forums for OpenEdge.  I have never seen a command that would do it, the normal response is to either restart the db or run a backup or table analysis - without using the -Bp option.  DBCC DROPCLEANBUFFERS will flush the database memory blocks, but not the file system cache, similar to if the database was stopped and restarted.

    If there is a report that runs slowly the first time and quickly after, then that usually indicates the subsequent runs are using data in the buffer pools.  You can track the physical vs logical reads a report makes.  You could also try removing the "-q" client side parameter so the code in the report will be loaded into memory for each run, otherwise the code stays in memory and that would also make future reports run faster.

    I can see the case for creating a 11.6.4.  Most companies don't bother requesting the latest hotfix when they upgrade.  They would upgrade to 11.6.3 and then be at risk for the db crashing during a backup or dbanalys.  

  • > Today the only good way to troubleshoot the *first* execution is to stop and restart the entire OE database.

    You can start the database with low value of the -B/-B2. It will immitate the unused buffer pool. But it's still not enough for the fair tests. You need to empty the filesystem cache as well. The only accurate way to troubleshoot the first execution is to reboot the whole system.