DB corruption on SLES12 with XFS - Forum - OpenEdge RDBMS - Progress Community

DB corruption on SLES12 with XFS

 Forum

DB corruption on SLES12 with XFS

This question is answered

Hello :)

A customer has a DB on a VM (VMWare) since a few month and is getting corrupted data. OE 10.2B08, Suse Linux Enterprise SP 02, XFS files system for data partition.

We made a D/L binary, and before we could do that, we needed to rebuild indexes on meta schema (_File, _Index...).

Short time later new errors showed up in DB log file. We set -DbCheck and -MemCheck, but we got more bad blocks.
A look into Linux error protocols showed nothing.

We see the following areas which may have a problem:
- XFS as file system, although other end users have this
- SAN
- Memory/CPU (hardware is about 5 years old)

We plan do do a text D/L to make sure everything is fresh. And we go to ext3 which was the formerly file system of the installation.

Questions:
- Any concerns about XFS?
- Ever heard about damaged meta schema or damaged empty DB?
- Is it possible that a binary dump is defekt? Like a backup with probkup, which may backup defect blocks.
- Any other idea?

kind regards - Klaus

Note: The error message from log file will follow soon.

Verified Answer
  • The customer change the hardware on which the VM runs. But errors still occured.

    Then he instantiated the old hardware as a VM, with old Linux version and OE version.

    That worked.

    So it must be a combination of the while software stack...

  • The customer changed the hardware, but errors still occur.

    He is not doing any of the bad things mentioned above :)

    The he made a copy of the old hardware as VM. And this finally runs.

    So it must be a combination of the software stack of the new server...

    Thanks to all

    Klaus

  • The customer changed the hardware, but errors still occur.

    He is not doing any of the bad things mentioned above :)

    The he made a copy of the old hardware as VM. And this finally runs.

    So it must be a combination of the software stack of the new server.

    Thanks to all

    Klaus

All Replies
  • I am sorry, text is in German, but message numbers are given :)

    [2017/03/27@14:22:06.929+0200] P-5038       T--147446016 I ABL    26: (1422)  SYSTEM ERROR: Index po-nummer in artkusta für recid 41520842 konnte nicht gelöscht werden. 

    The following repeat with a few blocks in the message (2 blocks, 107280832, 48955840).
    5 or 6 elements are affected (like 184 in this example).

    [2017/03/27@15:10:25.802+0200] P-2275 T--147462400 I ABL 17: (4430) SYSTEM ERROR: Index 49, Block 107280832, Element-Nr. 184: Falsche Informationsgröße in einem Leaf Block.
    [2017/03/27@15:10:25.811+0200] P-2275 T--147462400 I ABL 17: (2816) vorherige Größe = 18, cs = 6, ks = 1, is = 191, Schlüsselanzahl = 184.
    [2017/03/27@15:10:25.821+0200] P-2275 T--147462400 I ABL 17: (14037) Fehlerdaten der Blockvalidierung für Index 49: nment ist 455, nlength ist 4117, level ist 1, aktueller Schlüssel ist 184, Offset ist 1673, func ist cxDoInsert
    [2017/03/27@15:10:25.821+0200] P-2275 T--147462400 I ABL 17: (14031) Ungültiger Indexblock gefunden
    ...
    [2017/03/27@15:10:25.832+0200] P-2275       T--147462400 F ABL    17: (14036) SYSTEM ERROR: Ungültiger Indexblock FATAL 

  • What sort of SAN?  Does the customer do things with snapshots?

    Does the customer use VMotion on this VM?

    Doing any of the above without having a quiet point properly enabled seems like the most likely sources of corruption to me.

    --
    Tom Bascom
    tom@wss.com

  • > text is in German, but message numbers are given :)

    Translation:

    [2017/03/27@14:22:06.929+0200] P-5038 T--147446016 I ABL 26: (1422) SYSTEM ERROR: Index po-nummer in artkusta for recid 41520842 could not be deleted.
    
    [2017/03/27@15:10:25.802+0200] P-2275 T--147462400 I ABL 17: (4430) SYSTEM ERROR: Index 49, block 107280832, element no. 184: bad info size in a leaf block.
    [2017/03/27@15:10:25.811+0200] P-2275 T--147462400 I ABL 17: (2816) prev size = 18, cs = 6, ks = 1, is = 191, key count = 184.
    [2017/03/27@15:10:25.821+0200] P-2275 T--147462400 I ABL 17: (14037) Index 49 block validation error data: nment is 455, nlength is 4117, level is 1, current key is 184, offset is 1673, func is cxDoInsert
    [2017/03/27@15:10:25.821+0200] P-2275 T--147462400 I ABL 17: (14031) Invalid Index Block Detected
    ...
    [2017/03/27@15:10:25.832+0200] P-2275 T--147462400 F ABL 17: (14036) SYSTEM ERROR: Invalid Index Block FATAL
    

    Can you dump the index block with dbkey 107280832?

  • also doing

    - backups with third-party or system backup tools while the database is in use, or

    - skipping crash recovery with the -F option

    can cause these sorts of errors

  • Klaus, did you get to the bottom of this problem?

  • The customer change the hardware on which the VM runs. But errors still occured.

    Then he instantiated the old hardware as a VM, with old Linux version and OE version.

    That worked.

    So it must be a combination of the while software stack...

  • The customer changed the hardware, but errors still occur.

    He is not doing any of the bad things mentioned above :)

    The he made a copy of the old hardware as VM. And this finally runs.

    So it must be a combination of the software stack of the new server...

    Thanks to all

    Klaus

  • The customer changed the hardware, but errors still occur.

    He is not doing any of the bad things mentioned above :)

    The he made a copy of the old hardware as VM. And this finally runs.

    So it must be a combination of the software stack of the new server.

    Thanks to all

    Klaus

  • THanks Klaus. Appreciate the quick response.