Urgent: Database Crashes

Posted by oedev on 21-Feb-2020 19:12

Customer has had the database crash 3 times this afternoon, the following errors are seen in the log file.

[2020/02/21@16:38:52.050+0000] P-5044 T-8168 I SRV 1: (7129) Usr 80 set name to ulverston.
[2020/02/21@16:44:05.755+0000] P-4060 T-8172 I SRV 6: (14676) bkioRead:Unknown O/S error during Read, errno 13, fd 1000, len 4096, offset 1997278, file E:\Customer\Database\data_7.d2 database e:\Customer\database\data. (9451)
[2020/02/21@16:44:05.755+0000] P-4060 T-8172 I SRV 6: (9446) SYSTEM ERROR: bkioRead: Bad file descriptor was used during Read, fd 1000, len 4096, offset 1997278, file E:\Customer\Database\data_7.d2.
[2020/02/21@16:44:05.756+0000] P-4060 T-8172 I SRV 6: (9445) SYSTEM ERROR: read wrong dbkey at offset 8180850688 in file E:\Customer\Database\data_7.d2
found 1524695, expected 145832864, retrying. area 7
[2020/02/21@16:44:06.768+0000] P-4060 T-8172 I SRV 6: (9446) SYSTEM ERROR: bkioRead: Bad file descriptor was used during Read, fd 1000, len 4096, offset 2386934, file E:\Customer\Database\data_7.d2.
[2020/02/21@16:44:06.768+0000] P-4060 T-8172 I SRV 6: (9445) SYSTEM ERROR: read wrong dbkey at offset 9776881664 in file E:\Customer\Database\data_7.d2
found 1269495, expected 158301856, retrying. area 7
[2020/02/21@16:44:15.669+0000] P-5340 T-3056 I BIW 11: (9450) bkioWrite:Insufficient disk space during write, fd 984, len 4096, offset 60928, file E:\Customer\Database\data.b1.

Followed by

2020/02/21@17:32:31.039+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.039+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.040+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.041+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.041+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.042+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.042+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.042+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.043+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.043+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.044+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.044+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.045+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.045+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.046+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.046+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.046+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.047+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.047+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.048+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.048+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.049+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.049+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.050+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.050+0000] P-4812 T-3848 I APW 13: (9450) bkioWrite:Insufficient disk space during write, fd 1072, len 4096, offset 79280, file E:\Customer\Database\data_11.d2.
[2020/02/21@17:32:31.050+0000] P-4812 T-3848 F APW 13: (3645) bkwrite: write to disk failed errno 0.

This crashes the database and the admin service.

Restarting the database brings it back up without issue.

Will log with TS when back in the office but after any immediate input on possible causes?

All Replies

Posted by ChUIMonster on 21-Feb-2020 19:34

The error messages are explicitly saying that you are out of disk space.  Do you have any reason to think that is untrue?

Errno 13 is "permission denied".

fd 1000 -- that is a pretty high number of file descriptors.  How many extents does this database have?

What version of OpenEdge?

This is apparently running on Windows?  If you have allowed a virus scanner to run the file locking can result in spurious out of space and permission denied messages.

Posted by gus bjorklund on 21-Feb-2020 20:20

also, what, if anything, has been changed on this system just before the trouble started?

Posted by oedev on 21-Feb-2020 22:42

Thanks for the responses.

- plenty of disk space

- not aware of any recent changes on system

- will get customer to confirm re virus checking

- 11.6 on windows

- did see some low memory errors on server

St file as follows, apologies for poor formatting

# AI Extents

a . f 2048000

a . f 2048000

a . f 2048000

a . f 2048000

a . f 2048000

a . f 2048000

a . f 2048000

a . f 2048000

a . f 2048000

# BI Logfile

b . f 512000

b .

#

d "Schema Area":6,32;1 . f 512000

d "Schema Area":6,32;1 .

#

# Default Data Area 32 Records Per Block

#

d "Default32RPB":7,32;8 . f 10240000

d "Default32RPB":7,32;8 . f 10240000

d "Default32RPB":7,32;8 .

#

# Order and Order line 16 Records Per Block

#

d "Order":8,16;8 . f 4096000

d "Order":8,16;8 .

#

# Audit Detail and Process Log Detail Tables 64 Records Per Block

#

d "Logfiles":9,64;8 . f 4096000

d "Logfiles":9,64;8 .

#

# Large Record Area for generally large XML docs and as such this extent is 1 Record Per Block.  Trading Audit Data and RETAIL_RECEIPT_DETAIL

#

d "Large1RPB":10,1;8 . f 10240000

d "Large1RPB":10,1;8 . f 10240000

d "Large1RPB":10,1;8 .

#

# All indexes 1 Record Per Block as per Progress Best Practice

#

d "Index Area":11,1;8 . f 10240000

d "Index Area":11,1;8 .

Posted by oedev on 24-Feb-2020 09:14

Still investigating this issue, the system was fine over the week-end after a server reboot.

Just before the crash memory use was 14gb/16gb on the server, usually 3/4gb in use.

We deployed a new software library to the server on Wednesday which is referenced by some Webspeed agents. However, the agents were not re-started. Not sure this could be the cause? We've not re-stated the Webspeed broker as a precaution

This thread is closed