Just an FYI for people out there running 11.6.3, I found a bug that TS says is new in 11.6.3 and is also in 11.7.0.
I found it when running a promon script but I am also able to reproduce it running interactively. It causes promon to crash with error 49 (memory violation). The promon session holds a lock on the MTX latch when it crashes, so the database shuts down abnormally.
I haven't narrowed down exactly the minimum steps for recreating the crash but I have a script that does it reliably. The content of the script (with annotations) is below.
m Modify defaults1 page size9999q1 User control1 all usersq4 Record locking table1 all usersq5 Activityq6 Shared resourcesq7 Database statusqR&Ddebghb5 Adjust monitor options1 Display page length99996 Number of auto repeats3t Main menu1 Status displays4 Processes/clients2 Blocked clientsp3 Active transactionspp9 BI logp10 AI logp12 Startup parametersp13 Shared resourcesp14 Shared memory segmentsp17 Servers by brokert Main menu2 Activity displays3 Buffer cachep5 BI logp6 AI logp10 Space allocationp13 Othert Main menu3 Other displays1 Performance indicatorsp4 Checkpointst Main menu6 Hidden menu8 Resource queuesp11 Latch countsx Exit
You would have to remove the annotations to have a functioning script for promon stdin. I showed them here for clarity.
Workaround: if "debghb" is moved from its location above to the latest point possible, i.e. between "t" and "6", six lines from the bottom, then promon does not crash and the DB does not shut down. I hope that makes sense.
For those who are interested, I will post an update here when the 11.6.3 hotfix is available.
Is it the same issue as the defect # PSC00356177?
community.progress.com/.../30418
George P reported something that sounds similar on Solaris 64-bit. What OS did you see this on ?
George: it sounds similar, but TS said this issue is not in 11.6.2 and prior. I'll try to repro in Linux 11.6.3.
CJ: Sorry, I should have given the platform. I encountered this in 64-bit OE 11.6.3 on Linux x64 but I have also reproduced it in 64-bit 11.6.3 on Windows 7.
Update:
I can reproduce the error 49 following George's steps from the other thread. So this might be related.
I'll try my steps again without R&D 1 14 and see if that changes anything.
> The promon session holds a lock on the MTX latch when it crashes
Are you getting the error 5028 for latchId 1 (MTX) or 2 (USR)?
Update 2:
My script above works when R&D 1 14 is removed. So it does indeed look like this is the same bug George reported, though it is cross-platform.
> Are you getting the error 5028 for latchId 1 (MTX) or 2 (USR)?
The (5028) error was: SYSTEM ERROR: Releasing regular latch. latchId: 2
I'm confused. I thought MTX was 2 and USR was 3.
for each _latch no-lock:
display _latch-id _latch-name.
end.
1 0
2 MTL_MTX
3 MTL_USR
4 MTL_OM
5 MTL_BIB
...
_Latch._Latch-Id = real LatchID + 1
Common Progress rule: "plus or minus one" does not matter. ;-)
MTX latch was the first and the only latch in V5 Progress db and it was called MT lock.
> Common Progress rule: "plus or minus one" does not matter. ;-)
Well, I learned something new so today's a good day. :)
> _Latch._Latch-Id = real LatchID + 1
I believe you, but this is non-obvious. When the 5028 says "latchid" I expect it to mean "_latch-id".
I'm aware of such cases in other tables, like _connect-id = _connect-usr + 1. It seems like _Latch is missing a field like "_latch-num" to hold the "real" number that shows up in the db log. And the 5028 message should be reworded.
> And the 5028 message should be reworded.
And what about the 5029? ;-)
(5029) SYSTEM ERROR: Releasing multiplexed latch. latchId: 1489504328
It's the BHT latch, by the way. :-)
> 1 0
> 2 MTL_MTX
BTW, Progress does use the memory for the nameless latchId 0 though it's not a real latch.
The issue is that when the super secret "debghb" setting is on in promon, examining "14. Shared memory segments" has the adverse side effect of zeroing a pointer it should not be zeroing.
The next reference to this pointer will cause promon to crash.,
Depending on when the pointer is accessed, promon may be holding a resource that can cause a crash. This is seen when disconnection but could happen sooner than that based on activity performed.
And yes George, I believe it is the same issue and is available in HotFix 11.6.3.017
Thanks Rich and George. That confirms how I should edit my script for safety until I have the fix.
> On Apr 19, 2017, at 4:12 PM, George Potemkin wrote:
>
> MTX latch was a first latch in Progress db and it was called db lock.
wrong. sorry george. you get a red card. :)
the first release to have shared memory was 5.2A. In that release, there were two memory locks: the DB lock and the MTX lock.
The MTX lock served a purpose similar to what it does today although the implementation was quite different. For various reasons, /all/ database writes were performed while holding the MTX lock.
The DB lock was used to lock the entire shared memory region when any shared data structure was accessed or modified.
Some other fun facts about the ancient version 5:
* max segment size was 8 MB,
* max -B was 32,000,
* db and bi block size was 1 kb,
* lock table size was limited to 32 kb,
* bi cluster size was fixed at 16 kb.
* TP1 benchmark performance was about 10 tps.
* no data servers
* 4GL could connect to only one database at a time
* there were no internal procedures in 4GL
-gus