Make the PASOE logs less noisy in the event of (fully-handle

Posted by dbeavon on 28-Mar-2019 21:46

We are migrating to PASOE and the logs are quite noisy since STOP conditions happen more frequently than the did on classic appserver.

We haven't quite upgraded to the new S.E.H. model of handling lock conflicts (-catchStop 1).  The ABL code still has "ON STOP" qualifiers around blocks of code that are likely to fail because of lock conflicts.  eg.

/* Possible problem                    */

DEFINE VARIABLE m_PossibleProblem AS CHARACTER NO-UNDO.
m_PossibleProblem = "Possible lock conflict on table.".


/* Error is unexpected        */

DEFINE VARIABLE m_ErrorUnex AS CHARACTER NO-UNDO.
m_ErrorUnex = "Unexpected error in " + PROGRAM-NAME(1) + " at line {&LINE-NUMBER}. " + m_PossibleProblem. 


/* BlockForTransaction  */

BlockForTransaction:
DO TRANSACTION
   ON STOP   UNDO BlockForTransaction,
      RETURN ERROR m_ErrorUnex
   ON ERROR  UNDO BlockForTransaction,
      RETURN ERROR m_ErrorUnex
   ON ENDKEY UNDO BlockForTransaction,
      RETURN ERROR m_ErrorUnex:

... whatever code results in lock conflicts...

END.

When we encounter lock conflicts, they raise the STOP condition that causes the unexpected error to be returned (RETURN ERROR m_ErrorUnex).  The end result is that the client/calling program receives the necessary details.  In some cases this is a remote openclient.

Notice that the exceptions are fully handled (not "swallowed").  They are properly captured and sent back to the client/caller.  Given that this is the case, it doesn't seem helpful that the PASOE prints this event as a critical message in the agent logs.  It is printed without having any log entry types selected.  Here is the message, formatted as shown based on the options -errorstack and -debugalert.  Note the scary messages (8812) and (1750)

[19/03/28@10:41:21.287-0400] P-002404 T-006936 1 AS-492 -- Lock wait timeout of 10 seconds expired (8812)
[19/03/28@10:41:21.287-0400] P-002404 T-006936 1 AS-492 -- ** ABL Debug-Alert Stack Trace **
[19/03/28@10:41:21.287-0400] P-002404 T-006936 1 AS-492 -- --> FetchAsyncDataTransaction app/p/app0479.p at line 594  (\\grnetappm03\oe_prod\OpenEdge\USW\LumberTrack\app\p\app0479.r)
[19/03/28@10:41:21.287-0400] P-002404 T-006936 1 AS-492 --     FetchAsyncData app/p/app0479.p at line 501  (\\grnetappm03\oe_prod\OpenEdge\USW\LumberTrack\app\p\app0479.r)
[19/03/28@10:41:21.287-0400] P-002404 T-006936 1 AS-492 -- Could not read table 685 for undoing DBKEY 131307145 partitionId 0.
[19/03/28@10:41:21.287-0400] P-002404 T-006936 1 AS-492 -- ** ABL Debug-Alert Stack Trace **
[19/03/28@10:41:21.287-0400] P-002404 T-006936 1 AS-492 -- --> FetchAsyncDataTransaction app/p/app0479.p at line 843  (\\grnetappm03\oe_prod\OpenEdge\USW\LumberTrack\app\p\app0479.r)
[19/03/28@10:41:21.287-0400] P-002404 T-006936 1 AS-492 --     FetchAsyncData app/p/app0479.p at line 501  (\\grnetappm03\oe_prod\OpenEdge\USW\LumberTrack\app\p\app0479.r)


Is there a common mechanism by which we can avoid printing this (STOP stuff)  to the agent logs when the problems are already being handled in code?

In case it helps, our logging level and log entry types are as follows.

Logging level: 1

Log entry types: (blank)

In the general case we DO want critical messages in the logs. (***see PS)

But for modern programs that handle errors more consistently (using BLOCK-LEVEL ON ERROR UNDO, THROW) we don't need the extra noise in the logs.  It is redundant since we know that any problems will be sent back to client/callers via the callstack.

The only option I can think of is to change LOG-MANAGER:LOGGING-LEVEL to zero and then try to ensure that it gets changed back to 1 again at the end (which is an approach that is made more challenging based on the common occurrences of lock conflicts and other types of errors).

Any ideas would be appreciated.  It is unfortunate that the logs are noisy, considering that the errors are being fully handled.  It just means that the people responsible for reading them every day have a lot of unnecessary noise in the logs (and on top of that, it is very scary-looking noise ).

Thanks, David

*** In general we DO want messages in the logs because some of our legacy ABL code is good at ignoring errors and proceeding without any error handling whatsoever (based on the default ON ERROR behavior in ABL).  We have legacy programmers who are responsible for reviewing the logs after the fact (the next day) to see how scary the errors were, and how likely it is that the legacy ABL has corrupted data in the OE database.

All Replies

Posted by Laura Stern on 31-Mar-2019 12:43

I don’t understand why you’re saying that “STOP conditions happen more frequently” in the PASOE environment vs the classic AppServer.  That is absolutely not the case.  We do not handle error or stop conditions differently in the ABL in the 2 environments.  And you are not using the new Stop object capability so nothing at all should have changed.  

Maybe you are just getting lock conflicts more often.  Are there more concurrent users in your PASOE configuration?

Posted by Stefan Drissen on 01-Apr-2019 08:06

[quote user="Laura Stern"]Maybe you are just getting lock conflicts more often.  Are there more concurrent users in your PASOE configuration?[/quote]

Or it is the difference in default lock wait time out.

PASOE has 10 seconds (like classic WebSpeed) if I'm not mistaken and classic AppServer has 30 minutes. From a "user" point of view this results in PASOE having STOP conditions happening more frequently.

Posted by dbeavon on 01-Apr-2019 13:05

Yes, the default lock-wait-timeout (10 second vs 30 min) is what causes STOP's to happen more frequently in PASOE.

Now that STOP's are so much more common, we have started to introduce error-handling that is designed to consistently address the problem - or provide a better message to the user.   Given the explicit error-handling, these "STOP" conditions don't need to be so noisy in the agent logs. (They don't need to be any more noisy than if we were to raise and then handle our own custom errors - via UNDO, THROW NEW Progress.Lang.AppError ).

I was just wondering if it was possible to turn down the volume on these STOP's in the agent logs.  The only approach I can think of is a bit risky (LOG-MANAGER:LOGGING-LEVEL to zero and then try to ensure that it gets changed back to 1 again in the future).  It may have unintended consequences that extend beyond the "STOP" conditions.  Hope this makes sense.

Posted by Laura Stern on 01-Apr-2019 15:43

Maybe you should  just increase the value of -lkwtmo?  

When you say you've "handled" this condition, that is irrelevant.  If the error message is output, it will be put into the log.  Using DO ON STOP, or DO ON ERROR does not keep error messages from being output, therefore, they will go into the log.  Now, if you had caught the condition, and it still got output to the log (which might be the case for stop conditions), then you definitely have a case!  

Posted by dbeavon on 01-Apr-2019 16:34

I'm trying to wrap my head around that.  So my legacy handling of the STOP condition is "DO ON STOP UNDO, RETURN ERROR v_UnexpectedError".  You are saying that this type of STOP handling always sends the related error messages to the logs.

But if I use "-catchStop 1" and put a CATCH statement in there (for Progress.Lang.LockConflict) then you would expect that messages would no longer be sent to the logs. Am I getting that right?  

Conceptually the two don't seem that different to me.  They are both handling the STOP condition that is arising from a lock conflict.  I suppose that in the case of the "ON STOP" phrase, the ABL programmer doesn't have any of the related details (and there is no way to get at them).  But in the case of the CATCH statement the ABL programmer has all of the necessary details and they can determine for themselves whether or not they want to write them to a log file.

I was under the impression that the "traditional error handling model" would also suppress messages from the logs if you did certain tricks like using "NO-ERROR" after various types of statements (and proceeded to inspected the "error-status system handle" after the fact).  Maybe the NO-ERROR trick is a slightly different topic.  It appears that the NO-ERROR trick might have more of an impact on the agent logs than using "ON STOP" or "ON ERROR" phrases on a DO block.  

Posted by Laura Stern on 01-Apr-2019 17:22

It is really very simple.  If the error message is still "displayed" to the default output device, which in this case is log file.  End of story.  Whether it is "handled" or not has nothing to do with it.  The reason NO-ERROR would keep the error from going to the log is because in that case the message is not displayed to the default output device.  Simple.  It has always been this way.

If you use a CATCH block - again the message is no longer written to the output device.  There is a possibility that if it's a STOP condition we still write it out to the log even if it's caught.  I can't remember.  But what I meant in my last post was that if this happens, I can totally understand why you would consider that a bug and we would entertain the idea of fixing it.  I would be interested in other's opinions on this.

Posted by dbeavon on 01-Apr-2019 18:04

>> If you use a CATCH block - again the message is no longer written to the output device.  There is a possibility that if it's a STOP condition we still write it out to the log even if it's caught.  I can't remember.

I'll do some testing.  If it *IS* written to the log, then there is probably no easy way to suppress that message, right?

If it is *NOT* written to the log by default, then a developer could optionally write it themselves, via LOG-MANAGER.  The nice thing about the new LockConflict exception is that we should theoretically have all the details needed to write the message ourselves.  Whereas with the "traditional" error handling model's ON STOP phrase, you wouldn't have easy access to those lock-conflict details.

Our legacy ABL programs generate quite a lot of noise in the agent log.  I don't think that any error which is handled (or escapes to the client) should be printed to a log by default.  Or if they are printed then that should be done based on our own custom requirements (specified via error event handlers for first-chance exception & unhandled exceptions).   Of course if ABL is swallowing our errors by way of the default ON ERROR mechanism, then those types of things should continue to be printed to the logs.

Posted by Laura Stern on 01-Apr-2019 18:46

Re: If it *IS* written to the log, then there is probably no easy way to suppress that message, right?

If setting logginglevel to 0 doesn't work, then I believe that is correct - no way to suppress it.

Re: Our legacy ABL programs generate quite a lot of noise in the agent log.  I don't think that any error which is handled (or escapes to the client) should be printed to a log by default.

Sorry, I'm confused. The AVM cannot read your mind.  It has no idea that this error will be sent back to the client.  And in fact it is NOT going back to the client.  Once the error message is written to the output device, the error message is gone.  The application doesn't know what it was and even if an error condition is sent back to the client, the error message is not.  So how else are you going to know what went wrong if it is not in the log file?  

Posted by Laura Stern on 01-Apr-2019 18:46

Re: If it *IS* written to the log, then there is probably no easy way to suppress that message, right?

If setting logginglevel to 0 doesn't work, then I believe that is correct - no way to suppress it.

Re: Our legacy ABL programs generate quite a lot of noise in the agent log.  I don't think that any error which is handled (or escapes to the client) should be printed to a log by default.

Sorry, I'm confused. The AVM cannot read your mind.  It has no idea that this error will be sent back to the client.  And in fact it is NOT going back to the client.  Once the error message is written to the output device, the error message is gone.  The application doesn't know what it was and even if an error condition is sent back to the client, the error message is not.  So how else are you going to know what went wrong if it is not in the log file?  

Posted by dbeavon on 01-Apr-2019 19:43

>> It has no idea that this error will be sent back to the client.  

I guess I was just agreeing with you that it would be considered a bug if an error (LockConflict for example) is both *thrown* to the clients/callers *and* it is written to the agent log at the same time.  It seems redundant to do both.  No mind-reading required.  

If a client/caller has caught the error, then it is their right to swallow it without logging it (either while still on the ABL side, or on the .Net openclient side).

>>   And in fact it is NOT going back to the client.  Once the error message is written to the output device, the error message is gone.

The message details may be gone, but the stop condition itself has always able to be sent back to the openclient as Progress.Open4GL.Exceptions.RunTime4GLStopException.  As you say, there is very little detail in that exception to make sense of it.  It is basically just another generic "something went wrong" exception.  But if I catch a LockConflict and re-throw that as an AppError then I have captured all the underlying details I need.  There is no annoying noise needed in the log (or I can put it there myself if I really want that) .

Posted by Laura Stern on 01-Apr-2019 20:50

Actually, I was not thinking of this case.  What I meant before was that if a stop condition is caught and NOT re-thrown  then I could see why you would not want to see the error in the log.  It was completely handled.

However, in this case, even if the condition was originally caught, you are throwing it back to the client.  So at some level it is not being caught on the server side. Our original intent was to log any message associated with an uncaught Stop condition even if it was sent back to the client.  Interestingly, for the lock conflict case, we originally were NOT writing the message to the server log.  But we got push-back internally from various parties that this was a bug and I fixed it!  So now you definitely WILL get the lock-wait timeout message in the server log even if the stop object is sent back to the client.  

Personally, I think I agree with you on this!  You could log a bug and we can hash it out again.

But it would be REALLY helpful if others out there would weigh in on this.

Posted by dbeavon on 09-Apr-2019 15:24

Crickets. ;)

If I use catchStop -1, and catch it (Progress.Lang.LockConflict)  then my work is fully accomplished. I am fully in control of this type of ABL error condition, and my own code will fully handle both of the possible outcomes - the outcome where my data was available and the outcome where my data was not available.  What is the purpose of the logging in that case?  To take a step back , what is the intended purpose of the appserver logging in general?  From my perspective we should only be using the logging (after-the-fact) to discover where errors in ABL are being "swallowed" / "suppressed" / "ignored" by the ABL runtime (as a result of the default behavior of error conditions in certain ABL looping constructs).

In a certain sense the DO/CATCH(LockConflict) can be regarded as "syntactic sugar".  A similar thing could be easily be accomplished with manual programming.  For example, we could also write our own loop of ABL code which, for ten seconds, tries to perform a EXCLUSIVE-LOCK NO-WAIT and, after the timeout has expired, throws a custom "AppError" saying that there was a conflict.  In that scenario we would also be able to fully handle both of the possible outcomes.

The difference is that, in the case of the "manual" approach, it would NOT result in any scary messages in the appserver logs.  There would be no REASON for messages, since all outcomes are accounted for.  The messages would only serve to raise unnecessary alarms from a human operator.

Perhaps the first thing Progress needs to do is allow us a mechanism for first-chance error-handling.  It would be nice if we could attach a custom handler that is executed when errors are first raised, and perform a custom action (like spit messages via "LOG-MANAGER").  Then it would be totally *our* choice to print certain types of errors and not others.  In theory, a first-chance LockConflict is about as interesting as any other type of error you can think of (whether runtime errors or custom errors).  Assuming we were given a mechanism to capture first-chance errors, the current noisy STOP conditions could then be eliminated from the appservers logs by default.  Right?

I guess I still have some confusion about the noisy STOP's.  The confusion may originate in not understanding the purpose of appserver logging in general.  Is it the intention of Progress that there is a human who is monitoring these logs moment-by-moment and correcting problems after-the-fact?  That type of approach should be unnecessary and should be discouraged.  The ABL code should be made more capable, so that it can work in a well-defined way whether there is a lock conflict or not.  There should be no need for a human to be monitoring a log file and doing work that could otherwise be done by software.

Posted by Laura Stern on 15-Apr-2019 14:52

First of all, I just checked.  If you catch the lock conflict condition, we will not output the message to the log.  So a lot of what you are saying is moot.

But I believe the case that started this whole discussion was not one where the condition was caught.  It was one where the condition was thrown out from the top procedure running on an AppServer.  Here the condition is not caught on the server, but indeed, it is thrown back to the client where it presumably can get caught.  I believe I already said, from my prospective, you are preaching to the choir.  It doesn't seem necessary to me to put the information into the appserver log if all the information on what went wrong is going back to the client. If all the information is not going back to the client (as with .NET clients, and prior to v12 (or 11.7.x?), java clients, or before we serialized error objects back to ABL clients), then the information DOES need to be in the AppServer log or no one will know what went wrong - only that something has gone wrong.

But I wanted someone ELSE to chime in on this last point.

Posted by dbeavon on 17-Apr-2019 14:33

It would have been *really* nice if those new -catchStop 1 exceptions would auto-magically be marshalled back to the .Net openclients with a meaningful message (just like the occasional AppErrors that escape out of an ABL session in PASOE.)  

There is so *much* impedance going on here.  First of all the error condition (Progress.Lang.LockConflict) doesn't derive from normal error classes, or even implement the regular Progress.Lang.Error interface.  So any error-handling I've already written (unwillingly-written logic that translates from CATCH statements to RETURN ERROR statements) is totally useless.

Secondly, a lock conflict comes up so infrequently that NOBODY wants to write special logic for it.  It should be handled in the most generic possible way - one that takes advantage of the automagic "S.E.H" functionality which is built into the framework.

Thirdly our users on the .Net side get far too many indecipherable messages from our applications.  They don't want to see another unusual message which simply says that a “4GL STOP condition” occurred.  Even an ABL programmer doesn't really know what that means - and it might not even be a lock conflict in many cases.  If the error is happening because of a lock conflict, then in the very least it should say as much, and it should also give the name of the database record (and possibly the user that owns it).

I can understand that some of the work needed to improve things has to be done by the Progress team responsible for the .Net openclient.  And that can take time. But in the meantime we are in a fairly awkward state of limbo, and we're forced into doing a ton of manual effort for ABL developers to watch for "LockConflict" .... and translate it ...  so it can be consumed by our client applications.  As I said earlier, the manual effort will involve doubling-up all the generic error-handling that we've already written - so that we don't ignore another indecipherable set of generic errors (derived from the new Progress.Lang.Stop).

As far as the openclient goes, in the very least it would be nice if Progress could give us a roadmap, and an ETA for the changes that are coming.  Will the client only be available for OE 12?  Will it be fully backwards-compatible with PASOE running on 11.7?  Will the openclient for .Net match the functionality of the Java openclient?  Will there be any breaking changes requiring us to do work on our side?

By giving us some kind of a roadmap, it might help us determine if we really must invest in all the custom programming effort that is related to "LockConflict/Progress.Lang.Stop".

I suppose that we can put lock-wait timeouts back to 30 minutes (the way they were in classic appserver) and that "solves" the problem to some extent.  But now instead of an indecipherable message, the users get an experience where their applications become unresponsive for extremely long periods of time).

I agree that it would be nice for others to chime in and have some opinion about this.  But it might be that the people with the Java openclient don't care, because things are already working better for them on that side.  The only people who care about this might be the .Net openclient customers (however many there are).

Posted by Laura Stern on 17-Apr-2019 15:33

Re:  It would have been *really* nice if those new -catchStop 1 exceptions would auto-magically be marshalled back to the .Net openclients.

Yes, we agree.  I cannot argue with anything you're saying on this point. I think it is long overdue to be able to send back real error information to the client.  It is a pretty basic capability!  As you seem to know, that capability is already (as of 11.7.x) working that way for Java Open Clients.  It is just a matter of priorities and it is a product management decision.  

And we don't need others to chime in on this particular point.  The "chime in" request was about whether we should also put error messages in the AppServer log for errors that are sent back to the client WITH ALL of their information.  

Re: Secondly, a lock conflict comes up so infrequently that NOBODY wants to write special logic for it.

Well apparently that is no longer true (that it comes up so infrequently) or you wouldn't be complaining about noise in your error logs!  :-) I would say there's probably a happy medium between 30 minutes and 10 seconds!

This thread is closed