Stopping the PASOE service (sometimes this is taking extreme

Posted by dbeavon on 20-May-2019 13:34

Can someone tell me how long it takes for PASOE to stop on Windows?

We have PASOE registered as a windows service (not sure if the platform matters one way or the other).  Registering as a service is accomplished with a tcman command: (tcman service oepas1  register).  Once it is registered as a service in windows, it is managed by service control commands (net stop oepas1).  Or at least it is supposed to.

Under normal circumstances I can use "net stop" commands and the service stops in less than three seconds. But when the service is under load, it refuses to stop, even after thirty seconds.  The tomcat service seems to be allowing new, overlapping client requests to arrive and be processed.  The tomcat server doesn't seem to interrupt any requests, nor disallow new requests.

Can someone please tell me if this is expected behavior?  How long until the shutdown process will forcefully disallow new connections?  I've already tried to take drastic measures myself to facilitate the shutting down of the service, for example during a shutdown request I've tried to kill msagent processes from the OS.  But that doesn't seem to help matters.  PASOE will not only *ignore* my shutdown request, but will launch *new* msagent processes to replace the missing ones.

How long until PASOE/tomcat actually "gets serious" about shutting down, and it stops taking incoming requests from new clients?  What is the protocol for this? Why is the shutdown operation delayed by new client requests?  Is the problem related to the integration with the service control manager on windows?  Is it necessary to introduce some custom steps to force a shutdown (like using oemanager to stop some PASOE web applications first)?  Or maybe I should also be using the tomcat manager API to expire all active HTTP sessions?

It would be nice if there was some configuration that says the tomcat process will kill itself off after a predetermined number of seconds, even when clients are connected, and while new clients are attempting to connect.

Perhaps the problem is not related to PASOE in particular, but affects *all* tomcat-hosted applications?  Google seems to say that other applications which use tomcat can be affected by this issue:

https://www.google.com/search?q=tomcat+service+taking+too+long+to+stop

If anyone else has worked on this problem in the past, I would greatly appreciate some tips on how to forcefully stop the PASOE service in under 30 seconds.

Posted by Michael Jacobs on 27-Aug-2019 16:33

Wow... when you do follow up questions, you really mean it.   I'll be happy to share what I know.

1)  As a rule, OE does not alter the shipped behavior of tomcat.  We extend and supplement it through tomcat published APIs, but we don't change source code and recompile.  The same applies to procrun as well - we use it in tcman to make service setup easier, but we don't alter its functionality.

2) Aha!  They've update the help recently.   I physically looked at the Windows tcmanager in 11.7.4, and it contains the support for -w & -F.   Please try the options and verify that I am not leading you on a wild chase.

3) Use the normal start & stop for running private versions of a PASOE instance, and use tcman service start/stop/... for Windows Services.   The former uses the Windows process management APIs and the latter uses the Windows libraries for working with Services.

4) Yes, the shutdown port is supported.   It's just not as secure or reliable.   The shutdown port has always been more of a 'suggestion' to stop, while the tcman commands take direct control of the processes.  I've seen the shutdown port hang, silently fail, or be used to do a shutdown by other people - so it may be me but I don't trust it.  However, I don't see why the shutdown port could not be used as long as it satisfies your requirements and you take proper precautions for changing the passphrase.   I create instances via tcman, which provides me the option to set a specific port if I want.

5) Don't think I'll speculate on the 'supported' question.  OE should continue to work on making 'stop' more reliable, especially in the case of Windows Services.  The -F option to tcman service stop will do a Windows process stop if you have the privileges, which should be roughly equivalent to pskill.   If -F does not work - there may be something else happening and we'd like to know.  When all the right answers fail - you do what you must, as we all do.   (and yes, there is a task to look at other options for forcing Windows Service stop)

Not sure if you got everything you were looking for, but I hope the information is useful.

Mike J.

All Replies

Posted by dbeavon on 07-Jun-2019 14:54

We still continue to experience long delays when trying to stop the tomcat (oepas1) service.  It doesn't want to stop in a reasonable amount of time like 10 or 20 seconds.  (To make the requests to stop the service, we are using the service control manager in windows).

I did more research and discovered that PASOE configures tomcat without any shutdown timeout (see image below).  You can open the tomcat service configuration via a utility:

C:\Progress\OpenEdge\servers\pasoe\bin\tomcat8w.exe //MS//oepas1

... so I suppose I can spend some time playing with the shutdown timeout.  But this may not lead anywhere, or at least it won't get to the root of the problem.  I think that tomcat would shut itself down gracefully if it could (and shouldn't need me to tell it that anything more than 30 seconds is unreasonable).  But there is some internal integration issue with pasoe functionality that is causing a delay.  I suspect tomcat is designed so that when it reaches the shutdown timeout, it will probably kill itself, and whatever pasoe was trying to do at that moment will be aborted.  The abort is what worries me.  It would be nice to know what is at stake in using this approach, and why it is necessary to take such a drastic approach in the first place.

Is there a particular log file that I should be paying attention to, when it comes to the shutdown sequence?  The file named catalina.2019-06-07.log seems promising.... 

Here is what that file looks like when tomcat is able to shut itself down in a normal amount of time.

07-Jun-2019 10:09:15.186 INFO [Thread-14] com.progress.appserv.services.lifecycle.OeLifecycleListener.runStoppingScripts Running stopping scripts
07-Jun-2019 10:09:15.679 INFO [Thread-14] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["http-nio-8810"]
07-Jun-2019 10:09:15.744 INFO [Thread-14] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["https-openssl-nio-8811"]
07-Jun-2019 10:09:15.807 INFO [Thread-14] org.apache.catalina.core.StandardService.stopInternal Stopping service [Catalina]
... weird, but common, warnings about webabb threads...
07-Jun-2019 10:09:16.070 INFO [Thread-14] org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler ["http-nio-8810"]
07-Jun-2019 10:09:16.070 INFO [Thread-14] org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler ["https-openssl-nio-8811"]
07-Jun-2019 10:09:16.071 INFO [Thread-14] com.progress.appserv.services.lifecycle.OeLifecycleListener.runShutdownScripts Running shutdown scripts

Notice that things are shut down in under a second.  Notice that we are seeing some OE-specific messages (Running stopping scripts, Running shutdown scripts) but it is unfortunate that they aren't paired with another corresponding message that says when those oe-specific opreations are complete.  In any case, immediately after that final message above, the tomcat service seems to have stopped.  The time spent shutting down the service was very reasonable since it is only about one second.

Most of the time (at least 95% or more) the shutdown logs look like the example above.  But when the shutdown is stalled, I see something different.   Here is an example

07-Jun-2019 05:00:00.044 INFO [Thread-14] com.progress.appserv.services.lifecycle.OeLifecycleListener.runStoppingScripts Running stopping scripts
07-Jun-2019 05:01:45.828 INFO [Thread-14] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["http-nio-8810"]
07-Jun-2019 05:01:45.892 INFO [Thread-14] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["https-openssl-nio-8811"]
07-Jun-2019 05:01:45.956 INFO [Thread-14] org.apache.catalina.core.StandardService.stopInternal Stopping service [Catalina]
07-Jun-2019 05:01:45.966 INFO [localhost-startStop-2] org.apache.catalina.core.StandardWrapper.unload Waiting for [2] instance(s) to be deallocated for Servlet [apsv]
.. weird thread warnings ...
07-Jun-2019 05:01:47.247 INFO [Thread-14] org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler ["http-nio-8810"]
07-Jun-2019 05:01:47.248 INFO [Thread-14] org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler ["https-openssl-nio-8811"]
07-Jun-2019 05:01:47.251 INFO [Thread-14] com.progress.appserv.services.lifecycle.OeLifecycleListener.runShutdownScripts Running shutdown scripts

Notice that after the first pasoe message (Running stopping scripts), it hung for over a minute!  That is what we are trying to fix. 

Can someone tell me what this is doing?  How do I introduce additional logging for this?  It would be nice to at least know when the operation (Running stopping scripts) is complete, so I can be sure that this indeed what is consuming so much time.

Also, is there documentation about these lifetime scripts yet?  I've heard them mentioned before but don't know anything about them  ( see https://community.progress.com/community_groups/openedge_development/f/19/t/57264 ) .  Can we configure a minimal version of the "stopping scripts" that don't do anything special, or at least nothing so complex that it should take a minute of work.  Am I on the right track to suspect that it is the "stopping scripts" which are the culprits for the inability to shutdown tomcat?

Posted by dbeavon on 17-Aug-2019 16:24

Does anyone have PASOE running within the service controller on windows?  We continue to have intermittent problems with our shutdown process. 

Sometimes the service will stop within 5 seconds, sometimes it will never stop.   When it doesn't stop,  we find that it is normally hung up after the part where it says "Running stopping scripts".

I would guess that anyone who has run PASOE on Windows for an while will eventually encounter this issue.  The biggest problem is that if PASOE doesn't stop then it will be left in a state of limbo and will not behave correctly until we manually kill tomcat8.exe and all its children.  Last night this happened and below is an image showing that the service is still in the stopping state:

Posted by Roel de Wildt on 19-Aug-2019 06:53

Yes, sometimes it doesn't stop right away. Don't know why.

Posted by dbeavon on 26-Aug-2019 17:25

I haven't found a KB for the inability to stop PASOE as a service (on Windows).

I seems that I've found the repro!  The repro is to simply start a long-running operation within a PASOE agent session (eg SLEEP 300).  If you try to stop the PASOE service while one of these requests is underway, then the service will remain at the "stopping" status forever.  This seems to put it in a state of limbo.

I believe I could forcefully auto-stop the service using a series of brutal steps (initiate the stopping of the related PASOE web app, kill all _mproapsv processes from the OS, then finally attempt to use service controller to stop the PASOE windows service).  I'm a bit hesitant to work on this, since it should be part of the product ... and it may be in the future. I think it is better for Progress to implement this the "right way" (or at least in a way that they will fully support).

I noticed that the tomcat8w.exe service controller application for windows has a "timeout" configuration for shutdown purposes. I'm assuming that timeout is supposed to specify the "graceful" shutdown period , after which the service is supposed to be forcefully terminated.  I don't think this is currently implemented in the Progress PASOE because if I change the default "timeout" from 0 to 20 seconds, it doesn't have any impact on the behavior.

The registry key for the shutdown timeout in tomcat8w is found here:

HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Apache Software Foundation\Procrun 2.0\oepas1\Parameters\Stop

(... The DWORD value for the timeout is called "Timeout".)

If anyone has suggestions about tweaking the PASOE shutdown process (as a windows service), I'd be happy to hear.  I'm also interested to hear if tcman works any better for stopping PASOE on Linux.  I see that there is separate documentation, depending on if you are stopping PASOE as a windows service, or stopping it when managed outside the context of a windows service:

documentation.progress.com/.../index.html

documentation.progress.com/.../index.html

The second link seems to give options to forcefully stop after a timeout but the first link doesn't have any option like that.

Posted by Michael Jacobs on 27-Aug-2019 10:37

Yes, a perpetual loop in an ABL application that blocks the STOP condition will keep the Tomcat web app active, and therefore hang the server shutdown.   Using the standard Windows Service options (like SC or the Service plug-in... ) there is no forced exit.

If you use tcman.bat it adds a wait & force-stop to the Windows Service stop operation.  Tcman.bat will first attempt to stop the Windows Service normally.  If the Windows Service did not stop, it will wait up to 30 seconds to allow your ABL application code to complete before giving up.   You can try adding the command line options -w <seconds> -F to adjust the wait-time and doe a Windows process stop on Tomcat.

   tcman.bat service -w 100 -F myinst stop

Check with 'tcman.bat help service' to determine if the -w & -F options are listed and available.   The versions of OE have improved the Windows Service management via tcman.bat over the later releases.

Hope that helps.

Posted by dbeavon on 27-Aug-2019 13:28

Michael, that is very helpful.  A few follow-up questions...

1. Can you tell me how much I should be relying on the tomcat documentation?  IE. should we generally expect that tomcat features and behaviors (like graceful stop timeouts) should also be by PASOE?  See : commons.apache.org/.../procrun.html

Is this something that may start working in the future?  I have been happy to use some other tomcat features that work out-of-the-box, like the tomcat manager webapp.

2. >> "Check with 'tcman.bat help service' to determine if the -w & -F options are listed and available"

... the tcman service documentation above (the first link in my prior post) did not reference the -F (and -w) parameters in 11.7.5.  "tcman help service" doesn't list them either.  I'm assuming they are not available from the "tcman service" command any more than they are available from the standard service controller.  Let me know if I am missing something....

3. I'm assuming that when its configured as a windows service, you *must* use the "tcman service" variation of the start/stop commands, and *not* the regular "tcman stop", right?

4.  Is the "shutdown port" for tomcat fully supported in PASOE?  I had noticed that the shutdown port is not prominently featured on the OEE configuration U/I.

But based on my research, the shutdown port should work on a windows service.  See the discussion at  tomcat.10.x6.nabble.com/Shutdown-port-on-Windows-Service-installation-td5013583.html

5. It is unfortunate that poorly written code running in a *single* ABL session can subvert the *entire* shutdown operation.  There are times when it isn't even our own custom code that is the problem - but it may be some dependency/component such as the "JMS adapter" which can get itself hung at times.  Another unfortunate thing is that the shutdown operation doesn't even prevent *new* appserver requests... IE. after the shutdown operation is initiated, not only are the old sessions still alive-and-well, but new requests are arriving as well.  Given that the PASOE never seems to get very serious about stopping, should I avoid using the windows service controller for stopping the windows service?  It is easy to find another more reliable approach on Windows, like using "pskill -t tomcat8.exe". (see docs.microsoft.com/.../pskill ) .  We are using client-server connections to an OE database and I'm not extremely worried about the effect on the database.  But I would be worried if there was any chance that tomcat would not restart again after a pskill shutdown (possibly because of something that it leaves behind on disk, like a weird PID file or some such thing).  Can you tell me if Progress would support shutting down the tomcat service with pskill?

I appreciate anything you can do to help us find the <off/> button for PASOE !

Thanks, David

Posted by Michael Jacobs on 27-Aug-2019 16:33

Wow... when you do follow up questions, you really mean it.   I'll be happy to share what I know.

1)  As a rule, OE does not alter the shipped behavior of tomcat.  We extend and supplement it through tomcat published APIs, but we don't change source code and recompile.  The same applies to procrun as well - we use it in tcman to make service setup easier, but we don't alter its functionality.

2) Aha!  They've update the help recently.   I physically looked at the Windows tcmanager in 11.7.4, and it contains the support for -w & -F.   Please try the options and verify that I am not leading you on a wild chase.

3) Use the normal start & stop for running private versions of a PASOE instance, and use tcman service start/stop/... for Windows Services.   The former uses the Windows process management APIs and the latter uses the Windows libraries for working with Services.

4) Yes, the shutdown port is supported.   It's just not as secure or reliable.   The shutdown port has always been more of a 'suggestion' to stop, while the tcman commands take direct control of the processes.  I've seen the shutdown port hang, silently fail, or be used to do a shutdown by other people - so it may be me but I don't trust it.  However, I don't see why the shutdown port could not be used as long as it satisfies your requirements and you take proper precautions for changing the passphrase.   I create instances via tcman, which provides me the option to set a specific port if I want.

5) Don't think I'll speculate on the 'supported' question.  OE should continue to work on making 'stop' more reliable, especially in the case of Windows Services.  The -F option to tcman service stop will do a Windows process stop if you have the privileges, which should be roughly equivalent to pskill.   If -F does not work - there may be something else happening and we'd like to know.  When all the right answers fail - you do what you must, as we all do.   (and yes, there is a task to look at other options for forcing Windows Service stop)

Not sure if you got everything you were looking for, but I hope the information is useful.

Mike J.

Posted by dbeavon on 28-Aug-2019 00:15

>> Wow... when you do follow up questions, you really mean it. 

Yes, because stopping is important.  If my car had a brake pedal that didn't work then it would be a much bigger problem than a broken gas pedal.

Assuming we didn't get anywhere with those questions, then my next question would have been related to another option - restarting the entire VM.  When in doubt, restart the machine.  Isn't that what people used to say about Windows servers?  Certainly Progress wouldn't withhold tech support from customers who experience "unexpected" power loss on their PASOE servers...

Thanks for all your tips.  I'm especially happy about tip #2 which seems to be working fine.  This should be added to the documentation!  Without the -F option, there is really no point.  I can't imagine that any PASOE administrator would be happy with the final outcome of a stop operation that leaves the PASOE windows service in a state of limbo.

Below are the two different stop operations.  The first one doesn't stop the windows service, and leaves it at a permanent status of "stopping" (in which it can no longer be managed).

... and the second one stops everything after five seconds no matter what.  If an active ABL transaction does not complete in five seconds, then it dies!

Alternatively we might customize this with a wrapper.  We could build on the "tcman service stop" with additional functionality... ie. we could use REST to interrogate PASOE for long-running, active sessions and ask the administrator to confirm they are OK with killing those sessions.

Thanks again for the tips!

Posted by Michael Jacobs on 28-Aug-2019 09:37

I love the analogy.  It is very appropriate, and we are in agreement.

We have a more comprehensive start and restart process in the 'pasoestart' script, which is a higher level utility.   Where the tcman adds to tomcat's low-level CLI utilities, pasoe adds to tcman.   At this time, pasoestart is primarily targeted at developer's private instances and does not include Windows Services.  

   tcman pasoestart -timeout 120   # allow 120 seconds for startup of a stopped server, and stop the server if the ABL applications do not start.

   tcman pasoestart -timeout 120 -restart   # allows 120 seconds to stop, then restarts a stopped server and allows 120 seconds for startup of the ABL applications

I suspect that something similar would be useful for Windows Services?

I can say the -w & -F were missing from the tcman help text.  We missed that, and its on us.   You should find them added in the next service pack.

The engineering folks have been adding to the list of admin operations the means of stopping individual ABL application MS-Agent processes.   The tcman enable & disable could be used to turn on/off client access to individual ABL web applications before stopping the server.  We've observed that the use of these lower level admin tools can be very useful, when taken into context of the ABL application run-time.  I've often found in the past that making my own wrappers always give me a level of control that fit my operations better - it sounds like that might be a good investment for you.

I'll go back and look at the Windows pasoestart command.   Maybe handling Windows services would work, but maybe not.  No commitments and no promises other than I see your point and will look at the possibility.  

Go ahead and ask followup questions if it helps with your operations.  

Posted by dbeavon on 28-Aug-2019 13:57

Thanks for pointing me to "pasoestart".  I don't believe I would need that level of abstraction over the top of the service controller (and over top of tcman).  I suspect that the standard Windows service control would have been sufficient for our purposes if it had included the -F functionality after a "reasonable" number of seconds (30, 60 or whatever).

I had also investigated stopping something smaller than the tomcat instance. I noticed tomcat will allow me to stop the web apps as part of a custom shutdown.  What I liked is that it prevents new connection requests, while the PASOE instance is attempting to shut down.  Another thing I liked is the fact that this didn't have a permanent effect.... when the instance comes back online again the web apps are running right away without any additional management.  The only thing it didn't seem to help with is the interruption of misbehaving ABL sessions.  In that regard it didn't really address the root problem.

I haven't had a need to manage ABL applications independently yet.  (Aside from using pskill on the mproapsv's as needed) ...  But I suppose that might come in the future (ie. it would be nice to reconfigure an individual ABL application in openedge.properties on-the-fly, and then restart that application independently of the others).  Today it seems that you need to cycle the whole PASOE instance before configuration takes effect... but that normally seems appropriate anyway.  The only thing I really miss is the "dynamic" changes in OE logging - we lost that when upgrading from 11.7.4 to 11.7.5.  But that is another story...

Posted by Michael Jacobs on 28-Aug-2019 17:53

Thank you for the input on 'pasoestart' and using standard Service tools.   Appreciate it.

Yes, there was a slip in 11.7.5 that had a ripple effect of disabling dynamic agent logging level changes.   Fixed.

Interesting...  we have been seeing that Tomcat has shut-off client connection requests when a shutdown starts.   It sounds like you have observed that client connection requests still are honored.   Am I interpreting that correctly?  

If you are exploring the server and agent management abilities - have you tried the Swagger feature of the oemanager web app?  It's a service that helps me keep up with new/changed agent management abilities, plus lets me test the REST calls easily and see if it fits my automation use-case.

Posted by dbeavon on 28-Aug-2019 19:54

>> ripple effect of disabling dynamic agent logging level changes.   Fixed.

"Fixed"? Do you mean we can get dynamic logging back again ( in a hotfix on 11.7.5 )?  I was told to wait for 11.7.6 which may be at least six months out.  (I doubt they are even giving out dates yet.)

>> we have been seeing that Tomcat has shut-off client connection requests when a shutdown starts.   It sounds like you have observed that client connection requests still are honored.   Am I interpreting that correctly?  

Well, I noticed that remote connections are initially broken .  You must be doing something during shutdown to try and interrupt the active ABL sessions.  But after the shutdown sequence has "given up", then the service gets stuck in the "stopping" state.  And my recollection is that a server at that state will begin accepting new connections again after the shutdown sequence failed.  IE. we try to auto-stop the PASOE service as a scheduled task in the early hours on a Sunday morning. But if it fails to stop, then new client applications will resume working once again.  And there will be no evidence that anything went wrong except for the fact that the windows service has a suspicious status of "stopping" on Monday morning.  (and the catalina log can be interpreted to say that the service had tried to stop but wasn't)

This is easy to repro if you care to see it for yourself.  Just start with any random ABL that is called via the APSV transport and add SLEEP 300 to it.  Then connect to PASOE and start running that code.  Then try (and fail) to stop the windows service the normal way.  Then wait and use the instance again at a later time (after the shutdown sequence has given up).  I recall that it will happily let you connect to PASOE again, despite the "stopping" status of the windows service.  Hope this is clear.

>> have you tried the Swagger feature of the oemanager web app.

I'm eager to try it.  I agree that the ability to "explore" the REST API is extremely valuable.  Back when I first started using REST, I had to learn it the hard way with lots of trial-and-error and lots of questions (for Irfan) in these community forums.

As far as REST is concerned, I don't hate it, but I don't love it either.  I certainly prefer REST with swagger than without swagger.  REST needs a meta-layer just like SOAP did.  IMHO after we've added all necessary meta-layer on top of REST (OpenAPI, Swagger, WADL, JSON-schema, client-side API generation, etc)  then it seems that we've just gone around in a circle and we're back to having a "poor-man's" substitute for SOAP.   If I had to guess, there will come a new generation of programmer that will rediscover SOAP and think it's the "new-and-improved" version of REST.

Posted by Michael Jacobs on 29-Aug-2019 09:59

My apologies for not being clear.  Fixed in 11.7.6.  

Thank you for the deeper explanation.   So to summarize: PASOE was stopping clients on shutdown - but through some unknown internal magic started accepting client connections again.   I have a test case I run to -hang- and ms-agent ABL Session.  I can give that a try.  Should I still not get it right - I'll be back and we can see what is different.  

Do try out the Swagger interface.   Feedback would be appreciated and we'll roll that up with comments from others.  Thank you.

REST... the 'blush is off the rose', so to speak.  Something else will appear eventually to accomplish the same thing - only differently.   Which is not a bad thing in all cases.

Thank you for the feedback and the time you've spent on the topics.

This thread is closed