PASOE error 18318 and 18320 in log file - connection timeout

Posted by Gareth Vincent on 14-Apr-2020 08:19

Due to underlying network connection issues from various clients we are often seeing the following errors in our log files

08:38:36.739/440257346 [thd-3] ERROR c.p.appserv.adapters.apsv.Request - APSV(QwEFe4nNRSCL_A-21KrlkQ) : IOException while processing request : org.apache.catalina.connector.ClientAbortException: java.net.SocketTimeoutException. (18318)
08:38:56.741/440277348 [thd-3] ERROR c.p.appserv.adapters.apsv.Request - APSV(QwEFe4nNRSCL_A-21KrlkQ) : An error occurred processing the POST request : Unexpected error : org.apache.catalina.connector.ClientAbortException: java.net.SocketTimeoutException. (18320)

Unfortunately, this is another issue that is hindering us from moving all our clients over to PASOE.  When these errors occur I can see that the client request is stuck in "Reading State" and is still latched to one of the Agents even though the Agents sessions all show as IDLE.  Given enough time this can eventually lock all Sessions preventing any new requests from being serviced by the agent(s).  

The only work around we have for now is to trim the agents periodically or revert back to Classic appserver, 

I have logged a call with Progress Support but was just wondering if anyone else has experienced the same problem.

We are currently on OE 11.7.5 running on Centos 7.

All Replies

Posted by dbeavon on 14-Apr-2020 13:02

>>I can see that the client request is stuck in "Reading State" and is still latched to one of the Agents

How are you monitoring?  Is it done from the OEE console?

>> ... can eventually lock all Sessions preventing any new requests from being serviced by the agent(s).  

I'm assuming you mean "ABL sessions"?  How do you know they are locked if they say IDLE?  I've also noticed in the past that certain ABL sessions in the agent are not candidates to be trimmed via the oemanager, and I've wondered if they were internally locked in some way.  I'd be curious to know how you determine when things are locked.

FYI, there is a less drastic approach to trimming msagent resources.  Using the oemanager REST interface you can trim the underlying ABL sessions, rather than the agent as a whole.  I think there is a REST method named "trim idle sessions" or something like that.  We call it very frequently (hourly).   The msagent process itself will remain intact, and this operation has no impact at all on client applications.

What is the impact on the client side?  (APSV open client)  You haven't mentioned it, so I'm assuming that is not a concern.  Perhaps they are short-lived clients.   Or perhaps it was the client that had aborted and died in the first place (maybe the user lost network connection or killed the client process).

>> Given enough time this can eventually lock all Sessions preventing any new requests from being serviced by the agent(s).  

There may be a mechanism that would fix the problem automatically in time.  Have you deployed the tomcat manager app? (localhost:8810/.../html)  It will show you the HTTP sessions related to APSV.  Can you try to expire/detach the sessions from there and see if it clears up your issue?

If so, there is a web.xml config file in the conf directory with a "session-timeout" value that is set to 30 mins by default.  It will essentially do the same thing as expiring sessions from the manager.  The only thing to be careful of is your "session-free"/"state-free" clients if you use them.  Oddly enough, those are the ones that are particularly dependent on long-lived HTTP sessions.  And if sessions expire after 30 mins of inactivity, then the OpenClient will misbehave the next time the user tries to interact with appserver.  It will crash.  For this reason we had to increase the session-timeout to 3000 mins or something crazy like that.

Is there anything else in the logs?  You only showed the "session manager" logs.  Many times an error in there is paired with an error in the related "*agent*" log file.

Would you please keep us in the loop on the progression of your case?  I'm considering opening a similar case as well, but I'm a bit hesitant since I suspect Progress is not that interested in working on 11.7.x anymore.  They might just tell you to upgrade to 12.x.  Is that an option?  For my part, we are probably not going to be ready to upgrade to 12.x for a couple years.

Posted by Dileep Dasa on 14-Apr-2020 13:26

Errors 18318 AND 18320 can be ignored unless you see a lot of them. These errors appear when the client that made the request has stopped listening for the response. PASOE has no problem in generating a response in this case but there is no client to consume it. There is a KB article here:  knowledgebase.progress.com/.../errors-18318-and-18320-in-pasoe-logs

Posted by Gareth Vincent on 15-Apr-2020 04:50

Monitoring via OEE console and yes these are ABL sessions.

>> I'd be curious to know how you determine when things are locked.

When monitoring the Sessions these particular users still show a connection to the "Agent ID" and when viewing the activity on the Agents screen one can observe that only some of the sessions id's are still processing user requests and this starts to narrow down when more users are stuck in reading/writing state. When all sessions are exhausted the user will end up with the error "Unable to connect to the Appserver....."

In this particular example when looking at the agent.log and tracing the users Session ID the user was simply logging into the application with no errors in the log file.

Our clients connect via Progress Webclient and by design our application makes 3 http connections to the Appserver per user. There are some asynchronous calls to the appserver to load dashboard widgets on login and to juggle between two frameworks. Simply trimming idle sessions is not always an option as some of these http connections can remain idle for some time while the other 1 or 2 http sessions are being used more often throughout the application. Trimming just one of the 3 http sessions can crash the application.

We have over 400 client databases in our DC so you can imagine that can become difficult to manage even with the provided Rest API.  There is always going to be connectivity issues to some degree and I would expect PASOE to handle these connections without any human intervention.

I will play around with the "session-timeout" parameter as you mentioned to see if that helps.

I keep finding myself comparing the Classic appserver to PASOE. If we are going to migrate I would expect nothing less that it to be as stable and to provide the same if not better performance than the Classic appsever. Up to this point neither has been the case. To be fair the performance issue that we had where the performance was 10 times slower is being addressed and a patch will be released.

Moving to OE 12 is definitely not an option at this point and will not be for quite some time.

I will keep you posted.

Posted by Gareth Vincent on 15-Apr-2020 04:55

I've already reviewed this article on the KB but unfortunately I cannot ignore these entries if it keeps "locking" up the ABL sessions.

Posted by dbeavon on 15-Apr-2020 13:37

>> I keep finding myself comparing the Classic appserver to PASOE. If we are going to migrate I would expect nothing less that it to be as stable and to provide the same if not better performance than the Classic appsever. Up to this point neither has been the case

Yes, I agree that there are troubles upgrading from classic to PASOE.  Overall I think it is better, and continuing to improve as it is rolled out to increasing numbers of customers.  The deprecation of classic will only force PASOE to improve, since there won't be any alternative that can be used to jump ship.

You may have to plan on creating a bit of your own PASOE babysitting software.  We run PASOE on windows and have a dedicated service that performs health checks and regular maintenance operations like trimming the idle ABL sessions out of msagent processes on an hourly basis.  We also stop and restart the entire instance on a weekly basis to flush out any cruft that remains within memory (in the session-manager or msagent processes as a whole).  There are also peripheral tools you could use to monitor stability... like the "Health Scanner" docs.progress.com/.../HealthScanner.html   I haven't tried it yet.  There are also third-party vendors like ProTop would also help you monitor PASOE and ensure stability.

Overall I think the underlying tech/platform of PASOE is better (ie. because it is based on tomcat/HTTP).  And it allows you to deploy standardized HTTP load-balancing, assuming your custom ABL solution is not written in a way that requires the sharing of OS memory with the OpenEdge database itself.  

By using load-balancing, and scaling out to multiple independent hosts in the middle tier, you have the potential of vastly improving stability and capacity and even performance as compared to classic.  For the sake of performance we often just break apart a long-running loop across 5 or 10 concurrent threads by using .Net's TPL (ie. Parallel.Foreach.)   That allows everything to run all at once against the load-balanced PASOE .  It is a "sledgehammer" approach, but it easily makes up for performance degradations related to migration from classic to PASOE (and also for the degradations related to migrating from shared memory DB to client/server DB).

This thread is closed