PASOE/REST - How to Handle Database Disconnects

Posted by jts-law on 16-Feb-2018 13:40

Hello,

I have a REST service that uses a Data Object Handler.

I'm using a DB connection string on the "Agent startup parameter" to connect to the DB, and code within the agent startup procedures to create/set my initial client principal.  The session startup procedure creates a new instance of my DOH event handler, which then runs persistent.  As requests 

My issue is that if the database gets restarted and PASOE does not (currently on different servers), the DOH code errors due to the database having been shutdown.  The following are generated to the oepas1.agent.log file:

[18/02/16@13:29:12.918-0600] P-016920 T-018504 1 AS-8 LogMgrWrtr     [OE.W.InternalWebRouter DEBUG] Request for path "/JTS/web/pdo/UI/ProfitGrp" using template "/pdo/" and handler "OpenEdge.Web.DataObject.DataObjectHandler"
[18/02/16@13:29:12.922-0600] P-016920 T-018504 1 AS-8 LogMgrWrtr     [OE.W.InternalWebRouter DEBUG] Debug mode: ON 
[18/02/16@13:29:12.931-0600] P-016920 T-018504 1 AS-8 LogMgrWrtr     [OE.W.InternalWebRouter DEBUG] Handler instance OpenEdge.Web.DataObject.DataObjectHandler_4609 used for handler OpenEdge.Web.DataObject.DataObjectHandler
[18/02/16@13:29:12.975-0600] P-016920 T-018504 1 AS-8 LogMgrWrtr     INFO: Current session client type MULTI-SESSION-AGENT does not support named logs
[18/02/16@13:29:12.995-0600] P-016920 T-018504 1 AS-8 LogMgrWrtr     [OE.W.DO.DataObjectHandler DEBUG] Using mapped operation for GET service path ProfitGrp: OpenEdge.Web.DataObject.MappedOperation_3280: GET svc:UI v1.0.0 uri:ProfitGrp: type:Class ; name:Progress.Lang.Object, type-of:Progress.Lang.Object, fn:, numargs:2, num-schemas:1, name:
[18/02/16@13:29:13.062-0600] P-016920 T-018504 1 AS-8 LogMgrWrtr     INFO: Current session client type MULTI-SESSION-AGENT does not support named logs
[18/02/16@13:29:13.074-0600] P-016920 T-018504 1 AS-8 LogMgrWrtr     [OE.W.DO.DataObjectHandler INFO] Service UI logging performed by OpenEdge.Web.DataObject.DataObjectHandler.UI at DEBUG
[18/02/16@13:29:13.132-0600] P-016920 T-018504 1 AS-8 LogMgrWrtr     [OE.W.DO.DOH.UI DEBUG] "Accept" value: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
[18/02/16@13:29:13.143-0600] P-016920 T-018504 1 AS-8 LogMgrWrtr     [OE.W.DO.DOH.UI DEBUG] Operation ContentType: application/json
[18/02/16@13:29:13.159-0600] P-016920 T-018504 1 AS-8 LogMgrWrtr     [OE.W.DO.DOH.UI DEBUG] Business entity ProfitGrpBE (type-of Progress.Lang.Object) invoked by OpenEdge.Web.DataObject.DataObjectHandler_4609 as target type Class
[18/02/16@13:29:13.228-0600] P-016920 T-018504 1 AS-8 -- (Procedure: 'LoadEntityHandler OpenEdge.Web.DataObject.DataObjectHandler' Line:1389) Error reading socket, ret=10053, errno=2. (778)
[18/02/16@13:29:13.228-0600] P-016920 T-018504 1 AS-8 -- (Procedure: 'LoadEntityHandler OpenEdge.Web.DataObject.DataObjectHandler' Line:1389) ** Incomplete write when writing to the server. (735)
[18/02/16@13:29:13.228-0600] P-016920 T-018504 1 AS-8 -- (Procedure: 'LoadEntityHandler OpenEdge.Web.DataObject.DataObjectHandler' Line:1389) Failed to acquire requisite lock to read schema (12536)

I've tried to add code in a few different spots to either reconnect, or cause the agent to quit so it can restart and reconnect to the database on the next request but nothing I have tried so far has worked.

The only way so far to resolve the issue is to kill the agents, or fully restart oepas1.

Has anybody had a similar issue and come up with a solution?

TIA

Louis Winter

All Replies

Posted by dbeavon on 16-Feb-2018 18:52

I just asked this same question a little while ago.  It was in the context of the APSV transport but I imagine the same issues apply to REST since it is the underlying ABL client sessions that are having database connectivity problems.

Here is the link.

https://community.progress.com/community_groups/openedge_development/f/19/p/36682/113903#113903

Note that the problem was unexpected for me because we've always used shared memory database connections and those simply get recycled when the connection to the database is broken.  So for those types of clients, the appserver problems resolve themselves because once the database comes back online, new ABL client sessions are started. However, when you to switch to using OE client/server connectivity, the ABL sessions start generating errors when the database is disconnected.  And the errors don't resolve themselves anymore;  even if the database comes back online.

As far as I know, you definitely need to write your own custom solution to this problem.  Here is a KB saying as much:

https://knowledgebase.progress.com/articles/Article/P130271  

 It is unfortunate since every PASOE customer is probably rolling their own solution in their own way.  It seemed to me that if the database connection parameters are specified in on the agent at startup, then that is a necessary prerequisite for application code.  And it should be PASOE's responsibility to make sure that the agent is cycled when the database is shut down.  

Here is my solution (very different than the one in the KB).  I wanted the problem to be solved in a generic way that was external to the custom ABL code running within PASOE.  So I created two health-monitoring methods, one that has no database access and returns a response of TRUE, (HealthCheckupWithoutDataAccess) and the other that reads a single record from the database (HealthCheckupWithDataAccess).  Make sure they are both fast and not resource-intensive (< 50 ms end-to-end).

The goal is to bring PASOE back online as soon as possible after the database comes back online (for me I wanted it to be under one minute).  So each minute I do as follows in my own cron/service/job/whatever.

STEP 0 : Enumerate all agent processes and sessions for the ABL Application using the REST API (OE Manager).  If there are no sessions at all then do nothing. (see STEP 2 for more on the OE Manager)  

STEP 1 : Get the result from both HealthCheckup methods that I described above.  If the one with *no* data access works but the one *with* data access generates an error, then I know that my agents are being affected by a database outage.

STEP 2 : If I'm affected by an outage then enumerate all agent processes for the ABL application and delete the sessions of each (see     https://community.progress.com/community_groups/openedge_development/f/19/t/36461?pi20882=2  for details.  Make sure to use the powershell script that deletes sessions for a given agent:

IE. Invoke-RestMethod -Method Delete -Uri (hddp://localhost:' + $portnumber.ToString() + '/oemanager/applications/' + $ablapp + '/' + $agent.agentId + '/sessions')

Yes, this is a lot more effort than it should be.  The nice thing is that I don't have to clutter my ABL with any reconnection logic.  Hopefully one day PASOE will do some of this work itself.

Posted by Stefan Drissen on 19-Feb-2018 02:06
Posted by jts-law on 20-Feb-2018 09:57

Thanks for the input, at least I know I'm not the only one fighting this situation, and this gives me a direction to go for now.  I also voted for the idea referenced.

This thread is closed