PASOE - intermittent issue ("Error loading the .NET runtime. (14081)") - Forum - OpenEdge Development - Progress Community

PASOE - intermittent issue ("Error loading the .NET runtime. (14081)")

 Forum

PASOE - intermittent issue ("Error loading the .NET runtime. (14081)")

This question is not answered

We are running PASOE in production.  I'm seeing intermittent issues with some .Net interoperation (via clr bridge).  The interop is being used within our ABL code in order to quickly access a remote REST interface.  An error comes up intermittently while running some simple .Net code used by "sessionConnectProc" for session-managed connections.

We do not currently preload any assemblies for use by the CLR.

The progress error message that is reported within the AVM, via Progress.Lang.Error, is not of much help: Error loading the .NET runtime. (14081) 

There are only a few lines of code that reference .Net, and they run during the "sessionConnectProc".  I'm pretty sure I've isolated the problem to this code.

   /* ********************************************************************* */
   /* Call rest                                                             */
   /* ********************************************************************* */
   DEFINE VARIABLE v_HttpClient AS CLASS System.Net.WebClient. 
   v_HttpClient = NEW System.Net.WebClient(). 
   

 

Is there any way to get to the real underlying source of the problem.  I would guess that there are exceptions being thrown internally that are suppressed and replaced with the unhelpful and generic message, "Error loading the .NET runtime. (14081) ".  If there was a way to get the LOG-MANAGER to show more information, or hook into the first-chance details about the underlying errors/exceptions before they before they are superceded by the very generic one.

We are running PASOE on 11.7.4.  The KB articles that I've found about this are for problems that occurred in prior versions, and/or they are for problems that occur a lot more consistently.  In my case, I would guess I only see the error for 1 out of 1000 times that the WebClient is used.

Any help would be greatly appreciated.

All Replies
  • David,
     
    Use the MS fuslogvw.exe program (must be run as admin) and tell it to log all failures.
     
    Brian Maher
    Principal Engineer, Technical Support
    Progress
    Progress
    14 Oak Park | Bedford, MA 01730 | USA
    phone
    +1 781 280 3075
     
     
    Twitter
    Facebook
    LinkedIn
    Google+
     
     

  • David,

    You might also try the Windows Sysinternals program procmon. I have had an instance where fuslogvw.exe did not show me what assembly component was not being loaded but procmon did.

    docs.microsoft.com/.../procmon

    Jeffrey

  • I wonder:

    Does this always happen on the first request for the session-managed connection ?

    Or does the first request  for the connection succeed, and is it a later request that fails ?

    If it's a later request that fails, did this particular session move over different threads on the agent ?

    That should be visible in the agent logs by looking at the P-nnn (agent PID) T-nnn (Thread number)  AS-nnn (session number) portions of the entries. Filter those where P- and AS- portions match a session that ran into the error, see if there are variations in the T- portion.

    COM automation breaks when sessions switch threads (knowledgebase.progress.com/.../COM-objects-on-PASOE-raise-CoInitialize-has-not-been-called-error), so it'd be worth probing if .NET objects are affected by something similar.

  • So it sounds like the problem is most likely related to loading the .net runtime, and related assemblies, right?  I'm hoping it isn't a catch-all error message that could arise for any arbitrary .Net exception (eg, including something in the implementation of the constructor of this class: System.Net.WebClient).

    Why isn't there a way to get the .Net exception details?  This error (14081) is not something from the .Net runtime itself.  Troubleshooting is so much more difficult when Progress hides all the inner exception details.  I've noticed that the .Net openclient for appserver does the same thing.  It hides all the root-cause information like inner exceptions and stack traces, and replaces them with some generic message that will take many hours to troubleshoot.  (You almost need to have a debugger attached before you will find the real root cause for any of the unexpected errors).

  • I got a similar error when trying to run an uninstalled prowc.exe (that works as a swell), calling a CrystalReport dll. I ha no clue why it failed, but did an install of Progress, and after that it worked. I found some talking about CAS an security, but not sure...

    Sendt fra min iPad

    11. des. 2018 kl. 15:49 skrev dbeavon <bounce-dbeavon@community.progress.com>:

    Update from Progress Community
    dbeavon

    We are running PASOE in production.  I'm seeing intermittent issues with some .Net interoperation (via clr bridge).  The interop is being used within our ABL code in order to quickly access a remote REST interface.  An error comes up intermittently while running some simple .Net code used by "sessionConnectProc" for session-managed connections.

    We do not currently preload any assemblies for use by the CLR.

    The progress error message that is reported within the is not of much help: Error loading the .NET runtime. (14081) 

    There are only a few lines of code that reference .Net during the "sessionConnectProc".  I'm pretty sure I've isolated the problem to this code.

       /* ********************************************************************* */
       /* Call rest                                                             */
       /* ********************************************************************* */
       DEFINE VARIABLE v_HttpClient AS CLASS System.Net.WebClient. 
       v_HttpClient = NEW System.Net.WebClient(). 
       
    
    

     

    Is there any way to get to the real underlying source of the problem.  I would guess that there are exceptions being thrown internally that are suppressed and replaced with the unhelpful and generic message, "Error loading the .NET runtime. (14081) ".  If there was a way to get the LOG-MANAGER to show more information, or hook into the first-chance details about the underlying errors/exceptions before they before they are superceded by the very generic one.

    We are running PASOE on 11.7.4.  The KB articles that I've found about this are for problems that occurred in prior versions, and/or they are for problems that occur a lot more consistently.  In my case, I would guess I only see the error for 1 out of 1000 times that the WebClient is used.

    Any help would be greatly appreciated.

    View online

     

    You received this notification because you subscribed to the forum.  To stop receiving updates from only this thread, go here.

    Flag this post as spam/abuse.

  • Unfortunately, when there is a problem during the initialization of the .NET bridge (part of the AVM), you get this generic message "Error loading the .NET runtime. (14081)", which is I believe, very often not the problem!  There are a few different reasons why this can fail (would have to investigate to know what), but we always just give this one generic message that is extremely misleading.  It is also never about a failure on loading one of the assemblies in the assemblies.xml file.  There are no errors generated when any of those fail to load.  And it may have nothing to do with .NET whatsoever. There is an outstanding bug that we should modify the code to be able to generate a more specific error message.  It is OCTA-3502, if that helps anything.  I've been wanting to do this for a while, but it currently does not have priority.

  • @Laura 

    Thanks for the reply.  I was afraid you might say something like that.  It sounds like the error could mean a number of things.

    Perhaps you can prioritize OCTA-3502 based on the fact that this message is happening in PASOE, and there are times when the outer _mproapsv agent becomes very unreliable in the context of .Net interoperation.

    Today I happened to be doing some debugging and the debugger was attached to the _mproapsv.exe process with the Visual Studio (for unrelated reasons) and noticed a few of these "Error loading the .Net runtime" come up in the PASOE agent log while I was doing my debugging.  I had originally suspected that it was triggered by a first-chance exception on the .Net side of things - but I never encountered any .Net exception for the whole duration of my debug session.  I now suspect that some of the reasons for the message are reasons which are entirely on the Progress side of the fence... and not related to anything going wrong in .Net.

    It seems to me that the message "Error Loading the .Net runtime" doesn't pass the sniff test .  For example, there only seems to be a single appdomain in the entire _mproapsv.exe process.  I use .Net for the most minimal purposes (primarily just to use the WebClient in order to call a few REST methods.)  The initial usage of the WebClient takes place in the very first moments of the life of the agent process.  We can see that the appdomain is loaded with the relevant assemblies  right away.... (image)...

    Given that the appdomain is loaded and initialized many hours (days?) ahead of time, it doesn't make sense for us to be getting the message  "Error loading the .Net runtime".  I suspect the message is trying to describe a different problem where a new ABL session is unable to be hooked up to the pre-existing .Net runtime.

    Is there some way to get to the root cause of these error messages?  I no longer believe that the cause is on the .Net side.  It seems to be a problem within the ABL session.  Alternatively, is it likely that this message has a timing-related component to it, and will go away after some number of iterations?  Finally, is there documentation about how to use the CLR bridge in the context of PASOE?  It seems a bit scary that all ABL sessions are being re-directed to use the same appdomain.  An ABL programmer might expect that the CLR calls from one session are isolated from the CLR calls of another session, but that doesn't appear to be the case.

    Any additional tips would be greatly appreciated.

  • Yes, I agree with everything you said and getting to the root cause of this error is what OCTA-3502 is about.  We are working on prioritizing this to be higher in the queue.  We occasionally get that error message in our test environment, and personalIy, I find it VERY annoying!

    Regarding the AppDomain, yes, we use the default AppDomain and there is only one.  However, most other things internal to the CLR bridge are NOT shared between the sessions.  We DID need to make these changes when we incorporated the CLR Bridge into PASOE.  So I think we are OK with that.  We don't modify the AppDomain or rely on it for anything that would differ between PASOE sessions. And hopefully PASOE ABL code is not mucking with it either!

  • >> We are working on prioritizing this to be higher in the queue.

    So I'm hearing there isn't a way to troubleshoot on my end?   Alternatively, how should I bundle up this issue to send it over to tech support?  It happens so infrequently in in production that I will have a very hard time coming up with a consistent reproducible.  I may be able to refactor our use of the CLR bridge so it doesn't happen in the PASOE connect procedures (whereby it is causing users to get error messages that appear almost identical to connectivity or authentication failures).  But before I start refactoring this code and moving it around, I'd like to know where to move it so that it is less likely to raise the error messages.  

    >> Regarding the AppDomain ...

    Any static members in the CLR that an ABL programmer interacts with will also affect the other ABL sessions as well.  It is worth documenting this at the very least, since it might be unexpected and unintuitive to a programmer.  In contrast, on the ABL side of things all the static members of ABL classes are isolated within the context of the individual ABL session.

  • I'm wondering if these error messages from the CLR bridge are somehow correlated to the amount of load that is placed on the PASOE agent.  I really don't remember any occurrences of this issue prior to the recent deployment of a new application.  

    Perhaps there is a way to synchronize the calls to the CLR bridge so it becomes less busy.  With some synchronization, the bridge will think there is only one ABL session running in the process at a time.  The REST methods (called via the WebClient) normally take only about five milliseconds but, under heavy load, they might take a bit longer.  I suspect that whenever the PASOE agent is under heavy load, there is a greater possibility of overlapping calls to the CLR bridge.

    Hopefully I will be able to find a workaround.  If synchronization is the key, then I should only need to synchronize the ABL sessions within a given agent.  Perhaps I can create a table (CLR_IS_BUSY_RESTING) that has a primary key based on the process ID of the agent.  Prior to using the CLR bridge, I will create/lock a record associated with the process ID.  Then I'll call the REST methods using the CLR bridge.  Then I'll release the record again at the end.

    It certainly isn't a "pretty" solution but I don't have any other ideas at the moment.  The REST method calls are pretty critical to our PASOE connection procedure.  Hopefully it won't add more than one additional millisecond to do the synchronization. It should still be faster than using OpenEdge.Net.HTTP for these REST methods.  (I use that API in other places but our PASOE connection procedures require really fast performance that I can't seem to get without using the .Net WebClient).

  • Regarding reporting this to tech support, you really don't need to provide a reproducible to tech support in order to improve the error message.  As I said, there is already a bug and we also have a fairly high-priority "feature" to improve some of our error messages in order to avoid tech support calls.  This fits right into that category!  So the call to TS would really be to argue for increasing the priority.  But of course, if you did have a way to reproduce, that would be very helpful in diagnosing your actual problem!

    In regards to trying to synchronize things on your own, I thing you are going down a difficult and dangerous path!   I would not recommend it.  I will reiterate: the CLR Bridge really does not share any data between the session threads.  There is a separate AssemblyStore for each thread and as far as we know, anything that stores data has its own instance for each thread.  The shared AppDomain should not affect anything.  We do not interact with it/modify anything in it.

    Regarding the load factor: The error that started this whole discussion (Error loading the .NET runtime. (14081)) happens when we first need to initialize the CLR Bridge and that happens on the first call to anything in .NET.  Are you surmising that it could fail when one session is first initializing the Bridge and another one has already initialized but is now trying to make some .NET call?  I really can't comment on that.  I can't offhand see why this would be a problem.  But I can't say with any certainty that it isn't related.

    Have you tried using -preloadCLR?  Maybe that would help.  That will cause the initialization to occur when the session starts up.

  • >> The error that started this whole discussion (Error loading the .NET runtime. (14081)) happens when we first need to initialize the CLR Bridge and that happens on the first call to anything in .NET

    When you say the "first call to anything in .NET", then are you referring to the first call in the entire life of the agent process?  That doesn't seem relevant.  That had happened many days ago and there have been many tens of thousands of calls to .Net methods since then.  But it appears that we are still encountering the same intermittent message in the logs "Error loading the .NET runtime" in association with that same MS-agent process ID.  We churn thru the individual ABL sessions daily, because they are restarted on a regular basis (eg. when they become idle, are trimmed, and then are started when needed again ).  But the outer agent process has been running over the course of several days.

    Given that the appdomain was loaded and initialized many days ago, it doesn't make sense for us to be getting the message  "Error loading the .Net runtime".  I suspect the message is trying to describe a different problem which is ABL-session-specific.  I still think that the new ABL sessions might be having trouble hooking into the pre-existing .Net runtime.

    As far as the load factor goes, I'm wondering if there is contention between ABL sessions as they try to hook into the pre-existing runtime.  The more rapidly the ABL sessions are started, the more likely we might encounter errors?

    Will the parameter which you referenced, -preloadCLR, affect the behavior of all the individual ABL sessions, or does it only impact the outer agent process (by initializing the CLR app domain on a one-time basis)?  That parameter was referenced in the forums earlier, and the developer said it didn't change matters and they continued to see this same message ( see community.progress.com/.../34331 )

    I have opened a tech support case on this, given that I'm a long way from finding the root cause on my own.  I'd like to at least have a work-around that prevents these errors as much as possible, since I think they are disruptive when a few of our users encounter them each day.  My tech support engineer wants me to supply a consistent reproducible and I still don't have one.  Do you have any clues about how I might create a reproducible, even an artificial one?  I was going to focus on my theory that this is related to a synchronization problem, but you don't seem convinced.  Based on your experiences of this message, did you have any theories about how to recreate it on demand?

    As a side, I also opened another tech support case (00470446) about a substantial memory leak in the ms-agent process that seems to be related to the CLR bridge.  It is pretty clear that there is a memory problem, given that the CLR managed memory dump can be opened in the VS debugger and we can see hundreds of rooted references (rooted via Progress.ClrBridge.ProMarshal).  Currently we are killing agent processes only once a week.  But as we migrate more of our applications from "classic" to pasoe, we are probably going to need to do that daily.

  • > Do you have any clues about how I might create a reproducible, even an artificial one?

    See if you can provoke the error to occur more often, by any means imaginable?

    Maybe add some procedure that makes very many calls to the CLR bridge.

  • >>>When you say the "first call to anything in .NET", then are you referring to the first call in the entire life of the agent process?  

    I mean the first call from a particular session, not the whole agent process.

    >>>Given that the appdomain was loaded and initialized many days ago, it doesn't make sense for us to be getting the message  "Error loading the .Net runtime".  I suspect the message is trying to describe a different problem which is ABL-session-specific.

    Yes, that's what I already surmised - that it is probably not about the .NET runtime at all.

    >>>As far as the load factor goes, I'm wondering if there is contention between ABL sessions as they try to hook into the pre-existing runtime.  The more rapidly the ABL sessions are started, the more likely we might encounter errors?

    We really can't answer this until we know what the problem is.  If we could get a better error message, we would at least have some clue!

    >>>Will the parameter which you referenced, -preloadCLR, affect the behavior of all the individual ABL sessions, or does it only impact the outer agent process (by initializing the CLR app domain on a one-time basis)?  That parameter was referenced in the forums earlier, and the developer said it didn't change matters and they continued to see this same message ( see community.progress.com/.../34331 )

    This would affect the behavior of each ABL session.  i.e., When a session starts up, it would to this initialization.  However, after the first session, the .NET framework would already be loaded into the process, so it would only be other internal initialization that would occur.  The loading of the framework happens kind of automagically!  We don't actually have code that does it.  We just call into the CLR bridge, and voila, there it is (or not!).

    >>> My tech support engineer wants me to supply a consistent reproducible and I still don't have one.  Do you have any clues about how I might create a reproducible, even an artificial one?  

    I'm sorry, but I really don't.  So I think the first step is to prioritize getting that message fixed.  If the TSE balks, you can tell him/her to talk to me!  

    >>>As a side, I also opened another tech support case (00470446) about a substantial memory leak in the ms-agent process that seems to be related to the CLR bridge.  ...

    What version are you on?  We just fixed 3 different .NET-related memory leak issues in 11.7.4 and an 11.7.3 hot fix.  Could be one of these.  Or something else?? :-(

  • We're running 11.7.4 already.  Thankfully the memory issue will be an easy one to reproduce and submit, unlike the message "Error loading the .NET runtime".