Multi-threading -- It's time - Forum - OpenEdge Development - Progress Community
 Forum

Multi-threading -- It's time

  • I do think we need to recognize that multithreading addresses more than  one need.  From a quick glance, the purpose of the TPL offering is true  parallelism, i.e., taking a big compute problem and breaking it down  into multiple parts so that the computation can complete faster since  each part is being run on a serparate processor or core.  While there  are some problems that might fit in that category, I don't know that it is particularly typical, especially if one separates the class of problems where the data to be processed reside in the DB, so retrieving the data is the actual primary bottleneck.

    But, these problems are somewhat different than the problems which I am thinking about in needing multi-threading.  They don't so much deal with compute intensive requirements as they do the need for asynchronous execution.  E.g., the problem with the lack of multi-threading in socket handling isn't the huge amount of processing that is needed to deal with a received message, it is the fact that one needs to keep listening while that processing goes on, however long and intensive it might be.  Yes, there are times where that processing might be a big deal, but even when it isn't, we have a problem because a single-threaded process can't be both listening and doing at the same time.  Likewise with UI requirements.  They may not be compute intensive at all ... in fact, a common requirement is for a thread to be getting the next batch of data in the background while the user continues to interact with the screen in the foreground.  That background task actually spends most of its time doing nothing because it is waiting for the disk IO on the remote system and the network transfer of the data.  But, a single threaded session can't be simultaneously waiting for the network response and paying attention to the user.

    It is conceivable that we could accomplish some goals if we had real interrupts ... by which I mean that the program could be working on a task and there was the ability to interrupt that task, save the current state, handle some other event which had occurred, and then resume the original task.  But, that requires queueing of events in order to wait for appropriate save points or to be able to nest interrupts multiple levels deep.  This actually sounds harder than the proposed "simple" version of multi-threading and less flexible.

    So, what would we need to implement this proposal?  We need a new "glue" component which goes between the threads (or sessions) such that one thread can run a program or internal procedure in another thread and pass parameters.  If we are dealing with separate sessions, then we have the problem of passing things like temp-tables across a thread boundary ... which we can already do with AppServer, but one would like to avoid that much overhead.  If we are dealing with threads, not processes, it seems like it would be reasonable to have some common data area.  While that sounds like a good way to shoot oneself in the foot, I don't know that it needs to be a problem.  If thread 1 defines a variable X, then the value of X could live in a shared data area without conflict as long as there is some kind of lock or broker so that only one process is modifying the data area at one time.  Make them separate data areas and you don't need that, but it does seem like it would be desirable to have a keyword which would allow sharing of data structures like temp-tables across threads.  That is definitely going to require a bit of coordinating, but doesn't seem that hard.

    Then, I suppose we need a queing feature so that multiple threads can push events to other threads and there is a way for the receiving thread to process those in order.

    What else?

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice  http://www.cintegrity.com

  • >>  each "thread" could, in truth, be a process

    Well, Tom, we can run a separate process from ABL code right now. Actually we can even organize interprocess communication using either file system or sockets however that's not multithreding. Anyway I agree I like an idea to have some syntactic sugar to do that easier than now. Moreover, I'm starting thinking if new language constructs allow me easily to run a well-incapsulated procedure without its rewriting, I'll be happy.

  • I am wondering if it is not safe to switch to other env for new developments. I would not embrace all object stuff from .Net because it is very difficult to successfully design ua large scale data oriented software with it. But I think datasets oriented .net software is becoming worthwile, along with the speed of SQL server.

  • Spawning a process is not really acceptible because the start up performance is punishing.  Spawning a thread is a very lightweight action.  Communication via the DB and filesystem are not acceptable.  Just too slow and requires polling, not event driven.  I would accept separate processes if we could spawn them to create a pool and then have some high quality communication between them, although I have concerns about the load on the processor if we suddenly have four or five times as many processes on the system.  But, we need that high quality interprocess communication for any scheme.  And, that communication has to provide the event queueing and such that we can't do for ourselves.

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice  http://www.cintegrity.com

  • What if the process is just a "worker" process spawned once (not a hundred times) and used across the primary session when needed?

    E.g.

    DEFINE VARIABLE hWorkerSession AS HANDLE NO-UNDO.
    CREATE SESSION hWorkerSession CLONE THIS-SESSION. /* CLONE = use all the same parameters as the current session; there would be a way to specify completely different session settings */

    RUN MyPath\MyProcedure.p ON hWorkerSession ASYNCHRONOUS (someParameter).
    /* or in some procedure far from the spawned session creation: */
    RUN MyPath\MyProcedure.p ON SESSION:FIRST-SPAWNED-SESSION ASYNCHRONOUS (someParameter).

    /* or even */
    RUN MyPath\MyProcedure.p ON SESSION:FIRST-AVAILABLE-SPAWNED-SESSION ASYNCHRONOUS (someParameter).

    The CREATE SESSION could be wise enough to spawn the session in a different core (if one exists) than the primary session.

  • By the way, for potential voters, the enhancement request number for threading is #3332.

  • One of the things that I have heard repeatedly is that there are a lack of real world use cases where threading is absolutely essential. It's often very hard to articulate those use cases without good examples, but I am working on something right now where multi-threading is really the only viable solution to the problem.

    In a posting this week on my blog at http://www.thesoftwaregorilla.com/2010/04/exchange-web-services-subscriptions-and-notifications/, I described the way that Microsoft Exchange subscriptions work. The diagram in that posting (http://www.thesoftwaregorilla.com/wp-content/uploads/2010/04/Exchange-Web-Service-Subscriptions.jpg) shows the flow that I am working on. There are two places where multi-threading is not just nice to have, but essential for scalability.

    Both of the examples I am using are illustrative of a common pattern of enterprise application integration.

    The First Example - the Exchange Acknowledgment

    The first is in the way that my Java HTTP Servlet deals with the messages from Exchange. Exchange sends a notification to the servlet (2.1 in the diagram) that indicates that an item in the Exchange store has changed. The servlet has to respond for Exchange to maintain the subscription, and in responding Exchange knows to go on to the the next message in the queue. So I need to send the ACK back to Exchange in the HTTP response (2.2 in the diagram).

    The notification that I received from Exchange only tells me that the item has changed; there are no details about the item, so the Servlet calls Exchange (2.3) to ask for the information about the item. There is no need for Exchange to wait for 2.3 to happen before 2.2 happens. Exchange should send me the next notification as soon as possible. But I do need to make sure that 2.3 happens.

    So the way it works is that 2.3 is running as separate background threads. When I get the notification in 2.1, I put it into a queue and ACK it (2.2) so that that Exchange is free to send me the next message.  The background thread picks up items in the queue and persists them so they are not lost (I'm probably going to do this on the same thread as the ACK so that the ACK is only returned once the notification is persisted - guaranteed message delivery). It then notifies another background thread (item retriever) that reads the persistent store, retrieves the item from Exchange, and persists that result with the original notification. Exchange can, in the meantime, have sent more messages because I already responded. So as a result, my listener is not a bottleneck.

    Another thread is responsible for listening to messages from the item retriever. When it receives a notification that we have all the information related to a message, this thread makes a call to the OpenEdge AppServer (2.4) with the information about the item.

    The question is, "Could this have been written synchronously?" The answer is, Yes, but the performance would suck and you make the Exchange Server wait until the message has gone all the way to the ABL before you can respond to Exchange. In the meantime, Exchange thinks your listener has died and retries.

    This code is so critical, that there is no way that I can write it in the ABL. I have to write this code in Java because the ABL just can't do this.

    The Second Example - the ABL call

    Once the call in 2.4 takes place, we are inside an ABL AppServer agent. There is a bunch of stuff that has to happen now:

    1) The item that I have received needs to be validated to determine if I need to store it.

    2) I need to store it to the database.

    3) I need to notify the application that there is an item change that it cares about.

    4) The application needs to deal with the item change as it sees fit.

    Again, storing the information to the database is the only thing that need to happen before I should release the Java thread in the servlet that is waiting for the call to happen. But the OpenEdge AVM session is the only place where I can notify the application that this information has come in, and I have no way of knowing how long the application is going to take to process this item. So instead, I hold on to the Java thread that called me until that processing is complete.

    Ideally I should be able to accept the item, persist it, notify a background thread, and acknowledge receipt. The background thread should then notify the application and its processing should be something that the Java service does not have to wait for.

    Lest you think this example is contrived

    I was working at a Progress customer who shall remain anonymous (we'll call it ABC) a few years ago where this exact issue caused the customer to lose support from one of its vendors (XYZ). XYZ was providing the ABC regular updates on things that were going on with ABC's customers. ABC needed to react to these events in real time so that the customer satisfaction level could be high on a new product stream. ABC also needed to do quite a lot of processing when it received the event from XYZ. Several different systems were involved in the process.

    ABC decided to process the entire event within one OpenEdge AppServer call and the result was that XYZ was waiting as long as 3 minutes for the ACK to come back. Ultimately, XYZ downgraded ABC's usage of their services and refused to meet a 30 second SLA because ABC could not release its resources in time.

    Now there are definitely things that were wrong with ABC's architecture. ABC could have built this architecture with multiple processes, and polled database tables and all kinds of stuff to work around the lack of multi-threading. But right there is the issue. It's a work-around.

  • Fine, but we need the "glue" to enable those threads to talk to each other ... queuing of events and the like.

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice  http://www.cintegrity.com

  • Rather than thinking your example contrived, what I can't figure out is why the use case here isn't abundantly obvious.  Any time there is a service in which some time is required to fulfill a request, one would like the service to receive a request, spawn a thread to fulfill the request (or pass the request to an existing thread), and go back to being ready to receive the next request.  Especially if the time required for a response is highly variable.  Especially if the requestor needs an acknowledgment that the request has been received.  Examples of wanting this are abundant.

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice  http://www.cintegrity.com

  • tamhas schrieb:

    Rather than thinking your example contrived, what I can't figure out is why the use case here isn't abundantly obvious.  Any time there is a service in which some time is required to fulfill a request, one would like the service to receive a request, spawn a thread to fulfill the request (or pass the request to an existing thread), and go back to being ready to receive the next request.  Especially if the time required for a response is highly variable.  Especially if the requestor needs an acknowledgment that the request has been received.  Examples of wanting this are abundant.

    Excuse my stupidity. Either I'm not getting your point or there's something missing in that description: But isn't what you've described here exactly the way the AppServer is working?

  • Rather than thinking your example contrived, what I can't figure out

    is why the use case here isn't abundantly obvious.

    > Any time there is a service in which some time is required to

    fulfill a request, one would like the service to receive a

    > request, spawn a thread to fulfill the request (or pass the request

    to an existing thread), and go back to being ready to

    > receive the next request.

    The contrived part of the argument is when product management repeatedly

    asserts the need for a use case.

    --
    Tom Bascom
    tom@wss.com

  • Let me just clarify something about my last post. The reason I said "Lest you think this example is contrived" is that I am very aware that there are architectural ways to work around the Exchange issues.

    Certainly, having a good MOM in the mix with an ESB on top of it can significantly mitigate the problem and probably do away with several of the issues. Of course, you are likely to get into a religious war around the issue of using middleware for business logic, but if you accept that is not a problem, you could orchestrate the service calls properly in the middleware.

    The issue in the ABC case was that middleware really would not have solved anything. The compensating transaction issue would have been a nightmare. Even in the ABL Acknowledgement case, this is a messy issue to resolve.

  • Actually, AppServer is another place where we really need multi-threading for exactly the same reasons.  It just isn't efficient to try to package everything into a single .p with the life of a single call.  Works great for some things, but not for all by any means and there is lots of things we can't do with it at all.

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice  http://www.cintegrity.com

  • Actually, AppServer is another place where we really need multi-threading for exactly the same reasons.  It just isn't efficient to try to package everything into a single .p with the life of a single call.  Works great for some things, but not for all by any means and there is lots of things we can't do with it at all.

    I'm pretty sure that product management will need more details than that. The previous reply (the one I did comment on) did sound like a perfect fit for AppServer to me - and still does.

  • No, Mike. I don't think what we are talking about is the same as today. Today, when you call the AppSerber, no matter who the client is, you have to wait for it to finish processing before you receive an acknowledgement. We're talking about being able to spawn a thread and then return.