I've been having some conversations recently in which, over and over again, the limitations we face from not having multi-threading keep coming to the fore.
Consider that, without multi-threading, one can't reasonably write a socket server that will handle multiple clients. If you try to, one is certain to be busy handling the message from one client when a message from anothter client arrives. The only workaround is to do something like poll each client for work in rotation ... a really horrible compromise.
And, without being able to write a multi-client socket server, one also can't run multiple single threaded sessions and have them act like a multi-threaded process because there is no mechanism for having them communicate with each other.
Some would say that one can connect them with an ESB, but consider how much overhead one is adding in encoding and decoding messages, routing, guaranteed delivery, and the like for what should really be a very intimate connection.
Think what one could do with a multi-threaded AppServer! An agent could accept a request for a long running piece of work, spin it off into a thread, and then go back to accepting new requests! That could allow us to significantly alter the use of AppServer agents because one or a pool could be dedicated to a particular type of request and it would no longer be evil to bind to a specific agent for a series of related requests because you wouldn't be blocking anyone.
Think what one could do with a multi-threaded service on an ESB. Again, it could aceept a request, spawn a thread to deal with that request, and go back to listening for new requests. One service could do the job of a whole pool of services.
And, of course, there ae numerous applications associated with UI, not the least of which is being able to articulate to the .NET multithreaded architecture in ABL GUI for .NET and thousands of variations on doing some kind of background process, like fetching the next batch of data, while the main thread continued to pay attention to the UI itself.
It just seems to me that this is an area in which ABL is increasingly at a disadvantage compared with other languages and an area which is going to drive people to have to write 3GL handlers to have any hope of tackling many types of jobs.
And, given the need for many of these kind of behaviors to provide scalability, isn't V11 with its cloud computing theme the right time to do this?
Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice http://www.cintegrity.com
Completely agree! Moreover, I think that the right time came long time ago. In addition to your cases I can mention that one can't call .NET code from ABL if it uses multithreading.
On the other hand I'm afraid it's not a piece of cake for PSC to re-work their AVM for multithreading, that's why they are delaying its implementation.
Frankly, I think it's too late. Multi-threading is not just a nice to have anymore, and it hasn't been for a very long time. The fact that we do not have multithreading in the AppServer makes it completely unscalable.
When I start up an AppServer, I have to build caches of information that are available for subsequent calls so that the AppServer does not need to go to the database for every call. This is not optional - it's essential. In Glassfish or Tomcat (Java AppServers), I spin off a thread at startup to go and build those caches. There is no notion of an agent process with Glassfish or Tomcat - the agent is a thread. This means that each agent has access to a common cache and there is no processing associated with gaining access - it is all in-process. With OpenEdge, I have to build that cache **for each agent** which means that it significantly increases the amount of resources used by the AppServer.
Moreover, with Glassfish or Tomcat, an update to the cache is available to all agents. With OpenEdge I have to jump through hoops to make that possible.
I could go on an on. Collection implementations without multi-threading means that the user has to wait while a collection's underlying structures are cleaned up after the deletion of an item from the collection. Of course, we would not need to implenent collections ourselves if they had been implemented in the language for us in the first place.
I have a specific use case right now where I really need multi-threading. After a record is written to the database, I need to call a WebService. There is no need for this to happen inside the transaction, but it has to happen as a result of it. If I had multi-threading, I would create an object with the information that needs to go to the WebService, and put it in a queue to be processed, and then return control to the UI. In the background, a worker thread could pick up the object from the queue and make the WebService call for me and write the updates that come from it to the database. I can't do that. I now have to wait for the call to execute before control can return to the client.
Unfortunately I don't hold out much hope of this happening.
1) Retrofitting multi-threading to the AVM is going to be a big job;
2) The problems associated with Multi-Core make this job even more complicated than it used to be, and that is something from which Progress would have to shield OpenEdge developers;
3) The return on investment from Progress would not be significantly measurable.
The problem is that without it, I am better off writing large chunks of my code in Java/.NET and treating Progress as the back-end system responsible for maintaining my database an referential integrity.
On the plus side, presentations I've seen have detailed plans to implement a shared cache across multiple appservers as part of the "multi-tenant" db upgrade scheduled for release in v11.
On the minus side - v11 is still a ways away.
timk519 wrote:When I start up an AppServer, I have to build caches of information that are available for subsequent calls so that the AppServer does not need to go to the database for every call. This is not optional - it's essential. In Glassfish or Tomcat (Java AppServers), I spin off a thread at startup to go and build those caches. There is no notion of an agent process with Glassfish or Tomcat - the agent is a thread. This means that each agent has access to a common cache and there is no processing associated with gaining access - it is all in-process. With OpenEdge, I have to build that cache **for each agent** which means that it significantly increases the amount of resources used by the AppServer.On the plus side, presentations I've seen have detailed plans to implement a shared cache across multiple appservers as part of the "multi-tenant" db upgrade scheduled for release in v11.On the minus side - v11 is still a ways away.
If this is implemented the way that it was being spoken about 6 years ago when we had all the problems with Dynamics' AppServer performance, it's a kludge. The idea was that the broker would hold a cache of data that it would share with each of the agents and that it would be made available to agents as they were activated. No matter how you cut it, this results in inter-process communication and it is going to be slow. Much slower than a cache that is available inside the same process that has thread synchronization in place to prevent deadlocks.
Multi-threading the AppServer is the right solution.
Of course it's dangerous to discuss features that are not yet released and I'm therefore not at liberty to say much other than, No, the current thinking is not to do an implementation along the lines you summize.
OK, so now we know one thing you are probably not going to do .... I don't suppose that you have any hints on what you might do?
Or, at least, do you recognize the problem and how important it is?
Actually I doubt that it is all that hard. It certainly doesn't require a major rewrite of anything -- just the addition of a little syntactic sugar to the 4GL to allow coordination of threads. Then just instantiate multiple copies of the current AVM as threads within a session. The first one could be the UI thread and all others would be "batch" threads (run as if -b were specified). Technically it's not such a big deal.
I don't agree with you on this one. This is definitely very, very hard for OpenEdge for a number of reasons.
1) Most of the core language is still in C and that means that the object nature really doesn't exist. Threading is hard enough to do in Java where everything is an object and all your garbage collection is taken care of for you. In C, threading is rocket-science.
2) Progress does its own memory allocation which is both a good thing and a bad thing. The problem is that it was not written to be multi-threaded so any work you do on multi-threading the AVM requires that the memory allocation be multi-threaded and carefully tested in a multi-threaded configuration. That's a very big job.
3) 2 applies to several other key resources, including database buffers, queries, etc. They all have to be multi-threaded if you plan to multi-thread the AVM.
4) Threading is one thing. Parallelism is another. Now that we have multi-core machines, the problems associated with threading are exponentially more difficult to handle because context switching is complex enough on a single core, but context switching across cores is extremely expensive and can actually degrade performance. Processor affinity is a very key concept to understand and it does not come for free in the OO languages. How much worse is that in C?
So multi-threading is not just syntactical sugar. This is a huge change to the way that the AVM works. I think it is probably a 3 year development effort alone, and there are language constructs that would need to be deprecated to support it. Don't underestimate how big this is. It's probably bigger than replacing the database engine.
I get why Progress has not done this. It's a much bigger development effort than the GUI stuff. The problem is that if they had set up a task force to do it back in the late 90's when the idea was originally gaining traction after the deployment of AppServer, it would be well-baked by now. The problem Progress faces now is just how much life does OpenEdge have left in it? 20 years ago I would choose Progress automatically if I was building a new application. There was no question about it. 10 years ago, Java was starting to show, but you could always argue that database and AppServer work with Java were much harder than with Progress. Today, business logic to database is the only place that OpenEdge really has an advantage, and then the problem you face is scalability of the AppServer.
Now if Progress would go back to viewing OpenEdge the way that it has always been successful - that is, as a data manipulation language and a great platform for business logic - the money would be invested in the right technologies that would make building the world's best business applications possible. It starts with the AppServer and database.
I wonder if there isn't a middle ground here. Tom's concept is certain simple-minded multi-threading in that the threads might as well be in different sessions, but that alone would be a major leap forward. I don't know that we need to share things like database buffer and such. Each process could really be in its own memory space and the only communication would be through parameters and events. Yes, there would need to be some glue in the middle because each thread is so agressively single-threaded so we would need structures for making sure that an event or return hung around until the session it was targetted at was in a proper state to recieve the action and that implies queuing, but I think this could be simpler than you outline. It would have limitations, of course, but would still be a major leap forward.
I think you're right. I think there is a middle ground.
The other place this would need to be carefully thought out is in supporting multi-threading in the .NET GUI. Background threads in the CLR will need to communicate with background threads in the AVM. That is going to create some interesting language questions.
Of course, the new Task Parallel Library (TPL) in .NET 4.0 is going to add a whole new wrinkle when .NET 4.0 is eventually supported sometime in the future. This is one area where .NET has taken a big jump ahead of Java. TPL significantly reduces the complexity of multi-core parallel programming.
It starts with the 4GL. The 4GL has always been the core reason why people use Progress. Letting the 4GL stagnate has always been the root cause of the various ills suffered from time to time. All else flows from the 4GL. If the 4GL is healthy and robust then the rest of the ecosystem thrives. If the 4GL is swept into a corner angst and unhappiness ensue.
As you and Thomas have recognized I am advocating a compromise approach to multi-threading. I think that it is viable -- today's hardware is more than capable of supporting such an approach. From the 4GL programmer's perspective it should not manifest as a major change -- it really should just be a little syntactic sugar.
While I'm at it... each "thread" could, in truth, be a process. It only needs to look like a thread, it doesn't have to actually be one. That might even have some advantages (think of Chrome's solution of running browser tabs in their own process). That would make it even easier for PSC.
The problem with treating it as a separate process is the same problem we face with AppServer - which is why I raised it. Sharing a memory store that is out of process is difficult and a performance penalty.
One of the things I want to be able to do is have parallel FOR EACH statements running that update a common temp-table for data extracts. Ideally, I would like these to run on separate cores so that I can have true parallel processing, not just multi-threading.
Microsoft has messed up a lot of stuff, but they have come up with an idea in .NET4.0 that I think we could really benefit from.
The following is directly lifted from a paper that MSDN has published that is available at the following blog: http://blogs.msdn.com/pfxteam/.
The Parallel class’s ForEach method is a multi-threaded implementation of a common loop construct in C#, the foreach loop. Recall that a foreach loop allows you to iterate over an enumerable data set represented using an IEnumerable. Parallel.ForEach is similar to a foreach loop in that it iterates over an enumerable data set, but unlike foreach, Parallel.ForEach uses multiple threads to evaluate different invocations of the loop body. As it turns out, these characteristics make Parallel.ForEach a broadly useful mechanism for data-parallel programming.
This kind of thing would need to run in separate threads.
The funny thing is that the AVM actually does spawn more than one thread. How else could it do asynchronous AppServer? So exposing that kind of functionality that way is possible. I think it should be possible to find more use cases for specific things like this. Sockets, for example, need to have limited multi-threading. So do queries, streams, webservice requests, and so on.
Matching a .NET thread to an ABL thread when it was .NET that spawned the thread, not ABL could be one of those non-trivial things you were alluding to!
While I believe in planning with a long term vision, this is one area where I would be happy for them to do *anything* to get the ball rolling. I don't think we can solve the problem with truly separate sessions alone since they would then need to communicate and, without the communications glue, one would be stuck again with multi-node communication without multithreading, which I think doesn't work.
But, the idea of gluing together what amount to independent sessions, each running on a thread within a single process, and provided with some communications glue is something that I think would be a meaningful step in the right direction without being all that complicated. One does need to handle queueing of events and returns in that glue, since the individual sessions can't, but I would think that was pretty standard stuff and something that the engine crew knows inside out. And, it doesn't take a major language change since it is really just a local version of an asych AppServer run. In fact, I think I could handle the idea that each session was sufficiently independent as to have its own DB connection.