I'm not sure that my tests are directly related to your comparison of PABLO vs M-S-E.
They are really basic samples trying to illustrate some performance overhead in very specific OO areas, versus a traditional procedural/relational approach. They also compare some results with C#, since our OO experience comes from C#.
Some of these tests, and the help of PSC consulting, were used 2 years ago to guide us build an OO architecture, but the result was more a PDS centric model with a replace of .p by .cls, without using very much all those great OO concepts. M-S-E seems light years ahead of what we have. It is quite a shame that we didn't hear about M-S-E "was/would be" in the work back then... as we will probably not pay again the price of an whole re-architecture.
Phil, when you encountered the 50X performance difference, did it occur to you to report this to tech support or the development team? If there is a genuine language performance problem here, rather than just an issue with the specific implementation, it seems like a report would have been warranted.
In particular, I am wondering if there is a problem that might be impacting other parts of the application. My understanding is that M-S-E delivers BEs and ESs which look and behave very much like regular BEs and collections to the client and that the other parts of the application are presumably good old objects of the ordinary sort. Might they not be vulnerable too? Certainly, they get properties assigned to and read from them.
Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice http://www.cintegrity.com
FWIW, Guillaume, moving to M-S-E *might* be less expensive than you might think since the M-S-E components are produced by MDA from UML models. So, to the extent you could reverse engineer your existing code into UML models, you might be able to get M-S-E output. Unfortunately, you are also on the wrong side of the pond for this to be simple, I think. People on your side are doing the iMo thing, which shares that UML foundation, but which does not appear to be as highly evolved as M-S-E.
It does seem that if you are going to educate people about M-S-E, you are eventually going to have to get around to releasing some sample code.
I may be repeating myself. But I see a much higher demand for PABLO sample code.
So far, my testing doesn't seem to be finding the problem and seems to suggest that the problem doesn't exist unless it is a problem in the specific implementation.
You seem to have some "PABLO" like code to run relevant tests. Can't you share that with us? You know I'm pretty skeptical about PABLO for a number of reasons (performance is only one of them). But I'm open to get convinced of the opposite. By working code - not additional megabytes of posts on this forum.
I may be repeating myself. But I see a much higher demand for PABLO sample code.
Why? The total released code base of M-S-E code is 0 lines. Yes, we know that somewhere in PPS and their customers that it is actually in use, but we can't see any of it. The total publicly released documentation is one UML diagram and scattered descriptions in variious posts, many of which are by me.
You seem to have some "PABLO" like code to run relevant tests. Can't you share that with us?
There are a couple of significant points here.
First of all, a PABLO object is not some complex mysterious thing that we need a ton of documentation to get a handle on. Define a class, add whatever properties it needs, include a few methods if it has behavior, and presto you have a PABLO object. Now, there are a bunch of other things that one needs to figure out to have a complete PABLO-based architecture like where the factory is, who is going to handle persistence, what the DL is going to look like, how the DL and BL will communicate, what one is going to do about before-image handling, etc. Each of these has 5 or 10 different possibilities that might appeal to certain people. A PABLO solution is never going to be a single, one sized fits all monolithic pattern.
Second, the context of this thread is an exploration of possible performance problems with OO in general, not just with PABLO approaches. We have had a claim that PABLO is 50X slower than M-S-E. If that is true and can't be altered by different PABLO approaches then there is no point going to the effort to build any complete PABLO solution because it will inherently fail. So, it is critically important to discover why it is true, if it is true. If it is true, then we need to ask whether a different approach might eliminate the problem area. If it is not true, then we can proceed to explore various options and keep testing.
Thus far, we don't seem to have any enlightenment on the source of the problem which can be measured by a test outside of the context of the test which PPS did. We can't verify any theories. That is critical.
I can publish the tests I have done and reported here, but their trivial. They don't pretend to be a test of entire PABLO systems, but disagnostics looking for the source of this supposed 50X performance problem. I am working on a more complete test which will be closer to a stripped down version ot the PPS test, but it isn't going to pretend to be a model for a complete production approach to PABLO, just a way to verify that one can accomplish all of the required operations with adequate performance. If that works, then all one has to do to get to a production system is to add to it to provide the desired features without adding anything that destroys performanc unacceptably. That's not a 5 minute job, of course, but it is the way that one tests the feasibility of the idea.
I would like to create a new test or tests which we can use to get a better idea what does and doesn't work. Let me say up front that any performance testing we do should be capable of very precise interpretation. Among other things, this means to me that if a complex test produces a surprising result, one needs to break down the complex test into simple tests to find the source of the surprise. If the source of the surprise is not found, then one needs to explain why the combination performs poorly and test that. In the end, one should have a fairly simple test which illustrates the problem and then one can decide whether that is fundamental or whether a design change can avoid the problem.
I have some objections to the PPS test design since it doesn't seem to me to be representative of common tasks, but I am willing to work with it for now just because it gives us a starting point and I think it is important to either verify or refute the finding. I would be very interested in hearing people's ideas for other tests which they think would be meaningful stress tests.
PPS were starting with a complete production model in M-S-E which they then adapted to make a PABLO version. For comparison testing purposes, I don't think this is ideal since production code is necessarily complex and this obscures our ability to see the specific sources of performance variation. Therefore, the tests I would like to do would be to create stripped down versions that do all of the same basic operations, but without all of the specific mechanics. There are, however, some aspects of the test which are not yet clear to me so first I need some clarification.
My understanding of the test is as follows:
1. FILL a PDS with all customers, all orders, and all lines(1)
2. Instantiate a collection for customers and instantiate all customer objects(2)
3. For each Customer in the collection, create an order collection and instantiate all orders.(3)
4. For each Order in the collection:
a. Add a new line. Implies assign of all values.
b Delete an existing line.
c. Modify a value in an existing line.
This appears to imply instantiating two existing lines and one new one.(4)
5. After completion of #4, persist all changes to the database(5)
6. Clean up.
To keep my test simple, I am going to leave out any effort to create a strict DL/BL separation and won't use the M-S-E "decorators" for the DB access. Likewise, there will be no subclasses or their M-S-E equivalents as that is better tested separately.
For the PABLO version, I will create PABLO objects for each object indicated above.(6) I will probably do the persistence by putting the new values back into the original PDS. This is only one of many possible PABLO models, but it is directly comparable to what M-S-E is doing so it seems like the appropriate strategy to use for this test. We can compare persistence strategies in a separate test. Collections will be done in temp-tables.
For the M-S-E-like version, I will create both a BE and a delegate object in the fashion of the current M-S-E(7) and all data read and write will be through the delegate into the PDS. I'll do something to imitate an ES, but that might take a little experimentation.
So, what do I have wrong and/or what am I missing?
Notes and Questions
(1) Not all lines are actually used, but accessing two lines per order is nearly 8000 lines and there are only 13000 lines total, so I'm inclined to read them all. This is an element which could be tested separately for different strategies.
(2) If the transaction scope is the Customer, one could instantiante, process, and delete each customer individually, but the same number of operations are going to be required either way. It appears that nothing is modified in the customer using Sports, which may not be typical of production systems.
(3) Orders also could be created one at a time, but again, the same number of operations is required. There is also nothing modified in the order, again not likely to be true in production systems.
(4) Do all existing orders have two lines? If not, what happens?
(5) Could also be done at the end for all customers instead.
(6) Even though the Customer and Order objects are not modified, they need to exist if one is attempting to model OO interaction since the assignment of Customer.CustNum to Order.CustNum and Order.Ordernum to Orderline.Ordernum is an assignment between objects.
(7) I don't know if PPS' test was done on the latest release or on the earlier accessor method approach.
---- My understanding of the test is as follows:1. FILL a PDS with all customers, all orders, and all lines(1)2. Instantiate a collection for customers and instantiate all customer objects(2)3. For each Customer in the collection, create an order collection and instantiate all orders.(3)4. For each Order in the collection: a. Add a new line. Implies assign of all values. b Delete an existing line. c. Modify a value in an existing line. This appears to imply instantiating two existing lines and one new one.(4)5. After completion of #4, persist all changes to the database(5)6. Clean up.---- So, what do I have wrong and/or what am I missing?
I think you miss an important piece. The collection need to be traversed and the properties need to be returned/accessed to give the full picture of the architecture's performance. This is after all, how the classes are being used both in Business Logic and Presentation.
Isn't the OO version of loop = traversing collections?
Doesn't creating a new BE imply setting all of its properties?
Doesn't modifying a BE imply setting that property?
And, of course, the PABLO version is going to have to set all the BE properties as a part of creating the BE and do something equivalent to reading them all in order to persist the results.
Seems like this is covered ... what am I missing?
Mind you, I agree that a more real world test would involve a lot more execution of logic. It might be interesting to find out if simple accessing the properties which are just "right there" in a PABLO BE versus having to fetch and set them through the delegate had performance implications. But, I think that is something which could be easily and appropriately tested in its own test.
tamhas wrote:Isn't the OO version of loop = traversing collections?Doesn't creating a new BE imply setting all of its properties?Doesn't modifying a BE imply setting that property?And, of course, the PABLO version is going to have to set all the BE properties as a part of creating the BE and do something equivalent to reading them all in order to persist the results.Seems like this is covered ... what am I missing?Mind you, I agree that a more real world test would involve a lot more execution of logic. It might be interesting to find out if simple accessing the properties which are just "right there" in a PABLO BE versus having to fetch and set them through the delegate had performance implications. But, I think that is something which could be easily and appropriately tested in its own test.
I actually thought this was more than one test or at least more than one measure point. Yes, you are already traversing collections and setting the properties, but I would include read performance and get of properties (and not by the implied PABLO read which is implementation specific) also in a single test, if the intention is to get an idea of the overall performance of an architecture. (probably before I tested delete).
I have done and previously reported a simple test of just creating a bunch of objects, assigning properties, and reading the properties. It was very fast. As things move along, I should categorize and create a reporting page for these things on OE Hive or something and include the code.
In the current context, though, the reason I am proposing this specific test is that Phil has reported a 50X advantage of M-S-E over PABLO in a test run by PPS. I haven't been able to identify any single component of that test where I can find a performance issue which would explain that difference. The only remaining areas that I can think of which are not tested are the specifics of the before-image handling and some architectural problem in the implementation, neither of which I can test or diagnose without their code. Therefore, I am proposing the current test as a simplified version of their tests with all of the same basic elements, except a different handling of the before image issue. If this test shows a major performance issue, it should be straightforward to instrument it and discover the specific source(s) of the problem and to devise a really simple test to illustrate the problem. Then, we can consider whether it is avoidable or intrinsic. If the test does not show a major performance issue, then we will have to conclude that the result of the PPS test was due to an architectural issue with their PABLO implementation (BI or otherwise) and that it is not an indicator of a necessary problem with all PABLO implementations. One can then move forward to figure out a production PABLO architecture, testing alternatives as one goes on.
I'm working on my own benchmark code, both versions simplified to essential operations so that we can more easily see and measure the source of any performance issues. This should either make the source of the problem clear or illustrate that there is no problem in the simplest case. If the latter, then it will be a question of adding things in gradually to approach a more full production implementation to see where the problem arises.
I do have a couple questions about the test itself in order to try to do the same thing.
Modify a line and delete a line seems to require two existing lines. A bit under 700 of the orders have only one line. So, what happens to those?
For each customer one needs a collection of orders. For each order one needs a collection of lines. Does M-S-E clear and re-use entity sets or delete and create for each? I am going to assume the latter since it seems cleaner. Whichever it is, both could do the same.
Is there any basis for selecting what lines to modify or delete? I'm going to assume modify the first and delete the second.
Based on that assumption, for one line orders I will modify the one that is there and skip the delete.
I'm going to do the initial PABLO implementation using a PDS factory in the DL in order to start with as comparable a foundation to M-S-E as possible. This means that I will use the PDS TRACKING-CHANGES to handle the updates, just like M-S-E presumably does.
If that performs acceptably, I will then explore some other techniques for the DL and measure them against the same baseline.
While I won't be trying to imitate the BI within the BE that you used in your test, I would be interested in hearing something about how that was handled. Might we see a copy of one of you PABLO BEs? Since these are not a part of CloudPoint, I wouldn't think the same issues applied.
Also, can you say something about the process you used for persisting the data given that you have a BE with both BI and current data?