Hi Bryce,
I realized I didn't quite fully address a couple of issues:
The bytecode performance is the most interesting to me. Exupery does not yet do dynamic method inlining which explains Strongtalks strong send performance. Message inlining is not necessary for a 1.0. That the bytecode numbers are so close, and I know Exupery's weaknesses, is interesting. Exupery uses a colouring coalescing register allocator but also lives with Squeak's object memory and could do with a bit more tuning. I'm guessing Strongtalk's object memory is much cleaner and better designed for speed based on reading the Self papers. Did the Strongtalk team stop tuning for bytecode performance after they passed VisualWorks?
we still have to figure out exactly what is meant by 'bytecode', and for what benchmark, but I'll try to guess a definition for what you are basically talking about: the performance of the generated code for primitive operations, independent of the effect of sends and any inlining.
Although I haven't yet seen a benchmark I would trust, in that respect Exupery probably has a better code generator. The Strongtalk one is virtually untuned, and does just a few basic optimizations. You have to realize that Strongtalk was just gotten running, we just got it fairly stable, tuned for a few benchmarks, and it was frozen at that point. Robert Griesmer, who wrote the code generator, was already working on a better one to replace it, and that work was frozen mostly done, but needs to be finished and put in place (the new compiler was running, and I believe can actually be turned on, but it just was starting to work for bigger than snippets). So the interesting thing is that Strongtalk is getting its performance in spite of a very simple compiler. Even the new compiler wouldn't be doing anything as fancy as you are.
If you want to take full advantage of a better code generator like yours, it really helps to have inlining. Sends are so much more frequent in Smalltalk than in C++, that there isn't much to do between sends, on average. So you should really want something like type-feedback; it would magnify the benefits of your nice optimizations.
- I want to qualify something I said: I said "An inlined send takes 0 time". That is often true, but not always. The call itself obviously takes 0 time, but the class check can't always be removed. But often it is, and both the class check and the call can be eliminated (the class check only has to be done once per receiver(s) per inlined nmethod).
-Dave
-----Original Message----- From: vm-dev-bounces@lists.squeakfoundation.org [mailto:vm-dev-bounces@lists.squeakfoundation.org]On Behalf Of Bryce Kampjes Sent: Wednesday, September 20, 2006 2:42 PM To: vm-dev@lists.squeakfoundation.org; exupery@lists.squeakfoundation.org