Hi Bryce,
-----Original Message----- From: vm-dev-bounces@lists.squeakfoundation.org
Hi David, The bytecode benchmark is a prime number sieve. It uses #at: and #at:put:. The send benchmark is a simple recursive Fibonacci function. Both are just measures of how quickly they execute, neither really measures the actual bytecodes or sends performed. They are the old tinyBenchmarks. I'd guess everyone ran the same code for these benchmarks.
That's fine, it's just that we need to actually run these benchmarks right- with different architectures, clock speeds etc. I don't think we know the relative performance yet.
I 100% agree that inlining is the right way to optimise common sends and block execution. [...]
Ok, I was just trying to say that in Smalltalk, a mediocre compiler with optimistic inlining is better than a great compiler without inlining. As long as you are headed in the direction of optimistic inlining, we are in agreement.
I just want to re-emphasize the importance of "optimistic", which implies the ability to deoptimize, not just the ability to inline. Inlining the common case non-optimistically (i.e. with an 'else' clause containing the non-common case) is not nearly as good, since after those two cases merge you can't assume anything, whereas with optimism the rest of the code can assume the common case was taken, providing much more information for optimization (e.g. if the common case returns a SmallInteger, that is known in subsequent code, whereas without deoptimization, the subsequent code can't assume anything about the return value, regardless of inlining). Sorry if you already understood this, I couldn't tell from your post.
The reason I am pointing this out is that the machinery for deoptimization is the hard part. That is really the big advantage of the Strongtalk VM- that it provides all that infrastructure. I just want to make sure you are taking that into consideration.
I'd also not be surprised if Strongtalk is faster than Exupery for bytecode performance. I'm guessing that Strongtalk's integer arithmetic and #at: performance are better. Squeak uses 1 for it's integer tag so in general it takes 3 instructions to detag then retag and 2 clocks latency (this can be optimised often be optimised to 1 instruction and 1 clock latency). I'm guessing Strongtalk uses 0 for it's integer tag.
Yes.
Squeak uses a remembered set for it's write barrier which requires checking if the object is in the remembered set, and checking if the object is in new-space before adding it. Strongtalk might be using a card marking table just requiring a single store.
Yes, Strongtalk uses card marking; I think it is two instructions. It is Urs Holzle's write barrier, so it is probably the same as in Self.
Squeak stores the size of an object in one of two places. So to get the size to range check you first need to figure out where it's stored. I'm guessing that the size for an array is stored at a fixed location in Strongtalk.
Yes.
My assumptions about Strongtalk's object memory are based on reading the papers from the Self project.
None of these things really matters to Squeak while it's running as an interpreter because most of the time is spent recovering from branch mispredicts or waiting for memory leaving plenty of time available to hide the inefficiencies above.
One way to get around a slow compiler would be to save the code cache beside the image. All relocation is done in Smalltalk, so doing this shouldn't be too hard. But figuring out how get around a slow compiler can wait until after the compiler has become useful. There are several answers including writing a faster register allocator (2) or being the third compiler.
Yes, I have always wanted to be able to save the code. We only have the inlining DB right now, which doesn't avoid the compilation overhead on each run.
-Dave