RE: [Vm-dev] Strongtalk and Exupery

24 Sep 2006


      Hi Bryce,
...
-----Original Message-----
From: vm-dev-bounces@lists.squeakfoundation.org
Hi David,
The bytecode benchmark is a prime number sieve. It uses #at: and
#at:put:. The send benchmark is a simple recursive Fibonacci function.
Both are just measures of how quickly they execute, neither really
measures the actual bytecodes or sends performed. They are the old
tinyBenchmarks. I'd guess everyone ran the same code for these
benchmarks.
That's fine, it's just that we need to actually run these benchmarks right-
with different architectures, clock speeds etc. I don't think we know the
relative performance yet.
...
I 100% agree that inlining is the right way to optimise common sends
and block execution. [...]
Ok, I was just trying to say that in Smalltalk, a mediocre compiler with
optimistic inlining is better than a great compiler without inlining.  As
long as you are headed in the direction of optimistic inlining, we are in
agreement.
I just want to re-emphasize the importance of "optimistic", which implies
the ability to deoptimize, not just the ability to inline.  Inlining the
common case non-optimistically (i.e. with an 'else' clause containing the
non-common case) is not nearly as good, since after those two cases merge
you can't assume anything, whereas with optimism the rest of the code can
assume the common case was taken, providing much more information for
optimization (e.g. if the common case returns a SmallInteger, that is known
in subsequent code, whereas without deoptimization, the subsequent code
can't assume anything about the return value, regardless of inlining).
Sorry if you already understood this, I couldn't tell from your post.
The reason I am pointing this out is that the machinery for deoptimization
is the hard part.  That is really the big advantage of the Strongtalk VM-
that it provides all that infrastructure.  I just want to make sure you are
taking that into consideration.
...
I'd also not be surprised if Strongtalk is faster than Exupery for
bytecode performance. I'm guessing that Strongtalk's integer
arithmetic and #at: performance are better. Squeak uses 1 for it's
integer tag so in general it takes 3 instructions to detag then retag
and 2 clocks latency (this can be optimised often be optimised to 1
instruction and 1 clock latency). I'm guessing Strongtalk uses 0 for
it's integer tag.
Yes.
...
Squeak uses a remembered set for it's write barrier which requires
checking if the object is in the remembered set, and checking if the
object is in new-space before adding it. Strongtalk might be using a
card marking table just requiring a single store.
Yes, Strongtalk uses card marking; I think it is two instructions.  It is
Urs Holzle's write barrier, so it is probably the same as in Self.
...
Squeak stores the size of an object in one of two places. So to get
the size to range check you first need to figure out where it's
stored. I'm guessing that the size for an array is stored at a fixed
location in Strongtalk.
Yes.
...
My assumptions about Strongtalk's object memory are based on reading
the papers from the Self project.
None of these things really matters to Squeak while it's running as an
interpreter because most of the time is spent recovering from branch
mispredicts or waiting for memory leaving plenty of time available to
hide the inefficiencies above.
One way to get around a slow compiler would be to save the code cache
beside the image. All relocation is done in Smalltalk, so doing this
shouldn't be too hard. But figuring out how get around a slow compiler
can wait until after the compiler has become useful. There are several
answers including writing a faster register allocator (2) or being the
third compiler.
Yes, I have always wanted to be able to save the code.  We only have the
inlining DB right now, which doesn't avoid the compilation overhead on each
run.
-Dave

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

RE: [Vm-dev] Strongtalk and Exupery