Re: Comments on Smalltalk block closure designs, part 1 - Squeak-dev

28 Apr 2001


      Ian Piumarta wrote:
...
I just *know* I'm about to say something stupid.  (I haven't been
following this thread, so I'm completely lost for context.)  But
thanks for `CC'ing me anyway: I was wondering what to do with a spare
15 minutes while the washer finishes washing...
I think the machine likes me ;-)
...
...
But in spite of your criticism of an evolution of the classical ST80 design
towards block closure semantics, I'm continuing to resist to give up my design
at the moment. For this case you have given a neat cookbook for implementing
block closure semantics, which I could follow then... ;-)
AFAICMO, Allen's "cookbook" corresponds pretty closely to the way
closures are (and have been) done in the vast majority of functional
languages (including various Smalltalk implementations).  Apart from
anything else, you're arguing against an awful lot of exerience gained
by an awful lot of people over an awfully long time spent implementing
(and fine-tuning) an awfully large number of dynamic languages.  (By
"dynamic language" I guess I mean any language that implements
closures as fully-reentrant first-class fully-upward anonymous
functions with static [lexical] scoping and free variables.)
I don't want to argument against
...
an awful lot of exerience gained
by an awful lot of people over an awfully long time spent implementing
(and fine-tuning) an awfully large number of dynamic languages.
...
On the other hand, I do believe that many great breakthroughs have
been made simply because nobody bothered to tell the people involved
that what they were trying to do was "impossible".
So my position, generally speaking, would be: don't listen to any of
us bumble-headed know-nothings -- just go right ahead and try it.  If
it works, great!  If if doesn't (or if it's abysmally slow) throw it
away and move on to something entirely different (and hopefully more
promising, in the light of experience gained).  A negative result can
yield information that is every bit as useful as a positive result.
I've learned a lot already now by
- making the design;
- starting to implement while improving it;
- posting the design with
  - provocating this thread, and
  - getting other reactions.
...
(And the time and effort spent trying to convince us that it's The
Right Way is probably comparable to the time and effort required to
implement it, measure it, and know quantitatively whether or not it's
right.)
My intent wasn't to convince you by my design, but to get insights, where are
the possible problems with my design.
And now I've gotten much more, namely a raw draft of a more consequent design
by a very experienced man (thanks to Allen).
...
...
Agreed. But the special logic isn't very complicated here and we
save some other logic needed by another design.
Most of the logic needed to efficiently implement closures in the
"traditional manner" (split allocation between captured and
non-captured variables, etc...) happens in the compiler, not in the
runtime.  One needs to look very closely at what gets figured out
where, and by whom.  Complex logic in the compiler is preferable, if
it can eliminate even a little "trivial" logic in the runtime.
...
...
You can probably get by with static links that chain together activation
records (MethodContexts) but this has the reification problem that
ultimately is a performance killer.
This is a key point: Is this really a performance killer?
In VisualWorks: I know that big savings were made by redesigning the
way closures (and free variables in particular) work, along lines
similar to those that Allen is suggesting.
J4?
...
In J3: not at the moment, but only because other parts of the system
are sooooo hopelessly inefficient.  (Some of the inefficiency is
caused by the runtime logic required to cope with blocks that aren't
real closures, and to deal with block bodies that are inline in their
defining methods -- when truth and beauty are both crying out for them
to be compiled as separate "anonymous methods".  But that's another
story, for another bedtime.)
Named CompiledBlocks in VW, I think.
...
...
I don't know Jitter technology, so I'm very interested in a rationale for
this.
Converting between different representations of the same thing
(machine-friendly frames on the stack, closure-friendly contexts in
the heap) is just wasted cycles.  Not to mention additional memory
traffic, which is to be avoided as often as possible (in any context).
Without putting contexts onto the stack at all, we don't have this reifying
conversions, but only heap accesses (hopefully cached well, see below)
instead.
...
Just having the possibility for reification is costly too, since
potentially each and every return has to check whether the exiting
frame needs to be reified in the heap.  (Activation might even be more
complex too, if it has to plan ahead for possible reification.)
Plus every time you reify something, you create an object.  Even
though it's probably quite a small contribution to the big picture,
the GC has more work to do.
Same as previous point.
...
Apart from that, the closer you can get to the execution model that
the architects of your processor had in mind while they were designing
your processor, the faster your programs will run.  Since most
processors are designed to run C (or maybe Fortran), activation
records were meant to be managed (in the vast majority of
architectures) by specific hardware features implementing strict LIFO
behaviour.  Anything (such as non-LIFO behaviour) that causes traffic
between the stack (where the processor's designers decreed that all
activations should live) and the heap (where reified contexts have to
live) is pure overhead (and in some cases completely invalidates lots
of information, that the processor has carefully cached on your behalf
yet will throw away mercilessly the instant it suspects that you've
violated any of its implicit LIFO laws).
That seems to be the critical performance issue (also see below).
...
...
Let's take the simple example, which should occur - as I guess - mostly often
in running an ST application in this or a similar form:
...
     [:x :y | x >= y]

Lets assume a collection consisting of SmallIntegers (immediate objects):
Is for a Jitter creating and using an activation record onto the stack faster
than reusing an activation record at the heap for every evaluation in this
case?
No, there's probably only a few instructions of difference between
them.  (Assuming the way that contexts are handled inside the VM is
changed a little -- see "Managing Stack Frames in Smalltalk-80" [Moss
et al] for one possibility [but watch out for the typo in the example
code, if you decide to benchmark their algorithm].)  (But the required
number of memory references is another matter entirely!!)
Memory or just *heap* memory references?
<snipped>
...
...
Principal question: How intelligent are the caching mechanisms of typical
processors (for me its an Intel)
For Intel (and clones), depends on the manufacturer.  (For the most
part, fairly dumb.)
...
regarding non localized memory addresses? Is this a problem or not?
For most caches, dynamic locality is what matters, not static
locality.  Your program can generate access patterns that span
gigabytes without any adverse effects, provided that only a reasonably
small number of addresses are in the "working set" at any time (where
"reasonably small" is relative to some function of the number of
distinct cache lines that you have).
As a consequence only using activation records residing in *heap* located
contexts shouldn't be a performance problem, if there aren't too many used at
once ('reasonable small number of addresses').
Is this interpretation correct?
<snipped>
Greetings,
Stephan
-- 
Stephan Rudlof (sr@evolgo.de)
   "Genius doesn't work on an assembly line basis.
    You can't simply say, 'Today I will be brilliant.'"
    -- Kirk, "The Ultimate Computer", stardate 4731.3