Ian Piumarta wrote:
I just *know* I'm about to say something stupid. (I haven't been following this thread, so I'm completely lost for context.) But thanks for `CC'ing me anyway: I was wondering what to do with a spare 15 minutes while the washer finishes washing...
I think the machine likes me ;-)
But in spite of your criticism of an evolution of the classical ST80 design towards block closure semantics, I'm continuing to resist to give up my design at the moment. For this case you have given a neat cookbook for implementing block closure semantics, which I could follow then... ;-)
AFAICMO, Allen's "cookbook" corresponds pretty closely to the way closures are (and have been) done in the vast majority of functional languages (including various Smalltalk implementations). Apart from anything else, you're arguing against an awful lot of exerience gained by an awful lot of people over an awfully long time spent implementing (and fine-tuning) an awfully large number of dynamic languages. (By "dynamic language" I guess I mean any language that implements closures as fully-reentrant first-class fully-upward anonymous functions with static [lexical] scoping and free variables.)
I don't want to argument against
an awful lot of exerience gained by an awful lot of people over an awfully long time spent implementing (and fine-tuning) an awfully large number of dynamic languages.
On the other hand, I do believe that many great breakthroughs have been made simply because nobody bothered to tell the people involved that what they were trying to do was "impossible".
So my position, generally speaking, would be: don't listen to any of us bumble-headed know-nothings -- just go right ahead and try it. If it works, great! If if doesn't (or if it's abysmally slow) throw it away and move on to something entirely different (and hopefully more promising, in the light of experience gained). A negative result can yield information that is every bit as useful as a positive result.
I've learned a lot already now by - making the design; - starting to implement while improving it; - posting the design with - provocating this thread, and - getting other reactions.
(And the time and effort spent trying to convince us that it's The Right Way is probably comparable to the time and effort required to implement it, measure it, and know quantitatively whether or not it's right.)
My intent wasn't to convince you by my design, but to get insights, where are the possible problems with my design. And now I've gotten much more, namely a raw draft of a more consequent design by a very experienced man (thanks to Allen).
Agreed. But the special logic isn't very complicated here and we save some other logic needed by another design.
Most of the logic needed to efficiently implement closures in the "traditional manner" (split allocation between captured and non-captured variables, etc...) happens in the compiler, not in the runtime. One needs to look very closely at what gets figured out where, and by whom. Complex logic in the compiler is preferable, if it can eliminate even a little "trivial" logic in the runtime.
You can probably get by with static links that chain together activation records (MethodContexts) but this has the reification problem that ultimately is a performance killer.
This is a key point: Is this really a performance killer?
In VisualWorks: I know that big savings were made by redesigning the way closures (and free variables in particular) work, along lines similar to those that Allen is suggesting.
J4?
In J3: not at the moment, but only because other parts of the system are sooooo hopelessly inefficient. (Some of the inefficiency is caused by the runtime logic required to cope with blocks that aren't real closures, and to deal with block bodies that are inline in their defining methods -- when truth and beauty are both crying out for them to be compiled as separate "anonymous methods". But that's another story, for another bedtime.)
Named CompiledBlocks in VW, I think.
I don't know Jitter technology, so I'm very interested in a rationale for this.
Converting between different representations of the same thing (machine-friendly frames on the stack, closure-friendly contexts in the heap) is just wasted cycles. Not to mention additional memory traffic, which is to be avoided as often as possible (in any context).
Without putting contexts onto the stack at all, we don't have this reifying conversions, but only heap accesses (hopefully cached well, see below) instead.
Just having the possibility for reification is costly too, since potentially each and every return has to check whether the exiting frame needs to be reified in the heap. (Activation might even be more complex too, if it has to plan ahead for possible reification.)
Plus every time you reify something, you create an object. Even though it's probably quite a small contribution to the big picture, the GC has more work to do.
Same as previous point.
Apart from that, the closer you can get to the execution model that the architects of your processor had in mind while they were designing your processor, the faster your programs will run. Since most processors are designed to run C (or maybe Fortran), activation records were meant to be managed (in the vast majority of architectures) by specific hardware features implementing strict LIFO behaviour. Anything (such as non-LIFO behaviour) that causes traffic between the stack (where the processor's designers decreed that all activations should live) and the heap (where reified contexts have to live) is pure overhead (and in some cases completely invalidates lots of information, that the processor has carefully cached on your behalf yet will throw away mercilessly the instant it suspects that you've violated any of its implicit LIFO laws).
That seems to be the critical performance issue (also see below).
Let's take the simple example, which should occur - as I guess - mostly often in running an ST application in this or a similar form:
[:x :y | x >= y]
Lets assume a collection consisting of SmallIntegers (immediate objects): Is for a Jitter creating and using an activation record onto the stack faster than reusing an activation record at the heap for every evaluation in this case?
No, there's probably only a few instructions of difference between them. (Assuming the way that contexts are handled inside the VM is changed a little -- see "Managing Stack Frames in Smalltalk-80" [Moss et al] for one possibility [but watch out for the typo in the example code, if you decide to benchmark their algorithm].) (But the required number of memory references is another matter entirely!!)
Memory or just *heap* memory references?
<snipped>
Principal question: How intelligent are the caching mechanisms of typical processors (for me its an Intel)
For Intel (and clones), depends on the manufacturer. (For the most part, fairly dumb.)
regarding non localized memory addresses? Is this a problem or not?
For most caches, dynamic locality is what matters, not static locality. Your program can generate access patterns that span gigabytes without any adverse effects, provided that only a reasonably small number of addresses are in the "working set" at any time (where "reasonably small" is relative to some function of the number of distinct cache lines that you have).
As a consequence only using activation records residing in *heap* located contexts shouldn't be a performance problem, if there aren't too many used at once ('reasonable small number of addresses').
Is this interpretation correct?
<snipped>
Greetings,
Stephan
squeak-dev@lists.squeakfoundation.org