Hi jecel,
On Wed, May 13, 2009 at 3:46 PM, Jecel Assumpcao Jr jecel@merlintec.comwrote:
I have read the description of the stack vm again:
http://www.mirandabanda.org/cogblog/2009/01/14/under-cover-contexts-and-the-...
It seems to me that it would be possible to keep the arguments and the remaining temporaries together. This would require keeping the numTemps in the flags word instead of the numArgs. It would also mean moving the code to nil out the temps towards the beginning of #internalActivateNewMethod and #activateNewClosureMethod:numArgs: (before pushing the IP). The idea is that then #temporary:in: would become simpler, which is important since it is a very popular operation.
But the split arguments/temporaries organization is determined by frame build order, and the anticipation of a JIT. The caller pushes receiver and arguments, and in a JIT then pushes the return pc (as part of the call instruction that makes up the send). Then a frame is built at the end of which temporaries are initialized. In the JIT there will always be at least the return pc/caller's saved instruction pointer between the arguments and the temporaries, and since the offsets to access each can be determined at compile time the cost of separating them is low.
So while one could I don't see that its worth-while. Even if one did keep the arguments and temporaries together one would still have the stack contents separate from the arguments and temporaries and temporary access bytecodes can still access those so arguably one would still have to check the index against the temp count.
In practice the cost is not that high in the stack vm, or at least the stack vm is still substantially faster than the context vm with arguments and temporaries split. And in the JIT it really isn't a performance issue at all.
This change would make it a little harder to fix the bug in #marryFrame:
but I don't see any other changes that would be needed. Is there some important design issue with keeping the arguments and temps separate that I am missing? I can imagine a compiler that avoids initially filling out the temps with nils by creating them lazily on first assignment and that wouldn't work with my change. But I don't know if this is a planned feature (it complicates reflection a bit since you have to fake the uninitialized temps in order not to confuse the debugger).
I found the use of a one byte flag to indicate that the context pointer is valid interesting since it seems to me that the same information is available by looking at the pointer itself (nil or not). Is this just a performance issue or are there situations where the flag can be zero but the pointer is not nil?
It was my original intent to avoid having to write the context pointer field. It used to be important to avoid unnecessary writes but on current processors with good cache interfaces I don't think avoiding the write makes any difference and I write it anyway in the current VM (which has moved on a little from the blog posts). But it is quicker to test the flag than test against nil simply because he reference to nil is an arbitrary 32-bit value, not simply 0 or 1.
In the JIT the flag is a bit in the method reference's LSBs and is set for free on frame build.
Some small details that seem like errors to me but could be a lack of
understanding on my part:
- the drawing above the definition of "activateNewClosureMethod:
blockClosure numArgs: numArgs" shows the flag word right after fp but the code indicates that it should be a method pointer instead.
You're right. I missed out the methods my accident in the left-hand stack page.
- callersFPOrNull is used in #commonCallerReturn but not assigned to. Ah
- I see that the first definition near the top of the text is
incomplete. The real definition in the discussion about return does have the assignment.
Cheers, -- Jecel
Thanks!
Eliot,
[JIT code uses call which pushes PC first]
Ok, so this can't be helped.
So while one could I don't see that its worth-while. Even if one did keep the arguments and temporaries together one would still have the stack contents separate from the arguments and temporaries and temporary access bytecodes can still access those so arguably one would still have to check the index against the temp count.
Really? I wouldn't expect the compiler to ever generate such bytecodes and so wasn't too worried if the VM did the wrong thing in this situation.
In the JIT the flag is a bit in the method reference's LSBs and is set for free on frame build.
That sounds like a neat trick. Are the stack formats for the interpreted stack vm and the jit a little diffeent?
Thanks for the explanations. I haven't figured out how to do this in hardware in a reasonable way and so might have to go with a different design.
-- Jecel
vm-dev@lists.squeakfoundation.org