On Fri, May 25, 2012 at 12:57 AM, Fabrizio Perin fabrizio.perin@gmail.comwrote:
Hi, I'm using last Moose build on top of CogVM 2550 and each time I click on a method an error pops up (see the attachment). The same problem I have it with older builds of Moose but I don't have any problem if I open the same image with Cog VM 2540 or CogVM from the Pharo web site.
I changed the definition of at: for CompiledMethod to prevent it accessing bytes outside of the bytecodes part of a compiled method. What's happening here is that a hash is being derived form the compiled method by looking at the pointers part of the compiled method (i.e. bytes < aMethod initialPC). But it is wrong to try and derive a hash from these bytes. Since the GC can move objects these bytes can change as the GC moves the literals a method references (unlike the bytes between: aMethod initialPC and: aMethod size).
Two conclusions:
The solution is to define CompiledMethod>hash such that it does not access bytes < initialPC. The hash it is inheriting from ByteArray is fine for ByteArray but not at all OK for CompiledMethod.
The VM is revealing a latent bug, i.e. that on older VMs the system could be computing a variable hash, and hence result in not-found errors in certain circumstances, hence the VM is right to impose this stricter bounds checking.
I'm on Mac os 10.7.4 and the image is:
Pharo1.4 Latest update: #14438
Cheers, Fabrizio
On 25 May 2012 20:39, Eliot Miranda eliot.miranda@gmail.com wrote:
On Fri, May 25, 2012 at 12:57 AM, Fabrizio Perin fabrizio.perin@gmail.com wrote:
Hi, I'm using last Moose build on top of CogVM 2550 and each time I click on a method an error pops up (see the attachment). The same problem I have it with older builds of Moose but I don't have any problem if I open the same image with Cog VM 2540 or CogVM from the Pharo web site.
I changed the definition of at: for CompiledMethod to prevent it accessing bytes outside of the bytecodes part of a compiled method. What's happening here is that a hash is being derived form the compiled method by looking at the pointers part of the compiled method (i.e. bytes < aMethod initialPC).
Yes, this is reasonable. But the more cases like this we find, the more i feel that we need better format for compiled methods, to keep its bytecode in a separate plain bytearray, to be able to reshape/modify the method's state and access it in convenient manner. A bytecode could be held in a first ivar of compiled method. Yes, we will have 4-8 more bytes per method in image (a ByteArray instance header).. but at the same time, we can save our brain cells as well as save bytes of implementation in both image and vm sides: - no need to deduce initialPc/end pc , because it is always 1...bytearray size - simpler method header (no need to count method's literals) - no need for a method trailer crap, any additional state can be held in own ivar. - we can simplify implementation - free the placeholders in object format for something else, now 12..15 (4 values) is for CompiledMethod, we may need only 1. - i would even make an extra slot for JIT code pointer, instead of insane hacks which Eliot has to introduce in Cog, just because of need to keep things compatible.
And more about space.
My image is 43Mb now.
CompiledMethod allInstances size 63563
so, if we have 4 more bytes per method .. this is 254252 more bytes. which is 0.5 % of total space. 8 more bytes per method.. 1% so my image will be 1% bigger.. we dooommeeed..
But it is wrong to try and derive a hash from these bytes. Since the GC can move objects these bytes can change as the GC moves the literals a method references (unlike the bytes between: aMethod initialPC and: aMethod size).
Yes. Strange that nobody noticed this before, perhaps because nobody places methods in dicts/setc :) I never found why one would want to put methods in dictionary (they are naturally found in dictionary already). But in case of Moose, this is of course possible.
More than that, since CompiledMethod redefines #=, it is strange that it's not redefines #hash as well, because it is one of requirements to make sure they are always in sync.
Two conclusions:
The solution is to define CompiledMethod>hash such that it does not access bytes < initialPC. The hash it is inheriting from ByteArray is fine for ByteArray but not at all OK for CompiledMethod.
Yes, i would just do
self methodClass hash bitXor: self selector hash
The VM is revealing a latent bug, i.e. that on older VMs the system could be computing a variable hash, and hence result in not-found errors in certain circumstances, hence the VM is right to impose this stricter bounds checking.
I'm on Mac os 10.7.4 and the image is:
Pharo1.4 Latest update: #14438
Cheers, Fabrizio
-- best, Eliot
More than that, since CompiledMethod redefines #=, it is strange that it's not redefines #hash as well, because it is one of requirements to make sure they are always in sync.
indeed!
Two conclusions:
The solution is to define CompiledMethod>hash such that it does not
access bytes < initialPC. The hash it is inheriting from ByteArray is fine for ByteArray but not at all OK for CompiledMethod.
Yes, i would just do
self methodClass hash bitXor: self selector hash
that's one possibility. If we want to continue with the previous kind of hash we also could do something like:
CompiledMethod >> hash "#hash is implemented, because #= is implemented"
^self class hashBytes: self bytecodes startingWith: self species hash
CompiledMethod >> bytecodes | initialPC endPC bytecodes | initialPC := self initialPC. endPC := self endPC. bytecodes := ByteArray new: (endPC - initialPC + 1). initialPC to: endPC do: [:index | bytecodes at: (index - initialPC + 1) put: (self at: index) ]. ^ bytecodes
I am not a hashing expert. What do you think? which impl should we do?
On Sun, May 27, 2012 at 9:15 AM, Mariano Martinez Peck < marianopeck@gmail.com> wrote:
More than that, since CompiledMethod redefines #=, it is strange that
it's not redefines #hash as well, because it is one of requirements to make sure they are always in sync.
indeed!
Two conclusions:
The solution is to define CompiledMethod>hash such that it does not
access bytes < initialPC. The hash it is inheriting from ByteArray is fine for ByteArray but not at all OK for CompiledMethod.
Yes, i would just do
self methodClass hash bitXor: self selector hash
that's one possibility. If we want to continue with the previous kind of hash we also could do something like:
CompiledMethod >> hash "#hash is implemented, because #= is implemented"
^self class hashBytes: self bytecodes startingWith: self species hash
CompiledMethod >> bytecodes | initialPC endPC bytecodes | initialPC := self initialPC. endPC := self endPC. bytecodes := ByteArray new: (endPC - initialPC + 1). initialPC to: endPC do: [:index | bytecodes at: (index - initialPC + 1) put: (self at: index) ]. ^ bytecodes
That's really slow! I would do something simple like
CompiledMethod>>hash | initialPC size hash | initialPC := self initialPC. size := self size. hash := self species hash + self header + initialPC + size + self selector hash + self methodClass hash bitAnd: 16rFFFFFFF. "sample approximately 20 bytes" initialPC to: size by: (size - initialPC // 20 max: 1) do: [:i| hash := hash + (self at: i)]. ^hash
"(CompiledMethod>>#hash) hash"
| ai | ai := CompiledMethod allInstances. { ai size. (Set withAll: ai) size. (ai collect: [:i| i hash]) asSet size } #(46635 46572 46565)
I am not a hashing expert. What do you think? which impl should we do?
-- Mariano http://marianopeck.wordpress.com
Here's a better version. The current definition of #= (at least in Squeak) doesn't compare selectors (since CompiledMethod>>#= is trying to compare code). Any hash method for CompiledMethod should pass the following:
| ai | ai := CompiledMethod allInstances. ai do: [:a| ai do: [:b| a = b ifTrue: [self assert: a hash = b hash]]]
find attached this:
hash "CompiledMethod>>#= compares code, i.e. same literals and same bytecode. So we look at the header, methodClass and some bytes between initialPC and endPC, but /not/ the selector because the equal method does not compare selectors." | initialPC endPC hash | initialPC := self initialPC. endPC := self endPC. hash := self species hash + self header + initialPC + endPC + self methodClass hash bitAnd: 16rFFFFFFF. "sample approximately 20 bytes" initialPC to: endPC by: (endPC - initialPC // 20 max: 1) do: [:i| hash := hash + (self at: i)]. ^hash
"(CompiledMethod>>#hash) hash"
On Sun, May 27, 2012 at 2:15 PM, Eliot Miranda eliot.miranda@gmail.comwrote:
On Sun, May 27, 2012 at 9:15 AM, Mariano Martinez Peck < marianopeck@gmail.com> wrote:
More than that, since CompiledMethod redefines #=, it is strange that
it's not redefines #hash as well, because it is one of requirements to make sure they are always in sync.
indeed!
Two conclusions:
The solution is to define CompiledMethod>hash such that it does not
access bytes < initialPC. The hash it is inheriting from ByteArray is fine for ByteArray but not at all OK for CompiledMethod.
Yes, i would just do
self methodClass hash bitXor: self selector hash
that's one possibility. If we want to continue with the previous kind of hash we also could do something like:
CompiledMethod >> hash "#hash is implemented, because #= is implemented"
^self class hashBytes: self bytecodes startingWith: self species hash
CompiledMethod >> bytecodes | initialPC endPC bytecodes | initialPC := self initialPC. endPC := self endPC. bytecodes := ByteArray new: (endPC - initialPC + 1). initialPC to: endPC do: [:index | bytecodes at: (index - initialPC + 1) put: (self at: index) ]. ^ bytecodes
That's really slow! I would do something simple like
CompiledMethod>>hash | initialPC size hash | initialPC := self initialPC. size := self size. hash := self species hash + self header + initialPC + size + self selector hash + self methodClass hash bitAnd: 16rFFFFFFF. "sample approximately 20 bytes" initialPC to: size by: (size - initialPC // 20 max: 1) do: [:i| hash := hash + (self at: i)]. ^hash
"(CompiledMethod>>#hash) hash"
| ai | ai := CompiledMethod allInstances. { ai size. (Set withAll: ai) size. (ai collect: [:i| i hash]) asSet size } #(46635 46572 46565)
I am not a hashing expert. What do you think? which impl should we do?
-- Mariano http://marianopeck.wordpress.com
-- best, Eliot
On 27 May 2012 23:34, Eliot Miranda eliot.miranda@gmail.com wrote:
Here's a better version. The current definition of #= (at least in Squeak) doesn't compare selectors (since CompiledMethod>>#= is trying to compare code). Any hash method for CompiledMethod should pass the following:
| ai | ai := CompiledMethod allInstances. ai do: [:a| ai do: [:b| a = b ifTrue: [self assert: a hash = b hash]]]
find attached this:
hash "CompiledMethod>>#= compares code, i.e. same literals and same bytecode. So we look at the header, methodClass and some bytes between initialPC and endPC, but /not/ the selector because the equal method does not compare selectors." | initialPC endPC hash | initialPC := self initialPC. endPC := self endPC. hash := self species hash + self header + initialPC + endPC + self methodClass hash bitAnd: 16rFFFFFFF. "sample approximately 20 bytes" initialPC to: endPC by: (endPC - initialPC // 20 max: 1) do: [:i| hash := hash + (self at: i)]. ^hash
"(CompiledMethod>>#hash) hash"
this one still too slow.. it can serve as a punishment for putting CM as keys in Dictionaries or in Sets :)
vm-dev@lists.squeakfoundation.org