Hi All,
I had stalled in recent days on the new compactor. The compactor appears to be working perfectly for me running on Mac OS, but when my colleague Bob Westergaard built a Newspeak VM for linux it crashed immediately (our Newspeak images invoke GC as soon as the low space watcher is spawned, because we're based on an older Squeak release).
Linux crashes whereas for me Mac OS doesn't crash for the combination of two reasons. Linux places the heap much higher in the address space, typically creating heaps that extend into the up[per half of the address space. The default type for oops in the VM is saint, which is signed.
So unless unsigned comparisons are used, oops in the upper half of the address space, which consequently have their sign bit set, will confuse enumerations, and hence bring the compactor crashing down.
The problem is how to find these unsigned comparisons. I had taken the approach of simply finding test cases and running them in the debugger, but alas the compactor works perfectly in the simulator, because there's not enough address space to create a simulated heap > 2Gb, so the simulator works and the generated C doesn't work on Linux. Running the two side by side trying to look for where they diverge was not productive given the difficulty of looking at a 47Mb image containing 875k objects.
And then the subconscious kicked in and I realized I could use Slang to analyze the parse tree and look for signed comparisons. The first cut looked like this. Yes, the open coding of building the code enrapture and running the type inference is ugly, but it's not that complicated. The important bit at the bottom is
node isSend and: [(#(< > <= >=) includes: node selector) and: [({node receiver. node args first } anySatisfy: [:o| (cg typeFor: o in: m) ifNil: [false] ifNotNil: [:t| (cg isIntegralCType: t) and: [t first ~= $u]]])]]
SpurPlanningCompactor class>>#identifySignedComparisons "self identifySignedComparisons" | vmm cg | vmm := (VMMaker forPlatform: 'Cross') interpreterClass: StackInterpreter; options: #(ObjectMemory Spur32BitMemoryManager). cg := [vmm buildCodeGeneratorForInterpreter] on: Notification do: [:ex| ex tag == #getVMMaker ifTrue: [ex resume: vmm] ifFalse: [ex pass]]. cg vmClass preGenerationHook: cg. cg inferTypesForImplicitlyTypedVariablesAndMethods. cg retainMethods: self selectors. cg prepareMethods. cg doInlining: true. self selectors do: [:sel| (cg methodNamed: sel) ifNotNil: [:m| m parseTree nodesDo: [:node| (node isSend and: [(#(< > <= >=) includes: node selector) and: [{node receiver. node args first } anySatisfy: [:o| (cg typeFor: o in: m) ifNil: [false] ifNotNil: [:t| (cg isIntegralCType: t) and: [t first ~= $u]]]]]) ifTrue: [Transcript ensureCr; nextPutAll: sel; space; print: node; flush]]]]
Using the above was a doit to collect the results in a set allowed me to filter out the noise, and the final method looks like
identifySignedComparisons "self identifySignedComparisons" | vmm cg noise | noise := #('(manager bytesInObject: largestFreeChunk) >= spaceEstimate' '(self classIndexOf: o*) > self isForwardedObjectClassIndexPun' 'GCModeFull > 0' 'ReceiverIndex + (objectMemory integerValueOf: sp*) < (objectMemory lengthOf: o*)' 'fmt* < manager firstCompiledMethodFormat' 'fmt* < self firstCompiledMethodFormat' 'fmt* <= 5' 'gcPhaseInProgress > 0' 'i <= finishIndex' 'i >= 0' 'numPointerSlots > 0' 'scavenger rememberedSetSize > 0'). vmm := (VMMaker forPlatform: 'Cross') interpreterClass: StackInterpreter; options: #(ObjectMemory Spur32BitMemoryManager). cg := [vmm buildCodeGeneratorForInterpreter] on: Notification do: [:ex| ex tag == #getVMMaker ifTrue: [ex resume: vmm] ifFalse: [ex pass]]. cg vmClass preGenerationHook: cg. cg inferTypesForImplicitlyTypedVariablesAndMethods. cg retainMethods: self selectors. cg prepareMethods. cg doInlining: true. self selectors do: [:sel| (cg methodNamed: sel) ifNotNil: [:m| m parseTree nodesDo: [:node| (node isSend and: [(#(< > <= >=) includes: node selector) and: [({node receiver. node args first } anySatisfy: [:o| (cg typeFor: o in: m) ifNil: [false] ifNotNil: [:t| (cg isIntegralCType: t) and: [t first ~= $u]]]) and: [noise noneSatisfy: [:n| n match: node printString]]]]) ifTrue: [Transcript ensureCr; nextPutAll: sel; space; print: node; flush]]]]
and my output is
savedFirstFieldsSpaceInFreeChunk savedFirstFieldsSpace start >= nilObj unmarkPinnedObjectsAndFindFirstUnpinnedOrFreeEntityFollowing: (segments at: sweepIndex) segSize + (segments at: sweepIndex) segStart < nextObj unmarkPinnedObjectsAndFindFirstUnpinnedOrFreeEntityFollowing: nextObj >= manager endOfMemory findNextMarkedPinnedAfter: nextObj >= manager endOfMemory planCompactSavingForwarders toFinger <= (manager startOfObject: o) planCompactSavingForwarders toFinger <= (manager startOfObject: previousPin) updatePointers o2 >= firstFreeObject updatePointers toFinger <= (manager startOfObject: o3) updatePointers toFinger <= (manager startOfObject: previousPin) savedFirstFieldsSpaceWasAllocated savedFirstFieldsSpace start >= nilObj freeFrom:upTo:previousPin: toFinger >= (segments at: i) segStart freeFrom:upTo:previousPin: seg segSize + seg segStart < limit freeFrom:upTo:previousPin: start := (self byteAt: pin + 7) = self numSlotsMask ifTrue: [pin - self baseHeaderSize] ifFalse: [pin] > toFinger freeFrom:upTo:previousPin: (segments at: sweepIndex) segSize + (segments at: sweepIndex) segStart < nextObj1 freeFrom:upTo:previousPin: nextObj1 >= manager endOfMemory freeFrom:upTo:previousPin: nextUnpinned >= limit freeFrom:upTo:previousPin: nextObj >= manager endOfMemory unmarkPinned: (segments at: sweepIndex) segSize + (segments at: sweepIndex) segStart < pinnedObj compact savedFirstFieldsSpace start >= nilObj compact savedFirstFieldsSpace start >= nilObj updateSavedFirstFieldsSpaceIfNecessary savedFirstFieldsSpace start >= nilObj endCompaction savedFirstFieldsSpace start >= nilObj updatePointersInInitialImmobileObjects o >= firstFreeObject copyAndUnmarkMobileObjects toFinger <= (manager startOfObject: o) copyAndUnmarkMobileObjects toFinger <= (manager startOfObject: previousPin) copyAndUnmarkMobileObjects bytes + 2 * 8 > availableSpace copyAndUnmarkMobileObjects availableSpace > 0 releaseSavedFirstFieldsSpace savedFirstFieldsSpace start >= nilObj updatePointersInMobileObjects toFinger <= (manager startOfObject: o) updatePointersInMobileObjects toFinger <= (manager startOfObject: previousPin)
Less than half an hour's work. Now I can squash those bugs :-)
_,,,^..^,,,_ best, Eliot
Ah great! I had the same idea of instrumenting slang rather than analyzing the compiler warnings for the same purpose of filtering the noise out, but never took the time to do it...
2017-01-25 18:22 GMT+01:00 Eliot Miranda eliot.miranda@gmail.com:
Hi All,
I had stalled in recent days on the new compactor. The compactor
appears to be working perfectly for me running on Mac OS, but when my colleague Bob Westergaard built a Newspeak VM for linux it crashed immediately (our Newspeak images invoke GC as soon as the low space watcher is spawned, because we're based on an older Squeak release).
Linux crashes whereas for me Mac OS doesn't crash for the combination of two reasons. Linux places the heap much higher in the address space, typically creating heaps that extend into the up[per half of the address space. The default type for oops in the VM is saint, which is signed.
So unless unsigned comparisons are used, oops in the upper half of the address space, which consequently have their sign bit set, will confuse enumerations, and hence bring the compactor crashing down.
The problem is how to find these unsigned comparisons. I had taken the approach of simply finding test cases and running them in the debugger, but alas the compactor works perfectly in the simulator, because there's not enough address space to create a simulated heap > 2Gb, so the simulator works and the generated C doesn't work on Linux. Running the two side by side trying to look for where they diverge was not productive given the difficulty of looking at a 47Mb image containing 875k objects.
And then the subconscious kicked in and I realized I could use Slang to analyze the parse tree and look for signed comparisons. The first cut looked like this. Yes, the open coding of building the code enrapture and running the type inference is ugly, but it's not that complicated. The important bit at the bottom is
node isSend and: [(#(< > <= >=) includes: node selector) and: [({node receiver. node args first } anySatisfy: [:o| (cg typeFor: o in: m) ifNil: [false] ifNotNil: [:t| (cg isIntegralCType: t) and: [t first ~= $u]]])]]
SpurPlanningCompactor class>>#identifySignedComparisons "self identifySignedComparisons" | vmm cg | vmm := (VMMaker forPlatform: 'Cross') interpreterClass: StackInterpreter; options: #(ObjectMemory Spur32BitMemoryManager). cg := [vmm buildCodeGeneratorForInterpreter] on: Notification do: [:ex| ex tag == #getVMMaker ifTrue: [ex resume: vmm] ifFalse: [ex pass]]. cg vmClass preGenerationHook: cg. cg inferTypesForImplicitlyTypedVariablesAndMethods. cg retainMethods: self selectors. cg prepareMethods. cg doInlining: true. self selectors do: [:sel| (cg methodNamed: sel) ifNotNil: [:m| m parseTree nodesDo: [:node| (node isSend and: [(#(< > <= >=) includes: node selector) and: [{node receiver. node args first } anySatisfy: [:o| (cg typeFor: o in: m) ifNil: [false] ifNotNil: [:t| (cg isIntegralCType: t) and: [t first ~= $u]]]]]) ifTrue: [Transcript ensureCr; nextPutAll: sel; space; print: node; flush]]]]
Using the above was a doit to collect the results in a set allowed me to filter out the noise, and the final method looks like
identifySignedComparisons "self identifySignedComparisons" | vmm cg noise | noise := #('(manager bytesInObject: largestFreeChunk) >= spaceEstimate' '(self classIndexOf: o*) > self isForwardedObjectClassIndexPun' 'GCModeFull > 0' 'ReceiverIndex + (objectMemory integerValueOf: sp*) < (objectMemory lengthOf: o*)' 'fmt* < manager firstCompiledMethodFormat' 'fmt* < self firstCompiledMethodFormat' 'fmt* <= 5' 'gcPhaseInProgress > 0' 'i <= finishIndex' 'i >= 0' 'numPointerSlots > 0' 'scavenger rememberedSetSize > 0'). vmm := (VMMaker forPlatform: 'Cross') interpreterClass: StackInterpreter; options: #(ObjectMemory Spur32BitMemoryManager). cg := [vmm buildCodeGeneratorForInterpreter] on: Notification do: [:ex| ex tag == #getVMMaker ifTrue: [ex resume: vmm] ifFalse: [ex pass]]. cg vmClass preGenerationHook: cg. cg inferTypesForImplicitlyTypedVariablesAndMethods. cg retainMethods: self selectors. cg prepareMethods. cg doInlining: true. self selectors do: [:sel| (cg methodNamed: sel) ifNotNil: [:m| m parseTree nodesDo: [:node| (node isSend and: [(#(< > <= >=) includes: node selector) and: [({node receiver. node args first } anySatisfy: [:o| (cg typeFor: o in: m) ifNil: [false] ifNotNil: [:t| (cg isIntegralCType: t) and: [t first ~= $u]]]) and: [noise noneSatisfy: [:n| n match: node printString]]]]) ifTrue: [Transcript ensureCr; nextPutAll: sel; space; print: node; flush]]]]
and my output is
savedFirstFieldsSpaceInFreeChunk savedFirstFieldsSpace start >= nilObj unmarkPinnedObjectsAndFindFirstUnpinnedOrFreeEntityFollowing: (segments at: sweepIndex) segSize + (segments at: sweepIndex) segStart < nextObj unmarkPinnedObjectsAndFindFirstUnpinnedOrFreeEntityFollowing: nextObj >= manager endOfMemory findNextMarkedPinnedAfter: nextObj >= manager endOfMemory planCompactSavingForwarders toFinger <= (manager startOfObject: o) planCompactSavingForwarders toFinger <= (manager startOfObject: previousPin) updatePointers o2 >= firstFreeObject updatePointers toFinger <= (manager startOfObject: o3) updatePointers toFinger <= (manager startOfObject: previousPin) savedFirstFieldsSpaceWasAllocated savedFirstFieldsSpace start >= nilObj freeFrom:upTo:previousPin: toFinger >= (segments at: i) segStart freeFrom:upTo:previousPin: seg segSize + seg segStart < limit freeFrom:upTo:previousPin: start := (self byteAt: pin + 7) = self numSlotsMask ifTrue: [pin - self baseHeaderSize] ifFalse: [pin] > toFinger freeFrom:upTo:previousPin: (segments at: sweepIndex) segSize + (segments at: sweepIndex) segStart < nextObj1 freeFrom:upTo:previousPin: nextObj1 >= manager endOfMemory freeFrom:upTo:previousPin: nextUnpinned >= limit freeFrom:upTo:previousPin: nextObj >= manager endOfMemory unmarkPinned: (segments at: sweepIndex) segSize + (segments at: sweepIndex) segStart < pinnedObj compact savedFirstFieldsSpace start >= nilObj compact savedFirstFieldsSpace start >= nilObj updateSavedFirstFieldsSpaceIfNecessary savedFirstFieldsSpace start >= nilObj endCompaction savedFirstFieldsSpace start >= nilObj updatePointersInInitialImmobileObjects o >= firstFreeObject copyAndUnmarkMobileObjects toFinger <= (manager startOfObject: o) copyAndUnmarkMobileObjects toFinger <= (manager startOfObject: previousPin) copyAndUnmarkMobileObjects bytes + 2 * 8 > availableSpace copyAndUnmarkMobileObjects availableSpace > 0 releaseSavedFirstFieldsSpace savedFirstFieldsSpace start >= nilObj updatePointersInMobileObjects toFinger <= (manager startOfObject: o) updatePointersInMobileObjects toFinger <= (manager startOfObject: previousPin)
Less than half an hour's work. Now I can squash those bugs :-)
_,,,^..^,,,_ best, Eliot
I recall that the issue we had with making the 32bit version clean was a for/while loop using a signed integer as pointer compare. If the start/end of the object was either negative or positive it worked, but if the object spanned the 2GB boundary then boom.
Using the start address in mmap was helpful to fiddle with where the image started in memory.
On Wed, Jan 25, 2017 at 10:45 AM, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> wrote:
Ah great! I had the same idea of instrumenting slang rather than analyzing the compiler warnings for the same purpose of filtering the noise out, but never took the time to do it...
2017-01-25 18:22 GMT+01:00 Eliot Miranda eliot.miranda@gmail.com:
Hi All,
I had stalled in recent days on the new compactor. The compactor
appears to be working perfectly for me running on Mac OS, but when my colleague Bob Westergaard built a Newspeak VM for linux it crashed immediately (our Newspeak images invoke GC as soon as the low space watcher is spawned, because we're based on an older Squeak release).
Linux crashes whereas for me Mac OS doesn't crash for the combination of two reasons. Linux places the heap much higher in the address space, typically creating heaps that extend into the up[per half of the address space. The default type for oops in the VM is saint, which is signed.
So unless unsigned comparisons are used, oops in the upper half of the address space, which consequently have their sign bit set, will confuse enumerations, and hence bring the compactor crashing down.
The problem is how to find these unsigned comparisons. I had taken the approach of simply finding test cases and running them in the debugger, but alas the compactor works perfectly in the simulator, because there's not enough address space to create a simulated heap > 2Gb, so the simulator works and the generated C doesn't work on Linux. Running the two side by side trying to look for where they diverge was not productive given the difficulty of looking at a 47Mb image containing 875k objects.
And then the subconscious kicked in and I realized I could use Slang to analyze the parse tree and look for signed comparisons. The first cut looked like this. Yes, the open coding of building the code enrapture and running the type inference is ugly, but it's not that complicated. The important bit at the bottom is
node isSend and: [(#(< > <= >=) includes: node selector) and: [({node receiver. node args first } anySatisfy: [:o| (cg typeFor: o in: m) ifNil: [false] ifNotNil: [:t| (cg isIntegralCType: t) and: [t first ~= $u]]])]]
SpurPlanningCompactor class>>#identifySignedComparisons "self identifySignedComparisons" | vmm cg | vmm := (VMMaker forPlatform: 'Cross') interpreterClass: StackInterpreter; options: #(ObjectMemory Spur32BitMemoryManager). cg := [vmm buildCodeGeneratorForInterpreter] on: Notification do: [:ex| ex tag == #getVMMaker ifTrue: [ex resume: vmm] ifFalse: [ex pass]]. cg vmClass preGenerationHook: cg. cg inferTypesForImplicitlyTypedVariablesAndMethods. cg retainMethods: self selectors. cg prepareMethods. cg doInlining: true. self selectors do: [:sel| (cg methodNamed: sel) ifNotNil: [:m| m parseTree nodesDo: [:node| (node isSend and: [(#(< > <= >=) includes: node selector) and: [{node receiver. node args first } anySatisfy: [:o| (cg typeFor: o in: m) ifNil: [false] ifNotNil: [:t| (cg isIntegralCType: t) and: [t first ~= $u]]]]]) ifTrue: [Transcript ensureCr; nextPutAll: sel; space; print: node; flush]]]]
Using the above was a doit to collect the results in a set allowed me to filter out the noise, and the final method looks like
identifySignedComparisons "self identifySignedComparisons" | vmm cg noise | noise := #('(manager bytesInObject: largestFreeChunk) >= spaceEstimate' '(self classIndexOf: o*) > self isForwardedObjectClassIndexPun' 'GCModeFull > 0' 'ReceiverIndex + (objectMemory integerValueOf: sp*) < (objectMemory lengthOf: o*)' 'fmt* < manager firstCompiledMethodFormat' 'fmt* < self firstCompiledMethodFormat' 'fmt* <= 5' 'gcPhaseInProgress > 0' 'i <= finishIndex' 'i >= 0' 'numPointerSlots > 0' 'scavenger rememberedSetSize > 0'). vmm := (VMMaker forPlatform: 'Cross') interpreterClass: StackInterpreter; options: #(ObjectMemory Spur32BitMemoryManager). cg := [vmm buildCodeGeneratorForInterpreter] on: Notification do: [:ex| ex tag == #getVMMaker ifTrue: [ex resume: vmm] ifFalse: [ex pass]]. cg vmClass preGenerationHook: cg. cg inferTypesForImplicitlyTypedVariablesAndMethods. cg retainMethods: self selectors. cg prepareMethods. cg doInlining: true. self selectors do: [:sel| (cg methodNamed: sel) ifNotNil: [:m| m parseTree nodesDo: [:node| (node isSend and: [(#(< > <= >=) includes: node selector) and: [({node receiver. node args first } anySatisfy: [:o| (cg typeFor: o in: m) ifNil: [false] ifNotNil: [:t| (cg isIntegralCType: t) and: [t first ~= $u]]]) and: [noise noneSatisfy: [:n| n match: node printString]]]]) ifTrue: [Transcript ensureCr; nextPutAll: sel; space; print: node; flush]]]]
and my output is
savedFirstFieldsSpaceInFreeChunk savedFirstFieldsSpace start >= nilObj unmarkPinnedObjectsAndFindFirstUnpinnedOrFreeEntityFollowing: (segments at: sweepIndex) segSize + (segments at: sweepIndex) segStart < nextObj unmarkPinnedObjectsAndFindFirstUnpinnedOrFreeEntityFollowing: nextObj >= manager endOfMemory findNextMarkedPinnedAfter: nextObj >= manager endOfMemory planCompactSavingForwarders toFinger <= (manager startOfObject: o) planCompactSavingForwarders toFinger <= (manager startOfObject: previousPin) updatePointers o2 >= firstFreeObject updatePointers toFinger <= (manager startOfObject: o3) updatePointers toFinger <= (manager startOfObject: previousPin) savedFirstFieldsSpaceWasAllocated savedFirstFieldsSpace start >= nilObj freeFrom:upTo:previousPin: toFinger >= (segments at: i) segStart freeFrom:upTo:previousPin: seg segSize + seg segStart < limit freeFrom:upTo:previousPin: start := (self byteAt: pin + 7) = self numSlotsMask ifTrue: [pin - self baseHeaderSize] ifFalse: [pin] > toFinger freeFrom:upTo:previousPin: (segments at: sweepIndex) segSize + (segments at: sweepIndex) segStart < nextObj1 freeFrom:upTo:previousPin: nextObj1 >= manager endOfMemory freeFrom:upTo:previousPin: nextUnpinned >= limit freeFrom:upTo:previousPin: nextObj >= manager endOfMemory unmarkPinned: (segments at: sweepIndex) segSize + (segments at: sweepIndex) segStart < pinnedObj compact savedFirstFieldsSpace start >= nilObj compact savedFirstFieldsSpace start >= nilObj updateSavedFirstFieldsSpaceIfNecessary savedFirstFieldsSpace start >= nilObj endCompaction savedFirstFieldsSpace start >= nilObj updatePointersInInitialImmobileObjects o >= firstFreeObject copyAndUnmarkMobileObjects toFinger <= (manager startOfObject: o) copyAndUnmarkMobileObjects toFinger <= (manager startOfObject: previousPin) copyAndUnmarkMobileObjects bytes + 2 * 8 > availableSpace copyAndUnmarkMobileObjects availableSpace > 0 releaseSavedFirstFieldsSpace savedFirstFieldsSpace start >= nilObj updatePointersInMobileObjects toFinger <= (manager startOfObject: o) updatePointersInMobileObjects toFinger <= (manager startOfObject: previousPin)
Less than half an hour's work. Now I can squash those bugs :-)
_,,,^..^,,,_ best, Eliot
Awesome! However ...
On Wed, Jan 25, 2017 at 6:22 PM, Eliot Miranda eliot.miranda@gmail.com wrote:
The default type for oops in the VM is sqInt (*), which is signed.
... since this keeps biting us time and time again, how hard would it be to change? Defaulting to unsigned seems much more reasonable, no? Unless we're dealing with SmallIntegers, which have to be special-cased anyways, we never want unsigned behavior, right?
- Bert - (*) fixed the typo
On 26.01.2017, at 11:42, Bert Freudenberg bert@freudenbergs.de wrote:
Awesome! However ...
On Wed, Jan 25, 2017 at 6:22 PM, Eliot Miranda eliot.miranda@gmail.com wrote: The default type for oops in the VM is sqInt (*), which is signed.
... since this keeps biting us time and time again, how hard would it be to change? Defaulting to unsigned seems much more reasonable, no? Unless we're dealing with SmallIntegers, which have to be special-cased anyways, we never want unsigned behavior, right?
*signed behavior?
- Bert -
(*) fixed the typo
On Thu, Jan 26, 2017 at 1:32 PM, Tobias Pape Das.Linux@gmx.de wrote:
On 26.01.2017, at 11:42, Bert Freudenberg bert@freudenbergs.de wrote:
Awesome! However ...
On Wed, Jan 25, 2017 at 6:22 PM, Eliot Miranda eliot.miranda@gmail.com
wrote:
The default type for oops in the VM is sqInt (*), which is signed.
... since this keeps biting us time and time again, how hard would it be
to change? Defaulting to unsigned seems much more reasonable, no? Unless we're dealing with SmallIntegers, which have to be special-cased anyways, we never want unsigned behavior, right?
*signed behavior?
Err, yes, we never want *signed* behavior unless comparing / shifting small ints.
- Bert -
- Bert -
(*) fixed the typo
Hi Bert,
On Jan 26, 2017, at 2:42 AM, Bert Freudenberg bert@freudenbergs.de wrote:
Awesome! However ...
On Wed, Jan 25, 2017 at 6:22 PM, Eliot Miranda eliot.miranda@gmail.com wrote: The default type for oops in the VM is sqInt (*), which is signed.
... since this keeps biting us time and time again, how hard would it be to change? Defaulting to unsigned seems much more reasonable, no? Unless we're dealing with SmallIntegers, which have to be special-cased anyways, we never want unsigned behavior, right?
well, I've wanted to change this for some time but potentially it's a lot of work, as it affects the plugin and platform support code as much as the VM source. But let's at least discuss it.
First, what exactly do we do? Right now we have sqInt & usqInt as oop-sized signed and unsigned types. My list of pros and cons below is off the top of my head, preliminary. Most of the below share the issue that we have to identify places where signed SmallInteger arithmetic is done and re-type them appropriately. Do we
- a) change the code generator to use usqInt as the default type in place of sqInt? pros presumably minimal changes except to code generator and removing pragmas that specify usqInt? cons rewriting platform support code to use usqInt for at least parameters
- b) add a new default type, e.g. oop_t, which is the same as usqInt? pros, see above cons figuring out which usqInt types apply to oops and which to unsigned values
- c) change sqInt to be unsigned and add e.g. ssqInt as a signed type pros don't have to change platform source
- d) add a new default type, e.g. oop_t, which is the same as char *? pros oops are pointers cons C allows arithmetic on char *, so this won't fix inappropriate arithmetic on oops It's undefined in C whether pointers to different structures can be compared (but I do not know if an implement ration that refuses to compare pointers as simply addresses)
- e) use void * instead of char * (too horrible bow to contemplate cuz one can't compare void *)
- f..z) ?
Bert, is there a similar analysis for SqueakJS or are the C types irrelevant and ignored?
Anyway, looks to me like a) or c) are the main contenders. What d'all think?
- Bert -
(*) fixed the typo
Surely since we’re concerned about (ab)use of signed values for pointers to words we ought to define a ‘proper’ sqOop thing and use that instead of a slightly cleaned up derivative of how the Slang default was left as ‘int’ twenty years ago?
Yes, any change means a bunch of work to do but isn’t that always the price to improve things?
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: LB: connect Line-voltage to BITNET
+1
Dave
Surely since weâre concerned about (ab)use of signed values for pointers to words we ought to define a âproperâ sqOop thing and use that instead of a slightly cleaned up derivative of how the Slang default was left as âintâ twenty years ago?
Yes, any change means a bunch of work to do but isnât that always the price to improve things?
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: LB: connect Line-voltage to BITNET
+1
Sent from my iPhone
On Jan 26, 2017, at 10:58, David T. Lewis lewis@mail.msen.com wrote:
+1
Dave
Surely since weâre concerned about (ab)use of signed values for pointers to words we ought to define a âproperâ sqOop thing and use that instead of a slightly cleaned up derivative of how the Slang default was left as âintâ twenty years ago?
Yes, any change means a bunch of work to do but isnât that always the price to improve things?
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: LB: connect Line-voltage to BITNET
+1
with C i was always wondering, why C standard does not provides an unsigned integer type which size strictly corresponds to size of pointer type, e.g. sizeof(some standard uint) == sizeof(void*) that would solve so many problems and make thing so much easier..
instead they had this: http://www.gnu.org/software/libc/manual/html_node/Important-Data-Types.html Data Type: *ptrdiff_t*
This is the signed integer type of the result of subtracting two pointers. For example, with the declaration char *p1, *p2;, the expression p2 - p1 is of type ptrdiff_t. This will probably be one of the standard signed integer types (short int, int or long int), but might be a nonstandard type that exists only for this purpose.
it is easy to see, that sizeof(ptrdiff_t) should be _at least_ the sizeof(void*), else it won't be able to cover all possible values when taking difference between two pointers.. alas, it is signed.. and alas, nothing says that it cannot be _more_ than size of pointer in memory.. simply because you can always fit values that take 32 bits into 64-bit value holder..
One could say, that there's size_t that could be used as a unsigned alternative to ptrdiff_t, unfortunately, ISO gives no guarantees that its size corresponds to pointer size.
Btw,look what i found :) https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html#Integer-Ov...
that could be useful for implementing small-int arithmetics in plain C.. unfortunately this is extension, and don't shoot for proposing to use non-standard C herecy :)
On 26 January 2017 at 21:37, John McIntosh johnmci@smalltalkconsulting.com wrote:
+1
Sent from my iPhone
On Jan 26, 2017, at 10:58, David T. Lewis lewis@mail.msen.com wrote:
+1
Dave
Surely since we’re concerned about (ab)use of signed values for
pointers
to words we ought to define a ‘proper’ sqOop thing and use that instead of a slightly cleaned up derivative of how the Slang default was left as ‘int’ twenty years ago?
Yes, any change means a bunch of work to do but isn’t that always the price to improve things?
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: LB: connect Line-voltage to BITNET
2017-01-26 21:20 GMT+01:00 Igor Stasenko siguctua@gmail.com:
+1
with C i was always wondering, why C standard does not provides an unsigned integer type which size strictly corresponds to size of pointer type, e.g. sizeof(some standard uint) == sizeof(void*) that would solve so many problems and make thing so much easier..
In C99, there is intptr_t (and all the accompanying MACROS for printf/scanf formats, maximum value etc...)
instead they had this: http://www.gnu.org/software/libc/manual/html_node/ Important-Data-Types.html Data Type: *ptrdiff_t*
This is the signed integer type of the result of subtracting two pointers. For example, with the declaration char *p1, *p2;, the expression p2 - p1 is of type ptrdiff_t. This will probably be one of the standard signed integer types (short int, int or long int), but might be a nonstandard type that exists only for this purpose.
it is easy to see, that sizeof(ptrdiff_t) should be _at least_ the sizeof(void*), else it won't be able to cover all possible values when taking difference between two pointers.. alas, it is signed.. and alas, nothing says that it cannot be _more_ than size of pointer in memory.. simply because you can always fit values that take 32 bits into 64-bit value holder..
Technically you should only compare pointers that points to parts of a same structure or array (or byte just after the end of an array), otherwise it's UB. I guess this should apply to ptrdiff_t, in which case ptrdiff_t should be as long as a size_t, except it is signed.
One could say, that there's size_t that could be used as a unsigned alternative to ptrdiff_t, unfortunately, ISO gives no guarantees that its size corresponds to pointer size.
size_t does not necessarily fit intptr_t indeed, use the right type.
Btw,look what i found :) https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html# Integer-Overflow-Builtins
that could be useful for implementing small-int arithmetics in plain C.. unfortunately this is extension, and don't shoot for proposing to use non-standard C herecy :)
On 26 January 2017 at 21:37, John McIntosh <johnmci@smalltalkconsulting. com> wrote:
+1
Sent from my iPhone
On Jan 26, 2017, at 10:58, David T. Lewis lewis@mail.msen.com wrote:
+1
Dave
Surely since we’re concerned about (ab)use of signed values for
pointers
to words we ought to define a ‘proper’ sqOop thing and use that instead of a slightly cleaned up derivative of how the Slang default
was
left as ‘int’ twenty years ago?
Yes, any change means a bunch of work to do but isn’t that always the price to improve things?
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: LB: connect Line-voltage to BITNET
-- Best regards, Igor Stasenko.
On 26 January 2017 at 22:32, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> wrote:
2017-01-26 21:20 GMT+01:00 Igor Stasenko siguctua@gmail.com:
+1
with C i was always wondering, why C standard does not provides an unsigned integer type which size strictly corresponds to size of pointer type, e.g. sizeof(some standard uint) == sizeof(void*) that would solve so many problems and make thing so much easier..
In C99, there is intptr_t (and all the accompanying MACROS for printf/scanf formats, maximum value etc...)
i guess, you meant *uintptr_t* then
but it was introduced only in C99.. and compilers started supporting C99 years later.. but the problem is that AFAIK, first lines of Squeak were written around '96 and people had to deal with what they had..
And then there's a people, whose main argument for using C is 'its portable'.. Hell yes, except it wasn't when portable things, like our VM were written first :)
2017-01-26 21:46 GMT+01:00 Igor Stasenko siguctua@gmail.com:
On 26 January 2017 at 22:32, Nicolas Cellier <nicolas.cellier.aka.nice@ gmail.com> wrote:
2017-01-26 21:20 GMT+01:00 Igor Stasenko siguctua@gmail.com:
+1
with C i was always wondering, why C standard does not provides an unsigned integer type which size strictly corresponds to size of pointer type, e.g. sizeof(some standard uint) == sizeof(void*) that would solve so many problems and make thing so much easier..
In C99, there is intptr_t (and all the accompanying MACROS for printf/scanf formats, maximum value etc...)
i guess, you meant *uintptr_t* then
but it was introduced only in C99.. and compilers started supporting C99 years later.. but the problem is that AFAIK, first lines of Squeak were written around '96 and people had to deal with what they had..
And then there's a people, whose main argument for using C is 'its portable'.. Hell yes, except it wasn't when portable things, like our VM were written first :)
If not all compilers support C99, most support at least uintptr_t But because we are really caring of obsolete configurations, we have introduced a usqIntptr_t...
However, an oop is not necessarily a usqIntptr_t, for example if we have 32bits image on 64bits VM, which might still be an interesting case.
-- Best regards, Igor Stasenko.
On 26 January 2017 at 22:58, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> wrote:
2017-01-26 21:46 GMT+01:00 Igor Stasenko siguctua@gmail.com:
On 26 January 2017 at 22:32, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> wrote:
2017-01-26 21:20 GMT+01:00 Igor Stasenko siguctua@gmail.com:
+1
with C i was always wondering, why C standard does not provides an unsigned integer type which size strictly corresponds to size of pointer type, e.g. sizeof(some standard uint) == sizeof(void*) that would solve so many problems and make thing so much easier..
In C99, there is intptr_t (and all the accompanying MACROS for printf/scanf formats, maximum value etc...)
i guess, you meant *uintptr_t* then
but it was introduced only in C99.. and compilers started supporting C99 years later.. but the problem is that AFAIK, first lines of Squeak were written around '96 and people had to deal with what they had..
And then there's a people, whose main argument for using C is 'its portable'.. Hell yes, except it wasn't when portable things, like our VM were written first :)
If not all compilers support C99, most support at least uintptr_t But because we are really caring of obsolete configurations, we have introduced a usqIntptr_t...
However, an oop is not necessarily a usqIntptr_t, for example if we have 32bits image on 64bits VM, which might still be an interesting case.
you mean interesting in terms, like implementing a 70's era processor or DoS simulator(s) in javascript, so you can open it on web page and have fun? :)
-- Best regards, Igor Stasenko.
On 26.01.2017, at 21:20, Igor Stasenko siguctua@gmail.com wrote:
+1
with C i was always wondering, why C standard does not provides an unsigned integer type which size strictly corresponds to size of pointer type, e.g. sizeof(some standard uint) == sizeof(void*)
You mean uintptr_t?
https://en.wikipedia.org/wiki/C_data_types#Fixed-width_integer_types
C11 says:
7.20.1.4 Integer types capable of holding object pointers
1 The following type designates a signed integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:
intptr_t
The following type designates an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer:
uintptr_t
These types are optional.
that would solve so many problems and make thing so much easier..
instead they had this: http://www.gnu.org/software/libc/manual/html_node/Important-Data-Types.html Data Type: ptrdiff_t This is the signed integer type of the result of subtracting two pointers. For example, with the declaration char *p1, *p2;, the expression p2 - p1 is of type ptrdiff_t. This will probably be one of the standard signed integer types (short int, int or long int), but might be a nonstandard type that exists only for this purpose.
it is easy to see, that sizeof(ptrdiff_t) should be _at least_ the sizeof(void*), else it won't be able to cover all possible values when taking difference between two pointers.. alas, it is signed.. and alas, nothing says that it cannot be _more_ than size of pointer in memory.. simply because you can always fit values that take 32 bits into 64-bit value holder..
One could say, that there's size_t that could be used as a unsigned alternative to ptrdiff_t, unfortunately, ISO gives no guarantees that its size corresponds to pointer size.
Btw,look what i found :) https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html#Integer-Ov...
that could be useful for implementing small-int arithmetics in plain C.. unfortunately this is extension, and don't shoot for proposing to use non-standard C herecy :)
On 26 January 2017 at 21:37, John McIntosh johnmci@smalltalkconsulting.com wrote:
+1
Sent from my iPhone
On Jan 26, 2017, at 10:58, David T. Lewis lewis@mail.msen.com wrote:
+1
Dave
Surely since we’re concerned about (ab)use of signed values for pointers to words we ought to define a ‘proper’ sqOop thing and use that instead of a slightly cleaned up derivative of how the Slang default was left as ‘int’ twenty years ago?
Yes, any change means a bunch of work to do but isn’t that always the price to improve things?
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: LB: connect Line-voltage to BITNET
-- Best regards, Igor Stasenko.
Some requirements:
1) Slang must be able to code boolean and arithmetic expressions without the need of type hint.
2) If we still want to support 32bits image on 64bits VM, then it would be good to avoid implicit conversions between machine pointer (char *) and oop. We should allways use char *pointerForOop(sqOop oop) and sqOop oopForPointer(char *).
3) we must avoid as much as possible to invoke UB, or the compiled code will start to crash or behave wrongly as soon as we use an optimization flag, and this will get worse at each C compiler version.
So far, sqInt is defined as default type and works well for 1). We have to cast to unsigned here and there though to avoid UB (like left shifting a signed value, or testing for overflow in post condition). But also to avoid unwanted sign extension when we manipulate chnuks of bit (like high and low part) - we had several bugs with that.
I much prefer unsigned, because it's almost free from UB. With signed, we just don't have right to overflow, that's crazy... But if we switch to unsigned, my fear is to have to add many type hints... Because telling automatically which arithmetic expressions requires signed, and which would work with unsigned is not easy. for example, for( i=size-1;i >= 0; i--) isn't going to do what we want. Nor c = a - b; if ( c < 0 ) ...
Declaring sqOop as a union could help 2) but would hardly fit 1). Maybe type inference can solve it entirely, and it's not a problem?
Declaring sqOop as (char *) would not support 2) And we could not compare oops any more because it's UB! We could eventually use slang type inference and force (usqInt) oop1 < (usqInt) oop2... I don't like this option and much prefer explicit char *pointerForOop(sqOop oop) and sqOop oopForPointer(char *).
2017-01-26 20:37 GMT+01:00 John McIntosh johnmci@smalltalkconsulting.com:
+1
Sent from my iPhone
On Jan 26, 2017, at 10:58, David T. Lewis lewis@mail.msen.com wrote:
+1
Dave
Surely since we’re concerned about (ab)use of signed values for
pointers
to words we ought to define a ‘proper’ sqOop thing and use that instead of a slightly cleaned up derivative of how the Slang default was left as ‘int’ twenty years ago?
Yes, any change means a bunch of work to do but isn’t that always the price to improve things?
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: LB: connect Line-voltage to BITNET
And structure the changes to cause a net code loss, so the next set of changes is easier to make. Repeat.
On 1/26/17 10:54 , tim Rowledge wrote:
Surely since we’re concerned about (ab)use of signed values for pointers to words we ought to define a ‘proper’ sqOop thing and use that instead of a slightly cleaned up derivative of how the Slang default was left as ‘int’ twenty years ago?
Yes, any change means a bunch of work to do but isn’t that always the price to improve things?
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: LB: connect Line-voltage to BITNET
On Fri, Jan 27, 2017 at 12:52 AM, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Bert,
On Jan 26, 2017, at 2:42 AM, Bert Freudenberg bert@freudenbergs.de wrote:
Awesome! However ...
On Wed, Jan 25, 2017 at 6:22 PM, Eliot Miranda eliot.miranda@gmail.com wrote:
The default type for oops in the VM is sqInt (*), which is signed.
... since this keeps biting us time and time again, how hard would it be to change? Defaulting to unsigned seems much more reasonable, no? Unless we're dealing with SmallIntegers, which have to be special-cased anyways, we never want unsigned behavior, right?
well, I've wanted to change this for some time but potentially it's a lot of work, as it affects the plugin and platform support code as much as the VM source. But let's at least discuss it.
First, what exactly do we do? Right now we have sqInt & usqInt as oop-sized signed and unsigned types. My list of pros and cons below is off the top of my head, preliminary. Most of the below share the issue that we have to identify places where signed SmallInteger arithmetic is done and re-type them appropriately. Do we
a) change the code generator to use usqInt as the default type in place of sqInt? pros presumably minimal changes except to code generator and removing pragmas that specify usqInt? cons rewriting platform support code to use usqInt for at least parameters
b) add a new default type, e.g. oop_t, which is the same as usqInt? pros, see above cons figuring out which usqInt types apply to oops and which to unsigned values
c) change sqInt to be unsigned and add e.g. ssqInt as a signed type pros don't have to change platform source
d) add a new default type, e.g. oop_t, which is the same as char *? pros oops are pointers cons C allows arithmetic on char *, so this won't fix inappropriate arithmetic on oops It's undefined in C whether pointers to different structures can be compared (but I do not know if an implement ration that refuses to compare pointers as simply addresses)
e) use void * instead of char * (too horrible bow to contemplate cuz one can't compare void *)
f..z) ?
Bert, is there a similar analysis for SqueakJS or are the C types irrelevant and ignored?
Anyway, looks to me like a) or c) are the main contenders. What d'all think?
What is the overlap in the cons between a) and b) ? Is one a super-set of the other or are the changes orthogonal?
For stability during the progressive work on a) would it help to anyway replace sqInt by oop_t, and have an experimental branch where "typdef usqInt oop_t" while the main Cog branch remains "typedef sqInt oop_t" ?
cheers -ben
On Thu, Jan 26, 2017 at 5:52 PM, Eliot Miranda eliot.miranda@gmail.com wrote:
- a) change the code generator to use usqInt as the default type in place
of sqInt?
- b) add a new default type, e.g. oop_t, which is the same as usqInt?
- c) change sqInt to be unsigned and add e.g. ssqInt as a signed type
- d) add a new default type, e.g. oop_t, which is the same as char *?
- e) use void * instead of char * (too horrible bow to contemplate cuz one
can't compare void *)
I'd think b should work fine. sqOop as suggested by Tim sounds good.
Bert, is there a similar analysis for SqueakJS or are the C types irrelevant and ignored?
Well, most of SqueakJS is written by hand and not generated. That is the interpreter, jit, most plugins. Non-immediate oops are direct object references, there is no way to access their numeric value.
Where it is relevant is in the Slang sources for a couple of plugins that I translated to JS. My VMMakerJS subclasses the regular code generator and I had to add a crude type inference mechanism and some special-case hacks to make sized pointer arithmetic and signed/unsigned stuff work.
I guess if I had to touch it again (still using the generated code from 2 years ago) I may be better off using the oscog version instead of baking my own type inferencer. In any case, a clear distinction between pointers and arithmetic types would be a Good Thing.
- Bert -
I recently tackled this problem with the scheme below:
1. qOop is a pointer sized unsigned integer.
2. oop is typedef'ed as a pointer to a qOop.
3. sqOop is a pointer sized signed integer.
4. If *oop is tagged as a small integer, one has the right to cast *oop as an sqOop. This new sqOop is the small integer value encoded in the oop.
5. If *oop is not tagged, one has the right to cast *oop as an oop, and dereference this new oop.
This naming scheme subsumes the "int" part within the "q". In practice, qOop makes a lot of sense because it reads as "q the size of an oop", just like qN reads as "q of N bits" --- although I'd explore letting N's unit be bytes instead. Overall, the meaning is very close to the underlying assembly / compiler output / jitted code.
The sqOop type for small integers is close enough, I doubt we will have so many tag bits as to justify a small integer value type significantly smaller than that of a signed integer the size of a pointer.
I'd rather not key in uintptr_t and friends every time, so:
typedef uintptr_t qOop;
I replaced the previously existing type scheme with the above, and 1000 LOC went away in the process. New code feels easy to write, too. I'd say this approach turned out very well.
Nothing prevents the following, if needed:
#if PRE_C99 typedef PLATFORM_POINTER_SIZED_UNSIGNED_INTEGER uintptr_t; #endif
Andres.
On 1/26/17 8:52 , Eliot Miranda wrote:
Hi Bert,
On Jan 26, 2017, at 2:42 AM, Bert Freudenberg <bert@freudenbergs.de mailto:bert@freudenbergs.de> wrote:
Awesome! However ...
On Wed, Jan 25, 2017 at 6:22 PM, Eliot Miranda <eliot.miranda@gmail.com mailto:eliot.miranda@gmail.com> wrote:
The default type for oops in the VM is sqInt (*), which is signed.
... since this keeps biting us time and time again, how hard would it be to change? Defaulting to unsigned seems much more reasonable, no? Unless we're dealing with SmallIntegers, which have to be special-cased anyways, we never want unsigned behavior, right?
well, I've wanted to change this for some time but potentially it's a lot of work, as it affects the plugin and platform support code as much as the VM source. But let's at least discuss it.
First, what exactly do we do? Right now we have sqInt & usqInt as oop-sized signed and unsigned types. My list of pros and cons below is off the top of my head, preliminary. Most of the below share the issue that we have to identify places where signed SmallInteger arithmetic is done and re-type them appropriately. Do we
- a) change the code generator to use usqInt as the default type in
place of sqInt? pros presumably minimal changes except to code generator and removing pragmas that specify usqInt? cons rewriting platform support code to use usqInt for at least parameters
- b) add a new default type, e.g. oop_t, which is the same as usqInt? pros, see above cons figuring out which usqInt types apply to oops and which to
unsigned values
c) change sqInt to be unsigned and add e.g. ssqInt as a signed type pros don't have to change platform source
d) add a new default type, e.g. oop_t, which is the same as char *? pros oops are pointers cons C allows arithmetic on char *, so this won't fix inappropriate
arithmetic on oops It's undefined in C whether pointers to different structures can be compared (but I do not know if an implement ration that refuses to compare pointers as simply addresses)
- e) use void * instead of char * (too horrible bow to contemplate cuz
one can't compare void *)
- f..z) ?
Bert, is there a similar analysis for SqueakJS or are the C types irrelevant and ignored?
Anyway, looks to me like a) or c) are the main contenders. What d'all think?
- Bert -
(*) fixed the typo
vm-dev@lists.squeakfoundation.org