Nicolas Cellier uploaded a new version of VMMaker to project VM Maker Inbox: http://source.squeak.org/VMMakerInbox/VMMaker.oscog-nice.3174.mcz
==================== Summary ====================
Name: VMMaker.oscog-nice.3174 Author: nice Time: 9 March 2022, 1:04:51.911686 pm UUID: af4f8704-18fa-564c-ac59-27c6f6364c59 Ancestors: VMMaker.oscog-eem.3173
Workaround : literal characters should have unsigned values
Currently in SistaV1, characters in range 16r100 16rFFFF are encoded as push literal character (bytecode 233) with signed extended B.
This lead to negative character values in the range 16r8000 to 16rFFFF.
The workaround consist in bit-oring extB with 16rFF so as to re-interpret it as unsigned char. BEWARE: this assumes that a single extB will be used rather than multiple consecutive extB. It works by now, because it's currently the case that image side only use 1 extB for encoding literal character - see EncoderForSistaV1>>#genPushCharacter: - above 16rFFFF, a literal index is consumed.
An alternative would be to use extA rather than extB. But it would require a careful recompilation of CompiledMethod at image side.
Similar fix is required at image side in InstructionStream>>#interpretNext2ByteSistaV1Instruction:for:extA:extB:startPC:
=============== Diff against VMMaker.oscog-eem.3173 ===============
Item was changed: ----- Method: SimpleStackBasedCogit>>genExtPushCharacterBytecode (in category 'bytecode generators') ----- genExtPushCharacterBytecode "SistaV1: 233 11101001 iiiiiiii Push Character #iiiiiiii (+ Extend B * 256)" | value | + value := byte1 + ((extB bitAnd: 16rFF) << 8). - value := byte1 + (extB << 8). extB := 0. numExtB := 0. ^self genPushLiteral: (objectMemory characterObjectOf: value)!
Item was changed: ----- Method: StackInterpreter>>extPushCharacterBytecode (in category 'stack bytecodes') ----- extPushCharacterBytecode "SistaV1: * 233 11101001 iiiiiiii Push Character #iiiiiiii (+ Extend B * 256)" | value | + value := self fetchByte + ((extB bitAnd: 16rFF) << 8). - value := self fetchByte + (extB << 8). self fetchNextBytecode. self internalPush: (objectMemory characterObjectOf: value). numExtB := extB := 0!
Hi Nicolas --
Thanks for finding this workaround!
Given some later point in time, how bad could it bite us back considering extA vs extB? Would one notice this very quickly? What does "careful recompilation of CompiledMethod at image side" mean here? What can go wrong?
Best, Marcel Am 09.03.2022 13:05:56 schrieb commits@source.squeak.org commits@source.squeak.org:
Nicolas Cellier uploaded a new version of VMMaker to project VM Maker Inbox: http://source.squeak.org/VMMakerInbox/VMMaker.oscog-nice.3174.mcz
==================== Summary ====================
Name: VMMaker.oscog-nice.3174 Author: nice Time: 9 March 2022, 1:04:51.911686 pm UUID: af4f8704-18fa-564c-ac59-27c6f6364c59 Ancestors: VMMaker.oscog-eem.3173
Workaround : literal characters should have unsigned values
Currently in SistaV1, characters in range 16r100 16rFFFF are encoded as push literal character (bytecode 233) with signed extended B.
This lead to negative character values in the range 16r8000 to 16rFFFF.
The workaround consist in bit-oring extB with 16rFF so as to re-interpret it as unsigned char. BEWARE: this assumes that a single extB will be used rather than multiple consecutive extB. It works by now, because it's currently the case that image side only use 1 extB for encoding literal character - see EncoderForSistaV1>>#genPushCharacter: - above 16rFFFF, a literal index is consumed.
An alternative would be to use extA rather than extB. But it would require a careful recompilation of CompiledMethod at image side.
Similar fix is required at image side in InstructionStream>>#interpretNext2ByteSistaV1Instruction:for:extA:extB:startPC:
=============== Diff against VMMaker.oscog-eem.3173 ===============
Item was changed: ----- Method: SimpleStackBasedCogit>>genExtPushCharacterBytecode (in category 'bytecode generators') ----- genExtPushCharacterBytecode "SistaV1: 233 11101001 iiiiiiii Push Character #iiiiiiii (+ Extend B * 256)" | value | + value := byte1 + ((extB bitAnd: 16rFF) - value := byte1 + (extB extB := 0. numExtB := 0. ^self genPushLiteral: (objectMemory characterObjectOf: value)!
Item was changed: ----- Method: StackInterpreter>>extPushCharacterBytecode (in category 'stack bytecodes') ----- extPushCharacterBytecode "SistaV1: * 233 11101001 iiiiiiii Push Character #iiiiiiii (+ Extend B * 256)" | value | + value := self fetchByte + ((extB bitAnd: 16rFF) - value := self fetchByte + (extB self fetchNextBytecode. self internalPush: (objectMemory characterObjectOf: value). numExtB := extB := 0!
Hi Marcel, using extA instead of extB need synchronization between - VM to use extA in SistaV1 interpreter (and jit) - image side to use extA in compiled code generation (EncoderForSistaV1)
Being careful is maybe not necessary if we have all source code reachable. I was thinking of the case when we have to decompile, that would mean decompile with extB, recompile with extA. One careful solution would be to - restrict optimized Character value in range 0-16rFF - recompile all - switch to extA in VM and image side - restore optimized Character value in range 0-16rFFFF - recompile all
But mixing old extB VM and new extA image or vice versa would lead to bad things (loose upper bits of literal character value). Having a VM supporting both extB + extA would come at an extra cost (maybe not that high, it could be a simple extB+extA bitAnd: 16rFF) So I'd better ask for Eliot advice. The main interest of extA is that it could be extended above 16rFFFF simply by chaining several extended A (224) bytecodes. This is not as easy with extB, we have to undo the signed transform taking numExtB into account...
Alternatively, we could try to workaround at image side: test the VM at image startup. Recompiling at startup is too heavy, better operate at a lower level, that is detect and fix incompatible compiled method (find which is using extB (225)+pushLiteralChar (233) and replace with extA(224)+puchLiteralChar or vice versa). Seems a bit overkill to me, and does not fix the case old image (without workaround) + new VM.
However, all this might as well be a false problem. How many literal characters > 16rFF do we have in trunk ? Since we had poor support for extended (not to say unicode) fonts until recently, probably not that many ! A $? in source code is not helpful !
Le mer. 9 mars 2022 à 13:16, Marcel Taeumel marcel.taeumel@hpi.de a écrit :
Hi Nicolas --
Thanks for finding this workaround!
Given some later point in time, how bad could it bite us back considering extA vs extB? Would one notice this very quickly? What does "careful recompilation of CompiledMethod at image side" mean here? What can go wrong?
Best, Marcel
Am 09.03.2022 13:05:56 schrieb commits@source.squeak.org < commits@source.squeak.org>:
Nicolas Cellier uploaded a new version of VMMaker to project VM Maker Inbox: http://source.squeak.org/VMMakerInbox/VMMaker.oscog-nice.3174.mcz
==================== Summary ====================
Name: VMMaker.oscog-nice.3174 Author: nice Time: 9 March 2022, 1:04:51.911686 pm UUID: af4f8704-18fa-564c-ac59-27c6f6364c59 Ancestors: VMMaker.oscog-eem.3173
Workaround : literal characters should have unsigned values
Currently in SistaV1, characters in range 16r100 16rFFFF are encoded as push literal character (bytecode 233) with signed extended B.
This lead to negative character values in the range 16r8000 to 16rFFFF.
The workaround consist in bit-oring extB with 16rFF so as to re-interpret it as unsigned char. BEWARE: this assumes that a single extB will be used rather than multiple consecutive extB. It works by now, because it's currently the case that image side only use 1 extB for encoding literal character - see EncoderForSistaV1>>#genPushCharacter: - above 16rFFFF, a literal index is consumed.
An alternative would be to use extA rather than extB. But it would require a careful recompilation of CompiledMethod at image side.
Similar fix is required at image side in InstructionStream>>#interpretNext2ByteSistaV1Instruction:for:extA:extB:startPC:
=============== Diff against VMMaker.oscog-eem.3173 ===============
Item was changed: ----- Method: SimpleStackBasedCogit>>genExtPushCharacterBytecode (in category 'bytecode generators') ----- genExtPushCharacterBytecode "SistaV1: 233 11101001 iiiiiiii Push Character #iiiiiiii (+ Extend B * 256)" | value |
- value := byte1 + ((extB bitAnd: 16rFF) << 8).
- value := byte1 + (extB << 8).
extB := 0. numExtB := 0. ^self genPushLiteral: (objectMemory characterObjectOf: value)!
Item was changed: ----- Method: StackInterpreter>>extPushCharacterBytecode (in category 'stack bytecodes') ----- extPushCharacterBytecode "SistaV1: * 233 11101001 iiiiiiii Push Character #iiiiiiii (+ Extend B * 256)" | value |
- value := self fetchByte + ((extB bitAnd: 16rFF) << 8).
- value := self fetchByte + (extB << 8).
self fetchNextBytecode. self internalPush: (objectMemory characterObjectOf: value). numExtB := extB := 0!
Hi Nicolas --
Thanks for this explanation.
However, all this might as well be a false problem. How many literal characters > 16rFF do we have in trunk ? Since we had poor support for extended (not to say unicode) fonts until recently, probably not that many ! A $? in source code is not helpful !
Yes, the source code in Trunk is fine. I stumbled upon this issue while debugging things around Character, #leadingChar, glyph browsing, and TextConverter. I got used to putting a $ in front of some visible thing in a text field to then send #charCode, #asInteger, or #leadingChar to it. Being able to do-it/inspect-it/print-it in any text field is very convenient. I could have done this via '...' first as well. An exception would have been nice, though.
For the time being, it makes sense to just signal an error when the user tries to create literal characters between 16r8000 and 16rFFFF.
If you happen to know the exact place in the image, would you mind adding this check in Trunk to EncoderForSistaV1? :-)
Best, Marcel Am 09.03.2022 23:16:25 schrieb Nicolas Cellier nicolas.cellier.aka.nice@gmail.com: Hi Marcel, using extA instead of extB need synchronization between - VM to use extA in SistaV1 interpreter (and jit) - image side to use extA in compiled code generation (EncoderForSistaV1)
Being careful is maybe not necessary if we have all source code reachable. I was thinking of the case when we have to decompile, that would mean decompile with extB, recompile with extA. One careful solution would be to - restrict optimized Character value in range 0-16rFF - recompile all - switch to extA in VM and image side - restore optimized Character value in range 0-16rFFFF - recompile all
But mixing old extB VM and new extA image or vice versa would lead to bad things (loose upper bits of literal character value). Having a VM supporting both extB + extA would come at an extra cost (maybe not that high, it could be a simple extB+extA bitAnd: 16rFF) So I'd better ask for Eliot advice. The main interest of extA is that it could be extended above 16rFFFF simply by chaining several extended A (224) bytecodes. This is not as easy with extB, we have to undo the signed transform taking numExtB into account...
Alternatively, we could try to workaround at image side: test the VM at image startup. Recompiling at startup is too heavy, better operate at a lower level, that is detect and fix incompatible compiled method (find which is using extB (225)+pushLiteralChar (233) and replace with extA(224)+puchLiteralChar or vice versa). Seems a bit overkill to me, and does not fix the case old image (without workaround) + new VM.
However, all this might as well be a false problem. How many literal characters > 16rFF do we have in trunk ? Since we had poor support for extended (not to say unicode) fonts until recently, probably not that many ! A $? in source code is not helpful !
Le mer. 9 mars 2022 à 13:16, Marcel Taeumel a écrit :
Hi Nicolas --
Thanks for finding this workaround!
Given some later point in time, how bad could it bite us back considering extA vs extB? Would one notice this very quickly? What does "careful recompilation of CompiledMethod at image side" mean here? What can go wrong?
Best, Marcel
Am 09.03.2022 13:05:56 schrieb commits@source.squeak.org < commits@source.squeak.org>:
Nicolas Cellier uploaded a new version of VMMaker to project VM Maker Inbox: http://source.squeak.org/VMMakerInbox/VMMaker.oscog-nice.3174.mcz
==================== Summary ====================
Name: VMMaker.oscog-nice.3174 Author: nice Time: 9 March 2022, 1:04:51.911686 pm UUID: af4f8704-18fa-564c-ac59-27c6f6364c59 Ancestors: VMMaker.oscog-eem.3173
Workaround : literal characters should have unsigned values
Currently in SistaV1, characters in range 16r100 16rFFFF are encoded as push literal character (bytecode 233) with signed extended B.
This lead to negative character values in the range 16r8000 to 16rFFFF.
The workaround consist in bit-oring extB with 16rFF so as to re-interpret it as unsigned char. BEWARE: this assumes that a single extB will be used rather than multiple consecutive extB. It works by now, because it's currently the case that image side only use 1 extB for encoding literal character - see EncoderForSistaV1>>#genPushCharacter: - above 16rFFFF, a literal index is consumed.
An alternative would be to use extA rather than extB. But it would require a careful recompilation of CompiledMethod at image side.
Similar fix is required at image side in InstructionStream>>#interpretNext2ByteSistaV1Instruction:for:extA:extB:startPC:
=============== Diff against VMMaker.oscog-eem.3173 ===============
Item was changed: ----- Method: SimpleStackBasedCogit>>genExtPushCharacterBytecode (in category 'bytecode generators') ----- genExtPushCharacterBytecode "SistaV1: 233 11101001 iiiiiiii Push Character #iiiiiiii (+ Extend B * 256)" | value |
- value := byte1 + ((extB bitAnd: 16rFF) << 8).
- value := byte1 + (extB << 8).
extB := 0. numExtB := 0. ^self genPushLiteral: (objectMemory characterObjectOf: value)!
Item was changed: ----- Method: StackInterpreter>>extPushCharacterBytecode (in category 'stack bytecodes') ----- extPushCharacterBytecode "SistaV1: * 233 11101001 iiiiiiii Push Character #iiiiiiii (+ Extend B * 256)" | value |
- value := self fetchByte + ((extB bitAnd: 16rFF) << 8).
- value := self fetchByte + (extB << 8).
self fetchNextBytecode. self internalPush: (objectMemory characterObjectOf: value). numExtB := extB := 0!
Hi Marcel, using extA instead of extB need synchronization between - VM to use extA in SistaV1 interpreter (and jit)
- image side to use extA in compiled code generation (EncoderForSistaV1)
Being careful is maybe not necessary if we have all source code reachable. I was thinking of the case when we have to decompile, that would mean decompile with extB, recompile with extA. One careful solution would be to - restrict optimized Character value in range 0-16rFF - recompile all - switch to extA in VM and image side
- restore optimized Character value in range 0-16rFFFF - recompile all
But mixing old extB VM and new extA image or vice versa would lead to bad things (loose upper bits of literal character value). Having a VM supporting both extB + extA would come at an extra cost (maybe not that high, it could be a simple extB+extA bitAnd: 16rFF) So I'd better ask for Eliot advice. The main interest of extA is that it could be extended above 16rFFFF simply by chaining several extended A (224) bytecodes. This is not as easy with extB, we have to undo the signed transform taking numExtB into account...
Alternatively, we could try to workaround at image side: test the VM at image startup. Recompiling at startup is too heavy, better operate at a lower level, that is detect and fix incompatible compiled method (find which is using extB (225)+pushLiteralChar (233) and replace with extA(224)+puchLiteralChar or vice versa). Seems a bit overkill to me, and does not fix the case old image (without workaround) + new VM.
However, all this might as well be a false problem. How many literal characters > 16rFF do we have in trunk ? Since we had poor support for extended (not to say unicode) fonts until recently, probably not that many ! A $? in source code is not helpful !
Le mer. 9 mars 2022 à 13:16, Marcel Taeumel <marcel.taeumel@hpi.de [mailto:marcel.taeumel@hpi.de]> a écrit :
Hi Nicolas --
Thanks for finding this workaround!
Given some later point in time, how bad could it bite us back considering extA vs extB? Would one notice this very quickly? What does "careful recompilation of CompiledMethod at image side" mean here? What can go wrong?
Best, Marcel
Am 09.03.2022 13:05:56 schrieb commits@source.squeak.org [mailto:commits@source.squeak.org] <commits@source.squeak.org [mailto:commits@source.squeak.org]>:
Nicolas Cellier uploaded a new version of VMMaker to project VM Maker Inbox: http://source.squeak.org/VMMakerInbox/VMMaker.oscog-nice.3174.mcz [http://source.squeak.org/VMMakerInbox/VMMaker.oscog-nice.3174.mcz]
==================== Summary ====================
Name: VMMaker.oscog-nice.3174 Author: nice Time: 9 March 2022, 1:04:51.911686 pm UUID: af4f8704-18fa-564c-ac59-27c6f6364c59 Ancestors: VMMaker.oscog-eem.3173
Workaround : literal characters should have unsigned values
Currently in SistaV1, characters in range 16r100 16rFFFF are encoded as push literal character (bytecode 233) with signed extended B.
This lead to negative character values in the range 16r8000 to 16rFFFF.
The workaround consist in bit-oring extB with 16rFF so as to re-interpret it as unsigned char. BEWARE: this assumes that a single extB will be used rather than multiple consecutive extB. It works by now, because it's currently the case that image side only use 1 extB for encoding literal character - see EncoderForSistaV1>>#genPushCharacter: - above 16rFFFF, a literal index is consumed.
An alternative would be to use extA rather than extB. But it would require a careful recompilation of CompiledMethod at image side.
Similar fix is required at image side in InstructionStream>>#interpretNext2ByteSistaV1Instruction:for:extA:extB:startPC:
=============== Diff against VMMaker.oscog-eem.3173 ===============
Item was changed: ----- Method: SimpleStackBasedCogit>>genExtPushCharacterBytecode (in category 'bytecode generators') ----- genExtPushCharacterBytecode "SistaV1: 233 11101001 iiiiiiii Push Character #iiiiiiii (+ Extend B * 256)" | value | + value := byte1 + ((extB bitAnd: 16rFF) << 8). - value := byte1 + (extB << 8). extB := 0. numExtB := 0. ^self genPushLiteral: (objectMemory characterObjectOf: value)!
Item was changed: ----- Method: StackInterpreter>>extPushCharacterBytecode (in category 'stack bytecodes') ----- extPushCharacterBytecode "SistaV1: * 233 11101001 iiiiiiii Push Character #iiiiiiii (+ Extend B * 256)" | value | + value := self fetchByte + ((extB bitAnd: 16rFF) << 8). - value := self fetchByte + (extB << 8). self fetchNextBytecode. self internalPush: (objectMemory characterObjectOf: value). numExtB := extB := 0!
Le jeu. 10 mars 2022 à 07:53, Marcel Taeumel marcel.taeumel@hpi.de a écrit :
Hi Nicolas --
Thanks for this explanation.
However, all this might as well be a false problem. How many literal characters > 16rFF do we have in trunk ? Since we had poor support for extended (not to say unicode) fonts until recently, probably not that many ! A $? in source code is not helpful !
Yes, the source code in Trunk is fine. I stumbled upon this issue while debugging things around Character, #leadingChar, glyph browsing, and TextConverter. I got used to putting a $ in front of some visible thing in a text field to then send #charCode, #asInteger, or #leadingChar to it. Being able to do-it/inspect-it/print-it in any text field is very convenient. I could have done this via '...' first as well. An exception would have been nice, though.
For the time being, it makes sense to just signal an error when the user tries to create literal characters between 16r8000 and 16rFFFF.
If you happen to know the exact place in the image, would you mind adding this check in Trunk to EncoderForSistaV1? :-)
Best, Marcel
OK, https://source.squeak.org/trunk/Compiler-nice.471.diff https://source.squeak.org/trunk/Kernel-nice.1447.diff
vm-dev@lists.squeakfoundation.org