Stephane Rollandin reported this:
I have had a report that the code for my Balloon3D-based game (at http://www.zogotounga.net/comp/guardians.htm) does not work at all on 64-bit images.
The report was about a linux system, and indeed I could confirm that it also applies on Windows.
I have uploaded a ready-to-crash trunk image here, with instructions: http://www.zogotounga.net/swap/crash-64.zip
Note that no crash dump is produced.
Christian Kellerman reported this:
I don't have much time right now but I have attached the trace and here's the stack trace from the core dump I get when running the Guardians.image with an openbsd 64bit machine:
``` (gdb) bt #0 thrkill () at -:3 #1 0x00000c1db668d7de in _libc_abort () at /usr/src/lib/libc/stdlib/abort.c:51 #2 0x00000c1b4d876c72 in sigsegv (sig=-451888, info=Variable "info" is not available. ) at /usr/local/smalltalk/opensmalltalk-vm/platforms/unix/vm/sqUnixMain.c:1129 #3 <signal handler called> #4 copyBitsLockedAndClipped () at /usr/local/smalltalk/opensmalltalk-vm/src/plugins/BitBltPlugin/BitBltPlugin.c:820 #5 0x00000c1b4d9486f9 in copyBitsFromtoat (startX=Variable "startX" is not available. ) at /usr/local/smalltalk/opensmalltalk-vm/src/plugins/BitBltPlugin/BitBltPlugin.c:1257 #6 0x00000c1df5b0ac9e in b3dMainLoop (state=Variable "state" is not available. ) at /usr/local/smalltalk/opensmalltalk-vm/platforms/Cross/plugins/Squeak3D/b3dMain.c:1146 #7 0x00000c1df5b020c8 in b3dStartRasterizer () at /usr/local/smalltalk/opensmalltalk-vm/src/plugins/Squeak3D/Squeak3D.c:1704 #8 0x00000c1b4d8d4838 in primitiveExternalCall () at /usr/local/smalltalk/opensmalltalk-vm/spur64src/vm/gcc3x-cointerp.c:76948
```
Here is lldb debug report on OSX:
``` Process 95068 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x321089be150) frame #0: 0x00000001003c58fd Squeak`alphaSourceBlendBits32 at BitBltPlugin.c:820:5 817 sourceWord = long32At(srcIndex); 818 srcAlpha = ((usqInt) sourceWord) >> 24; 819 if (srcAlpha == 0xFF) { -> 820 long32Atput(dstIndex, sourceWord); 821 srcIndex += 4;
``` we have : ``` (lldb) p/x dstIndex (sqInt) $9 = 0x00000321089be150 (lldb) p/x destBits (sqInt) $14 = 0x00000001089be470
``` Wow, `dstIndex` is very far from `destBits`... because: ``` (lldb) p/x dy (sqInt) $13 = 0x00000000ffffffff (lldb) p/x dstY (sqInt) $15 = 0x00000000ffffffff
``` Hmm is that intended to be so large? or just -1? Is it a missing sign extension? Is -1 a valid value?
it comes from `destY`, which is set either by BitBlt inst var access: `destY = fetchIntOrFloatofObjectifNil(BBDestYIndex, bitBltOop, 0); ` or thru balloon support: ``` EXPORT(sqInt) copyBitsFromtoat(sqInt startX, sqInt stopX, sqInt yValue) { destX = startX; destY = yValue;
``` We are in the later case: ``` (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x321089be150) * frame #0: 0x00000001003c58fd Squeak`alphaSourceBlendBits32 at BitBltPlugin.c:820:5 frame #1: 0x00000001003bf07d Squeak`copyBitsLockedAndClipped at BitBltPlugin.c:1438:3 frame #2: 0x00000001003b97e6 Squeak`copyBits at BitBltPlugin.c:1257:2 frame #3: 0x00000001003b9893 Squeak`copyBitsFromtoat(startX=0, stopX=199, yValue=4294967295) at BitBltPlugin.c:1357:2 frame #4: 0x000000011036eeb4 Squeak3D`b3dDrawSpanBuffer(aet=0x0000000108a313c8, yValue=-1) at b3dMain.c:1146:3 frame #5: 0x0000000110373d5b Squeak3D`b3dMainLoop(state=0x00000001103aa990, stopReason=0) at b3dMain.c:1448:4 frame #6: 0x000000011030e1e1 Squeak3D`b3dStartRasterizer at Squeak3D.c:1704:12 frame #7: 0x00000001000b9cc4 Squeak`primitiveExternalCall at gcc3x-cointerp.c:76948:3
``` up the stack, we have: ``` /* INLINE b3dDrawSpanBuffer(aet, yValue) */ void b3dDrawSpanBuffer(B3DActiveEdgeTable *aet, int yValue) { int leftX, rightX; if(aet->size && currentState->spanDrawer) { leftX = aet->data[0]->xValue >> B3D_FixedToIntShift; rightX = aet->data[aet->size-1]->xValue >> B3D_FixedToIntShift; if(leftX < 0) leftX = 0; if(rightX > currentState->spanSize) rightX = currentState->spanSize; currentState->spanDrawer(leftX, rightX, yValue); } }
``` in b3d.h, we have: ``` /* Function to call on drawing the output buffer */ b3dDrawBufferFunction spanDrawer;
``` and what is a `b3dDrawBufferFunction`? `typedef int (*b3dDrawBufferFunction) (int leftX, int rightX, int yValue); ` OK, so we get a type mismatch on x64... We pretend that the function expects a 32bits `int`, when it expects a 64bits `sqInt`... Bad, because this type mismatch prevents the sign extension... Parameter is passed on a 64 bit register, and we get the high bits remaining at 0...
So it's a bit more subtle than pointer stored into int. Frankly, those type mismatch are a plague. Levente just corrected a bunch of them recently, but when there is such an indirection thru a function pointer, it's getting unobvious... Especially when we force with a cast:
`../src/plugins//Squeak3D/Squeak3D.c: state.spanDrawer = (b3dDrawBufferFunction) copyBitsFn; ` ``` ../src/plugins/Squeak3D/Squeak3D.c:static sqInt copyBitsFn; ../src/plugins/Squeak3D/Squeak3D.c: copyBitsFn = ioLoadFunctionFrom("copyBitsFromtoat", bbPluginName);
```Bah, with all those casts, we let ZERO chance to the compiler to tell us the awfull type mismatch!
I broke it in https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/448 The function signature was already BADLY inconsistent! But luckily, the erroneous 64 bits arguments did get copied into 32 bits `dstY` before my fix... I sort of broke the spell and the magic vanished ;) Brittle code = hazardous life
What I'm very unsure of now, is if writing at negative offset is a good idea (buffer underflow?), or if this uncovers yet another bug at upper level... If I just correct function signature, the game seems to work though...
On Fri, Jan 10, 2020 at 1:57 PM Nicolas Cellier notifications@github.com wrote:
I broke it in #448 https://github.com/OpenSmalltalk/opensmalltalk-vm/pull/448 The function signature was already BADLY inconsistent! But luckily, the erroneous 64 bits arguments did get copied into 32 bits dstY before my fix... I sort of broke the spell and the magic vanished ;) Brittle code = hazardous life
What I'm very unsure of now, is if writing at negative offset is a good idea (buffer underflow?), or if this uncovers yet another bug at upper level... If I just correct function signature, the game seems to work though...
That's the right thing to do, no? Just fix the signature.
_,,,^..^,,,_ best, Eliot
Question: where is the BitBlt plugin `destBits` set? Next question: is writing into `destBits` + negative offset a good idea? This rings the alarm bell in my mind, unless `destBits` is offseted... So appear to work ≠ is the right thing?
We also have a few UB associated with those negative values I compiled the VM with CFLAGS, LDFLAGS, BFLAGS and DYFLAGS: ` -fsanitize=undefined `
Then got: ``` ../../platforms/Cross/plugins/Squeak3D/b3dMain.c:1249:29: runtime error: left shift of negative value -8 ../../platforms/Cross/plugins/Squeak3D/b3dMain.c:1251:25: runtime error: left shift of negative value -8 ../../platforms/Cross/plugins/Squeak3D/b3dDraw.c:316:33: runtime error: left shift of negative value -18279 ../../platforms/Cross/plugins/Squeak3D/b3dDraw.c:317:33: runtime error: left shift of negative value -18279 ../../platforms/Cross/plugins/Squeak3D/b3dDraw.c:318:33: runtime error: left shift of negative value -18279 ../../platforms/Cross/plugins/Squeak3D/b3dMain.c:775:40: runtime error: -4.36761e+10 is outside the range of representable values of type 'int' ```
I do not think that it is the source of problem, but yet another potential problem in the future in case of aggressive C compiler optimization...
Closed #468.
I have removed all this specific UB I have forbidden writing at negative offset (buffer underrun?) I think that we can close this one (better than let it open forever if we are unsure).
I have removed all this specific UB I have forbidden writing at negative offset (buffer underrun?) I think that we can close this one (better than let it open forever if we are unsure).
I have not had a single crash since those last fixes of yours (using VM from Feb 9).
Stef
Great, shall we use that VM for the upcoming Squeak release?
Fabio
On Mon, 17 Feb 2020 at 8:54 pm, Stéphane Rollandin lecteur@zogotounga.net wrote:
I have removed all this specific UB I have forbidden writing at negative offset (buffer underrun?) I think that we can close this one (better than let it open forever if we are unsure).
I have not had a single crash since those last fixes of yours (using VM from Feb 9).
Stef
Hmm, not yet, I'm currently chasing a bug on 32 bits Spur (related to highBit it seems).
Le lun. 17 févr. 2020 à 21:07, Fabio Niephaus lists@fniephaus.com a écrit :
Great, shall we use that VM for the upcoming Squeak release?
Fabio
On Mon, 17 Feb 2020 at 8:54 pm, Stéphane Rollandin lecteur@zogotounga.net wrote:
I have removed all this specific UB I have forbidden writing at negative offset (buffer underrun?) I think that we can close this one (better than let it open forever if we are unsure).
I have not had a single crash since those last fixes of yours (using VM from Feb 9).
Stef
Great, shall we use that VM for the upcoming Squeak release?
Hmm, sorry, I just realized the topic here is about 64bits VM.
So far I have not been able to load my game on a 64bits VM (tested with the one from Feb 9). It crashes right away.
Stef
primitiveHighBit problem should be fixed in VMMaker.oscog-nice.2712 It remains to generate all the Spur 32 variants...
Le lun. 17 févr. 2020 à 21:35, Stéphane Rollandin lecteur@zogotounga.net a écrit :
Great, shall we use that VM for the upcoming Squeak release?
Hmm, sorry, I just realized the topic here is about 64bits VM.
So far I have not been able to load my game on a 64bits VM (tested with the one from Feb 9). It crashes right away.
Stef
Hem, I did not fix anything, just the simulation of BSR (bit scan reverse) because Bochs seems to initialize an ooold cpu? Most cpu have CLZ (count leading zeros), and Spur32 still crash if activating pimitiveHighBit (575) at image side. We must fix it, i'm pretty sure that it used to work...
Le lun. 17 févr. 2020 à 21:58, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> a écrit :
primitiveHighBit problem should be fixed in VMMaker.oscog-nice.2712 It remains to generate all the Spur 32 variants...
Le lun. 17 févr. 2020 à 21:35, Stéphane Rollandin lecteur@zogotounga.net a écrit :
Great, shall we use that VM for the upcoming Squeak release?
Hmm, sorry, I just realized the topic here is about 64bits VM.
So far I have not been able to load my game on a 64bits VM (tested with the one from Feb 9). It crashes right away.
Stef
Hi Nicolas,
On Feb 17, 2020, at 11:31 PM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:
Hem, I did not fix anything, just the simulation of BSR (bit scan reverse) because Bochs seems to initialize an ooold cpu?
Alas the Bochs sources I’m using date from 2008. They made significant changes to the memory interface so that when I looked at upgrading about four or five years ago I took the lazy approach. I can try and upgrade, and I can also take a look at the gdb source and see if it would work. But I can’t make promises time wise.
Most cpu have CLZ (count leading zeros), and Spur32 still crash if activating pimitiveHighBit (575) at image side. We must fix it, i'm pretty sure that it used to work...
I will take a look at this asap.
Le lun. 17 févr. 2020 à 21:58, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com a écrit : primitiveHighBit problem should be fixed in VMMaker.oscog-nice.2712 It remains to generate all the Spur 32 variants...
Le lun. 17 févr. 2020 à 21:35, Stéphane Rollandin lecteur@zogotounga.net a écrit :
Great, shall we use that VM for the upcoming Squeak release?
Hmm, sorry, I just realized the topic here is about 64bits VM.
So far I have not been able to load my game on a 64bits VM (tested with the one from Feb 9). It crashes right away.
Stef
vm-dev@lists.squeakfoundation.org