On Fri, Mar 10, 2017 at 09:45:07AM +0100, Nicolas Cellier wrote:
2017-03-10 3:47 GMT+01:00 David T. Lewis lewis@mail.msen.com:
On Thu, Mar 09, 2017 at 12:41:44PM -0800, Eliot Miranda wrote:
Hi David,
On Mon, Mar 6, 2017 at 5:50 PM, David T. Lewis lewis@mail.msen.com
wrote:
In the VM, the millisecond clock wraps within the 32 bit integer range:
#define MillisecondClockMask 0x1FFFFFFF
In the Cuis image, Delay class>>handleTimerEvent does this:
nextTick := nextTick min: SmallInteger maxVal.
On a 64-bit Spur image, SmallInteger maxVal is 16rFFFFFFFFFFFFFFF, but
on
a 32-bit image it is 16r3FFFFFFF.
Could that be it?
I don't really know how to test in Squeak. As you say, Squeak is now using the microsecond clock in #handleTimerEvent. I do not see anything in primitiveSignalAtMilliseconds that would behave any differently on a 64 bit versus 32 bit image or VM, but I do not know how to test to be sure.
I bet that the following code from primitiveSignalAtMilliseconds ends up wrapping when given Delay primSignal: s atMilliseconds: Time primMillisecondClock - 10.
deltaMsecs := msecs - (self ioMSecs bitAnd: MillisecondClockMask). deltaMsecs < 0 ifTrue: [deltaMsecs := deltaMsecs + MillisecondClockMask + 1]. nextWakeupUsecs := self ioUTCMicroseconds + (deltaMsecs * 1000).
and I bet you'll find that the VM will wake up in about 6 days and 5
hours
;-)
No. The failure is specific to the 64 bit VM. Source code for the primitive is the same in either case.
I suppose we could fix this, but I'm *much* happier to simply not use primitiveSignalAtMilliseconds and stay with the simpler and
wrapping-immune
primitiveSignalAtUTCMicroseconds
Fair enough, but given that there is a demonstrated bug that affects only the 64-bit VM, and given that it expresses itself intermittently and in ways that affect only someone who is attempting to migrate their existing V3 image to Spur, then I would say that it makes very good sense to take the time to fix it if we are able to do so. After all, there may be other people who will want to migrate V3 images to Spur, and there is no point in making the process needlessly difficult.
I do not have the solution, but maybe someone else can help. So I am asking for help here. Can someone with a working 64-bit build environment please check and see if what I said in an earlier email might make a difference:
I see that ioMSecs() is declared as signed long (32 bits), but it is used in expression with a 64 bit usqInt. So maybe it needs a cast, or maybe the variables like msecs and deltaMsecs in primitiveSignalAtMilliseconds should be declared as 32 bit long and unsigned long to match the actual usage.
Thanks,
Dave
Hi David I just reviewed the generated code for primitiveSignalAtMilliseconds andhere is what i found:
The primitive will simply fail if fed with a negative integer, thanks to:
msecs = positive32BitValueOf(msecsObj);
msecs is allways positive, since declared as usqInt msecs;
So far, so good.
Then deltaMsecs = msecs - ((ioMSecs()) & MillisecondClockMask); if (deltaMsecs < 0) { deltaMsecs = (deltaMsecs + MillisecondClockMask) + 1; } GIV(nextWakeupUsecs) = (ioUTCMicroseconds()) + (deltaMsecs * 1000);
Due to the 29 bits Mask, the bitAnd: (&) operation will result into a positive int. 0 <= ((ioMSecs()) & MillisecondClockMask) <= 0x1FFFFFFF
However, when we do the Roll-over thing: deltaMsecs = (deltaMsecs + MillisecondClockMask) + 1; Then we create a giant delay (the 6 days mentionned by Eliot) which is HIGHLY QUESTIONNABLE whatever 32 or 64 bits!
The difference between 32 and 64 bits is in the next line: GIV(nextWakeupUsecs) = (ioUTCMicroseconds()) + (deltaMsecs * 1000); Because sqInt deltaMsecs;
is 32 bits for a 32bits spur, 64bits for 64bits spur, then (deltaMsecs * 1000) will do two different things:
- in 32 bits it will overflow leading to a negative delay and a
nextWakeupUsecs in the past (but that's invoking Undefined Behavior)
- in 64 bits it won't overflow leading to he 6 days delay
Does that explain?
if (deltaMsecs < 0) I would either fail the primitive or set the deltaMsecs to zero. What do you think?
Hi Nicolas,
Thanks very much for looking into this and for the explanation. I am away right now so I cannot look at it now, but it does seem likely to be a sign extension issue at least in part.
If you have a chance to try recompiling with changes to the generated code, Juan's test condition is just this:
s := Semaphore new. Delay primSignal: s atMilliseconds: Time primMillisecondClock - 10. s wait.
This locks the image on 64-bits, and not on 32-bits.
Thanks a lot! Dave