Re: [Vm-dev] We need help from VM experts. Re: Freeze after Morph& nbsp; & nbsp; & nbsp; & nbsp; Activity

13 Mar 2017


      Thanks David!
On Sun, Mar 12, 2017 at 3:52 PM, David T. Lewis lewis@mail.msen.com wrote:
...
On Fri, Mar 10, 2017 at 11:49:43PM -0500, David T. Lewis wrote:
...
On Fri, Mar 10, 2017 at 10:19:42AM +0100, Nicolas Cellier wrote:
...
2017-03-10 9:45 GMT+01:00 Nicolas Cellier <
nicolas.cellier.aka.nice@gmail.com>:
...
2017-03-10 3:47 GMT+01:00 David T. Lewis lewis@mail.msen.com:
...
On Thu, Mar 09, 2017 at 12:41:44PM -0800, Eliot Miranda wrote:
...
Hi David,
On Mon, Mar 6, 2017 at 5:50 PM, David T. Lewis <
lewis@mail.msen.com>
...
...
...
...
wrote:
...
>
> In the VM, the millisecond clock wraps within the 32 bit integer
range:
...
>
>   #define MillisecondClockMask 0x1FFFFFFF
>
> In the Cuis image, Delay class>>handleTimerEvent does this:
>
>   nextTick := nextTick min: SmallInteger maxVal.
>
> On a 64-bit Spur image, SmallInteger maxVal is
16rFFFFFFFFFFFFFFF,
...
...
...
...
but on
...
> a 32-bit image it is 16r3FFFFFFF.
>
> Could that be it?
>
> I don't really know how to test in Squeak. As you say, Squeak
is now
...
...
...
...
...
> using the microsecond clock in #handleTimerEvent. I do not see
anything
...
> in primitiveSignalAtMilliseconds that would behave any
differently on
...
...
...
...
...
> a 64 bit versus 32 bit image or VM, but I do not know how to
test to
...
...
...
...
...
> be sure.
>
I bet that the following code from primitiveSignalAtMilliseconds
ends up
...
...
...
...
...
wrapping when given
    Delay primSignal: s atMilliseconds: Time primMillisecondClock


...
...
...
...
...
deltaMsecs := msecs - (self ioMSecs bitAnd: MillisecondClockMask).
deltaMsecs < 0 ifTrue:
[deltaMsecs := deltaMsecs + MillisecondClockMask + 1].
nextWakeupUsecs := self ioUTCMicroseconds + (deltaMsecs * 1000).
and I bet you'll find that the VM will wake up in about 6 days
and 5
...
...
...
...
hours
...
;-)
No. The failure is specific to the 64 bit VM. Source code for the
primitive
is the same in either case.
...
I suppose we could fix this, but I'm *much* happier to simply not
use
...
...
...
...
...
primitiveSignalAtMilliseconds and stay with the simpler and
wrapping-immune
...
primitiveSignalAtUTCMicroseconds
Fair enough, but given that there is a demonstrated bug that
affects only
...
...
...
...
the 64-bit VM, and given that it expresses itself intermittently
and in
...
...
...
...
ways
that affect only someone who is attempting to migrate their
existing V3
...
...
...
...
image
to Spur, then I would say that it makes very good sense to take the
time
...
...
...
...
to fix it if we are able to do so. After all, there may be other
people
...
...
...
...
who
will want to migrate V3 images to Spur, and there is no point in
making
...
...
...
...
the
process needlessly difficult.
I do not have the solution, but maybe someone else can help. So I am
asking
for help here. Can someone with a working 64-bit build environment
please
...
...
...
...
check and see if what I said in an earlier email might make a
difference:
...
...
...
...
I see that ioMSecs() is declared as signed long (32 bits), but it
is
...
...
...
...
used
  in expression with a 64 bit usqInt. So maybe it needs a cast, or
maybe
...
...
...
...
the variables like msecs and deltaMsecs in
primitiveSignalAtMilliseconds
...
...
...
...
should be declared as 32 bit long and unsigned long to match the
actual
...
...
...
...
usage.
Thanks,
Dave
Hi David
I just reviewed the generated code for primitiveSignalAtMilliseconds
andhere is what i found:
The primitive will simply fail if fed with a negative integer,
thanks to:
...
...
...
    msecs = positive32BitValueOf(msecsObj);


msecs is allways positive, since declared as
    usqInt msecs;
So far, so good.
Then
                        deltaMsecs = msecs - ((ioMSecs()) &
MillisecondClockMask);
                        if (deltaMsecs < 0) {
                                deltaMsecs = (deltaMsecs +
MillisecondClockMask) + 1;
                        }
                        GIV(nextWakeupUsecs) = (ioUTCMicroseconds())


...
...
...
(deltaMsecs * 1000);
Due to the 29 bits Mask, the bitAnd: (&) operation will result into a
positive int.
0 <= ((ioMSecs()) & MillisecondClockMask) <= 0x1FFFFFFF
However, when we do the Roll-over thing:
                                deltaMsecs = (deltaMsecs +
MillisecondClockMask) + 1;
Then we create a giant delay (the 6 days mentionned by Eliot) which
is
...
...
...
HIGHLY QUESTIONNABLE whatever 32 or 64 bits!
The difference between 32 and 64 bits is in the next line:
                        GIV(nextWakeupUsecs) = (ioUTCMicroseconds())


...
...
...
(deltaMsecs * 1000);
Because
    sqInt deltaMsecs;
is 32 bits for a 32bits spur, 64bits for 64bits spur, then
(deltaMsecs *
...
...
...

will do two different things:


in 32 bits it will overflow leading to a negative delay and a

nextWakeupUsecs in the past (but that's invoking Undefined Behavior)

in 64 bits it won't overflow leading to he 6 days delay

Does that explain?
if (deltaMsecs < 0) I would either fail the primitive or set the
deltaMsecs to zero.
What do you think?
Nicolas
Anyway, the RollOver protection sounds strange: if clock was high, and
if
...
...
we don't apply the MillisecondClockMask to (clock+delay),
then (clock+delay) won't overflow (thanks to largeIntegers).
Eventually,
...
...
for delay > 2^32-2^29 msecs, the primitive will fail thru
(positive32BitValueOf).
That's something like 43 days delay max, more than enough...
Either the code should rather be something like this with roll-over
protection
                        deltaMsecs = (msecs - ioMSecs()) &
MillisecondClockMask;
or we should leave deltaMsecs as is and reject negative delays
Also note this in 32 bits:
Since *1000 consumes 10bits, then delays above 21 bits msecs will
overflow
...
...
and be signalled immediately.
Tha's about 2e6 msecs, 2e3 secs. Thus the maximum delay supported by
this
...
...
primitive is a bit less than 36 minutes (16r80000000/1000/1000/60).
So maybe we should also reject large positive "unreasonable" delays,
that
...
...
isanthing above a few minutes and document it.
I think I understand what you are saying now. Let me see if I can restate
the problem:
Before the change to microsecond clock, the interpreter VM did the
rollover check only in #checkForInterrupts, and the arithmetic did not
require a 64 bit microsecond range. Now we have rollover logic in
#primitiveSignalAtMilliseconds, which leads to two issues, one of which
masks the other on 32 bits.
Issue #1: If we pass to the primitive a millisecond count that is
slightly
...
in the past (possibly due to clock skew or other timing issues), then the
primitive will test for deltaMsecs less than zero, and will incorrectly
assume that a millisecond clock rollover has happened.
Issue #2: When adjusting for clock rollover on 32 bits, deltaMsecs is
declared as 32 bit sqInt, which results in overflow when converting to
microseconds. For 64 bit sqInt, there is no overflow. This error masks
the first problem of Issue #1, which makes the primiitive appear to work
correctly on 32 bits, even though the clock rollover logic is wrong.
I do not think that we can simply fail the primitive, or set
deltaMSeconds
...
to zero, in the case of deltaMSeconds being less than zero. We still need
to be able to schedule delays over a time period when the millisecond
clock rolls over, so that needs to work.
We do need to fix the arithmetic overflow condition, which probably just
requires declaring deltaMsecs as usqLong. This matches the declaration
for nextWakeupUsecs and should prevent arithmetic overflow. But fixing
this for issue #2 will unmask issue #1.
For issue #1, the solution probably requires a heuristic. If deltaMsecs
is a big negative value, then the clock has rolled over. If it is a small
negative value (maybe less than a second, just to pick a number), then
it is probably clock skew and the MillisecondClockMask + 1 adjustment
should not be applied.
Does this sound right?
I now have a working Cog/Spur build on my Ubuntu box, so I was able to
try a few things. It turns out that the arithmetic overflow that Nicolas
spotted seems to fully account for the problem that Juan originally
reported on the 64 but Cuis image. The overflow happened here:
nextWakeupUsecs := self ioUTCMicroseconds + (deltaMsecs * 1000).
Declaring deltaMsecs as a 64 bit unsigned to match the declaration of
nextWakeupUsecs resolved the image lockup problem that Juan identified.
I committed the fix in VMMaker.oscog-dtl.2147.
Dave
-- 
_,,,^..^,,,_
best, Eliot

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Vm-dev] We need help from VM experts. Re: Freeze after Morph& nbsp; & nbsp; & nbsp; & nbsp; Activity

Re: [Vm-dev] We need help from VM experts. Re: Freeze after Morph&amp; nbsp; &amp; nbsp; &amp; nbsp; &amp; nbsp; Activity

Re: [Vm-dev] We need help from VM experts. Re: Freeze after Morph& nbsp; & nbsp; & nbsp; & nbsp; Activity