Hi Ben, On Mon, Nov 12, 2018 at 8:51 AM Ben Coman btc@openinworld.com wrote:
On Mon, 12 Nov 2018 at 21:37, Sven Van Caekenberghe sven@stfx.eu wrote:
Hi,
I run Pharo 7 64-bit on a macOS laptop, where the images are kept running across sleep/wake cycles.
Just to confirm, you mean your laptop sleep/wake cycle. On my Window 10 laptop I do the same quite often with no problems.
For many weeks, it often happens that an image crashes before/after such a sleep/wakeup (not all the time, just regularly).
Here is a crash dump from today (fresh image/vm from WE, nothing special loaded).
Related to scheduling ? Event handling ?
In trying to understand the last few moments, is the recent primitives list strongly ordered in time?... Most recent primitives signal nowTick primSignal:atUTCMicroseconds: wait millisecondClockValue @ actualScreenSize
Yes, it's a log of the most recent 256 named primitives. Note that they're irrelevant in this case. The crash is always rooted in vmIOProcessEvents. i.e. the VM is responding to some input event, and it calls pumpRunLoop to do (Objective-C) [NSRunLoop mainRunLoop] runMode:NSDefaultRunLoopMode beforeDate:[NSDate distantPast], which in Smalltalk would be written NSRunLoop mainRunLoop runMode: NSDefaultRunLoopMode beforeDate: NSDate distantPast. And somewhere within this a display update occurs which crashes, presumably because we're using stale data that should have been invalidated oil sleep and refreshed on wake.
Starting at #actualScreenSize, instrumenting the code by adding... Transcript crShow: thisContext sender printString. indicates the following call chain... DisplayScreen-class>>actualScreenSize MorphicUIManager>>checkForNewDisplaySize DisplayScreen class>>checkForNewScreenSize WorldState>>doOneCycleNowFor: WorldState>>doOneCycleFor: WorldMorph>>doOneCycle WorldMorph class>>doOneCycle MorphicUIManager>>spawnNewProcess
In detail from a static analysis of the code... > WorldState>>doOneCycleNowFor: > DisplayScreen class>>checkForNewScreenSize > MorphicUIManager>>checkForNewDisplaySize > DisplayScreen-class>>actualScreenSize PRIMITIVE < MorphicUIManager>>checkForNewDisplaySize < DisplayScreen class>>checkForNewScreenSize < WorldState>>doOneCycleNowFor < WorldState>>doOneCycleFor: < WorldMorph>>doOneCycle < WorldMorph class>>doOneCycle < MorphicUIManager>>spawnNewProcess
WorldMorph class>>doOneCycle
> WorldMorph>>doOneCycle > WorldState>>doOneCycleNowFor: > WorldState>>interCyclePause: > Time>>millisecondClockValue PRIMITIVE < WorldState>>interCyclePause: > Delay>>schedule > DelaySemaphoreScheduler>>schedule: > Semaphore>>wait
PRIMITIVE "readyToSchedule variable" > Semaphore>>signal PRIMITIVE "timingSemaphore variable" NOT RECORDED
maybe context change causes this to be recorded later in "Most
recent primitives" > DelayMicrosecondTicker>>waitForUserSignalled:orExpired: waking up from Semaphore>>wait the primSignal:atUTCMicroseconds: PRIMITIVE immediately before the #wait seems to have been logged now in "Most recent primitives" > DelaySemaphoreScheduler>>scheduleAtTimingPriority > Semaphore>>signal PRIMITIVE "readyToSchedule variable" seems to have not been recorded in "Most recent primitives" > DelayBasicScheduler>>scheduleAtTimingPriority < DelaySemaphoreScheduler>>scheduleAtTimingPriority < DelayBasicScheduler>>runBackendLoopAtTimingPriority > DelayMicrosecondTicker>>nowTick PRIMITIVE > Delay>>timingPrioritySignalExpired > Semaphore>>signal PRIMITIVE "delaySemaphore variable" CRASH
Smalltalk stack dump: 0x7ffee21d7138 M Delay>timingPrioritySignalExpired 0x113e3b138: a(n) Delay 0x7ffee21d7170 M [] in DelaySemaphoreScheduler(DelayBasicScheduler)>runBackendLoopAtTimingPriority 0x11453b8c0: a(n) DelaySemaphoreScheduler 0x1169f51c8 s BlockClosure>ensure: 0x1169f5828 s DelaySemaphoreScheduler(DelayBasicScheduler)>runBackendLoopAtTimingPriority 0x1169f5c38 s [] in DelaySemaphoreScheduler(DelayBasicScheduler)>startTimerEventLoopPriority: 0x1169fae10 s [] in BlockClosure>newProcess
The only reason I can think that last semaphore call would be a problem is if variable delaySemaphore was not a Semaphore, but the C stack backtrace doesn't look like it crashes in the Semaphore handling. Its perhaps co-incidental with delay scheduling code is invoked fairly often, that its the current process when this crash occurs.
To hep isolate the failure point, I would considering copying #nowTIck to a multiple series like... #nowTick1 #nowTick2 #nowTick3 etc so that all of the sender sites can be made distinct. And maybe the same for #signal senders within the delay scheduling code.
One side-curiousity is I can't see where the primitive #@ is being called from. Does it indicate some loop as its trying to determine
To me the "vmIOProcessEvents" in the C backtrace with a fairly deep stack into the AppKit framework indicates its related to event handling.
cheers -ben
vm-dev@lists.squeakfoundation.org