Hi Ben,
On Mon, Nov 12, 2018 at 8:51 AM Ben Coman <btc@openinworld.com> wrote:


On Mon, 12 Nov 2018 at 21:37, Sven Van Caekenberghe <sven@stfx.eu> wrote:
Hi,

I run Pharo 7 64-bit on a macOS laptop, where the images are kept running across sleep/wake cycles.

Just to confirm, you mean your laptop sleep/wake cycle.
On my Window 10 laptop I do the same quite often with no problems.
 
For many weeks, it often happens that an image crashes before/after such a sleep/wakeup (not all the time, just regularly).

Here is a crash dump from today (fresh image/vm from WE, nothing special loaded).

Related to scheduling ? Event handling ?

In trying to understand the last few moments, is the recent primitives list strongly ordered in time?...
  Most recent primitives
  signal
  nowTick
  primSignal:atUTCMicroseconds:
  wait
  millisecondClockValue
  @
  actualScreenSize

Yes, it's a log of the most recent 256 named primitives.  Note that they're irrelevant in this case.  The crash is always rooted in vmIOProcessEvents. i.e. the VM is responding to some input event, and it calls pumpRunLoop to do (Objective-C) [NSRunLoop mainRunLoop] runMode:NSDefaultRunLoopMode beforeDate:[NSDate distantPast], which in Smalltalk would be written NSRunLoop mainRunLoop runMode: NSDefaultRunLoopMode beforeDate: NSDate distantPast.  And somewhere within this a display update occurs which crashes, presumably because we're using stale data that should have been invalidated oil sleep and refreshed on wake.
 


Starting at #actualScreenSize, instrumenting the code by adding...
     Transcript crShow: thisContext sender printString.
indicates the following call chain...
  DisplayScreen-class>>actualScreenSize
  MorphicUIManager>>checkForNewDisplaySize
  DisplayScreen class>>checkForNewScreenSize
  WorldState>>doOneCycleNowFor:
  WorldState>>doOneCycleFor:
  WorldMorph>>doOneCycle
  WorldMorph class>>doOneCycle
  MorphicUIManager>>spawnNewProcess

In detail from a static analysis of the code...
           > WorldState>>doOneCycleNowFor:  
              > DisplayScreen class>>checkForNewScreenSize
                  > MorphicUIManager>>checkForNewDisplaySize 
                      > DisplayScreen-class>>actualScreenSize          PRIMITIVE 
                  < MorphicUIManager>>checkForNewDisplaySize   
               < DisplayScreen class>>checkForNewScreenSize 
            < WorldState>>doOneCycleNowFor
         < WorldState>>doOneCycleFor:  
      < WorldMorph>>doOneCycle  
   < WorldMorph class>>doOneCycle 
< MorphicUIManager>>spawnNewProcess  
   > WorldMorph class>>doOneCycle  
      > WorldMorph>>doOneCycle
         > WorldState>>doOneCycleNowFor:  
            > WorldState>>interCyclePause:
               > Time>>millisecondClockValue       PRIMITIVE
            < WorldState>>interCyclePause:
               > Delay>>schedule
                  > DelaySemaphoreScheduler>>schedule:
                      > Semaphore>>wait                            PRIMITIVE "readyToSchedule variable"
                      > Semaphore>>signal                         PRIMITIVE "timingSemaphore variable"   NOT RECORDED
                                                                                  maybe context change causes this to be recorded later in "Most recent primitives"
                             > DelayMicrosecondTicker>>waitForUserSignalled:orExpired:      waking up from Semaphore>>wait  
                                    the primSignal:atUTCMicroseconds:  PRIMITIVE  immediately before the #wait seems to have been logged now in "Most recent primitives"
                             > DelaySemaphoreScheduler>>scheduleAtTimingPriority
                                > Semaphore>>signal               PRIMITIVE  "readyToSchedule variable"  seems to have not been recorded in "Most recent primitives"
                                > DelayBasicScheduler>>scheduleAtTimingPriority
                             < DelaySemaphoreScheduler>>scheduleAtTimingPriority
                          < DelayBasicScheduler>>runBackendLoopAtTimingPriority
                             > DelayMicrosecondTicker>>nowTick         PRIMITIVE 
                             > Delay>>timingPrioritySignalExpired         
                                > Semaphore>>signal                PRIMITIVE "delaySemaphore variable"      CRASH

 Smalltalk stack dump:
    0x7ffee21d7138 M Delay>timingPrioritySignalExpired 0x113e3b138: a(n) Delay
    0x7ffee21d7170 M [] in DelaySemaphoreScheduler(DelayBasicScheduler)>runBackendLoopAtTimingPriority 0x11453b8c0: a(n) DelaySemaphoreScheduler
       0x1169f51c8 s BlockClosure>ensure:
       0x1169f5828 s DelaySemaphoreScheduler(DelayBasicScheduler)>runBackendLoopAtTimingPriority
       0x1169f5c38 s [] in DelaySemaphoreScheduler(DelayBasicScheduler)>startTimerEventLoopPriority:
       0x1169fae10 s [] in BlockClosure>newProcess

The only reason I can think that last semaphore call would be a problem is if variable delaySemaphore was not a Semaphore,
but the C stack backtrace doesn't look like it crashes in the Semaphore handling.
Its perhaps co-incidental with delay scheduling code is invoked fairly often, that its the current process when this crash occurs.


To hep isolate the failure point, I would considering copying #nowTIck to a multiple series like...
#nowTick1
#nowTick2
#nowTick3
etc so that all of the sender sites can be made distinct.
And maybe the same for #signal senders within the delay scheduling code.

One side-curiousity is I can't see where the primitive #@ is being called from.
Does it indicate some loop as its trying to determine 

To me the "vmIOProcessEvents" in the C backtrace with a fairly deep stack into the AppKit framework 
indicates its related to event handling.

cheers -ben


--
_,,,^..^,,,_
best, Eliot