Hi Ben,
On Fri, Jul 25, 2014 at 7:56 AM, Ben Coman btc@openinworld.com wrote:
Over the last few days I have been looking deeper into the image locking when suspending a process. It is an interesting rabbit hole [1] that leads to pondering the Delay machinery, that leads to some VM questions.
When pressing the interrupt key it seems to always opens the debugger with the following call stack. Semaphore>>critical: 'self wait' BlockClosure>>ensure: 'self valueNoContextSwitch' Semaphore>>critical: 'ensure: [ caught ifTrue: [self signal]] Delay>>schedule 'AccessProtect critical: [' Delay>>wait 'self schedule' WorldState>>interCyclePause:
I notice... Delay class >> initialize TimingSemaphore := (Smalltalk specialObjectsArray at: 30). and... Delay class >> startTimerEventLoop TimingSemaphore := Semaphore new. which seems incongruous that TimingSemaphore is set in differently. So while I presume this critical stuff all works fine, just in an exotic way, my entropy-guarding-neuron would just like confirm this is so.
The TimingSemaphore gets installed in the specialObjectsArray via
primSignal: aSemaphore atMilliseconds: aSmallInteger "Signal the semaphore when the millisecond clock reaches the value of the second argument. Fail if the first argument is neither a Semaphore nor nil. Essential. See Object documentation whatIsAPrimitive." <primitive: 136> ^self primitiveFailed
and from that the VM sets the nextWakeupUsecs:
primitiveSignalAtMilliseconds "Cause the time semaphore, if one has been registered, to be signalled when the microsecond clock is greater than or equal to the given tick value. A tick value of zero turns off timer interrupts." | msecsObj msecs deltaMsecs sema | <var: #msecs type: #usqInt> msecsObj := self stackTop. sema := self stackValue: 1. msecs := self positive32BitValueOf: msecsObj. self successful ifTrue: [(objectMemory isSemaphoreOop: sema) ifTrue: [*objectMemory splObj: TheTimerSemaphore put: sema*. deltaMsecs := msecs - (self ioMSecs bitAnd: MillisecondClockMask). deltaMsecs < 0 ifTrue: [deltaMsecs := deltaMsecs + MillisecondClockMask + 1]. *nextWakeupUsecs := self ioUTCMicroseconds + (deltaMsecs * 1000)*. ^self pop: 2]. sema = objectMemory nilObject ifTrue: [objectMemory storePointer: TheTimerSemaphore ofObject: objectMemory specialObjectsOop withValue: objectMemory nilObject. *nextWakeupUsecs := 0*. ^self pop: 2]]. self primitiveFailFor: PrimErrBadArgument
In Delay class >> handleTimerEvent the comment says... "Handle a timer event.... -a timer signal (not explicitly specified)" ...is that event perhaps a 'tick' generated periodically by the VM via that item from specialObjectArray ? Or is there some other mechanism ?
Every time the VM checks for interrupts, which, in the Cog.Stack VMs is controlled by the heartbeat frequency, which defaults to 2 milliseconds, the VM checks if the current time has progressed to or beyond nextWakeupUsecs and signals the timer semaphore if so.
--------------
[1] http://www.urbandictionary.com/define.php?term=Rabbit+Hole
and the problem here is not in the VM. So climb out and breath some fresh air ;-)
cheers -ben
P.S. I've left the following for some initial context as I change the subject. btw Nicolai, I confirm that my proposed fixes only work on Windows, not Mavericks (and I haven't checked Linux).
Nicolai Hess wrote:
Hi ben, thank you for looking at this.
2014-07-22 20:17 GMT+02:00 btc@openinworld.com:
I thought this might be interesting to learn, so I've gave it a go. I had some success at the end, but I'll give a progressive report.
First I thought I'd try moving the update of StringMorph outside the worker-process using a Morph's #step method as follows...
Morph subclass: #BackgroundWorkDisplayMorph instanceVariableNames: 'interProcessString stringMorph' classVariableNames: '' category: 'BenPlay' "---------"
BackgroundWorkDisplayMorph>>initializeMorph self color: Color red. stringMorph := StringMorph new. self addMorphBack: stringMorph. self extent:(300@50). "---------"
BackgroundWorkDisplayMorph>>newWorkerProcess ^[ | work | work := 0. [ 20 milliSeconds asDelay wait. work := work + 1. interProcessString := work asString. ] repeat. ] newProcess. "---------"
BackgroundWorkDisplayMorph>>step stringMorph contents: interProcessString. "---------"
BackgroundWorkDisplayMorph>>stepTime ^50 "---------"
BackgroundWorkDisplayMorph>>initialize | workerProcess running | super initialize. self initializeMorph.
workerProcess := self newWorkerProcess. running := false. self on: #mouseUp send: #value to: [ (running := running not) ifTrue: [ workerProcess resume. self color: Color green. ] ifFalse: [ workerProcess suspend. self color: Color red. ] ] "---------"
But evaluating "BackgroundWorkDisplayMorph new openInWorld" found this exhibited the same problematic behavior you reported... Clicking on the morph worked a few times and then froze the UI until Cmd-. pressed a few times.
However I found the following never locked the GUI.
BackgroundWorkDisplayMorph>>initialize "BackgroundWorkDisplayMorph new openInWorld" | workerProcess running | super initialize. self initializeMorph.
workerProcess := self newWorkerProcess. running := false. [ [ (running := running not) ifTrue: [ workerProcess resume. self color: Color green ] ifFalse: [ workerProcess suspend. self color: Color red ]. 10 milliSeconds asDelay wait. ] repeat ] fork. "---------"
This locks the UI as well. Not every timet hough. I did this 5 times, every time in a freshly loaded image and it happens two times.
So the problem seemed to not be with #suspend/#resume or with the shared variable /interProcessString/. Indeed, since in the worker thread /interProcessString/ is atomically assigned a copy via #asString, and the String never updated, I think there is no need to surround use of it with a critical section.
The solution then was to move the "#resume/#suspend" away from the "#on: #mouseUp send: #value to:" as follows...
BackgroundWorkDisplayMorph>>initialize "BackgroundWorkDisplayMorph new openInWorld" | workerProcess running lastRunning | super initialize. self initializeMorph.
workerProcess := self newWorkerProcess. lastRunning := running := false. [ [ lastRunning = running ifFalse: [ running ifTrue: [ workerProcess resume ] ifFalse: [ workerProcess suspend ]. lastRunning := running. ]. 10 milliSeconds asDelay wait. ] repeat ] fork. self on: #mouseUp send: #value to: [ (running := running not) ifTrue: [ self color: Color green. ] ifFalse: [ self color: Color red. ] ] "---------"
And this too :(
And finally remove the busy loop.
BackgroundWorkDisplayMorph>>initialize "BackgroundWorkDisplayMorph new openInWorld" | workerProcess running lastRunning semaphore | super initialize. self initializeMorph.
workerProcess := self newWorkerProcess. lastRunning := running := false. semaphore := Semaphore new. [ [ semaphore wait. running ifTrue: [ workerProcess resume ] ifFalse: [ workerProcess suspend ]. ] repeat ] fork. self on: #mouseUp send: #value to: [ (running := running not) ifTrue: [ self color: Color green. ] ifFalse: [ self color: Color red. ]. semaphore signal. ] "---------"
And this locks the UI too. (Loaded the code 20 times, every time after a fresh image start up. Two times I got a locked ui after the first two clicks). And I don't understand this code :)
Now I can't say how close that is to how it "should" be done. Its the first time I used sempahores and just what I discovered hacking around. But hey! it works :)
cheers -ben
Nicolai Hess wrote:
I am still struggling with it.
Any ideas?
2014-07-09 11:19 GMT+02:00 Nicolai Hess nicolaihess@web.de:
2014-07-09 2:07 GMT+02:00 Eliot Miranda eliot.miranda@gmail.com:
Hi Nicolai,
On Tue, Jul 8, 2014 at 7:19 AM, Nicolai Hess nicolaihess@web.de wrote:
I want to create a process doing some work and call #changed on a Morph. I want to start/suspend/resume or stop this process. But sometimes, suspending the process locks the UI-Process, and I don't know why. Did I miss something or do I have to care when to call suspend?
Wrapping the "morph changed" call in UIManager default defer:[ morph changed]. Does not change anything.
Here is an example to reproduce it. Create the process, call resume, call supsend. It works, most of the time, but sometimes, calling suspend locks the ui.
p:=[[true] whileTrue:[ Transcript crShow: (DateAndTime now asString). 30 milliSeconds asDelay wait]] newProcess.
p resume.
p suspend.
If you simply suspend this process at random form a user-priority process you'll never be able to damage the Delay machinery you're using, but chances are you'll suspend the process inside the critical section that Transcript uses to make itself thread-safe, and that'll lock up the Transcript.
Thank you Eliot yes I guessed it locks up the critical section, but I hoped with would not happen if I the use UIManager defer call.
ThreadSafeTranscript>>nextPutAll: value accessSemaphore critical: [stream nextPutAll: value]. ^value
So instead you need to use a semaphore. e.g.
| p s wait | s := Semaphore new. p:=[[true] whileTrue:[wait ifTrue: [s wait]. Transcript crShow: (DateAndTime now asString). 30 milliSeconds asDelay wait]] newProcess. wait := true. 30 milliSeconds asDelay wait. wait := false. s signal
etc...
Is this a common pattern I can find in pharos classes. Or I need some help understanding this. The semaphore wait/signal is used instead of process resume/suspend?
What I want is a process doing repeatly some computation, calls or triggers an update on a morph, and I want to suspend and resume this process.
I would stop this discussion if someone tells me, "No your are doing it wrong, go this way ..", BUT what strikes me: in this example, that reproduces my problem more closely:
|p m s running| running:=false. m:=Morph new color:Color red. s:= StringMorph new. m addMorphBack:s. p:=[[true]whileTrue:[20 milliSeconds asDelay wait. s contents:(DateAndTime now asString). m changed]] newProcess. m on:#mouseUp send:#value to:[ running ifTrue:[p suspend. m color:Color red.] ifFalse:[p resume.m color:Color green.]. running := running not]. m extent:(300@50). m openInWorld
clicking on the morph will stop or resume the process, if it locks up I can still press alt+dot ->
- a Debugger opens but the UI is still not responsive. I can click
with the mouse on the debuggers close icon.
- nothing happens, as the UI is still blocked.
- pressing alt+Dot again, the mouse click on the close icon is
processed and the first debugger window closes
- maybe other debuggers open.
Repeating this steps, at some time the system is *fully* responsive again! And miraculously, it works after that without further blockages. What's happening here?
Nicolai
HTH
regards
Nicolai
-- best, Eliot
--
Aloha, Eliot
On Sat, Jul 26, 2014 at 4:36 PM, Ben Coman btc@openinworld.com wrote:
Eliot Miranda wrote:
Hi Ben,
On Fri, Jul 25, 2014 at 7:56 AM, Ben Coman btc@openinworld.com wrote:
Over the last few days I have been looking deeper into the image locking when suspending a process. It is an interesting rabbit hole [1] that leads to pondering the Delay machinery, that leads to some VM questions.
When pressing the interrupt key it seems to always opens the debugger with the following call stack. Semaphore>>critical: 'self wait' BlockClosure>>ensure: 'self valueNoContextSwitch' Semaphore>>critical: 'ensure: [ caught ifTrue: [self signal]] Delay>>schedule 'AccessProtect critical: [' Delay>>wait 'self schedule' WorldState>>interCyclePause:
I notice... Delay class >> initialize TimingSemaphore := (Smalltalk specialObjectsArray at: 30). and... Delay class >> startTimerEventLoop TimingSemaphore := Semaphore new. which seems incongruous that TimingSemaphore is set in differently. So while I presume this critical stuff all works fine, just in an exotic way, my entropy-guarding-neuron would just like confirm this is so.
The TimingSemaphore gets installed in the specialObjectsArray via
primSignal: aSemaphore atMilliseconds: aSmallInteger "Signal the semaphore when the millisecond clock reaches the value of the second argument. Fail if the first argument is neither a Semaphore nor nil. Essential. See Object documentation whatIsAPrimitive." <primitive: 136> ^self primitiveFailed
and from that the VM sets the nextWakeupUsecs:
primitiveSignalAtMilliseconds "Cause the time semaphore, if one has been registered, to be signalled when the microsecond clock is greater than or equal to the given tick value. A tick value of zero turns off timer interrupts." | msecsObj msecs deltaMsecs sema | <var: #msecs type: #usqInt> msecsObj := self stackTop. sema := self stackValue: 1. msecs := self positive32BitValueOf: msecsObj. self successful ifTrue: [(objectMemory isSemaphoreOop: sema) ifTrue: [*objectMemory splObj: TheTimerSemaphore put: sema*. deltaMsecs := msecs - (self ioMSecs bitAnd: MillisecondClockMask). deltaMsecs < 0 ifTrue: [deltaMsecs := deltaMsecs + MillisecondClockMask + 1]. *nextWakeupUsecs := self ioUTCMicroseconds + (deltaMsecs * 1000)*. ^self pop: 2]. sema = objectMemory nilObject ifTrue: [objectMemory storePointer: TheTimerSemaphore ofObject: objectMemory specialObjectsOop withValue: objectMemory nilObject. *nextWakeupUsecs := 0*. ^self pop: 2]]. self primitiveFailFor: PrimErrBadArgument
In Delay class >> handleTimerEvent the comment says... "Handle a timer event.... -a timer signal (not explicitly specified)" ...is that event perhaps a 'tick' generated periodically by the VM via that item from specialObjectArray ? Or is there some other mechanism ?
Every time the VM checks for interrupts, which, in the Cog.Stack VMs is controlled by the heartbeat frequency, which defaults to 2 milliseconds, the VM checks if the current time has progressed to or beyond nextWakeupUsecs and signals the timer semaphore if so.
Thanks Eliot. Just so I'm clear... the signals to the TimingSemaphore from the VM depend entirely on the Delays scheduled by the Image? The VM never signals the TimingSemaphore independently?
Right, yes and yes.
--------------
[1] http://www.urbandictionary.com/define.php?term=Rabbit+Hole
and the problem here is not in the VM.
Yep. Just looking to understand the interaction between VM and image.
So climb out and breath some fresh air ;-)
Soon :) but for the moment its a puzzle thats got hold of me, like a dog on a bone. This is a "hard" problem for me, and I like hard problems. It provides an opportunity to hold my attention to dig deeper and learn stuff that I otherwise might not. cheers -ben
Hi Ben,
I am on Windows too :( So, the fixes does not work (not always) on winddows too. But at least they make it less probable to occure, but it still happens. The most distracting thing is, after the first ui lock, pressing alt+dot, closing the debuggers, pressing alt+dot .... and trying to close the very first debugger, after that, it all works. The UI is responsive again and suspending the process does not block the ui anymore. It "looks like" supsending the process reactivates another process that blocks the UI. And as soon as I terminate this process (alt+dot, close debugger ...) all works. But I really don't know.
Nicolai
2014-07-25 16:56 GMT+02:00 Ben Coman btc@openinworld.com:
Over the last few days I have been looking deeper into the image locking when suspending a process. It is an interesting rabbit hole [1] that leads to pondering the Delay machinery, that leads to some VM questions.
When pressing the interrupt key it seems to always opens the debugger with the following call stack. Semaphore>>critical: 'self wait' BlockClosure>>ensure: 'self valueNoContextSwitch' Semaphore>>critical: 'ensure: [ caught ifTrue: [self signal]] Delay>>schedule 'AccessProtect critical: [' Delay>>wait 'self schedule' WorldState>>interCyclePause:
I notice... Delay class >> initialize TimingSemaphore := (Smalltalk specialObjectsArray at: 30). and... Delay class >> startTimerEventLoop TimingSemaphore := Semaphore new. which seems incongruous that TimingSemaphore is set in differently. So while I presume this critical stuff all works fine, just in an exotic way, my entropy-guarding-neuron would just like confirm this is so.
In Delay class >> handleTimerEvent the comment says... "Handle a timer event.... -a timer signal (not explicitly specified)" ...is that event perhaps a 'tick' generated periodically by the VM via that item from specialObjectArray ? Or is there some other mechanism ?
[1] http://www.urbandictionary.com/define.php?term=Rabbit+Hole cheers -ben
P.S. I've left the following for some initial context as I change the subject. btw Nicolai, I confirm that my proposed fixes only work on Windows, not Mavericks (and I haven't checked Linux).
Nicolai Hess wrote:
Hi ben, thank you for looking at this.
2014-07-22 20:17 GMT+02:00 btc@openinworld.com:
I thought this might be interesting to learn, so I've gave it a go. I had some success at the end, but I'll give a progressive report.
First I thought I'd try moving the update of StringMorph outside the worker-process using a Morph's #step method as follows...
Morph subclass: #BackgroundWorkDisplayMorph instanceVariableNames: 'interProcessString stringMorph' classVariableNames: '' category: 'BenPlay' "---------"
BackgroundWorkDisplayMorph>>initializeMorph self color: Color red. stringMorph := StringMorph new. self addMorphBack: stringMorph. self extent:(300@50). "---------"
BackgroundWorkDisplayMorph>>newWorkerProcess ^[ | work | work := 0. [ 20 milliSeconds asDelay wait. work := work + 1. interProcessString := work asString. ] repeat. ] newProcess. "---------"
BackgroundWorkDisplayMorph>>step stringMorph contents: interProcessString. "---------"
BackgroundWorkDisplayMorph>>stepTime ^50 "---------"
BackgroundWorkDisplayMorph>>initialize | workerProcess running | super initialize. self initializeMorph.
workerProcess := self newWorkerProcess. running := false. self on: #mouseUp send: #value to: [ (running := running not) ifTrue: [ workerProcess resume. self color: Color green. ] ifFalse: [ workerProcess suspend. self color: Color red. ] ] "---------"
But evaluating "BackgroundWorkDisplayMorph new openInWorld" found this exhibited the same problematic behavior you reported... Clicking on the morph worked a few times and then froze the UI until Cmd-. pressed a few times.
However I found the following never locked the GUI.
BackgroundWorkDisplayMorph>>initialize "BackgroundWorkDisplayMorph new openInWorld" | workerProcess running | super initialize. self initializeMorph.
workerProcess := self newWorkerProcess. running := false. [ [ (running := running not) ifTrue: [ workerProcess resume. self color: Color green ] ifFalse: [ workerProcess suspend. self color: Color red ]. 10 milliSeconds asDelay wait. ] repeat ] fork. "---------"
This locks the UI as well. Not every timet hough. I did this 5 times, every time in a freshly loaded image and it happens two times.
So the problem seemed to not be with #suspend/#resume or with the shared variable /interProcessString/. Indeed, since in the worker thread /interProcessString/ is atomically assigned a copy via #asString, and the String never updated, I think there is no need to surround use of it with a critical section.
The solution then was to move the "#resume/#suspend" away from the "#on: #mouseUp send: #value to:" as follows...
BackgroundWorkDisplayMorph>>initialize "BackgroundWorkDisplayMorph new openInWorld" | workerProcess running lastRunning | super initialize. self initializeMorph.
workerProcess := self newWorkerProcess. lastRunning := running := false. [ [ lastRunning = running ifFalse: [ running ifTrue: [ workerProcess resume ] ifFalse: [ workerProcess suspend ]. lastRunning := running. ]. 10 milliSeconds asDelay wait. ] repeat ] fork. self on: #mouseUp send: #value to: [ (running := running not) ifTrue: [ self color: Color green. ] ifFalse: [ self color: Color red. ] ] "---------"
And this too :(
And finally remove the busy loop.
BackgroundWorkDisplayMorph>>initialize "BackgroundWorkDisplayMorph new openInWorld" | workerProcess running lastRunning semaphore | super initialize. self initializeMorph.
workerProcess := self newWorkerProcess. lastRunning := running := false. semaphore := Semaphore new. [ [ semaphore wait. running ifTrue: [ workerProcess resume ] ifFalse: [ workerProcess suspend ]. ] repeat ] fork. self on: #mouseUp send: #value to: [ (running := running not) ifTrue: [ self color: Color green. ] ifFalse: [ self color: Color red. ]. semaphore signal. ] "---------"
And this locks the UI too. (Loaded the code 20 times, every time after a fresh image start up. Two times I got a locked ui after the first two clicks). And I don't understand this code :)
Now I can't say how close that is to how it "should" be done. Its the first time I used sempahores and just what I discovered hacking around. But hey! it works :)
cheers -ben
Nicolai Hess wrote:
I am still struggling with it.
Any ideas?
2014-07-09 11:19 GMT+02:00 Nicolai Hess nicolaihess@web.de:
2014-07-09 2:07 GMT+02:00 Eliot Miranda eliot.miranda@gmail.com:
Hi Nicolai,
On Tue, Jul 8, 2014 at 7:19 AM, Nicolai Hess nicolaihess@web.de wrote:
I want to create a process doing some work and call #changed on a Morph. I want to start/suspend/resume or stop this process. But sometimes, suspending the process locks the UI-Process, and I don't know why. Did I miss something or do I have to care when to call suspend?
Wrapping the "morph changed" call in UIManager default defer:[ morph changed]. Does not change anything.
Here is an example to reproduce it. Create the process, call resume, call supsend. It works, most of the time, but sometimes, calling suspend locks the ui.
p:=[[true] whileTrue:[ Transcript crShow: (DateAndTime now asString). 30 milliSeconds asDelay wait]] newProcess.
p resume.
p suspend.
If you simply suspend this process at random form a user-priority process you'll never be able to damage the Delay machinery you're using, but chances are you'll suspend the process inside the critical section that Transcript uses to make itself thread-safe, and that'll lock up the Transcript.
Thank you Eliot yes I guessed it locks up the critical section, but I hoped with would not happen if I the use UIManager defer call.
ThreadSafeTranscript>>nextPutAll: value accessSemaphore critical: [stream nextPutAll: value]. ^value
So instead you need to use a semaphore. e.g.
| p s wait | s := Semaphore new. p:=[[true] whileTrue:[wait ifTrue: [s wait]. Transcript crShow: (DateAndTime now asString). 30 milliSeconds asDelay wait]] newProcess. wait := true. 30 milliSeconds asDelay wait. wait := false. s signal
etc...
Is this a common pattern I can find in pharos classes. Or I need some help understanding this. The semaphore wait/signal is used instead of process resume/suspend?
What I want is a process doing repeatly some computation, calls or triggers an update on a morph, and I want to suspend and resume this process.
I would stop this discussion if someone tells me, "No your are doing it wrong, go this way ..", BUT what strikes me: in this example, that reproduces my problem more closely:
|p m s running| running:=false. m:=Morph new color:Color red. s:= StringMorph new. m addMorphBack:s. p:=[[true]whileTrue:[20 milliSeconds asDelay wait. s contents:(DateAndTime now asString). m changed]] newProcess. m on:#mouseUp send:#value to:[ running ifTrue:[p suspend. m color:Color red.] ifFalse:[p resume.m color:Color green.]. running := running not]. m extent:(300@50). m openInWorld
clicking on the morph will stop or resume the process, if it locks up I can still press alt+dot ->
- a Debugger opens but the UI is still not responsive. I can click
with the mouse on the debuggers close icon.
- nothing happens, as the UI is still blocked.
- pressing alt+Dot again, the mouse click on the close icon is
processed and the first debugger window closes
- maybe other debuggers open.
Repeating this steps, at some time the system is *fully* responsive again! And miraculously, it works after that without further blockages. What's happening here?
Nicolai
HTH
regards
Nicolai
-- best, Eliot
Hi Nicolai, Hi Ben,
On Fri, Jul 25, 2014 at 10:55 AM, Nicolai Hess nicolaihess@web.de wrote:
Hi Ben,
I am on Windows too :( So, the fixes does not work (not always) on winddows too. But at least they make it less probable to occure, but it still happens. The most distracting thing is, after the first ui lock, pressing alt+dot, closing the debuggers, pressing alt+dot .... and trying to close the very first debugger, after that, it all works. The UI is responsive again and suspending the process does not block the ui anymore. It "looks like" supsending the process reactivates another process that blocks the UI. And as soon as I terminate this process (alt+dot, close debugger ...) all works. But I really don't know.
if you can run a unix machine (in a VM?) then remember that kill -USR1 pid will cause the VM to print out a stack backtrace of all processes in the image. That can be very useful in debuggng lockups like this.
HTH
Nicolai
2014-07-25 16:56 GMT+02:00 Ben Coman btc@openinworld.com:
Over the last few days I have been looking deeper into the image locking when suspending a process. It is an interesting rabbit hole [1] that leads to pondering the Delay machinery, that leads to some VM questions.
When pressing the interrupt key it seems to always opens the debugger with the following call stack. Semaphore>>critical: 'self wait' BlockClosure>>ensure: 'self valueNoContextSwitch' Semaphore>>critical: 'ensure: [ caught ifTrue: [self signal]] Delay>>schedule 'AccessProtect critical: [' Delay>>wait 'self schedule' WorldState>>interCyclePause:
I notice... Delay class >> initialize TimingSemaphore := (Smalltalk specialObjectsArray at: 30). and... Delay class >> startTimerEventLoop TimingSemaphore := Semaphore new. which seems incongruous that TimingSemaphore is set in differently. So while I presume this critical stuff all works fine, just in an exotic way, my entropy-guarding-neuron would just like confirm this is so.
In Delay class >> handleTimerEvent the comment says... "Handle a timer event.... -a timer signal (not explicitly specified)" ...is that event perhaps a 'tick' generated periodically by the VM via that item from specialObjectArray ? Or is there some other mechanism ?
[1] http://www.urbandictionary.com/define.php?term=Rabbit+Hole cheers -ben
P.S. I've left the following for some initial context as I change the subject. btw Nicolai, I confirm that my proposed fixes only work on Windows, not Mavericks (and I haven't checked Linux).
Nicolai Hess wrote:
Hi ben, thank you for looking at this.
2014-07-22 20:17 GMT+02:00 btc@openinworld.com:
I thought this might be interesting to learn, so I've gave it a go. I had some success at the end, but I'll give a progressive report.
First I thought I'd try moving the update of StringMorph outside the worker-process using a Morph's #step method as follows...
Morph subclass: #BackgroundWorkDisplayMorph instanceVariableNames: 'interProcessString stringMorph' classVariableNames: '' category: 'BenPlay' "---------"
BackgroundWorkDisplayMorph>>initializeMorph self color: Color red. stringMorph := StringMorph new. self addMorphBack: stringMorph. self extent:(300@50). "---------"
BackgroundWorkDisplayMorph>>newWorkerProcess ^[ | work | work := 0. [ 20 milliSeconds asDelay wait. work := work + 1. interProcessString := work asString. ] repeat. ] newProcess. "---------"
BackgroundWorkDisplayMorph>>step stringMorph contents: interProcessString. "---------"
BackgroundWorkDisplayMorph>>stepTime ^50 "---------"
BackgroundWorkDisplayMorph>>initialize | workerProcess running | super initialize. self initializeMorph.
workerProcess := self newWorkerProcess. running := false. self on: #mouseUp send: #value to: [ (running := running not) ifTrue: [ workerProcess resume. self color: Color green. ] ifFalse: [ workerProcess suspend. self color: Color red. ] ] "---------"
But evaluating "BackgroundWorkDisplayMorph new openInWorld" found this exhibited the same problematic behavior you reported... Clicking on the morph worked a few times and then froze the UI until Cmd-. pressed a few times.
However I found the following never locked the GUI.
BackgroundWorkDisplayMorph>>initialize "BackgroundWorkDisplayMorph new openInWorld" | workerProcess running | super initialize. self initializeMorph.
workerProcess := self newWorkerProcess. running := false. [ [ (running := running not) ifTrue: [ workerProcess resume. self color: Color green ] ifFalse: [ workerProcess suspend. self color: Color red ]. 10 milliSeconds asDelay wait. ] repeat ] fork. "---------"
This locks the UI as well. Not every timet hough. I did this 5 times, every time in a freshly loaded image and it happens two times.
So the problem seemed to not be with #suspend/#resume or with the shared variable /interProcessString/. Indeed, since in the worker thread /interProcessString/ is atomically assigned a copy via #asString, and the String never updated, I think there is no need to surround use of it with a critical section.
The solution then was to move the "#resume/#suspend" away from the "#on: #mouseUp send: #value to:" as follows...
BackgroundWorkDisplayMorph>>initialize "BackgroundWorkDisplayMorph new openInWorld" | workerProcess running lastRunning | super initialize. self initializeMorph.
workerProcess := self newWorkerProcess. lastRunning := running := false. [ [ lastRunning = running ifFalse: [ running ifTrue: [ workerProcess resume ] ifFalse: [ workerProcess suspend ]. lastRunning := running. ]. 10 milliSeconds asDelay wait. ] repeat ] fork. self on: #mouseUp send: #value to: [ (running := running not) ifTrue: [ self color: Color green. ] ifFalse: [ self color: Color red. ] ] "---------"
And this too :(
And finally remove the busy loop.
BackgroundWorkDisplayMorph>>initialize "BackgroundWorkDisplayMorph new openInWorld" | workerProcess running lastRunning semaphore | super initialize. self initializeMorph.
workerProcess := self newWorkerProcess. lastRunning := running := false. semaphore := Semaphore new. [ [ semaphore wait. running ifTrue: [ workerProcess resume ] ifFalse: [ workerProcess suspend ]. ] repeat ] fork. self on: #mouseUp send: #value to: [ (running := running not) ifTrue: [ self color: Color green. ] ifFalse: [ self color: Color red. ]. semaphore signal. ] "---------"
And this locks the UI too. (Loaded the code 20 times, every time after a fresh image start up. Two times I got a locked ui after the first two clicks). And I don't understand this code :)
Now I can't say how close that is to how it "should" be done. Its the first time I used sempahores and just what I discovered hacking around. But hey! it works :)
cheers -ben
Nicolai Hess wrote:
I am still struggling with it.
Any ideas?
2014-07-09 11:19 GMT+02:00 Nicolai Hess nicolaihess@web.de:
2014-07-09 2:07 GMT+02:00 Eliot Miranda eliot.miranda@gmail.com:
Hi Nicolai,
On Tue, Jul 8, 2014 at 7:19 AM, Nicolai Hess nicolaihess@web.de wrote:
I want to create a process doing some work and call #changed on a Morph. I want to start/suspend/resume or stop this process. But sometimes, suspending the process locks the UI-Process, and I don't know why. Did I miss something or do I have to care when to call suspend?
Wrapping the "morph changed" call in UIManager default defer:[ morph changed]. Does not change anything.
Here is an example to reproduce it. Create the process, call resume, call supsend. It works, most of the time, but sometimes, calling suspend locks the ui.
p:=[[true] whileTrue:[ Transcript crShow: (DateAndTime now asString). 30 milliSeconds asDelay wait]] newProcess.
p resume.
p suspend.
If you simply suspend this process at random form a user-priority process you'll never be able to damage the Delay machinery you're using, but chances are you'll suspend the process inside the critical section that Transcript uses to make itself thread-safe, and that'll lock up the Transcript.
Thank you Eliot yes I guessed it locks up the critical section, but I hoped with would not happen if I the use UIManager defer call.
ThreadSafeTranscript>>nextPutAll: value accessSemaphore critical: [stream nextPutAll: value]. ^value
So instead you need to use a semaphore. e.g.
| p s wait | s := Semaphore new. p:=[[true] whileTrue:[wait ifTrue: [s wait]. Transcript crShow: (DateAndTime now asString). 30 milliSeconds asDelay wait]] newProcess. wait := true. 30 milliSeconds asDelay wait. wait := false. s signal
etc...
Is this a common pattern I can find in pharos classes. Or I need some help understanding this. The semaphore wait/signal is used instead of process resume/suspend?
What I want is a process doing repeatly some computation, calls or triggers an update on a morph, and I want to suspend and resume this process.
I would stop this discussion if someone tells me, "No your are doing it wrong, go this way ..", BUT what strikes me: in this example, that reproduces my problem more closely:
|p m s running| running:=false. m:=Morph new color:Color red. s:= StringMorph new. m addMorphBack:s. p:=[[true]whileTrue:[20 milliSeconds asDelay wait. s contents:(DateAndTime now asString). m changed]] newProcess. m on:#mouseUp send:#value to:[ running ifTrue:[p suspend. m color:Color red.] ifFalse:[p resume.m color:Color green.]. running := running not]. m extent:(300@50). m openInWorld
clicking on the morph will stop or resume the process, if it locks up I can still press alt+dot ->
- a Debugger opens but the UI is still not responsive. I can click
with the mouse on the debuggers close icon.
- nothing happens, as the UI is still blocked.
- pressing alt+Dot again, the mouse click on the close icon is
processed and the first debugger window closes
- maybe other debuggers open.
Repeating this steps, at some time the system is *fully* responsive again! And miraculously, it works after that without further blockages. What's happening here?
Nicolai
HTH
regards
Nicolai
-- best, Eliot
--
Aloha, Eliot
2014-07-25 20:05 GMT+02:00 Eliot Miranda eliot.miranda@gmail.com:
Hi Nicolai, Hi Ben,
On Fri, Jul 25, 2014 at 10:55 AM, Nicolai Hess nicolaihess@web.de wrote:
Hi Ben,
I am on Windows too :( So, the fixes does not work (not always) on winddows too. But at least they make it less probable to occure, but it still happens. The most distracting thing is, after the first ui lock, pressing alt+dot, closing the debuggers, pressing alt+dot .... and trying to close the very first debugger, after that, it all works. The UI is responsive again and suspending the process does not block the ui anymore. It "looks like" supsending the process reactivates another process that blocks the UI. And as soon as I terminate this process (alt+dot, close debugger ...) all works. But I really don't know.
if you can run a unix machine (in a VM?) then remember that kill -USR1 pid will cause the VM to print out a stack backtrace of all processes in the image. That can be very useful in debuggng lockups like this.
HTH
Ok, but I don't know if this helps, at least it does not look very helpful to me:)
SIGUSR1 Mon Jul 28 01:06:16 2014
pharo VM version: 3.9-7 #1 Tue May 6 08:30:23 UTC 2014 gcc 4.8.2 [Production ITHB VM] Built from: NBCoInterpreter NativeBoost-CogPlugin-GuillermoPolito.19 uuid: acc98e51-2fba-4841-a965-2975997bba66 May 6 2014 With: NBCogit NativeBoost-CogPlugin-GuillermoPolito.19 uuid: acc98e51-2fba-4841-a965-2975997bba66 May 6 2014 Revision: https://github.com/pharo-project/pharo-vm.git Commit: ef5832e6f70e5b24e8b9b1f4b8509a62b6c88040 Date: 2014-01-26 15:34:28 +0100 By: Esteban Lorenzano estebanlm@gmail.com Jenkins build #14794 Build host: Linux chindi08 2.6.24-32-xen #1 SMP Mon Dec 3 16:12:25 UTC 2012 i686 i686 i686 GNU/Linux plugin path: /usr/lib/pharo-vm/ [default: /usr/lib/pharo-vm/]
C stack backtrace: /usr/lib/pharo-vm/pharo-vm[0x809ad23] /usr/lib/pharo-vm/pharo-vm[0x809af6e] [0xf7784410] [0xf7784425] /lib/i386-linux-gnu/libc.so.6(__select+0x2d)[0xf763691d] /usr/lib/pharo-vm/pharo-vm(aioPoll+0x13d)[0x809748d] /usr/lib/pharo-vm/vm-display-X11.so(+0xdc85)[0xf71d2c85] /usr/lib/pharo-vm/pharo-vm(ioRelinquishProcessorForMicroseconds+0x17)[0x8099b57] /usr/lib/pharo-vm/pharo-vm[0x8070685] [0xb6f8dbc3] [0xb6f89700] [0xb7a2650e] [0xb6f895c0]
All Smalltalk process stacks (active first): Process 0xb88376fc priority 10 0xff76c830 M ProcessorScheduler class>idleProcess 0xb7306b08: a(n) ProcessorScheduler class 0xff76c850 I [] in ProcessorScheduler class>startUp 0xb7306b08: a(n) ProcessorScheduler class 0xff76c870 I [] in BlockClosure>newProcess 0xb8837620: a(n) BlockClosure
Process 0xb8838c78 priority 50 0xff768830 I WeakArray class>finalizationProcess 0xb7306cd8: a(n) WeakArray class 0xff768850 I [] in WeakArray class>restartFinalizationProcess 0xb7306cd8: a(n) WeakArray class 0xff768870 I [] in BlockClosure>newProcess 0xb8838b9c: a(n) BlockClosure
Process 0xb9148c20 priority 40 0xff7907b8 M [] in Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff7907d8 M BlockClosure>ensure: 0xb91502e0: a(n) BlockClosure 0xff7907f8 M Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff790814 M Delay>schedule 0xb91501e4: a(n) Delay 0xff79082c M Delay>wait 0xb91501e4: a(n) Delay 0xff790850 I [] in BackgroundWorkDisplayMorph>initialize 0xb91488b0: a(n) BackgroundWorkDisplayMorph 0xff790870 I [] in BlockClosure>newProcess 0xb9148b40: a(n) BlockClosure
Process 0xb7902630 priority 40 0xff764784 M [] in Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff7647a4 M BlockClosure>ensure: 0xb916b7a4: a(n) BlockClosure 0xff7647c4 M Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff7647e0 M Delay>schedule 0xb916b6a8: a(n) Delay 0xff7647f8 M Delay>wait 0xb916b6a8: a(n) Delay 0xff764818 M WorldState>interCyclePause: 0xb75e8fd8: a(n) WorldState 0xff764834 M WorldState>doOneCycleFor: 0xb75e8fd8: a(n) WorldState 0xff764850 M WorldMorph>doOneCycle 0xb75e8fa4: a(n) WorldMorph 0xff764870 I [] in MorphicUIManager()>? 0xb770ac38: a(n) MorphicUIManager 0xb78cb554 s [] in BlockClosure()>?
Process 0xb82f9078 priority 80 0xff765858 M Delay class>handleTimerEvent 0xb8684d08: a(n) Delay class 0xff765870 M Delay class()>? 0xb8684d08: a(n) Delay class 0xb8623474 s [] in Delay class()>? 0xb82f9018 s [] in BlockClosure>newProcess
Process 0xb883735c priority 60 0xff76680c M InputEventFetcher>waitForInput 0xb72f059c: a(n) InputEventFetcher 0xff766830 M InputEventFetcher>eventLoop 0xb72f059c: a(n) InputEventFetcher 0xff766850 I [] in InputEventFetcher>installEventLoop 0xb72f059c: a(n) InputEventFetcher 0xff766870 I [] in BlockClosure>newProcess 0xb8837280: a(n) BlockClosure
Process 0xb8837534 priority 60 0xb8837568 s SmalltalkImage>lowSpaceWatcher 0xb9127478 s [] in SmalltalkImage>installLowSpaceWatcher 0xb88374d4 s [] in BlockClosure>newProcess
Most recent primitives relinquishProcessorForMicroseconds: relinquishProcessorForMicroseconds: relinquishProcessorForMicroseconds: relinquishProcessorForMicroseconds: ~ 200 times
Nicolai
2014-07-25 16:56 GMT+02:00 Ben Coman btc@openinworld.com:
Over the last few days I have been looking deeper into the image locking when suspending a process. It is an interesting rabbit hole [1] that leads to pondering the Delay machinery, that leads to some VM questions.
When pressing the interrupt key it seems to always opens the debugger with the following call stack. Semaphore>>critical: 'self wait' BlockClosure>>ensure: 'self valueNoContextSwitch' Semaphore>>critical: 'ensure: [ caught ifTrue: [self signal]] Delay>>schedule 'AccessProtect critical: [' Delay>>wait 'self schedule' WorldState>>interCyclePause:
I notice... Delay class >> initialize TimingSemaphore := (Smalltalk specialObjectsArray at: 30). and... Delay class >> startTimerEventLoop TimingSemaphore := Semaphore new. which seems incongruous that TimingSemaphore is set in differently. So while I presume this critical stuff all works fine, just in an exotic way, my entropy-guarding-neuron would just like confirm this is so.
In Delay class >> handleTimerEvent the comment says... "Handle a timer event.... -a timer signal (not explicitly specified)" ...is that event perhaps a 'tick' generated periodically by the VM via that item from specialObjectArray ? Or is there some other mechanism ?
[1] http://www.urbandictionary.com/define.php?term=Rabbit+Hole cheers -ben
P.S. I've left the following for some initial context as I change the subject. btw Nicolai, I confirm that my proposed fixes only work on Windows, not Mavericks (and I haven't checked Linux).
Nicolai Hess wrote:
Hi ben, thank you for looking at this.
2014-07-22 20:17 GMT+02:00 btc@openinworld.com:
I thought this might be interesting to learn, so I've gave it a go. I had some success at the end, but I'll give a progressive report.
First I thought I'd try moving the update of StringMorph outside the worker-process using a Morph's #step method as follows...
Morph subclass: #BackgroundWorkDisplayMorph instanceVariableNames: 'interProcessString stringMorph' classVariableNames: '' category: 'BenPlay' "---------"
BackgroundWorkDisplayMorph>>initializeMorph self color: Color red. stringMorph := StringMorph new. self addMorphBack: stringMorph. self extent:(300@50). "---------"
BackgroundWorkDisplayMorph>>newWorkerProcess ^[ | work | work := 0. [ 20 milliSeconds asDelay wait. work := work + 1. interProcessString := work asString. ] repeat. ] newProcess. "---------"
BackgroundWorkDisplayMorph>>step stringMorph contents: interProcessString. "---------"
BackgroundWorkDisplayMorph>>stepTime ^50 "---------"
BackgroundWorkDisplayMorph>>initialize | workerProcess running | super initialize. self initializeMorph.
workerProcess := self newWorkerProcess. running := false. self on: #mouseUp send: #value to: [ (running := running not) ifTrue: [ workerProcess resume. self color: Color green. ] ifFalse: [ workerProcess suspend. self color: Color red. ] ] "---------"
But evaluating "BackgroundWorkDisplayMorph new openInWorld" found this exhibited the same problematic behavior you reported... Clicking on the morph worked a few times and then froze the UI until Cmd-. pressed a few times.
However I found the following never locked the GUI.
BackgroundWorkDisplayMorph>>initialize "BackgroundWorkDisplayMorph new openInWorld" | workerProcess running | super initialize. self initializeMorph.
workerProcess := self newWorkerProcess. running := false. [ [ (running := running not) ifTrue: [ workerProcess resume. self color: Color green ] ifFalse: [ workerProcess suspend. self color: Color red ]. 10 milliSeconds asDelay wait. ] repeat ] fork. "---------"
This locks the UI as well. Not every timet hough. I did this 5 times, every time in a freshly loaded image and it happens two times.
So the problem seemed to not be with #suspend/#resume or with the shared variable /interProcessString/. Indeed, since in the worker thread /interProcessString/ is atomically assigned a copy via #asString, and the String never updated, I think there is no need to surround use of it with a critical section.
The solution then was to move the "#resume/#suspend" away from the "#on: #mouseUp send: #value to:" as follows...
BackgroundWorkDisplayMorph>>initialize "BackgroundWorkDisplayMorph new openInWorld" | workerProcess running lastRunning | super initialize. self initializeMorph.
workerProcess := self newWorkerProcess. lastRunning := running := false. [ [ lastRunning = running ifFalse: [ running ifTrue: [ workerProcess resume ] ifFalse: [ workerProcess suspend ]. lastRunning := running. ]. 10 milliSeconds asDelay wait. ] repeat ] fork. self on: #mouseUp send: #value to: [ (running := running not) ifTrue: [ self color: Color green. ] ifFalse: [ self color: Color red. ] ] "---------"
And this too :(
And finally remove the busy loop.
BackgroundWorkDisplayMorph>>initialize "BackgroundWorkDisplayMorph new openInWorld" | workerProcess running lastRunning semaphore | super initialize. self initializeMorph.
workerProcess := self newWorkerProcess. lastRunning := running := false. semaphore := Semaphore new. [ [ semaphore wait. running ifTrue: [ workerProcess resume ] ifFalse: [ workerProcess suspend ]. ] repeat ] fork. self on: #mouseUp send: #value to: [ (running := running not) ifTrue: [ self color: Color green. ] ifFalse: [ self color: Color red. ]. semaphore signal. ] "---------"
And this locks the UI too. (Loaded the code 20 times, every time after a fresh image start up. Two times I got a locked ui after the first two clicks). And I don't understand this code :)
Now I can't say how close that is to how it "should" be done. Its the first time I used sempahores and just what I discovered hacking around. But hey! it works :)
cheers -ben
Nicolai Hess wrote:
I am still struggling with it.
Any ideas?
2014-07-09 11:19 GMT+02:00 Nicolai Hess nicolaihess@web.de:
2014-07-09 2:07 GMT+02:00 Eliot Miranda eliot.miranda@gmail.com:
Hi Nicolai,
On Tue, Jul 8, 2014 at 7:19 AM, Nicolai Hess nicolaihess@web.de wrote:
> I want to create a process doing some work and call #changed on a > Morph. > I want to start/suspend/resume or stop this process. > But sometimes, suspending the process locks the UI-Process, > and I don't know why. Did I miss something or do I have to care when > to call suspend? > > Wrapping the "morph changed" call in > UIManager default defer:[ morph changed]. > Does not change anything. > > Here is an example to reproduce it. > Create the process, > call resume, call supsend. It works, most of the time, > but sometimes, calling suspend locks the ui. > > p:=[[true] whileTrue:[ Transcript crShow: (DateAndTime now > asString). 30 milliSeconds asDelay wait]] newProcess. > p resume. > p suspend. >
If you simply suspend this process at random form a user-priority process you'll never be able to damage the Delay machinery you're using, but chances are you'll suspend the process inside the critical section that Transcript uses to make itself thread-safe, and that'll lock up the Transcript.
Thank you Eliot yes I guessed it locks up the critical section, but I hoped with would not happen if I the use UIManager defer call.
ThreadSafeTranscript>>nextPutAll: value accessSemaphore critical: [stream nextPutAll: value]. ^value
So instead you need to use a semaphore. e.g.
| p s wait | s := Semaphore new. p:=[[true] whileTrue:[wait ifTrue: [s wait]. Transcript crShow: (DateAndTime now asString). 30 milliSeconds asDelay wait]] newProcess. wait := true. 30 milliSeconds asDelay wait. wait := false. s signal
etc...
Is this a common pattern I can find in pharos classes. Or I need some help understanding this. The semaphore wait/signal is used instead of process resume/suspend?
What I want is a process doing repeatly some computation, calls or triggers an update on a morph, and I want to suspend and resume this process.
I would stop this discussion if someone tells me, "No your are doing it wrong, go this way ..", BUT what strikes me: in this example, that reproduces my problem more closely:
|p m s running| running:=false. m:=Morph new color:Color red. s:= StringMorph new. m addMorphBack:s. p:=[[true]whileTrue:[20 milliSeconds asDelay wait. s contents:(DateAndTime now asString). m changed]] newProcess. m on:#mouseUp send:#value to:[ running ifTrue:[p suspend. m color:Color red.] ifFalse:[p resume.m color:Color green.]. running := running not]. m extent:(300@50). m openInWorld
clicking on the morph will stop or resume the process, if it locks up I can still press alt+dot ->
- a Debugger opens but the UI is still not responsive. I can click
with the mouse on the debuggers close icon.
- nothing happens, as the UI is still blocked.
- pressing alt+Dot again, the mouse click on the close icon is
processed and the first debugger window closes
- maybe other debuggers open.
Repeating this steps, at some time the system is *fully* responsive again! And miraculously, it works after that without further blockages. What's happening here?
Nicolai
HTH
regards > Nicolai >
-- best, Eliot
--
Aloha, Eliot
On Sun, Jul 27, 2014 at 1:14 PM, Nicolai Hess nicolaihess@web.de wrote:
2014-07-25 20:05 GMT+02:00 Eliot Miranda eliot.miranda@gmail.com:
Hi Nicolai, Hi Ben,
On Fri, Jul 25, 2014 at 10:55 AM, Nicolai Hess nicolaihess@web.de wrote:
Hi Ben,
I am on Windows too :( So, the fixes does not work (not always) on winddows too. But at least they make it less probable to occure, but it still happens. The most distracting thing is, after the first ui lock, pressing alt+dot, closing the debuggers, pressing alt+dot .... and trying to close the very first debugger, after that, it all works. The UI is responsive again and suspending the process does not block the ui anymore. It "looks like" supsending the process reactivates another process that blocks the UI. And as soon as I terminate this process (alt+dot, close debugger ...) all works. But I really don't know.
if you can run a unix machine (in a VM?) then remember that kill -USR1 pid will cause the VM to print out a stack backtrace of all processes in the image. That can be very useful in debuggng lockups like this.
HTH
Ok, but I don't know if this helps, at least it does not look very helpful to me:)
Then you're not reading it properly. It clearly shows you have a deadlock:
Process 0xb9148c20 priority 40 0xff7907b8 M [] in Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff7907d8 M BlockClosure>ensure: 0xb91502e0: a(n) BlockClosure 0xff7907f8 M Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff790814 M Delay>schedule 0xb91501e4: a(n) Delay 0xff79082c M Delay>wait 0xb91501e4: a(n) Delay 0xff790850 I [] in BackgroundWorkDisplayMorph>initialize 0xb91488b0: a(n) BackgroundWorkDisplayMorph 0xff790870 I [] in BlockClosure>newProcess 0xb9148b40: a(n) BlockClosure
Process 0xb7902630 priority 40 0xff764784 M [] in Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff7647a4 M BlockClosure>ensure: 0xb916b7a4: a(n) BlockClosure 0xff7647c4 M Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff7647e0 M Delay>schedule 0xb916b6a8: a(n) Delay 0xff7647f8 M Delay>wait 0xb916b6a8: a(n) Delay 0xff764818 M WorldState>interCyclePause: 0xb75e8fd8: a(n) WorldState 0xff764834 M WorldState>doOneCycleFor: 0xb75e8fd8: a(n) WorldState 0xff764850 M WorldMorph>doOneCycle 0xb75e8fa4: a(n) WorldMorph 0xff764870 I [] in MorphicUIManager()>? 0xb770ac38: a(n) MorphicUIManager 0xb78cb554 s [] in BlockClosure()>?
SIGUSR1 Mon Jul 28 01:06:16 2014
pharo VM version: 3.9-7 #1 Tue May 6 08:30:23 UTC 2014 gcc 4.8.2 [Production ITHB VM] Built from: NBCoInterpreter NativeBoost-CogPlugin-GuillermoPolito.19 uuid: acc98e51-2fba-4841-a965-2975997bba66 May 6 2014 With: NBCogit NativeBoost-CogPlugin-GuillermoPolito.19 uuid: acc98e51-2fba-4841-a965-2975997bba66 May 6 2014 Revision: https://github.com/pharo-project/pharo-vm.git Commit: ef5832e6f70e5b24e8b9b1f4b8509a62b6c88040 Date: 2014-01-26 15:34:28 +0100 By: Esteban Lorenzano estebanlm@gmail.com Jenkins build #14794 Build host: Linux chindi08 2.6.24-32-xen #1 SMP Mon Dec 3 16:12:25 UTC 2012 i686 i686 i686 GNU/Linux plugin path: /usr/lib/pharo-vm/ [default: /usr/lib/pharo-vm/]
C stack backtrace: /usr/lib/pharo-vm/pharo-vm[0x809ad23] /usr/lib/pharo-vm/pharo-vm[0x809af6e] [0xf7784410] [0xf7784425] /lib/i386-linux-gnu/libc.so.6(__select+0x2d)[0xf763691d] /usr/lib/pharo-vm/pharo-vm(aioPoll+0x13d)[0x809748d] /usr/lib/pharo-vm/vm-display-X11.so(+0xdc85)[0xf71d2c85]
/usr/lib/pharo-vm/pharo-vm(ioRelinquishProcessorForMicroseconds+0x17)[0x8099b57] /usr/lib/pharo-vm/pharo-vm[0x8070685] [0xb6f8dbc3] [0xb6f89700] [0xb7a2650e] [0xb6f895c0]
All Smalltalk process stacks (active first): Process 0xb88376fc priority 10 0xff76c830 M ProcessorScheduler class>idleProcess 0xb7306b08: a(n) ProcessorScheduler class 0xff76c850 I [] in ProcessorScheduler class>startUp 0xb7306b08: a(n) ProcessorScheduler class 0xff76c870 I [] in BlockClosure>newProcess 0xb8837620: a(n) BlockClosure
Process 0xb8838c78 priority 50 0xff768830 I WeakArray class>finalizationProcess 0xb7306cd8: a(n) WeakArray class 0xff768850 I [] in WeakArray class>restartFinalizationProcess 0xb7306cd8: a(n) WeakArray class 0xff768870 I [] in BlockClosure>newProcess 0xb8838b9c: a(n) BlockClosure
Process 0xb9148c20 priority 40 0xff7907b8 M [] in Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff7907d8 M BlockClosure>ensure: 0xb91502e0: a(n) BlockClosure 0xff7907f8 M Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff790814 M Delay>schedule 0xb91501e4: a(n) Delay 0xff79082c M Delay>wait 0xb91501e4: a(n) Delay 0xff790850 I [] in BackgroundWorkDisplayMorph>initialize 0xb91488b0: a(n) BackgroundWorkDisplayMorph 0xff790870 I [] in BlockClosure>newProcess 0xb9148b40: a(n) BlockClosure
Process 0xb7902630 priority 40 0xff764784 M [] in Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff7647a4 M BlockClosure>ensure: 0xb916b7a4: a(n) BlockClosure 0xff7647c4 M Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff7647e0 M Delay>schedule 0xb916b6a8: a(n) Delay 0xff7647f8 M Delay>wait 0xb916b6a8: a(n) Delay 0xff764818 M WorldState>interCyclePause: 0xb75e8fd8: a(n) WorldState 0xff764834 M WorldState>doOneCycleFor: 0xb75e8fd8: a(n) WorldState 0xff764850 M WorldMorph>doOneCycle 0xb75e8fa4: a(n) WorldMorph 0xff764870 I [] in MorphicUIManager()>? 0xb770ac38: a(n) MorphicUIManager 0xb78cb554 s [] in BlockClosure()>?
Process 0xb82f9078 priority 80 0xff765858 M Delay class>handleTimerEvent 0xb8684d08: a(n) Delay class 0xff765870 M Delay class()>? 0xb8684d08: a(n) Delay class 0xb8623474 s [] in Delay class()>? 0xb82f9018 s [] in BlockClosure>newProcess
Process 0xb883735c priority 60 0xff76680c M InputEventFetcher>waitForInput 0xb72f059c: a(n) InputEventFetcher 0xff766830 M InputEventFetcher>eventLoop 0xb72f059c: a(n) InputEventFetcher 0xff766850 I [] in InputEventFetcher>installEventLoop 0xb72f059c: a(n) InputEventFetcher 0xff766870 I [] in BlockClosure>newProcess 0xb8837280: a(n) BlockClosure
Process 0xb8837534 priority 60 0xb8837568 s SmalltalkImage>lowSpaceWatcher 0xb9127478 s [] in SmalltalkImage>installLowSpaceWatcher 0xb88374d4 s [] in BlockClosure>newProcess
Most recent primitives relinquishProcessorForMicroseconds: relinquishProcessorForMicroseconds: relinquishProcessorForMicroseconds: relinquishProcessorForMicroseconds: ~ 200 times
Nicolai
2014-07-25 16:56 GMT+02:00 Ben Coman btc@openinworld.com:
Over the last few days I have been looking deeper into the image locking when suspending a process. It is an interesting rabbit hole [1] that leads to pondering the Delay machinery, that leads to some VM questions.
When pressing the interrupt key it seems to always opens the debugger with the following call stack. Semaphore>>critical: 'self wait' BlockClosure>>ensure: 'self valueNoContextSwitch' Semaphore>>critical: 'ensure: [ caught ifTrue: [self signal]] Delay>>schedule 'AccessProtect critical: [' Delay>>wait 'self schedule' WorldState>>interCyclePause:
I notice... Delay class >> initialize TimingSemaphore := (Smalltalk specialObjectsArray at: 30). and... Delay class >> startTimerEventLoop TimingSemaphore := Semaphore new. which seems incongruous that TimingSemaphore is set in differently. So while I presume this critical stuff all works fine, just in an exotic way, my entropy-guarding-neuron would just like confirm this is so.
In Delay class >> handleTimerEvent the comment says... "Handle a timer event.... -a timer signal (not explicitly specified)" ...is that event perhaps a 'tick' generated periodically by the VM via that item from specialObjectArray ? Or is there some other mechanism ?
[1] http://www.urbandictionary.com/define.php?term=Rabbit+Hole cheers -ben
P.S. I've left the following for some initial context as I change the subject. btw Nicolai, I confirm that my proposed fixes only work on Windows, not Mavericks (and I haven't checked Linux).
Nicolai Hess wrote:
Hi ben, thank you for looking at this.
2014-07-22 20:17 GMT+02:00 btc@openinworld.com:
I thought this might be interesting to learn, so I've gave it a go. I had some success at the end, but I'll give a progressive report.
First I thought I'd try moving the update of StringMorph outside the worker-process using a Morph's #step method as follows...
Morph subclass: #BackgroundWorkDisplayMorph instanceVariableNames: 'interProcessString stringMorph' classVariableNames: '' category: 'BenPlay' "---------"
BackgroundWorkDisplayMorph>>initializeMorph self color: Color red. stringMorph := StringMorph new. self addMorphBack: stringMorph. self extent:(300@50). "---------"
BackgroundWorkDisplayMorph>>newWorkerProcess ^[ | work | work := 0. [ 20 milliSeconds asDelay wait. work := work + 1. interProcessString := work asString. ] repeat. ] newProcess. "---------"
BackgroundWorkDisplayMorph>>step stringMorph contents: interProcessString. "---------"
BackgroundWorkDisplayMorph>>stepTime ^50 "---------"
BackgroundWorkDisplayMorph>>initialize | workerProcess running | super initialize. self initializeMorph.
workerProcess := self newWorkerProcess. running := false. self on: #mouseUp send: #value to: [ (running := running not) ifTrue: [ workerProcess resume. self color: Color green.
] ifFalse: [ workerProcess suspend. self color: Color red. ] ] "---------"
But evaluating "BackgroundWorkDisplayMorph new openInWorld" found this exhibited the same problematic behavior you reported... Clicking on the morph worked a few times and then froze the UI until Cmd-. pressed a few times.
However I found the following never locked the GUI.
BackgroundWorkDisplayMorph>>initialize "BackgroundWorkDisplayMorph new openInWorld" | workerProcess running | super initialize. self initializeMorph.
workerProcess := self newWorkerProcess. running := false. [ [ (running := running not) ifTrue: [ workerProcess resume. self color: Color green ] ifFalse: [ workerProcess suspend. self color: Color red ]. 10 milliSeconds asDelay wait. ] repeat ] fork. "---------"
This locks the UI as well. Not every timet hough. I did this 5 times, every time in a freshly loaded image and it happens two times.
So the problem seemed to not be with #suspend/#resume or with the shared variable /interProcessString/. Indeed, since in the worker thread /interProcessString/ is atomically assigned a copy via #asString, and the String never updated, I think there is no need to surround use of it with a critical section.
The solution then was to move the "#resume/#suspend" away from the "#on: #mouseUp send: #value to:" as follows...
BackgroundWorkDisplayMorph>>initialize "BackgroundWorkDisplayMorph new openInWorld" | workerProcess running lastRunning | super initialize. self initializeMorph.
workerProcess := self newWorkerProcess. lastRunning := running := false. [ [ lastRunning = running ifFalse: [ running ifTrue: [ workerProcess resume ] ifFalse: [ workerProcess suspend ]. lastRunning := running. ]. 10 milliSeconds asDelay wait. ] repeat ] fork. self on: #mouseUp send: #value to: [ (running := running not) ifTrue: [ self color: Color green. ] ifFalse: [ self color: Color red. ] ] "---------"
And this too :(
And finally remove the busy loop.
BackgroundWorkDisplayMorph>>initialize "BackgroundWorkDisplayMorph new openInWorld" | workerProcess running lastRunning semaphore | super initialize. self initializeMorph.
workerProcess := self newWorkerProcess. lastRunning := running := false. semaphore := Semaphore new. [ [ semaphore wait. running ifTrue: [ workerProcess resume ] ifFalse: [ workerProcess suspend ]. ] repeat ] fork. self on: #mouseUp send: #value to: [ (running := running not) ifTrue: [ self color: Color green. ] ifFalse: [ self color: Color red. ]. semaphore signal. ] "---------"
And this locks the UI too. (Loaded the code 20 times, every time after a fresh image start up. Two times I got a locked ui after the first two clicks). And I don't understand this code :)
Now I can't say how close that is to how it "should" be done. Its the first time I used sempahores and just what I discovered hacking around. But hey! it works :)
cheers -ben
Nicolai Hess wrote:
I am still struggling with it.
Any ideas?
2014-07-09 11:19 GMT+02:00 Nicolai Hess nicolaihess@web.de:
2014-07-09 2:07 GMT+02:00 Eliot Miranda eliot.miranda@gmail.com:
Hi Nicolai, > > > On Tue, Jul 8, 2014 at 7:19 AM, Nicolai Hess nicolaihess@web.de > wrote: > >> I want to create a process doing some work and call #changed on a >> Morph. >> I want to start/suspend/resume or stop this process. >> But sometimes, suspending the process locks the UI-Process, >> and I don't know why. Did I miss something or do I have to care >> when to call suspend? >> >> Wrapping the "morph changed" call in >> UIManager default defer:[ morph changed]. >> Does not change anything. >> >> Here is an example to reproduce it. >> Create the process, >> call resume, call supsend. It works, most of the time, >> but sometimes, calling suspend locks the ui. >> >> p:=[[true] whileTrue:[ Transcript crShow: (DateAndTime now >> asString). 30 milliSeconds asDelay wait]] newProcess. >> > p resume. >> p suspend. >> > > If you simply suspend this process at random form a user-priority > process you'll never be able to damage the Delay machinery you're using, > but chances are you'll suspend the process inside the critical section that > Transcript uses to make itself thread-safe, and that'll lock up the > Transcript. >
Thank you Eliot yes I guessed it locks up the critical section, but I hoped with would not happen if I the use UIManager defer call.
> > ThreadSafeTranscript>>nextPutAll: value > accessSemaphore > critical: [stream nextPutAll: value]. > ^value > > So instead you need to use a semaphore. e.g. > > | p s wait | > s := Semaphore new. > p:=[[true] whileTrue:[wait ifTrue: [s wait]. Transcript crShow: > (DateAndTime now asString). 30 milliSeconds asDelay wait]] newProcess. > wait := true. > 30 milliSeconds asDelay wait. > wait := false. > s signal > > etc... >
Is this a common pattern I can find in pharos classes. Or I need some help understanding this. The semaphore wait/signal is used instead of process resume/suspend?
What I want is a process doing repeatly some computation, calls or triggers an update on a morph, and I want to suspend and resume this process.
I would stop this discussion if someone tells me, "No your are doing it wrong, go this way ..", BUT what strikes me: in this example, that reproduces my problem more closely:
|p m s running| running:=false. m:=Morph new color:Color red. s:= StringMorph new. m addMorphBack:s. p:=[[true]whileTrue:[20 milliSeconds asDelay wait. s contents:(DateAndTime now asString). m changed]] newProcess. m on:#mouseUp send:#value to:[ running ifTrue:[p suspend. m color:Color red.] ifFalse:[p resume.m color:Color green.]. running := running not]. m extent:(300@50). m openInWorld
clicking on the morph will stop or resume the process, if it locks up I can still press alt+dot ->
- a Debugger opens but the UI is still not responsive. I can click
with the mouse on the debuggers close icon.
- nothing happens, as the UI is still blocked.
- pressing alt+Dot again, the mouse click on the close icon is
processed and the first debugger window closes
- maybe other debuggers open.
Repeating this steps, at some time the system is *fully* responsive again! And miraculously, it works after that without further blockages. What's happening here?
Nicolai
> > HTH > > regards >> Nicolai >> > > > > -- > best, > Eliot
>
Aloha, Eliot
Then you're not reading it properly. It clearly shows you have a deadlock:
Process 0xb9148c20 priority 40 0xff7907b8 M [] in Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff7907d8 M BlockClosure>ensure: 0xb91502e0: a(n) BlockClosure 0xff7907f8 M Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff790814 M Delay>schedule 0xb91501e4: a(n) Delay 0xff79082c M Delay>wait 0xb91501e4: a(n) Delay 0xff790850 I [] in BackgroundWorkDisplayMorph>initialize 0xb91488b0: a(n) BackgroundWorkDisplayMorph 0xff790870 I [] in BlockClosure>newProcess 0xb9148b40: a(n) BlockClosure
Process 0xb7902630 priority 40 0xff764784 M [] in Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff7647a4 M BlockClosure>ensure: 0xb916b7a4: a(n) BlockClosure 0xff7647c4 M Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff7647e0 M Delay>schedule 0xb916b6a8: a(n) Delay 0xff7647f8 M Delay>wait 0xb916b6a8: a(n) Delay 0xff764818 M WorldState>interCyclePause: 0xb75e8fd8: a(n) WorldState 0xff764834 M WorldState>doOneCycleFor: 0xb75e8fd8: a(n) WorldState 0xff764850 M WorldMorph>doOneCycle 0xb75e8fa4: a(n) WorldMorph 0xff764870 I [] in MorphicUIManager()>? 0xb770ac38: a(n) MorphicUIManager 0xb78cb554 s [] in BlockClosure()>?
Ah, Ok. So, it is not my "misuse" of delays but a bug in Delay>>#schedule, like ben already guessed? Two processes in the same critical section should not happen, right?
Nicolai
Hi Nicolai,
On Jul 27, 2014, at 10:59 PM, Nicolai Hess nicolaihess@web.de wrote:
Then you're not reading it properly. It clearly shows you have a deadlock:
Process 0xb9148c20 priority 40 0xff7907b8 M [] in Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff7907d8 M BlockClosure>ensure: 0xb91502e0: a(n) BlockClosure 0xff7907f8 M Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff790814 M Delay>schedule 0xb91501e4: a(n) Delay 0xff79082c M Delay>wait 0xb91501e4: a(n) Delay 0xff790850 I [] in BackgroundWorkDisplayMorph>initialize 0xb91488b0: a(n) BackgroundWorkDisplayMorph 0xff790870 I [] in BlockClosure>newProcess 0xb9148b40: a(n) BlockClosure
Process 0xb7902630 priority 40 0xff764784 M [] in Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff7647a4 M BlockClosure>ensure: 0xb916b7a4: a(n) BlockClosure 0xff7647c4 M Semaphore>critical: 0xb82f8ef4: a(n) Semaphore 0xff7647e0 M Delay>schedule 0xb916b6a8: a(n) Delay 0xff7647f8 M Delay>wait 0xb916b6a8: a(n) Delay 0xff764818 M WorldState>interCyclePause: 0xb75e8fd8: a(n) WorldState 0xff764834 M WorldState>doOneCycleFor: 0xb75e8fd8: a(n) WorldState 0xff764850 M WorldMorph>doOneCycle 0xb75e8fa4: a(n) WorldMorph 0xff764870 I [] in MorphicUIManager()>? 0xb770ac38: a(n) MorphicUIManager 0xb78cb554 s [] in BlockClosure()>?
Ah, Ok. So, it is not my "misuse" of delays but a bug in Delay>>#schedule, like ben already guessed? Two processes in the same critical section should not happen, right?
Right, and fortunately Ben already provided a solution.
Nicolai
Aloha, Eliot (phone)
vm-dev@lists.squeakfoundation.org