On 1/3/2011 10:09 AM, Adrian Lienhard wrote:
I'll try that...
But I still don't understand why the change in Pharo that makes LinkedList>>remove:ifAbsent: non thread safe can cause the problem since this code is executed by the timerEvent process, which runs at the highest priority. This process should never be suspended during the execution of remove:ifAbsent:. What do I miss?
The problem isn't thread-safety, at least in the classical definition. What happens is that if you're removing processes by using LinkedList>>remove: you are subject to a race condition where the semaphore gets signaled *while* you are removing the the process. Obviously, hillarity ensues at this point, which is why I made primitive suspend do the Right Thing (i.e., remove the process primitively). There are two parameters which affect if you're likely to see the effect or not: One is the number of suspension points (real sends) in the method. The more you have, the more likely you're affected. The second one is whether the method can tolerate having the process removed "underneith its feet". Both are far worse in Pharo.
Cheers, - Andreas
Cheers, Adrian
On Dec 31, 2010, at 10:08 , Andreas Raab wrote:
Revert LinkedList>>remove:ifAbsent: back to the version in Squeak and your problems will go away.
Cheers,
- Andreas
On 12/30/2010 11:50 PM, Adrian Lienhard wrote:
Thanks Andreas and David for the responses!
In the meantime I've gathered more information. From the mail of Andreas I assumed that the most likely reason for the freeze is that the timer event loop throws an unhandled exception and therefore gets suspended.
So I added a guard to catch any error in handleTimerEvent, restart the loop, and then pass the exception to open a debugger:
runTimerEventLoop [RunTimerEventLoop] whileTrue: [ [ self handleTimerEvent ] on: Error do: [ :e | self startTimerEventLoop. ...write a warning to stdout... e pass ] ]
And voila, after 10 days or so I got the stack trace below.
I haven't had time to dive into it, but from the stack it seems like a concurrency issue in linked list (although I wonder whether that's possible since the timer event loop runs at the highest priority...).
Maybe something catches somebody's eye.
Cheers, Adrian
THERE_BE_DRAGONS_HERE Error: no such method! 30 December 2010 10:32:28 pm
VM: unix - i686 - linux - Squeak3.10.2 of '5 June 2008' [latest update: #7179] Image: Pharo1.1 [Latest update: #11410]
Semaphore(Object)>>error: Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait) Arguments and temporary variables: t1: 'no such method!' Receiver's instance variables: firstLink: a Process in [] in DelayWaitTimeout>>wait lastLink: a Process in [] in DelayWaitTimeout>>wait excessSignals: 0
[] in Semaphore(LinkedList)>>removeLink: Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait) Arguments and temporary variables:
Receiver's instance variables: firstLink: a Process in [] in DelayWaitTimeout>>wait lastLink: a Process in [] in DelayWaitTimeout>>wait excessSignals: 0
Semaphore(LinkedList)>>removeLink:ifAbsent: Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait) Arguments and temporary variables: aLink: a Process in [] in DelayWaitTimeout>>wait aBlock: [self error: 'no such method!'] tempLink: nil Receiver's instance variables: firstLink: a Process in [] in DelayWaitTimeout>>wait lastLink: a Process in [] in DelayWaitTimeout>>wait excessSignals: 0
Semaphore(LinkedList)>>removeLink: Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait) Arguments and temporary variables: aLink: a Process in [] in DelayWaitTimeout>>wait Receiver's instance variables: firstLink: a Process in [] in DelayWaitTimeout>>wait lastLink: a Process in [] in DelayWaitTimeout>>wait excessSignals: 0
Semaphore(LinkedList)>>remove:ifAbsent: Receiver: a Semaphore(a Process in [] in DelayWaitTimeout>>wait) Arguments and temporary variables: aLinkOrObject: a Process in [] in DelayWaitTimeout>>wait aBlock: [] link: a Process in [] in DelayWaitTimeout>>wait Receiver's instance variables: firstLink: a Process in [] in DelayWaitTimeout>>wait lastLink: a Process in [] in DelayWaitTimeout>>wait excessSignals: 0
Process>>suspend Receiver: a Process in [] in DelayWaitTimeout>>wait Arguments and temporary variables: t1: a Semaphore(a Process in [] in DelayWaitTimeout>>wait) Receiver's instance variables: nextLink: nil suspendedContext: [] in DelayWaitTimeout>>wait priority: 30 myList: a Semaphore(a Process in [] in DelayWaitTimeout>>wait) errorHandler: nil name: 'seaside' env: nil
DelayWaitTimeout>>signalWaitingProcess Receiver: a DelayWaitTimeout(10000 msecs) Arguments and temporary variables:
Receiver's instance variables: delayDuration: 10000 resumptionTime: 217048389 delaySemaphore: a Semaphore(a Process in [] in DelayWaitTimeout>>wait) beingWaitedOn: false process: a Process in [] in DelayWaitTimeout>>wait expired: true
Delay class>>handleTimerEvent Receiver: Delay Arguments and temporary variables: t1: 217128602 t2: nil Receiver's instance variables: superclass: Object methodDict: a MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc... format: 138 instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore' 'beingWa...etc... organization: ('as yet unclassified' adjustResumptionTimeOldBase:newBase: being...etc... subclasses: {MonitorDelay. DelayWaitTimeout} name: #Delay classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a Delay(10 ...etc... sharedPools: nil environment: a SystemDictionary(lots of globals) category: #'Kernel-Processes' traitComposition: {} localSelectors: nil
[] in Delay class>>runTimerEventLoop Receiver: Delay Arguments and temporary variables:
Receiver's instance variables: superclass: Object methodDict: a MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc... format: 138 instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore' 'beingWa...etc... organization: ('as yet unclassified' adjustResumptionTimeOldBase:newBase: being...etc... subclasses: {MonitorDelay. DelayWaitTimeout} name: #Delay classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a Delay(10 ...etc... sharedPools: nil environment: a SystemDictionary(lots of globals) category: #'Kernel-Processes' traitComposition: {} localSelectors: nil
BlockClosure>>on:do: Receiver: [self handleTimerEvent] Arguments and temporary variables: exception: Error handlerAction: [:e | self startTimerEventLoop. FileStream fileNamed: '/dev/...etc... handlerActive: false Receiver's instance variables: outerContext: Delay class>>runTimerEventLoop startpc: 108 numArgs: 0
Delay class>>runTimerEventLoop Receiver: Delay Arguments and temporary variables:
Receiver's instance variables: superclass: Object methodDict: a MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc... format: 138 instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore' 'beingWa...etc... organization: ('as yet unclassified' adjustResumptionTimeOldBase:newBase: being...etc... subclasses: {MonitorDelay. DelayWaitTimeout} name: #Delay classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a Delay(10 ...etc... sharedPools: nil environment: a SystemDictionary(lots of globals) category: #'Kernel-Processes' traitComposition: {} localSelectors: nil
[] in Delay class>>startTimerEventLoop Receiver: Delay Arguments and temporary variables:
Receiver's instance variables: superclass: Object methodDict: a MethodDictionary(#adjustResumptionTimeOldBase:newBase:->(Delay>>#...etc... format: 138 instanceVariables: #('delayDuration' 'resumptionTime' 'delaySemaphore' 'beingWa...etc... organization: ('as yet unclassified' adjustResumptionTimeOldBase:newBase: being...etc... subclasses: {MonitorDelay. DelayWaitTimeout} name: #Delay classPool: a Dictionary(#AccessProtect->a Semaphore() #ActiveDelay->a Delay(10 ...etc... sharedPools: nil environment: a SystemDictionary(lots of globals) category: #'Kernel-Processes' traitComposition: {} localSelectors: nil
[] in BlockClosure>>newProcess Receiver: [self runTimerEventLoop] Arguments and temporary variables:
Receiver's instance variables: outerContext: Delay class>>startTimerEventLoop startpc: 144 numArgs: 0
--- The full stack --- Semaphore(Object)>>error: [] in Semaphore(LinkedList)>>removeLink: Semaphore(LinkedList)>>removeLink:ifAbsent: Semaphore(LinkedList)>>removeLink: Semaphore(LinkedList)>>remove:ifAbsent: Process>>suspend DelayWaitTimeout>>signalWaitingProcess Delay class>>handleTimerEvent [] in Delay class>>runTimerEventLoop BlockClosure>>on:do: Delay class>>runTimerEventLoop [] in Delay class>>startTimerEventLoop [] in BlockClosure>>newProcess