My general understanding of process scheduling is that it is FIFO within same priority. Indeed Pharo has a test for this ProcessTest>>testSchedulingIsFirstComeFirstServed, which however I thought was a bit fragile and I was looking to revise.
But now I'm curious whether same-priority-process FIFO scheduling is an expected guarantee? On both Pharo 5 build 50560 & Squeak 5 build 15113 the following test fails...
[ 1 to: 100000 do: [ :n | | ranFirst ranSecond | [ [ ranFirst ifNil: [ ranFirst := 1 ] ifNotNil: [ ranSecond := 1] ] forkAt: 78. [ ranFirst ifNil: [ ranFirst := 2 ] ifNotNil: [ ranSecond := 2] ] forkAt: 78. ] forkAt: 79. self assert: ranFirst=1. self assert: ranSecond=2. ] ] fork.
... but only after 20k iterations, which seems an "almost" guarantee. Interestingly, it doesn't fail when run without the outer fork. Also if I take an alternate approach with less message calls...
faults := OrderedCollection new. done := Semaphore new. [ 1 to: 10000000 do: [ :n | | ranFirst ranSecond | [ [ ranFirst ifNil: [ ranFirst := 1 ] ifNotNil: [ ranSecond := 1] ] forkAt: 78. [ ranFirst ifNil: [ ranFirst := 2 ] ifNotNil: [ ranSecond := 2] ] forkAt: 78. ] forkAt: 79. (ranFirst=1 and: ranSecond=2) ifFalse: [ faults add: { n . ranFirst . ranSecond } ]. ]. done signal. ] fork. done wait. faults inspect.
I only get seven failures in 10 million iterations. "an OrderedCollection( #( 258535 2 1) #(1148605 2 1) #(3502820 2 1) #(4010935 2 1) #(4533713 2 1) #(6301878 2 1) #(8497001 2 1))"
So what I'm wondering is if there is a bug to hunt down, or if FIFO scheduling is simply not an expected guarantee?
cheers -ben
On Sun, Jan 31, 2016 at 6:16 PM, Ben Coman btc@openinworld.com wrote:
My general understanding of process scheduling is that it is FIFO within same priority. Indeed Pharo has a test for this ProcessTest>>testSchedulingIsFirstComeFirstServed, which however I thought was a bit fragile and I was looking to revise.
But now I'm curious whether same-priority-process FIFO scheduling is an expected guarantee? On both Pharo 5 build 50560 & Squeak 5 build 15113 the following test fails...
[ 1 to: 100000 do: [ :n | | ranFirst ranSecond | [ [ ranFirst ifNil: [ ranFirst := 1 ] ifNotNil: [ ranSecond := 1] ] forkAt: 78. [ ranFirst ifNil: [ ranFirst := 2 ] ifNotNil: [ ranSecond := 2] ] forkAt: 78. ] forkAt: 79. self assert: ranFirst=1. self assert: ranSecond=2. ] ] fork.
... but only after 20k iterations, which seems an "almost" guarantee. Interestingly, it doesn't fail when run without the outer fork. Also if I take an alternate approach with less message calls...
faults := OrderedCollection new. done := Semaphore new. [ 1 to: 10000000 do: [ :n | | ranFirst ranSecond | [ [ ranFirst ifNil: [ ranFirst := 1 ] ifNotNil: [ ranSecond := 1] ] forkAt: 78. [ ranFirst ifNil: [ ranFirst := 2 ] ifNotNil: [ ranSecond := 2] ] forkAt: 78. ] forkAt: 79. (ranFirst=1 and: ranSecond=2) ifFalse: [ faults add: { n . ranFirst . ranSecond } ]. ]. done signal. ] fork. done wait. faults inspect.
I only get seven failures in 10 million iterations. "an OrderedCollection( #( 258535 2 1) #(1148605 2 1) #(3502820 2 1) #(4010935 2 1) #(4533713 2 1) #(6301878 2 1) #(8497001 2 1))"
So what I'm wondering is if there is a bug to hunt down, or if FIFO scheduling is simply not an expected guarantee?
cheers -ben
A while ago in Pharo I added DelayNullScheduler, which stops the timingPriority(=80) process that schedules delays. After selecting it via World > System > Settings > System > Delay Scheduler I cannot reproduce any of the faults(?) above. (Aside: I just found performance a bit better if DelayNullScheduler>>schedule: first does a "Processor yield")
With the usual delay scheduler operating, the fault(?) can be bypassed by signalling the timingSemaphore...
timingSemaphore := Delay testCaseSupportTimingSemaphore. [ 1 to: 100000 do: [ :n | | ranFirst ranSecond | [ [ ranFirst ifNil: [ ranFirst := 1 ] ifNotNil: [ ranSecond := 1] ] forkAt: 78. [ ranFirst ifNil: [ ranFirst := 2 ] ifNotNil: [ ranSecond := 2] ] forkAt: 78. ] forkAt: 79. self assert: ranFirst=1. self assert: ranSecond=2. ] timingSemaphore signal. ] fork.
The fault(?) can also be bypassed by commenting the call to #intercyclePause: out of WorldState>>doOneCycleFor:
I'd be glad for any insights.
cheers -ben
vm-dev@lists.squeakfoundation.org