Writing a good scheduler is hard. I didn’t mention the Scheduler Wars for nothing. 

Luckily, we have a much simpler problem than a multi-core machine with gazillions of processes. But we also have to admit that Squeak’s default scheduling has atrocious behavior, preempting yield or not. 

That’s because we have no actual scheduler that gives a fair share of time to all runnable processes. Instead, the VM simply gives 100% of CPU time to the runnable process with the highest priority. It will starve any other runnable process unless the high-priority process voluntarily gives up time. That’s “cooperative” scheduling, which was a bad idea in the 70s and unfortunately we never moved on (you may remember that the major reason Macs switched to Unix was scheduling). 

An actual scheduler would give fair time to all runnable processes in the system. It would give more time to higher-priority processes because that’s why they were given a higher priority. But it would still make sure that within a time slice (Linux defaults to 100ms, at least in the old simple scheduler) every runnable process gets at least some time. A runaway process would not freeze the whole system. 

Now, I haven’t thought of the full implications of that for Squeak yet. There is beauty in simplicity, and the cooperative scheduling we have is simple for sure. But it breaks down in cases where you need preemptive scheduling because the processes do not cooperate. 

Vanessa

PS: here’s an article describing the simple scheduler Linux used for its first decade (before its O(n) runtime became a problem):
https://dev.to/satorutakeuchi/a-brief-history-of-the-linux-kernel-s-process-scheduler-the-very-first-scheduler-v0-01-9e4

On Tue, Mar 26, 2024 at 21:29 Chris Muller <ma.chris.m@gmail.com> wrote:
Hi everyone,  :)

First, interesting discussion, thanks for your passionate and stimulating engagement.  I'm reminded of our past debates where I learned a lot over the years challenging your positions, including when you challenged mine.

Given how dug in you seem to be, I don't really see an economic path to consensus this time.  However, I do hope you'll oblige me with another round of exploring the question to see if we can glean any final tidbits.  I perceive your motivation to be to provide Squeak with the most flexibility out-of-the-box with regard to supporting multiple, alternative concurrency disciplines like cooperative.  I respect that, even though I don't think preemptProcessYields (PPY) false achieves that, and my view being motivated by "utility visible from the outside looking in at the Scheduler".  That's why, I tried to read Igors blog, my eyes glazed over, because it's written from the view inside the Scheduler looking out.

My goal now is just to alleviate _new_ concerns your last points have just raised in me.  It was mentioned how the developer can / should write their own Scheduler to handle the same-priority Process selection however they want.  My first question I hope you will help me with is, "If I wrote such a custom scheduler, but made it behave exactly like the current one does when PPY is set true by choosing the longest-suspended Process, would there be something fundamentally wrong or unsafe about that?"

Your answer to this should help me also figure out if you're simply insisting that "the *default setting* MUST be false."?  OR, are you saying, "a Scheduler that chooses the longest-waiting Process within the priority group should never be made, because to do that is fundamentally a bug"?  

Finally, with respect to the idea:

> For example, the server could introduce a high-priority process that wakes up periodically (looping on a delay). On 
> waking up it inspects the current set of running request serving processes, and if the current one has taken more than 
> its time-slice the supervisor process can send yield to it, moving it to the back of the queue.

My final question is:  "What are the possible goals behind the design of the Process-selection algorithm?"  Your suggestion of, if "the current one has taken more than its time-slice" suggests this may be where you only glossed over the details in your mind.  It has to be ready to face the possibility that all the Processes within the priority-group are "overdue", so it's a criteria of order, not explicit selection.  And what you're doing by pretending there is such a thing as a "current" within each priority group is attributing additional artificial "priority" to a single Process within each group even though their stated #priority is no higher than any of the others in their group.  Show me one place in Squeak that mentions the notion of a "current" Process within a priority group.  I would love to be proven wrong, but it doesn't exist.

So if the most generic responsibility of the algorithm is, "what is the BEST Process instance, generically, within a priority-group to run next, relative to each other," then at the top of my list are:

    selection algorithm:  resume the longest-suspended Process within the priority-group
    rationale:  optimize for smoother response-time curve amongst multiple clients

    selection algorithm:  resume the shortest-suspended Process within the priority group
    rationale:   optimize for ???  (<--- anyone?)

    selection algorithm:  resume a Process within the priority group at random
    rationale:   optimize for ???

So what other possible orders are there, and which one is the best choice for generic behavior?

I know time is tight.  For me too.  I hope these questions provoked some useful thoughts.  Thanks for any clarifications.

Best Regards,
  Chris