Hi Eliot and Chris,

On 2024-03-28 15:25, Eliot Miranda wrote:

Hi Dave, Hi All,

On Mar 27, 2024, at 6:28 PM, lewis@mail.msen.com wrote:

On 2024-03-27 23:50, Tim Rowledge wrote:

Just an observation that might, maybe, trigger some useful thoughts - I use Seaside quite a lot and I can't really think of any scheduling issues that have ever come up. I can be working doing development work in a Seaside image that is actively serving pages without any problems getting annoying. I don't recall ever digging into Seaside scheduling or process fiddling but perhaps a Clever Thing is being done?

 
 
My possibly useful thought is that your observation is right on target. Process scheduling is not going to be a noticeable problem unless the the Process being scheduled is doing something unreasonably expensive.
 
In the case of the squeaksource.com issue, and possibly also the source.squeak.org image, the unreasonably expensive process may be the save-repository-to-serialized-object-file process. I think it is being triggered each time something changes in the repository, such as for example somebody uploading a new MCZ package. The repository object is very large (but 20 years ago it was very small) and it takes at least a minute to serialize it to disk now. Regardless of the scheduling priority in Squeak, this is going to take some time.
 
I noticed this on squeaksource.com because I was watching the load on dan.box.squeak.org, and right after I pushed an update to one of my projects the system got quite busy. I could see that the image was saving a data.obj copy of the repository after I did the update, and the CPU finally went back to normal a minute or two later when the save was complete.

Although source.squeak.org image works differently, I noticed in a previously saved copy of that image that had gone sluggish that the processes in the process browser seemed to be forked blocks waiting to do repository saves. I don't know if I interpreted this right but I can't help but think that the event-driven repository saves might be problematic.

 
So introduce a queue for save requests and service them in a lower priority process than the processes serving user requests. Interaction is with the in-image model. The lower priority process going saves can elide intervening saves if it gets behind, so the system saves as often as necessary when lightly loaded and as often as possible while prioritizing user responsiveness when heavily loaded.
 
If the model needs to be locked while setialising then take a copy (which will be shallow at the leaves, deep in the branches, because only changeable data needs to be shallow copied), and serialise the copy. The copy operation should be much faster than the serialisation.
 
I'd happily collaborate on this but I need pointers to the code and instructions on how to interact with the running server.
 


Perhaps Chris can comment, I'm not actually familiar with that part of the SqueakSource code. In particular, I don't really know how and when the repository save events happen.

I'm definitely happy to help and to provide access to the squeaksource.com image (requires box-admin access to a login account though). I also have access to the source.squeak.org service, but would want to work through/with Chris on anything that gets done there.

Note that my observations were based on watching files being slowly written to disc while also watching /usr/bin/top. The activity also correlates with log messages written to the ss.log log file, so that's what made me suspect issues with the repository save mechanism.

Dave