Hi Eliot,

I don't know the answer, but I can say that any changes are always going to be very small, and the serialized squeaksource.com repository occupies 146MB of disc space, so quite large.

It's worth mentioning that I have been looking after the old squeaksource.com service for many years now, and I cannot recall a single case in which I made use of one of the automatically saved data.obj files. For cases in which I did major image updates, I'm pretty sure I just serialized it myself right before completing the update. So, as a backup and recovery mechanism it's not terribly important, and simply turning off the automatic repository saves would hurt nothing.

However, the situation may be quite different on source.squeak.org, where I believe that loading the repository from a data.obj file may be part of the image startup process (I am not sure of this though).

Dave


On 2024-03-29 20:29, Eliot Miranda wrote:

Hi Both,
 
    how long, and how big is a copy of the graph which only shallow copies objects that reference only objects that don't change?

_,,,^..^,,,_ (phone)

On Mar 29, 2024, at 12:56 PM, lewis@mail.msen.com wrote:

Hi Chris,

The method update that you sent is adding a critical section for synchronization, but I see no evidence of synchronization problems in the ss.log log files. Each 'BEGIN SAVING' is always followed by a 'DONE SAVING' so it does not look like we ever have two processes running this method at the same time.

But the method itself is interesting.  There is a 10 times retry loop in it:

            "Because we're serializing a big object while its changing, there is a possibility the serialization will fail."
            triesRemaining := 10. 

On the squeaksource.com server, a repository save takes an average of 5 1/2 minutes most of the time. If the retry loop is handling the case of the repository changing during that 5 1/2 minutes, it seems quite likely that this might happen when people are actively using the system. A 10 times retry policy might well lead to total processing time an the order of an hour if the system was busy and you then fell into the retry loop.

Noticing this, I pulled the log messages out of the ss.log for all repository saves since January 1. I looked at total processing duration from 'BEGIN SAVING' until 'DONE SAVING' and pulled out the top 24 processing times sorted by duration. There are three cases of the repository save running for well over an hour as well as a couple of 45 minute runs. All of these happened between Mar 20 and Mar 24, which is exactly the time frame in which I got worried about slow response time and high CPU load:

2024-03-20T15:19:05.335842+00:00 SSFilesystem DONE SAVING => 0:01:16:51.227981
2024-03-23T05:00:30.784172+00:00 SSFilesystem DONE SAVING => 0:01:14:31.366004
2024-03-20T16:29:24.942607+00:00 SSFilesystem DONE SAVING => 0:01:10:19.604764
2024-03-24T06:24:27.642523+00:00 SSFilesystem DONE SAVING => 0:00:45:40.476352
2024-03-24T05:38:47.164168+00:00 SSFilesystem DONE SAVING => 0:00:45:35.499997
2024-03-09T08:29:10.761091+00:00 SSFilesystem DONE SAVING => 0:00:17:12.286009
2024-03-22T22:32:23.862167+00:00 SSFilesystem DONE SAVING => 0:00:14:33.731995
2024-03-24T21:29:08.624166+00:00 SSFilesystem DONE SAVING => 0:00:12:22.215981
2024-03-09T07:44:09.783081+00:00 SSFilesystem DONE SAVING => 0:00:09:45.765823
2024-03-08T21:35:23.817078+00:00 SSFilesystem DONE SAVING => 0:00:09:44.209998
2024-03-24T14:59:00.770168+00:00 SSFilesystem DONE SAVING => 0:00:08:48.055976
2024-03-08T21:22:01.479093+00:00 SSFilesystem DONE SAVING => 0:00:08:47.844014
2024-03-01T15:08:03.471507+00:00 SSFilesystem DONE SAVING => 0:00:08:37.066421
2024-03-06T00:04:33.281516+00:00 SSFilesystem DONE SAVING => 0:00:08:16.202437
2024-02-01T02:19:37.511613+00:00 SSFilesystem DONE SAVING => 0:00:08:11.774005
2024-03-01T14:50:12.219079+00:00 SSFilesystem DONE SAVING => 0:00:08:05.161995
2024-03-24T15:06:40.470254+00:00 SSFilesystem DONE SAVING => 0:00:07:39.698083
2024-03-01T15:15:41.72317+00:00 SSFilesystem DONE SAVING => 0:00:07:38.246062
2024-03-09T15:10:55.945114+00:00 SSFilesystem DONE SAVING => 0:00:07:30.252033
2024-03-06T00:14:41.921079+00:00 SSFilesystem DONE SAVING => 0:00:07:29.582
2024-03-01T21:17:34.817093+00:00 SSFilesystem DONE SAVING => 0:00:07:21.842016
2024-02-28T00:08:45.99831+00:00 SSFilesystem DONE SAVING => 0:00:06:51.106038
2024-02-05T11:08:41.027609+00:00 SSFilesystem DONE SAVING => 0:00:06:42.185927
2024-02-28T00:18:09.073611+00:00 SSFilesystem DONE SAVING => 0:00:06:41.339976

Dave


On 2024-03-29 05:02, Chris Muller wrote:

Hi Dave,
 
I just downloaded squeaksource.8.image from dan and took a look.  I see you abandoned the PersonalSqueakSource codebase back in Nov-2022.  That's too bad.  Part of what I'd hoped to accomplish with the renovation was not only a more responsive and resilient server, but for the relocation to /ss on source.squeak.org to encourage your and the community's collaboration, where we would eventually get to a point where questions like this:
 
> I'd happily collaborate on this but I need pointers to the code and instructions on how to interact 
> with the running server.
 
would be as universally known and natural as the Inbox process (although maybe that isn't saying much anyway).  Your comment in the unmerge version (SqueakSource.sscom-dtl.1147) mentions merge issues and startup problems.  I would've tried to help if you'd reached out.  Perhaps we can learn and gain just as much remaining forked and cherry-picking from each other what we deem to be most appropriate.  I just noticed the performance improvement from Levente last September.  See, before I dreamt something like that would simply be committed to /ss by him, and maybe it would send an email like with /trunk and /inbox.  Then, we admins could merge fixes into the servers whenever it was worthwhile to do so.

Note that my observations were based on watching files being slowly written to disc while also watching /usr/bin/top. The activity also correlates with log messages written to the ss.log log file, so that's what made me suspect issues with the repository save mechanism.

I don't think saving data.obj was / is related to the client slowness issues.  Why?  Because you're still rightly using SSFilesystem from PersonalSqueakSource (which is good!), which essentially does what Eliot described.  It forks the save at Processor userBackgroundPriority-1 (29), which is lower than client operations (30).  And although there appears to be a bug that will cause other client save operations to be blocked during the long serialization process (see the attached fix for that, if you wish) *read* operations don't wait on any mutex, so should remain completely unblocked.  You'd still see 100% CPU during serialization, yes, but client responsiveness should still be fine due to their (30) level processes preempting the serialization process.
 
 - Chris