Hi,

Thanks for you example, I could reproduce.

It seems when the VM tries to grow the remember set while there is not enough free space in old space to grow it, it does that error.

In your case, the remember set grows in the middle of a GC, and during GC there is not enough free space in old space to allocate a larger remember set. The full GC includes a scavenge, the scavenge tenures objects leading to a growth of the remembered table, and as old space is not reclaimed yet (later in the full GC phase), there is not enough free space for it. I don't think at this point we can do a scavenge for Remembered table shrinkage (we're already in the middle of a scavenge, which is part of the full GC). Hence I think the best bet is to allocate a new old space memory segment, even though that operation can fail, it's still better than crashing. There are other solutions I can think of but I don't like any of them.

In SpurGenerationScavenger>>growRememberedSet, we have:

...

newObj := manager allocatePinnedSlots: numSlots * 2.

newObj ifNil:

[newObj := manager allocatePinnedSlots: numSlots + 1024.

newObj ifNil:

[self error: 'could not grow remembered set']].

...

If I replace:

self error: 'could not grow remembered set'

by:

(manager growOldSpaceByAtLeast: numSlots + 1024) ifNil: [self error: 'could not grow remembered set'].

newObj := manager allocatePinnedSlots: numSlots + 1024. "cannot fail"

Then your example works (in 5min45sec on my machine).

I would like to have Eliot's opinion before integrating as I am not sure if growing old space in the middle of a scavenge performed during a full GC is a good idea, there might be some strange uncommon interactions with the rest of the GC logic I don't see right now.

Eliot what do you think ?

On Thu, Oct 19, 2017 at 9:52 PM, Phil B <pbpublist@gmail.com> wrote:

Clément,

I was curious as to whether you or Eliot were able to get anything useful from this or not.

Thanks,
Phil

On Oct 13, 2017 2:52 AM, "Clément Bera" <bera.clement@gmail.com> wrote:

On Thu, Oct 12, 2017 at 9:41 PM, Phil B <pbpublist@gmail.com> wrote:

Clément,

On Oct 11, 2017 4:09 AM, "Clément Bera" <bera.clement@gmail.com> wrote:

Hi,

Without a way to reproduce, it is difficult to deal with the problem.

Hopefully, this will allow you to do so: https://github.com/pbella/VmIssueCouldNotGrow

This turned out to be tricky to provide a repo case for since I'm not sure exactly what is triggering it so I reproduced the type of work I'm throwing at the VM (it's a bulk parser/loader) where there's lots of continuous allocation going on with the occasional saving of a result to generate lots of garbage. This should run in 5-10 minutes depending on the speed of your system.

The main caveat is that I'm only able to get this example to reliably reproduce with the included VM with the commented VM parameters applied. So I'm not sure if this is an issue only with this particular VM/parameter combination or if it's just generally a difficult to reproduce issue.

Ok.

Today I am very busy.

I will try to have a look tomorrow, else Eliot said he could have a look next week. 5-10 min means if I want to simulate I must likely will need to start simulation tonight and debug tomorrow morning.

Thanks,
Phil

--
Clément Béra
Pharo consortium engineer
https://clementbera.wordpress.com/
Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq

Clément Béra

Pharo consortium engineer

https://clementbera.wordpress.com/

Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq