On 28 February 2012 13:13, Igor Stasenko siguctua@gmail.com wrote:
On 27 February 2012 22:15, Eliot Miranda eliot.miranda@gmail.com wrote:
let me retry *with* the attachment :(
On Mon, Feb 27, 2012 at 12:03 PM, Igor Stasenko siguctua@gmail.com wrote:
On 27 February 2012 10:53, Mariano Martinez Peck marianopeck@gmail.com wrote:
On Mon, Feb 27, 2012 at 5:20 AM, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Mariano,
On Sun, Feb 26, 2012 at 8:58 AM, Mariano Martinez Peck marianopeck@gmail.com wrote:
Hi. I have faced a VM crash while using Nautilus browser. It took me a while, but I finally could make a reproducible crash from image startup. You can find the image here:
https://gforge.inria.fr/frs/download.php/30280/Marea.104-Crash.1.image.zip
What the image is running at startup that causes the crash is:
| nautilus model ui| Nautilus instVarNamed: 'groups' put: nil. model := Nautilus open. ui := model ui. ui groupsButtonAction.
If you need more about the "domain", we can ask Ben, Nautilus developer. From what I can see in GDB, it crashes in #mapStackPages because it does a remap to an OOP that is 0 (zero)
while (theSP <= frameRcvrOffset) { oop = longAt(theSP); if (!((oop & 1))) { longAtput(theSP, remap(oop)); } theSP += BytesPerWord; }
Any ideas?
The image overflows the weakRoots table in scanning stack pages. The weakRoots table registers weak objects for scanning at the end of a GC. It is, unfortunately, fixed size (~2600 entries), and there are lots of WeakMessageSends and WeakAnnouncementSubscriptions on the stack.
I found this using aDebug VM with assert enabled (i.e. compiled with NDEBUG /not/ defined). I increased the table size to 3000 then 6000 before finding it no longer crashed with a weakRoots table size of 12000.
wow, I never imagine about that.
a) Looks like weakRoots' size should be configurable either via a start-up flag or an image header constant (with e.g. vmParameter accessors).
yes, with vmParameter would be nice, like the external semaphore table.
b) overflowing the weakRoots table (and possibly other tables) should probably cause the VM to abort with a useful error message.
please! :)
I have check in the image, before reproducing the bug, and it is not that bad:
WeakMessageSend instanceCount 755. WeakAnnouncementSubscription instanceCount 538
So...maybe when I do the stuff that reproduces the crash there is ANOTHER bug (say a loop for example), that cause to have much more instances of those weak stuff?
hmm.. i hardly believe that UI needs such amount of weak messages to wire the stuff.. but it is hard to tell, since i'm not an author.
Take a look at the attached. It is taken form the image at a point where an incrementalGC is performed when the weakRootTable has 6000 or more elements. It shows a very deep call stack full of WeakAnnouncementSubscriptions.
Also, answering Stephane's question: AFAIK, a weak roots table size is
not liearly depending on the total number of all weak containers in your image.
The number of weak containers does define an upper bound on the size of the table. It doesn't necessarily correlate to how many containers are encountered in an incremental GC.
But i might be wrong. Eliot, can you please explain how this weak roots table populated and what triggers addition of new element(s) to it, and freeing the entry.
So when an incremental GC is performed, any weak collections encountered must be scanned later, after the mark phase of non-objects have completed, so that the GC can discover which elements of weak collections are unmarked and nil these collections. So in markAndTrace any encountered weak objects get added as "roots" to the weakRootsTable. Later (either in incrementalGC or fullGC) the weak table is processed and unmarked referents in the weak arrays in the weak table are nilled. Hence the weak table fills during the mark phase and is emptied in the nilling phase.
But in reading the code more carefully I notice that the weak roots table is not used during a full GC. Instead, during a fullGC nilling is done as each weak container is encountered. I don't understand how this works yet. Anyone care to explain?
yes, i noticed that too.
Weak roots are used only during incremental GC but not full GC, which makes sense:
incrementalGC ... weakRootCount := 0. statSweepCount := statMarkCount := statMkFwdCount := statCompMoveCount := 0. self markPhase: false. self assert: weakRootCount <= WeakRootTableSize. 1 to: weakRootCount do: [:i| self finalizeReference: (weakRoots at: i)]. ...
fullGC ... statSweepCount := statMarkCount := statMkFwdCount := statCompMoveCount := 0. self clearRootsTable. youngStart := self startOfMemory. "process all of memory" self markPhase: true. "Sweep phase returns the number of survivors. Use the up-to-date version instead the one from startup." totalObjectCount := self sweepPhaseForFullGC. ...
The bad thing is that , #markAndTrace: is used in both #incrementalGC and #fullGC and calls: self lastPointerOf: oop recordWeakRoot: true.
and that last 'true' is evil, because it will populate weak roots into table even during full GC which means that a total number of weak containers in a whole image should not be more than (err.. how much you said?) otherwise *!crash!*. Also, note that #fullGC method even doesn't cares to reset weakRootCount to zero, which means that it can be any value, which was the last number of weak containers found during last incremental GC , so its enough to have (1+WeakRootTableSize/2) weak containers discovered by incremental GC, for full GC to start corrupting a memory not saying about higher numbers.
To fix that , a #fullGC should either use different #markAndTrace: method, which won't records weak roots, or actually anything, which will lead to following change:
self lastPointerOf: oop recordWeakRoot: true. to be: self lastPointerOf: oop recordWeakRoot: (fullGC not).
i think it is easy to imagine an image with lots of weak containers, while only few of them (a reasonable number ;) can be discovered during single incremental GC.
Another thing, which i would do is to trigger a full GC, if during incremental GC a weak roots table overflows. But its hard to tell, how easy to implement abortion of marking phase due to table overflow..
And is the weak roots table size limit reasonably good? Needless to say, that nobody likes when system hits the wall of hardcoded limits.
Hmmm... In VisualWorks, which has a two-space copying generational GC there is no weak root table during incremental GC. Instead the list of weak objects is threaded through the corpses left behind in from space. So at least for some GC designs a weak roots table isn't even needed. I will copy this scheme in my new object representation/GC. But for now I think just providing a parameter to determine the maximum size is sufficient.
i actually wonder do we really need this structure at all. an incremental GC works similarly to full one, except that it operates on smaller heap size. so, this piece of code: 1 to: weakRootCount do: [:i| self finalizeReference: (weakRoots at: i)].
can be replaced by heap-walking procedure from:
youngStart to: memoryEnd
finding all weak containers and checking them. (we can even leave the counter, so if count = 0, you don't do heap walk, and if it >0 , you decrement it by 1 when discovering weak container object, so heap walk will terminate once counter will reach 0).
A dirty fix to this would be directly in :
lastPointerOf: oop recordWeakRoot: recordWeakRoot "<Boolean>" .... [recordWeakRoot ifTrue: ["And remember as weak root" weakRootCount := weakRootCount + 1.
( weakRootCount <= WeakRootTableSize) ifTrue: [ weakRoots at: weakRootCount put: oop ] ifFalse: [ self isFullGC ifFalse: [ self handleWeakRootsTableOverflow ] "and just ignore if full gc" ] ].
now if we put:
weakRootCount := WeakRootTableSize + 1.
in #fullGC method , this will result in ignoring weak roots during full gc mark phase.
ohh ... well, since i'm using #isFullGC it actually makes sense to just put:
(recordWeakRoot and: [ isFullGC not ])
to get bug busted. Just not sure, if extra memory read (checking full gc flag ) will impact the performance that much..
actually for performance, its better to actually 'call' #markAndTrace: with right argument, then it will be inlined and compiler will produce no-op there.