Hi Andreas, First I agree with you. My fix should NOT be included in the main VM/image.
It does however unfortunately fix my personal problems. When they first appeared about three months ago I fixed them by compiling the VM without optimisation. That this helped is unfortunately a bad sign, it indicates that the difference between working and non-working code is in the area where the compiler has a right to choose, or a bug in the compiler which is very unlikely, but so is a bug in the interpreter/garbage collector.
It is unfortunate that it fixes my problems but that isolates my problems a lot which is very fortunate. Yes, I'm working with a custom VM. But I managed to run my version of your test and produce a nil even with a stock VM but not a stock image. Thinking about it I don't think my version (with the message send) should be different to yours. Weird.
That my fix does fix my problems however does isolate it. It's something that can stop a root weak object from being collected. That implies that the mark bit is set, and yes that bit should not be set. I very much doubt that my code is setting that bit, that would involve it producing an otherwise good header work with a bad mark bit which is highly unlikely.
Only in one place do I deal with headers, and that is what that part of the test suite that crashes. However the test that crashes does nothing, and I've stepped through the machine code, instruction by instruction, multiple times over a three month period to know this. Actually that specific test verifies that Exupery is not adding anything to the root table when both objects in an assignment are both old. Unfortunately, this also removes the chance of unexpected GCs because a call instruction is definitely noticeable.
So my situation is this: I have a bug that is possibly caused by the garbage collector and I have a fix that works. Unfortunately the fix works for the wrong reasons which is at least enlightening especially with you help. I can continue working using my fix but that leaves the real bug undiscovered. I can also spend more time chasing a better fix. Given that my fix fixes my problem it really isolates the kind of issue which is not the sort of thing that my VM modifications could do, especially as I've single stepped through the machine code I'm running.
Currently, I feel that I should release the next Exupery version with a Linux VM that includes my fix. See how that VM works in real use rather than just under explicit testing for a few weeks. Hope inspiration strikes, or (more likely) a better time comes to chase this bug further. Releasing a modified VM is nessisary to let people play with it without needing to compile the VM themselves.
Oh, the stock VM was a 3.4-2 Linux VM from Ian's site. My compiled VM's were modified versions built from Ned's SourceForge VM branch with the latest version of VMMaker.
Exupery does involve a few VM modifications to run. First, it needs to get the addresses of various VM variables for code generation. Second, it needs to modify the message sending code so it can override methods with compiled code. This is why until I had that fix I assumed the bug was due to my code. However to test rootTable updating I do run global collects frequently, this produces the bug that I see. I run identical code elsewhere without the garbage collect when testing the assignment which does not crash. The test that causes the crash does not update the rootTable, I've checked both by reading the assembly generated and also by single stepping through the machine code while watching the contents of the rootTable (only four entries in this case).
If there is interest, I'm happy to chase this further now. If it isn't impacting anybody else then I'll leave it until a better time. A better time would be when working with the Exupery/VM integration which is the guts of the next release. Or on things that involve GC interaction such as inlining code where type tests need types which are objects which the garbage collector can move.
Bryce