bryce@kampjes.demon.co.uk wrote:
You'd need to serialise object creation and accessing the root table in the write barrier. That may be possible without too much work but there's likely to be some overhead.
Yes. Such is the price of shared state concurrency.
Providing a parallel object memory as part of a garbage collector rewrite that speed up single CPU code should be possible. The major design change would be changing the write barrier from a remembered set to card marking. That unfortunately might make it necessary to separate pointer object space from byte storage space.
Given the sensitivity of GC algorithms in real-world situations, rewriting the garbage collector is not my understanding of a practical approach. The overhead of using an atomic allocator/GC mechanism can be reduced to a single decrement and test for the default single-threaded case, so that the price would only be paid when you run multiple threads. That seems like more straight-forward approach in particular if one can tweak the GC parameters to run GC less often (and incur the overhead less often).
Cheers, - Andreas