Hi Clément,

On Wed, May 2, 2018 at 11:42 PM, Clément Bera <bera.clement@gmail.com> wrote:
Hi Eliot,

I am more annoyed about using mmap to get memory at higher addresses and segment positioning than using mmap itself.

Allocating memory at higher addresses:
- is impossible in some platforms such as rumpkernel
- is annoying since it relies on API such as sbrk, which is deprecated in SUSv2 and not present at all in POSIX.

Malloc is not amazing, but it's much more portable. I would rather have something like:

#ifdef mmap

I think the thing to do is something like
- a cross platform define, USE_MALLOC (see e.g. USE_MMAP in the Unix sources)
- either files that implement each interface, e.g.
    sqUnixSpurMMapMemory.c, sqUnixSpurMallocMemory.c
  and a small file that includes one or the other
- or each function implemented twice3, surrounded by #if USE_MALLOC ... #else ... #endif /* USE_MALLOC */

But before you go there, how do you hope to get posix_memalign to answer memory where you need it?  It seems to me that if you go with malloc then you're forced to allocate memory at start-up, because there's no guarantee that memory will appear at higher or lower addresses than the initial alloc.  This was the situation with the original V3 allocator; it did a large allocation, defaulting to 512Mb, and allocated the heap from that.  One couldn't release memory back to the OS, one couldn't have a small footprint and be allowed to grow afterwards, etc.  And the only interface I know that allows address hints is the memory mapping one.

I get that there are platforms for which mmap doesn't work.  But I would suggest that on these platforms one has to do something very different than what makes sense for desktop OSs, and so one has to accommodate their limitations somehow.

1. all primitive functions are above 1024

That's not a problem whatsoever.

I know.  I was just being precise.

2. New space is below all old space segments

Here there's the alignment solution, which improves performance and removes the constraint.

What, that objects are allocated on a 0 modulo 16 byte boundary or an 8 modulo 16 byte boundary?  That's fine, but it wastes 5% of the heap.  And don't we encounter more problems using the alignment solution if we want to do shared segments?  (I guess not; we simply choose the 0 modulo 16 alignment for old space segments).  Have you implemented the alignment solution?

3. the code zone is below new space. This allows isReallyYoungObject: to use two comparisons, instead of three.

isReallyYoungObject: objOop
"Answer if obj is young. Require that obj is non-immediate. Override to filter-out Cog methods"
self assert: (self isNonImmediate: objOop).
^(self oop: objOop isLessThan: newSpaceLimit)
  and: [self oop: objOop isGreaterThanOrEqualTo: newSpaceStart]

I don't think that method would change.

Right.  Sorry.
I think the method isMachineCodeFrame: would change (2 comparisons instead of one).

I dream of a world where young space, code zone and old space segments are in different segments which do not have any position requirement. That way:
- no constraints for platforms like rumpkernel
- no reliance on API such as sbrk.
- quicker segment alloc since 0 can be used as the address (OS allocates segment wherever it wants)
- quicker write barrier with bit check instead of cmp with constant
- growing code zone at runtime is fairly easy (divorce allFrames, alloc new segment and free old one)
- growing new space at runtime is fairly easy (do a tenureAll, alloc new segment and free the old new space segment)

Sure.  Some things to consider:
- boundary checks are frequent.  The ParcPlace code (written by David Ungar and Frank Jackson) was very careful to have as many single boundary checks as possible.

And my own complaint.
- sbrk is regrettable, but an interface like mmap that allows for one to supply a position hint and then doesn't provide a convenient of finding out what the emory map seems to me to be broken without sbrk.
It's just details all right. I will see if I can try that someday.

Right.  And implementing things using subclasses (e.g. of SpurMemoryManager) means we can mix and match and experiment.

P.S.  sorry to have replied so slowly...

On Wed, May 2, 2018, 02:19 Eliot Miranda <eliot.miranda@gmail.com> wrote:
Hi Clémewnt,

   sorry for the late reply...

On Sat, Apr 28, 2018 at 1:57 AM, Clément Bera <bera.clement@gmail.com> wrote:
Hi Eliot, Hi all,

On mac and linux, Spur uses mmap to allocate new segments. The V3 memory manager used malloc instead. I've looked into many other VMs (Javascript and Java), and most of them use posix_memalign (basically malloc where you can ask for specific alignment)

And on Windows it uses VirtualAlloc.  So it is consistent in using memory mapping to allocate segments across the platforms, where available.
I am wondering why we are using mmap over posix_memalign / mallocThe only reason I can find is that Spur always allocate new memory segments at a higher address than past segments to guarantee that young objects are on lower addresses than old objects for the write barrier. Is that correct?

Well, I don't like using malloc because one is layering unnecessarily and hence there is wastage.  Many malloc implementations are optimized for small block sizes and allocating a huge block
- may have a segment allocated all to itself
- won't necessarily be on a page boundary (especially on systems with very large pages)

Assuming it is correct, let's say I change Spur to implement the write barrier differently (typically, I change all objects to be aligned on 128 bits instead of 64 and have different allocation alignment for young (128 bits alignment) and old objects(128+64 bits alignment)). Will we be able to use posix_memalign / malloc to allocate new memory segment if I do that ?

Sure, but why?  Given that using mmap/VirtualAlloc gives page alignment, one is going to get alignment up to at least 256 bytes (ancient VAX page size) and more typically 4k bytes (x86/x86_64) .
Or does the VM rely on segments being on higher addresses for other reasons ? For example, does the VM assume CogMethods are on lower addresses than objects on heap and rely on it to check if a stack frame is mframe or iframe ? 

Well indeed being able to reply on ordering makes the boundary checks in the store checks simpler.  I think you wrote a blog post on this so you;ve actually captured this info before.  But to reiterate, the Cog and Stack VM assumes the following memory orderings:

1. all primitive functions are above 1024.  This allows the quick primitives to be stored in the method cache with a primitive function pointer that is their index and for executeNewMethod et al to compare the primitiveFunctionPointer against MaxQuickPrimitiveIndex and dispatch to quickPrimitiveResponse

2. New space is below all old space segments, and is immediately below the first old space segment.  This allows isOldObject:/isYoungObject: et al to compare an oop against newSpaceLimit/oldSpaceStart/nilObj (yes we have three different names for exactly the same value; we only need two; the fact that nilObj = oldSpaceStart is incidental).

3. the code zone is below new space.  This allows isReallyYoungObject: to use two comparisons, instead of three.

So let me ask you the corollary.  Why, if mmap/VirtualAlloc provides memory aligned on a page boundary, with no overhead, and control over placement, why would one use posix_memalign or malloc to allocate memory?

best, Eliot

best, Eliot