Multiprocessing with Squeak

List overview All Threads
Download

newer

older

Re: Uploaded squeak-vm 3.11.3

Bulletproof BitBltSimulation...

Levente Uzonyi

27 Jan 2010 27 Jan '10

11:04 p.m.

Hi,

I had an idea a few days ago and even though I don't have the time or knowledge to try it myself, I just can't get it out of my head. The idea is to let an interpreter use two images at once. One of them is read only "fully working" image let's call it S (source), the other is empty (contains no objects), writeable, possibly generated on the fly, let's call it W (working). The vm knows if an object is in S or W by checking the object pointer. Whenever an object in S is about to be modified, a copy is created in W and all references to it are changed to the new one (which means that more than one object might have to be copied). This means a slower startup, but once all necessary objects are copied performance would be normal. (This approach is similar to the way sources are handled today: the sources file is read only, new source code goes to the changes file.)

What are the benefits? - the source image can be shared among interpreters (even vms) - the garbage collector has much less work, since it only has to check the objects in W

How does it help with multiprocessing? - combine it with HydraVM, it might give Erlang-like capabilities (cheap and fast processes) - reduces memory usage if multiple interpreters (vms) use the same source image

Possible caveats? - too many objects might have to be copied after startup (this is solveable, see below) - too many objects might have to be copied overall (this is unlikely, but who knows) - ? (you name it)

Possible enhancements - let the interpreter use a not-empty, possibly user-specified, W image for quick startup

Opinions? Ideas?

Levente

Show replies by date

Colin Putney

29 Jan 29 Jan

7:42 a.m.

On 2010-01-27, at 2:04 PM, Levente Uzonyi wrote:

...

Hi,

I had an idea a few days ago and even though I don't have the time or knowledge to try it myself, I just can't get it out of my head. The idea is to let an interpreter use two images at once. One of them is read only "fully working" image let's call it S (source), the other is empty (contains no objects), writeable, possibly generated on the fly, let's call it W (working). The vm knows if an object is in S or W by checking the object pointer. Whenever an object in S is about to be modified, a copy is created in W and all references to it are changed to the new one (which means that more than one object might have to be copied). This means a slower startup, but once all necessary objects are copied performance would be normal. (This approach is similar to the way sources are handled today: the sources file is read only, new source code goes to the changes file.)

I believe VW can do something like this - they call it "Shared Perm Space." There's a special section of memory that's immutable, not subject to garbage collection, and shared between several VM processes.

...

combine it with HydraVM, it might give Erlang-like capabilities (cheap

and fast processes)

Well, we already have cheap and fast processes. The overhead for creating a new instance of Process and scheduling it is very low. What we lack is isolation between them. Squeak seems to be drifting in that direction, though. Islands are a good start. Josh's recent contribution of futures to the trunk are another step away from shared state concurrency.

My sense of it is that efficient use of memory isn't the most important problem to solve at the moment. Further steps toward event-loop concurrency would be more fruitful.

Colin

Igor Stasenko

9:51 a.m.

On 29 January 2010 08:42, Colin Putney cputney@wiresong.ca wrote:

...

On 2010-01-27, at 2:04 PM, Levente Uzonyi wrote:

...
Hi,

I had an idea a few days ago and even though I don't have the time or knowledge to try it myself, I just can't get it out of my head. The idea is to let an interpreter use two images at once. One of them is read only "fully working" image let's call it S (source), the other is empty (contains no objects), writeable, possibly generated on the fly, let's call it W (working). The vm knows if an object is in S or W by checking the object pointer. Whenever an object in S is about to be modified, a copy is created in W and all references to it are changed to the new one (which means that more than one object might have to be copied). This means a slower startup, but once all necessary objects are copied performance would be normal. (This approach is similar to the way sources are handled today: the sources file is read only, new source code goes to the changes file.)

I believe VW can do something like this - they call it "Shared Perm Space." There's a special section of memory that's immutable, not subject to garbage collection, and shared between several VM processes.

...

combine it with HydraVM, it might give Erlang-like capabilities (cheap

and fast processes)

Well, we already have cheap and fast processes. The overhead for creating a new instance of Process and scheduling it is very low. What we lack is isolation between them. Squeak seems to be drifting in that direction, though. Islands are a good start. Josh's recent contribution of futures to the trunk are another step away from shared state concurrency.

My sense of it is that efficient use of memory isn't the most important problem to solve at the moment. Further steps toward event-loop concurrency would be more fruitful.

Well, at some point we should start using some kind of native-based concurrency, not just green threading. Processes still run on top of a single object space, i.e. all objects are equally reachable from any process since they are using non-concurrent memory model. Oh, nevermind, we had long talks about it in the past , lets not start over again :)

...

Colin

-- Best regards, Igor Stasenko AKA sig.

Josh Gargus

10:22 a.m.

On Jan 29, 2010, at 12:51 AM, Igor Stasenko wrote:

...

On 29 January 2010 08:42, Colin Putney cputney@wiresong.ca wrote:

...
On 2010-01-27, at 2:04 PM, Levente Uzonyi wrote:

...
Hi,

I had an idea a few days ago and even though I don't have the time or knowledge to try it myself, I just can't get it out of my head. The idea is to let an interpreter use two images at once. One of them is read only "fully working" image let's call it S (source), the other is empty (contains no objects), writeable, possibly generated on the fly, let's call it W (working). The vm knows if an object is in S or W by checking the object pointer. Whenever an object in S is about to be modified, a copy is created in W and all references to it are changed to the new one (which means that more than one object might have to be copied). This means a slower startup, but once all necessary objects are copied performance would be normal. (This approach is similar to the way sources are handled today: the sources file is read only, new source code goes to the changes file.)

I believe VW can do something like this - they call it "Shared Perm Space." There's a special section of memory that's immutable, not subject to garbage collection, and shared between several VM processes.

...

combine it with HydraVM, it might give Erlang-like capabilities (cheap

and fast processes)

Well, we already have cheap and fast processes. The overhead for creating a new instance of Process and scheduling it is very low. What we lack is isolation between them. Squeak seems to be drifting in that direction, though. Islands are a good start. Josh's recent contribution of futures to the trunk are another step away from shared state concurrency.

My sense of it is that efficient use of memory isn't the most important problem to solve at the moment. Further steps toward event-loop concurrency would be more fruitful.

Well, at some point we should start using some kind of native-based concurrency, not just green threading. Processes still run on top of a single object space, i.e. all objects are equally reachable from any process since they are using non-concurrent memory model. Oh, nevermind, we had long talks about it in the past , lets not start over again :)

:-)

Modern multi-core/hyper-threaded CPUs present a lot of low-hanging fruit for us to harvest. I was tickled to learn that my new desktop machine can compile a Squeak VM from scratch in 15 seconds it I let it use 10 threads ("make -j 10").

Hydra seems like the easiest way to do so. Luckily, it's both orthogonal and complementary to efforts to facilitate event-loop concurrency. For example, the receiver of a message can be an "eventual reference" (Mark Miller's terminology... I prefer "far-ref") to an object in a different Hydra image: "foo := aRef future bar: baz". This would result in a Promise being assigned to "foo"; it would resolve once the message executed in the remote image, and communicated the result back via a Hydra channel.

Cheers, Josh

...

...
Colin

-- Best regards, Igor Stasenko AKA sig.

Igor Stasenko

11:13 a.m.

On 29 January 2010 11:22, Josh Gargus josh@schwa.ca wrote:

...

On Jan 29, 2010, at 12:51 AM, Igor Stasenko wrote:

...
On 29 January 2010 08:42, Colin Putney cputney@wiresong.ca wrote:

...
On 2010-01-27, at 2:04 PM, Levente Uzonyi wrote:

...
Hi,

I had an idea a few days ago and even though I don't have the time or knowledge to try it myself, I just can't get it out of my head. The idea is to let an interpreter use two images at once. One of them is read only "fully working" image let's call it S (source), the other is empty (contains no objects), writeable, possibly generated on the fly, let's call it W (working). The vm knows if an object is in S or W by checking the object pointer. Whenever an object in S is about to be modified, a copy is created in W and all references to it are changed to the new one (which means that more than one object might have to be copied). This means a slower startup, but once all necessary objects are copied performance would be normal. (This approach is similar to the way sources are handled today: the sources file is read only, new source code goes to the changes file.)

I believe VW can do something like this - they call it "Shared Perm Space." There's a special section of memory that's immutable, not subject to garbage collection, and shared between several VM processes.

...

combine it with HydraVM, it might give Erlang-like capabilities (cheap

and fast processes)

Well, we already have cheap and fast processes. The overhead for creating a new instance of Process and scheduling it is very low. What we lack is isolation between them. Squeak seems to be drifting in that direction, though. Islands are a good start. Josh's recent contribution of futures to the trunk are another step away from shared state concurrency.

My sense of it is that efficient use of memory isn't the most important problem to solve at the moment. Further steps toward event-loop concurrency would be more fruitful.

Well, at some point we should start using some kind of native-based concurrency, not just green threading. Processes still run on top of a single object space, i.e. all objects are equally reachable from any process since they are using non-concurrent memory model. Oh, nevermind, we had long talks about it in the past , lets not start over again :)

:-)

Modern multi-core/hyper-threaded CPUs present a lot of low-hanging fruit for us to harvest. I was tickled to learn that my new desktop machine can compile a Squeak VM from scratch in 15 seconds it I let it use 10 threads ("make -j 10").

Hydra seems like the easiest way to do so. Luckily, it's both orthogonal and complementary to efforts to facilitate event-loop concurrency. For example, the receiver of a message can be an "eventual reference" (Mark Miller's terminology... I prefer "far-ref") to an object in a different Hydra image: "foo := aRef future bar: baz". This would result in a Promise being assigned to "foo"; it would resolve once the message executed in the remote image, and communicated the result back via a Hydra channel.

If you remember, recently i added a primitive in Hydra which could spawn a 'child' object memory based on hand-crafted set of objects from main one. Not much rocket science there, it just cloning a closed object graph, which you are specifying. But, by proceeding with such approach, one could generate an 'islands' on the fly, which could serve for a small sub-task which can run in parallel or on demand. This is much more space efficient than spawning a full image object memory, when all you need is to do only specific set of tasks.

...

Cheers, Josh

...
...
Colin

-- Best regards, Igor Stasenko AKA sig.

-- Best regards, Igor Stasenko AKA sig.

Levente Uzonyi

2:17 p.m.

On Thu, 28 Jan 2010, Colin Putney wrote:

...

On 2010-01-27, at 2:04 PM, Levente Uzonyi wrote:

...
Hi,

I had an idea a few days ago and even though I don't have the time or knowledge to try it myself, I just can't get it out of my head. The idea is to let an interpreter use two images at once. One of them is read only "fully working" image let's call it S (source), the other is empty (contains no objects), writeable, possibly generated on the fly, let's call it W (working). The vm knows if an object is in S or W by checking the object pointer. Whenever an object in S is about to be modified, a copy is created in W and all references to it are changed to the new one (which means that more than one object might have to be copied). This means a slower startup, but once all necessary objects are copied performance would be normal. (This approach is similar to the way sources are handled today: the sources file is read only, new source code goes to the changes file.)

I believe VW can do something like this - they call it "Shared Perm Space." There's a special section of memory that's immutable, not subject to garbage collection, and shared between several VM processes.

After a bit of googling I found this: http://cincomsmalltalk.com/userblogs/runarj/blogView?showComments=true&p... and it looks similar, though it doesn't describe what Shared Perm Space is.

...

...

combine it with HydraVM, it might give Erlang-like capabilities (cheap

and fast processes)

Well, we already have cheap and fast processes. The overhead for creating a new instance of Process and scheduling it is very low. What we lack is isolation between them. Squeak seems to be drifting in that direction, though. Islands are a good start. Josh's recent contribution of futures to the trunk are another step away from shared state concurrency.

Our cheap processes can't do multiprocessing and futures won't help with that.

...

My sense of it is that efficient use of memory isn't the most important problem to solve at the moment. Further steps toward event-loop concurrency would be more fruitful.

I always see people saying: the image is large, we need a kernel image, squeak is bloated, I can't run this large image on a server, etc.

If a vm could do this, running 1000 images on a single server wouldn't hurt much (assuming ~15MB source image and <1MB worker images).

Levente

...

Colin

David T. Lewis

2:41 p.m.

On Fri, Jan 29, 2010 at 02:17:46PM +0100, Levente Uzonyi wrote:

...

On Thu, 28 Jan 2010, Colin Putney wrote:

...
I believe VW can do something like this - they call it "Shared Perm Space." There's a special section of memory that's immutable, not subject to garbage collection, and shared between several VM processes.

After a bit of googling I found this: http://cincomsmalltalk.com/userblogs/runarj/blogView?showComments=true&p... and it looks similar, though it doesn't describe what Shared Perm Space is.

I think this may be more or less that same thing I was trying to demonstrate in the "poor man's multiprocessing" thread: http://lists.squeakfoundation.org/pipermail/squeak-dev/2010-January/143841.h...

The major difference seems to be that Cincom's implementation actually works, whereas mine just crashed a lot of other people's Squeak images ;)

Dave

Eliot Miranda

6:25 p.m.

On Thu, Jan 28, 2010 at 10:42 PM, Colin Putney cputney@wiresong.ca wrote:

...

On 2010-01-27, at 2:04 PM, Levente Uzonyi wrote:

...
Hi,

I had an idea a few days ago and even though I don't have the time or

knowledge to try it myself, I just can't get it out of my head. The idea is to let an interpreter use two images at once. One of them is read only "fully working" image let's call it S (source), the other is empty (contains no objects), writeable, possibly generated on the fly, let's call it W (working). The vm knows if an object is in S or W by checking the object pointer. Whenever an object in S is about to be modified, a copy is created in W and all references to it are changed to the new one (which means that more than one object might have to be copied). This means a slower startup, but once all necessary objects are copied performance would be normal.

...
(This approach is similar to the way sources are handled today: the

sources file is read only, new source code goes to the changes file.)

I believe VW can do something like this - they call it "Shared Perm Space." There's a special section of memory that's immutable, not subject to garbage collection, and shared between several VM processes.

It used to exist and then was broken when Barry Hayes and I added memory mapping of new heap segments back in the late 90's. I was working on bringing it back when I left.

You're almost right (and I'm probably being pedantic; forgive me). PermSpace (not shared) is a third generation that is not collected unless one does a global GC. VW has a scavenger, a stop-the-world mark-sweep collector and an incremental mark-sweep collector. The scavenger collects only new space. The incremental collector, run in short bursts for a few milliseconds under image-level control, collects oldSpace. The stop-the-world collector will collect oldSpace or oldSpace + permSpace. So permSpace is only collected when one does a global stop-the-world collection (globalGarbageCollect) not an oldSpace collection (garbageCollect). To populate permSpace one does a "perm save" which does an otherwise normal image save that sets a bit in the image header that causes the VM to load the entire image into permSpace. One then does a globalGarbageCollect and saves, resulting in an image in which most objects are in permSpace (particularly all classes and methods) but where transient objects (font descriptions loaded at startup etc) are in oldSpace. So the incremental collector, collecting oldSpace, doesn't waste time scan-marking classes and methods, and hence is much more effective.

Shared permSpace extends the scheme by memory mapping an image file's permSpace segment using copy-on-write. So as objects in permSpace are written to pages of the permSpace part of the image file are copied into private memory. No effort is made to do things like cluster class variables (which are the most likely targets of writes into permSpace) together on pages to reduce the amount of copying when writes do occur. A tracer approach would do much better here.

You can infer that memory mapping new oldSpace segments broke shared permSpace because shared permSpace was hacked to map the file at a hard-coded address. I was trying to bring back shared permSpace for 64-bit images (where it would have more impact because 64-bit objects are bigger) by doing things like aligning the object headers of oldSpace objects on a 16-byte boundary and permSpace objects 8 bytes from a 16-byte boundary so that the permSpace test was a tag test (there being 3 bits of immediate tags).

HTH Eliot

...

...

combine it with HydraVM, it might give Erlang-like capabilities (cheap

and fast processes)

Well, we already have cheap and fast processes. The overhead for creating a new instance of Process and scheduling it is very low. What we lack is isolation between them. Squeak seems to be drifting in that direction, though. Islands are a good start. Josh's recent contribution of futures to the trunk are another step away from shared state concurrency.

My sense of it is that efficient use of memory isn't the most important problem to solve at the moment. Further steps toward event-loop concurrency would be more fruitful.

Colin