The "correct" approach to multi-core systems.

List overview All Threads
Download

newer

older

Proposals for Squeak.org and...

LargeInteger benchmarks?

Michael van der Gulik

23 Feb 2008 23 Feb '08

4:01 a.m.

On 2/22/08, Stephen Pair stephen@pairhome.net wrote:

...

I must say, this is a really impressive development. I really think this is the right way to approach multi-core systems.

I disagree about it being the right approach in the long term.

In the short term, the Hydra VM allows the use of multiple cores without large changes to the core of Squeak, which is good and IMHO the right decision for a quick and reliable solution (for whoever Igor is doing his work for... Qwaq?). The disadvantage with the Hydra VM is that all inter-process communication needs to go through a pipe; this makes sharing objects and synchronising access while still getting good performance more difficult. I can't back up my claims yet; we'll see how Hydra VM works out.

In the long term, a VM that can run its green threads (aka Process) on multiple OS threads (aka pthreads) should be the long-term goal.

I can't imagine that hacking the VM with multiple processes, per-process state and a global VM lock for garbage collection and new object creation would be too difficult. The global VM lock would kill scalability and could make object creation slow, but it should still get some speedup on multi-cored CPUs. More advanced VMs with per-thread eden space would take a bit longer to write.

Gulik.

-- http://people.squeakfoundation.org/person/mikevdg http://gulik.pbwiki.com/

Attachments:

attachment.html (text/html — 2.0 KB)

Show replies by date

Igor Stasenko

23 Feb 23 Feb

4:20 a.m.

On 23/02/2008, Michael van der Gulik mikevdg@gmail.com wrote:

...

On 2/22/08, Stephen Pair stephen@pairhome.net wrote:

...
I must say, this is a really impressive development. I really think this

is the right way to approach multi-core systems.

...
I disagree about it being the right approach in the long term.

In the short term, the Hydra VM allows the use of multiple cores without large changes to the core of Squeak, which is good and IMHO the right decision for a quick and reliable solution (for whoever Igor is doing his work for... Qwaq?). The disadvantage with the Hydra VM is that all inter-process communication needs to go through a pipe; this makes sharing objects and synchronising access while still getting good performance more difficult. I can't back up my claims yet; we'll see how Hydra VM works out.

In the long term, a VM that can run its green threads (aka Process) on multiple OS threads (aka pthreads) should be the long-term goal.

I can't imagine that hacking the VM with multiple processes, per-process state and a global VM lock for garbage collection and new object creation would be too difficult. The global VM lock would kill scalability and could make object creation slow, but it should still get some speedup on multi-cored CPUs. More advanced VMs with per-thread eden space would take a bit longer to write.

The major challenge with multi-core over single shared object memory is writing GC, because GC is the most complex part of squeak VM. Now imagine adding concurrent-aware features to it.. When you'll have such GC, the rest will look like piece of cake :)

P.S. Global lock suck, you need to pick something less disastrous :) I read some papers describing run-time GCs and background GCs, running in separate thread. The question is, that adopting them to current object model may be impossible without changing the model itself.

...

Gulik.

-- http://people.squeakfoundation.org/person/mikevdg http://gulik.pbwiki.com/

-- Best regards, Igor Stasenko AKA sig.

Michael van der Gulik

5:49 a.m.

On 2/23/08, Igor Stasenko siguctua@gmail.com wrote:

...

P.S. Global lock suck, you need to pick something less disastrous :)

I know. It's a simple and implementable solution and would be a good first attempt at making a multi-threaded VM.

Gulik.

-- http://people.squeakfoundation.org/person/mikevdg http://gulik.pbwiki.com/

Jeffrey Straszheim

4:24 a.m.

A compromise approach would be to allow something like Erlang's processes to run on each CPU within the same image. You would still be required to copy any object that pass the process boundary, but the advantages are separate GC's for each process and the only kernel level synchronizations would be via asynch. queues.

It would end up very much like the Hydra model (insofar as I understand it), but without the fully IPC context switch.

To avoid confusion, these processes would not map 1-to-1 to Squeak processes, which would continue as normal. These would be special, uber-cpu processes.

Michael van der Gulik wrote:

...

In the long term, a VM that can run its green threads (aka Process) on multiple OS threads (aka pthreads) should be the long-term goal.

I can't imagine that hacking the VM with multiple processes, per-process state and a global VM lock for garbage collection and new object creation would be too difficult. The global VM lock would kill scalability and could make object creation slow, but it should still get some speedup on multi-cored CPUs. More advanced VMs with per-thread eden space would take a bit longer to write.

-- Jeffrey Straszheim http://straszheim.50megs.com

Joshua Gargus

7 a.m.

On Feb 22, 2008, at 7:01 PM, Michael van der Gulik wrote:

...

On 2/22/08, Stephen Pair stephen@pairhome.net wrote: I must say, this is a really impressive development. I really think this is the right way to approach multi-core systems.

I disagree about it being the right approach in the long term.

In the short term, the Hydra VM allows the use of multiple cores without large changes to the core of Squeak, which is good and IMHO the right decision for a quick and reliable solution (for whoever Igor is doing his work for... Qwaq?). The disadvantage with the Hydra VM is that all inter-process communication needs to go through a pipe;

This is incorrect. There is no inter-process communication because there is only one process with multiple thread, each running a separate VM.

...

this makes sharing objects and synchronising access while still getting good performance more difficult. I can't back up my claims yet; we'll see how Hydra VM works out.

In the long term, a VM that can run its green threads (aka Process) on multiple OS threads (aka pthreads) should be the long-term goal.

This is debatable. Why are you convinced that fine-grained concurrency will not involve a large performance hit due to CPU cache invalidations? I haven't heard a compelling argument that this won't be a problem (and increasingly so, as the number of cores grows). We can't pretend that it takes zero time to make an object available for processing on a different core. As I've said before, I'm willing to be convinced otherwise.

Josh

...

I can't imagine that hacking the VM with multiple processes, per- process state and a global VM lock for garbage collection and new object creation would be too difficult. The global VM lock would kill scalability and could make object creation slow, but it should still get some speedup on multi-cored CPUs. More advanced VMs with per-thread eden space would take a bit longer to write.

Gulik.

-- http://people.squeakfoundation.org/person/mikevdg http://gulik.pbwiki.com/

Michael van der Gulik

8:51 a.m.

On 2/23/08, Joshua Gargus schwa@fastmail.us wrote:

...

On Feb 22, 2008, at 7:01 PM, Michael van der Gulik wrote:

this makes sharing objects and synchronising access while still getting good performance more difficult. I can't back up my claims yet; we'll see how Hydra VM works out.

In the long term, a VM that can run its green threads (aka Process) on multiple OS threads (aka pthreads) should be the long-term goal.

This is debatable. Why are you convinced that fine-grained concurrency will not involve a large performance hit due to CPU cache invalidations? I haven't heard a compelling argument that this won't be a problem (and increasingly so, as the number of cores grows). We can't pretend that it takes zero time to make an object available for processing on a different core. As I've said before, I'm willing to be convinced otherwise.

Equally so, why then would any other concurrent implementation, such as the HydraVM, not also have exactly the same problem. Or why would any other concurrent application not have this problem?

Real operating systems implement some form of processor affinity[1] to keep cache on a single processor. The same could be done for the Squeak scheduler. I'm sure that the scheduling algorithm could be tuned to minimize cache invalidations.

[1] http://en.wikipedia.org/wiki/Processor_affinity

Gulik.

-- http://people.squeakfoundation.org/person/mikevdg http://gulik.pbwiki.com/

Joshua Gargus

10:03 a.m.

On Feb 22, 2008, at 11:51 PM, Michael van der Gulik wrote:

...

On 2/23/08, Joshua Gargus schwa@fastmail.us wrote: On Feb 22, 2008, at 7:01 PM, Michael van der Gulik wrote:

...
...
this makes sharing objects and synchronising access while still getting good performance more difficult. I can't back up my claims yet; we'll see how Hydra VM works out.

In the long term, a VM that can run its green threads (aka Process) on multiple OS threads (aka pthreads) should be the long-term goal.

This is debatable. Why are you convinced that fine-grained concurrency will not involve a large performance hit due to CPU cache invalidations? I haven't heard a compelling argument that this won't be a problem (and increasingly so, as the number of cores grows). We can't pretend that it takes zero time to make an object available for processing on a different core. As I've said before, I'm willing to be convinced otherwise.

Equally so, why then would any other concurrent implementation, such as the HydraVM, not also have exactly the same problem.

Because within HydraVM, each VM has it's own ObjectMemory in a single, contiguous chunk of memory.

Below, you mention processor-affinity. This is certainly necessary, but is orthogonal to the issue. Let's simplify the discussion by assuming that the number of VMs is <= the number of cores, and that each VM is pinned to a different core.

32-bit CPU caches typically work on 4KB pages of memory. You can fit quite a few objects in 4KB. The problem is that is processor A and processor B are operating in the same ObjectMemory, they don't have to even touch the same object to cause cache contention... they merely have to touch objects on the same memory page. Can you provide a formal characterization of worst-case and average-case performance under a variety of application profiles? I wouldn't know where to start.

Happily, HydraVM doesn't have to worry about this, because each thread operates on a separate ObjectMemory.

...

Or why would any other concurrent application not have this problem?

They can, depending on the memory access patterns of the application.

...

Real operating systems implement some form of processor affinity[1] to keep cache on a single processor. The same could be done for the Squeak scheduler. I'm sure that the scheduling algorithm could be tuned to minimize cache invalidations.

As I described above, the problem is not simply ensuring that each thread tends to run on the same processor. I believe that you're overlooking a crucial aspect of real-world processor-affinity schemes: when a Real Operating System pins a process to a particular processor, the memory for that process is only touched by that processor.

I haven't had a chance to take more than a glance at it, but Ulrich Draper from Red Hat has written a paper named "What Every Programmer Should Know About Memory". It's dauntingly comprehensive. (What Every Programmer Should Know About Memory)

It might help to think of a multi-core chip as a set of separate computers connected by a network (I don't have the reference off-hand, but I've seen an Intel whitepaper that explicitly takes this viewpoint). It's expensive and slow to send messages over the network to ensure that my cached version of an object isn't stale. In general, it's better to structure our computation so that we know exactly when memory needs to be touched by multiple processors.

Cheers, Josh

...

[1] http://en.wikipedia.org/wiki/Processor_affinity

Gulik.

-- http://people.squeakfoundation.org/person/mikevdg http://gulik.pbwiki.com/

Michael van der Gulik

10:43 a.m.

On 2/23/08, Joshua Gargus schwa@fastmail.us wrote:

...

On Feb 22, 2008, at 11:51 PM, Michael van der Gulik wrote:

On 2/23/08, Joshua Gargus schwa@fastmail.us wrote:

...
On Feb 22, 2008, at 7:01 PM, Michael van der Gulik wrote:

this makes sharing objects and synchronising access while still getting good performance more difficult. I can't back up my claims yet; we'll see how Hydra VM works out.

In the long term, a VM that can run its green threads (aka Process) on multiple OS threads (aka pthreads) should be the long-term goal.

This is debatable. Why are you convinced that fine-grained concurrency will not involve a large performance hit due to CPU cache invalidations? I haven't heard a compelling argument that this won't be a problem (and increasingly so, as the number of cores grows). We can't pretend that it takes zero time to make an object available for processing on a different core. As I've said before, I'm willing to be convinced otherwise.

Equally so, why then would any other concurrent implementation, such as the HydraVM, not also have exactly the same problem.

Because within HydraVM, each VM has it's own ObjectMemory in a single, contiguous chunk of memory.

Below, you mention processor-affinity. This is certainly necessary, but is orthogonal to the issue. Let's simplify the discussion by assuming that the number of VMs is <= the number of cores, and that each VM is pinned to a different core.

32-bit CPU caches typically work on 4KB pages of memory. You can fit quite a few objects in 4KB. The problem is that is processor A and processor B are operating in the same ObjectMemory, they don't have to even touch the same object to cause cache contention... they merely have to touch objects on the same memory page. Can you provide a formal characterization of worst-case and average-case performance under a variety of application profiles? I wouldn't know where to start.

Well... we'll revisit this when we actually have a VM capable of running a single image on multiple threads.

I haven't had a chance to take more than a glance at it, but Ulrich Draper

...

from Red Hat has written a paper named "What Every Programmer Should Know About Memory". It's dauntingly comprehensive. (What Every Programmer Should Know About Memory) http://people.redhat.com/drepper/cpumemory.pdf

Thanks for the link; I'll read it tomorrow.

Gulik.

-- http://people.squeakfoundation.org/person/mikevdg http://gulik.pbwiki.com/

Stephen Pair

4:37 p.m.

On Sat, Feb 23, 2008 at 4:43 AM, Michael van der Gulik mikevdg@gmail.com wrote:

...

Well... we'll revisit this when we actually have a VM capable of running a single image on multiple threads.

Michael, people here are just trying to help you save a whole lot of work. There is educational value in the work, but you really do need to think about both process affinity and concurrent access to shared memory. Both are equally important (at least for today's architectures). Intel's manuals are all online, all you have to do is read them to get an idea about the cost of concurrent access to shared memory.

- Stephen

Jason Johnson

10:01 a.m.

On Sat, Feb 23, 2008 at 4:01 AM, Michael van der Gulik mikevdg@gmail.com wrote:

...

I disagree about it being the right approach in the long term.

The correct mid-term approach is to do what Erlang did: Have one image, and one OS-thread per *schedular*. Then when new processes run they get a particular scheduler. All IO is non-blocking, etc.

The long term will be to remove the OS threads, as when we have 100's of cores memory sharing simply wont be possible.

...

In the short term, the Hydra VM allows the use of multiple cores without large changes to the core of Squeak, which is good and IMHO the right decision for a quick and reliable solution (for whoever Igor is doing his work for... Qwaq?). The disadvantage with the Hydra VM is that all inter-process communication needs to go through a pipe; this makes sharing objects and synchronising access while still getting good performance more difficult. I can't back up my claims yet; we'll see how Hydra VM works out.

Fine-grained locking should be considered as obsolete as manual memory management (at least at language level. The VM can do it internally so long as it's hidden. Like memory management).

Michael van der Gulik

10:35 a.m.

On 2/23/08, Jason Johnson jason.johnson.081@gmail.com wrote:

...

On Sat, Feb 23, 2008 at 4:01 AM, Michael van der Gulik mikevdg@gmail.com wrote:

...
I disagree about it being the right approach in the long term.

The correct mid-term approach is to do what Erlang did: Have one image, and one OS-thread per *schedular*. Then when new processes run they get a particular scheduler.

I'd agree on that one.

Gulik.

-- http://people.squeakfoundation.org/person/mikevdg http://gulik.pbwiki.com/

Andreas Raab

10:39 a.m.

Jason Johnson wrote:

...

On Sat, Feb 23, 2008 at 4:01 AM, Michael van der Gulik mikevdg@gmail.com wrote:

...
I disagree about it being the right approach in the long term.

The correct mid-term approach is to do what Erlang did: Have one image, and one OS-thread per *scheduler*. Then when new processes run they get a particular scheduler.

What is the advantage of doing this compared to Hydra?

Cheers, - Andreas

Michael van der Gulik

10:49 a.m.

On 2/23/08, Andreas Raab andreas.raab@gmx.de wrote:

...

Jason Johnson wrote:

...
On Sat, Feb 23, 2008 at 4:01 AM, Michael van der Gulik mikevdg@gmail.com wrote:

...
I disagree about it being the right approach in the long term.

The correct mid-term approach is to do what Erlang did: Have one

...
image, and one OS-thread per *scheduler*. Then when new processes run

...
they get a particular scheduler.

What is the advantage of doing this compared to Hydra?

Access to shared objects is much easier. In the above scenario, they're just there - normal objects - that can be used by multiple Processes concurrently. With Hydra, you need some form of inter-image communication, which is a lot more work.

Gulik.

-- http://people.squeakfoundation.org/person/mikevdg http://gulik.pbwiki.com/

Martin Beck

12:55 p.m.

Michael van der Gulik wrote:

...

On 2/23/08, *Andreas Raab* <andreas.raab@gmx.de mailto:andreas.raab@gmx.de> wrote:
Jason Johnson wrote:
 > On Sat, Feb 23, 2008 at 4:01 AM, Michael van der Gulik
 > <mikevdg@gmail.com <mailto:mikevdg@gmail.com>> wrote:
 >> I disagree about it being the right approach in the long term.
 >
 > The correct mid-term approach is to do what Erlang did:  Have one

 > image, and one OS-thread per *scheduler*.  Then when new
processes run

 > they get a particular scheduler.


What is the advantage of doing this compared to Hydra?
Access to shared objects is much easier. In the above scenario, they're just there - normal objects - that can be used by multiple Processes concurrently. With Hydra, you need some form of inter-image communication, which is a lot more work.

Hi,

you forgot that Erlang doesn't even allow for mutable shared objects. It only has processes communicating with each other and variables defined once cannot be changed later on.

Furthermore SMP machines don't scale well for the same reasons global locks don't scale well. Thus some sophisticated techniques are needed. NUMA is one of them and starts to completely separate CPUs and their memory but providing a fast message bus between them.

So while almost any multiprocess architecture not sharing any memory like Erlang and Hydra (?) will be able to compete with this because they only rely on message passing, shared memory architectures will stuck on SMP machines. However, there they will outperform non-shared, I think. Nevertheless, IMHO shared-memory architectures will always stay more complex to develop and program with.

Regards, Martin

Igor Stasenko

1:51 p.m.

In short: Less sharing - less contention. More sharing - more contention.

If you put 2 points on a line and call them 'no sharing' and 'share everything', then any system which allows you run on multiple cores and operate over single domain (be it single memory or multiple standalone memories) is lies somewhere in the middle.

You can pick a starting point from where you moving to that golden point - from 'share everything' or from 'share nothing'. But it's no doubt, no matter from where you started, you will always move towards 'golden' middle point.

-- Best regards, Igor Stasenko AKA sig.

Jason Johnson

2 Mar 2 Mar

9:47 p.m.

On Sat, Feb 23, 2008 at 1:51 PM, Igor Stasenko siguctua@gmail.com wrote:

...

In short: Less sharing - less contention. More sharing - more contention.

If you put 2 points on a line and call them 'no sharing' and 'share everything', then any system which allows you run on multiple cores and operate over single domain (be it single memory or multiple standalone memories) is lies somewhere in the middle.

You can pick a starting point from where you moving to that golden point - from 'share everything' or from 'share nothing'. But it's no doubt, no matter from where you started, you will always move towards 'golden' middle point.

But the question is, where do you make your trade offs. If you take the simple way *for you* then just give access to threading to everyone and let them suffer with the pain of a paradigm too complex to be done correctly.

If you take the way that's simply for *everyone else* then you put this sharing inside the VM in the places it makes since and hide it from the language level (e.g. how at least Erlang does it)

Igor Stasenko

10:19 p.m.

On 02/03/2008, Jason Johnson jason.johnson.081@gmail.com wrote:

...

On Sat, Feb 23, 2008 at 1:51 PM, Igor Stasenko siguctua@gmail.com wrote:

...
In short: Less sharing - less contention. More sharing - more contention.

If you put 2 points on a line and call them 'no sharing' and 'share everything', then any system which allows you run on multiple cores and operate over single domain (be it single memory or multiple standalone memories) is lies somewhere in the middle.

You can pick a starting point from where you moving to that golden point - from 'share everything' or from 'share nothing'. But it's no doubt, no matter from where you started, you will always move towards 'golden' middle point.

But the question is, where do you make your trade offs. If you take the simple way *for you* then just give access to threading to everyone and let them suffer with the pain of a paradigm too complex to be done correctly.

If you take the way that's simply for *everyone else* then you put this sharing inside the VM in the places it makes since and hide it from the language level (e.g. how at least Erlang does it)

I'd vote for *everyone* - put threading control at language side, as everything else in smalltalk. Any 'magic' should be a code which i can read and change, placed in image, not in VM. No-magic is the spirit of smalltalk, after all.

-- Best regards, Igor Stasenko AKA sig.

nicolas cellier

10:35 p.m.

Igor Stasenko a écrit :

...

On 02/03/2008, Jason Johnson jason.johnson.081@gmail.com wrote:

...
But the question is, where do you make your trade offs. If you take the simple way *for you* then just give access to threading to everyone and let them suffer with the pain of a paradigm too complex to be done correctly.

If you take the way that's simply for *everyone else* then you put this sharing inside the VM in the places it makes since and hide it from the language level (e.g. how at least Erlang does it)

I'd vote for *everyone* - put threading control at language side, as everything else in smalltalk. Any 'magic' should be a code which i can read and change, placed in image, not in VM. No-magic is the spirit of smalltalk, after all.

Yes but the spirit is also to build a VM able to hide some low level details like memory allocation... Smalltalk programmers are released from these release problems... Free to concentrate on higher level problems.

Wouldn't this apply to threads too?

Nicolas

Igor Stasenko

10:42 p.m.

On 02/03/2008, nicolas cellier ncellier@ifrance.com wrote:

...

Igor Stasenko a écrit :

...
On 02/03/2008, Jason Johnson jason.johnson.081@gmail.com wrote:

...
...
But the question is, where do you make your trade offs. If you take the simple way *for you* then just give access to threading to everyone and let them suffer with the pain of a paradigm too complex to be done correctly.

If you take the way that's simply for *everyone else* then you put this sharing inside the VM in the places it makes since and hide it from the language level (e.g. how at least Erlang does it)

I'd vote for *everyone* - put threading control at language side, as everything else in smalltalk. Any 'magic' should be a code which i can read and change, placed in image, not in VM. No-magic is the spirit of smalltalk, after all.

Yes but the spirit is also to build a VM able to hide some low level details like memory allocation... Smalltalk programmers are released from these release problems... Free to concentrate on higher level problems.

Wouldn't this apply to threads too?

It is, but developers should be free in choice whether use locking semantics or use vats/islands/E. Simply because there is no single, ultimately best solution for all kinds of parallel computing.

...

Nicolas

-- Best regards, Igor Stasenko AKA sig.

Jason Johnson

22 Mar 22 Mar

9:24 p.m.

On Sun, Mar 2, 2008 at 10:42 PM, Igor Stasenko siguctua@gmail.com wrote:

...

...
Wouldn't this apply to threads too?

Absolutely.

...

It is, but developers should be free in choice whether use locking semantics or use vats/islands/E. Simply because there is no single, ultimately best solution for all kinds of parallel computing.

There is no "best" solution for memory management either. I'm sure it would be trivial to make some kind of memory management scheme that would work better for most applications running on GC's now, but the productivity gain we get from GC's makes it worth it, and we can use that extra time to find better GC algorithms.

Threading is no different; you can certainly find cases where Vats/Messaging/whatever isn't the very best possible solution, but in the vast majority of the cases it will be good enough and it saves the developer huge time.

Andreas Raab

2 Mar 2 Mar

10:50 p.m.

nicolas cellier wrote:

...

Yes but the spirit is also to build a VM able to hide some low level details like memory allocation...

As well as the details of method lookup etc.

...

Smalltalk programmers are released from these release problems... Free to concentrate on higher level problems.

Wouldn't this apply to threads too?

Absolutely. What we need is a *model* of concurrency (just like we have a *model* for managing memory a *model* for sending messages, a *model* for linked stack frames) and then have the VM implement that model of concurrency as effectively as possible.

Cheers, - Andreas

Francisco Garau

3 Mar 3 Mar

11:30 p.m.

"Andreas Raab" said:

...

Absolutely. What we need is a *model* of concurrency (just like we have a *model* for managing memory a *model* for sending messages, a *model* for linked stack frames) and then have the VM implement that model of concurrency as effectively as possible.

Out of curiosity, are those models exposed as Smalltalk objects?

My interest is not to have the most efficient VM implementation but to gain understanding on how the Garbage Collector or the PIC works.

I am thinking about something like Dan Ingalls presented in ESUG 2004 where he showed different Smalltalk versions (72, 76, 80) running on top of Squeak. He mentioned that the overall speed of the interpreted system was comparable to the original one running on the xerox machines.

Cheers, Francisco

Jason Johnson

2 Mar 2 Mar

9:42 p.m.

On Sat, Feb 23, 2008 at 10:39 AM, Andreas Raab andreas.raab@gmx.de wrote:

...

...
The correct mid-term approach is to do what Erlang did: Have one image, and one OS-thread per *scheduler*. Then when new processes run

...
they get a particular scheduler.

What is the advantage of doing this compared to Hydra?

Cheers,

Andreas

Sorry for the delayed response. I'm not familiar with what Hydra is doing and I didn't mean my comment as a comparison. I was simply responding to the comment about what is the best mid/long term approach.

As far as what advantage this approach provides in general: it allows the VM to fully take advantage of multiple threads on a system without exposing "real" threading to the language.

I said this is the best *mid-term* approach because even this wont be tenable once we reach a certain amount of cores. Everyone keeps finding a way to use more cores under the old model, but it's getting more and more complex, at some point it just wont push any further and then we will have to switch completely away from share-memory. At that point having n-threads per CPU probably wont buy anything anymore.

Andreas Raab

10:47 p.m.

Jason Johnson wrote:

...

As far as what advantage this approach provides in general: it allows the VM to fully take advantage of multiple threads on a system without exposing "real" threading to the language.

But that's exactly what Hydra does (thus my question). Hydra uses separate object memories for (as you phrase it) "on OS-thread per scheduler". Therefore it allows all the existing code to continue to run with the current assumptions about threading (i.e., green threads only) and requires new code to be explicit about the concurrency model it assumes (i.e., channels for communication between the images).

Cheers, - Andreas

Jason Johnson

22 Mar 22 Mar

9:27 p.m.

On Sun, Mar 2, 2008 at 10:47 PM, Andreas Raab andreas.raab@gmx.de wrote:

...

But that's exactly what Hydra does (thus my question). Hydra uses separate object memories for (as you phrase it) "on OS-thread per scheduler". Therefore it allows all the existing code to continue to run with the current assumptions about threading (i.e., green threads only) and requires new code to be explicit about the concurrency model it assumes (i.e., channels for communication between the images).

Very cool, sounds like a nice project. Sorry for not being knowledgeable about it, but as I said, I wasn't speaking to Hydra at all but rather answering someone's "in general" question with an "in general" answer. :)

Thanks for the information, I will definitely make a point to look into it.

Sebastian Sastre

23 Feb 23 Feb

2:30 p.m.

Hi Guilk, all,

"correct" is a stong word heavily coupled to the paradigm who has made it born. "Correct" things for procedural processors are procedural languages. We don't have object oriented processors yet. We are not using decent hardware for make this technology to run. We are forced to make trade offs due to lack of better resources.

I'm glad to see that simplicity in the object paradigm is prioritized. I'm skeptic on extremely complex machines. Specially for scale matters.

What Igor made is to create a network of squeaks working in one machine as one. A network scales well. That's a powerful idea. Its simplicity is its strenght.

I think the Hydra concept is a pragmatically brilliant choice,

cheers,

Sebastian Sastre

________________________________

De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Michael van der Gulik Enviado el: Sábado, 23 de Febrero de 2008 00:02 Para: The general-purpose Squeak developers list Asunto: [squeak-dev] The "correct" approach to multi-core systems.

On 2/22/08, Stephen Pair stephen@pairhome.net wrote:

I must say, this is a really impressive development. I really think this is the right way to approach multi-core systems.

I disagree about it being the right approach in the long term. In the short term, the Hydra VM allows the use of multiple cores without large changes to the core of Squeak, which is good and IMHO the right decision for a quick and reliable solution (for whoever Igor is doing his work for... Qwaq?). The disadvantage with the Hydra VM is that all inter-process communication needs to go through a pipe; this makes sharing objects and synchronising access while still getting good performance more difficult. I can't back up my claims yet; we'll see how Hydra VM works out. In the long term, a VM that can run its green threads (aka Process) on multiple OS threads (aka pthreads) should be the long-term goal. I can't imagine that hacking the VM with multiple processes, per-process state and a global VM lock for garbage collection and new object creation would be too difficult. The global VM lock would kill scalability and could make object creation slow, but it should still get some speedup on multi-cored CPUs. More advanced VMs with per-thread eden space would take a bit longer to write. Gulik. -- http://people.squeakfoundation.org/person/mikevdg http://gulik.pbwiki.com/

5902

Age (days ago)

5930

Last active (days ago)

squeak-dev@lists.squeakfoundation.org

25 comments

11 participants

tags (0)

participants (11)

Andreas Raab
Francisco Garau
Igor Stasenko
Jason Johnson
Jeffrey Straszheim
Joshua Gargus
Martin Beck
Michael van der Gulik
nicolas cellier
Sebastian Sastre
Stephen Pair