Multy-core CPUs

List overview All Threads
Download

newer

older

Fwd: OLPC News 2007-10-27

Nesting DynamicVariables

gruntfuttuck

17 Oct 2007 17 Oct '07

11:10 a.m.

How is squeak going to handle multy-core CPUs, if at all? If we see cores of 100 plus in the future and squeak stay as it is, I would imagine other languages such as erlang, will look more attractive.

-- View this message in context: http://www.nabble.com/Multy-core-CPUs-tf4639074.html#a13249733 Sent from the Squeak - Dev mailing list archive at Nabble.com.

Show replies by date

Sebastian Sastre

17 Oct 17 Oct

5:40 p.m.

This is not my area but I imagine that somehow Squeak processes should map to OS native threads paralellizable by each of the cores. Any chance to Exupery be of some help on that? I ask because if it is then is a must for that future.

regards,

Sebastian Sastre

...

-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de gruntfuttuck Enviado el: Miércoles, 17 de Octubre de 2007 06:10 Para: squeak-dev@lists.squeakfoundation.org Asunto: Multy-core CPUs

How is squeak going to handle multy-core CPUs, if at all? If we see cores of 100 plus in the future and squeak stay as it is, I would imagine other languages such as erlang, will look more attractive. -- View this message in context: http://www.nabble.com/Multy-core-CPUs-tf4639074.html#a13249733 Sent from the Squeak - Dev mailing list archive at Nabble.com.

Steve Wart

10:25 p.m.

I don't know if mapping Smalltalk processes to native threads is the way to go, given the pain I've seen in the Java and C# space.

What might be interesting is to develop low-level primitives (along the lines of the famed map/reduce operations) that provide parallel processing versions of commonly used collection functions.

No idea how easy this would be to do, but on the surface seems more promising than trying to do process/thread jiggery pokery.

Steve

On 10/17/07, Sebastian Sastre ssastre@seaswork.com wrote:

...

This is not my area but I imagine that somehow Squeak processes should map to OS native threads paralellizable by each of the cores. Any chance to Exupery be of some help on that? I ask because if it is then is a must for that future.
    regards,
Sebastian Sastre

...
-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de gruntfuttuck Enviado el: Miércoles, 17 de Octubre de 2007 06:10 Para: squeak-dev@lists.squeakfoundation.org Asunto: Multy-core CPUs

How is squeak going to handle multy-core CPUs, if at all? If we see cores of 100 plus in the future and squeak stay as it is, I would imagine other languages such as erlang, will look more attractive. -- View this message in context: http://www.nabble.com/Multy-core-CPUs-tf4639074.html#a13249733 Sent from the Squeak - Dev mailing list archive at Nabble.com.

Sebastian Sastre

18 Oct 18 Oct

2:51 a.m.

Mmmm original, yeah thats a very different approach. Hard to say which one is best. But for your comments, maybe the primitives way is a path *better* for the human beigns that program the system in terms of easing that pain.

Question: that would be a path that prioritizes usability at samlltalk developer level? if so, for me is more interesting even if it is less efficient than the other in terms of a couple of more or less [whatever measure unit] per second that in one year will be duplicated with a cpu with 2 more cores for a few u$s.

Not prioritizing usability and intelectual ergonomy is equal to not geting the point of all this smalltalk thing, perhaps even more.. all TI thing. Just a thought.

I'm quite sure that multicore this is the begining of a new crisis for the industry. But is a good one!

cheers,

Sebastian Sastre

_____

De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Steve Wart Enviado el: Miércoles, 17 de Octubre de 2007 17:26 Para: The general-purpose Squeak developers list Asunto: Re: Multy-core CPUs

I don't know if mapping Smalltalk processes to native threads is the way to go, given the pain I've seen in the Java and C# space.

What might be interesting is to develop low-level primitives (along the lines of the famed map/reduce operations) that provide parallel processing versions of commonly used collection functions.

No idea how easy this would be to do, but on the surface seems more promising than trying to do process/thread jiggery pokery.

Steve

On 10/17/07, Sebastian Sastre ssastre@seaswork.com wrote:

regards,

Sebastian Sastre

...

-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org

mailto:squeak-dev-bounces@lists.squeakfoundation.org

...

[mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de gruntfuttuck Enviado el: Miércoles, 17 de Octubre de 2007 06:10 Para: squeak-dev@lists.squeakfoundation.org Asunto: Multy-core CPUs

How is squeak going to handle multy-core CPUs, if at all? If we see cores of 100 plus in the future and squeak stay as it is, I would imagine other languages such as erlang, will look more attractive. -- View this message in context: http://www.nabble.com/Multy-core-CPUs-tf4639074.html#a13249733 Sent from the Squeak - Dev mailing list archive at Nabble.com.

Hans-Martin Mosner

8:12 a.m.

Steve Wart schrieb:

...

I don't know if mapping Smalltalk processes to native threads is the way to go, given the pain I've seen in the Java and C# space.

What might be interesting is to develop low-level primitives (along the lines of the famed map/reduce operations) that provide parallel processing versions of commonly used collection functions.

No idea how easy this would be to do, but on the surface seems more promising than trying to do process/thread jiggery pokery.

This would only make things more complicated since then the primitives would have to start parallel native threads working on the same object memory. The problem with native threads is that the current object memory is not designed to work with multiple independent mutator threads. There are GC algorithms which work with parallel threads, but AFAIK they all have quite some overhead relative to the single-thread situation.

IMO, a combination of native threads and green threads would be the best (although it still has the problem of parallel GC): The VM runs a small fixed number of native threads (default: number of available cores, but could be a little more to efficiently handle blocking calls to external functions) which compete for the runnable Smalltalk processes. That way, a number of processes could be active at any one time instead of just one. The synchronization overhead in the process-switching primitives should be negligible compared to the overhead needed for GC synchronization.

The simple yet efficient ObjectMemory of current Squeak can not be used with parallel threads (at least not without significant synchronization overhead). AFAIK, efficient algorithms require every thread to have its own object allocation area to avoid contention on object allocations. Tenuring (making young objects old) and storing new objects into old objects (remembered table) require synchronization. In other words, grafting a threadsafe object memory onto Squeak would be a major project.

In contrast, for a significant subset of applications (servers) it is orders of magnitudes simpler to run several images in parallel. Those images don't stomp on each other's object memory, so there is absolutely no synchronization overhead. For stateful sessions, a front end can handle routing requests to the image which currently holds a session's state, stateless requests can be handled by any image.

Cheers, Hans-Martin

Robert Withers

6:06 p.m.

On Oct 17, 2007, at 11:12 PM, Hans-Martin Mosner wrote:

...

This would only make things more complicated since then the primitives would have to start parallel native threads working on the same object memory. The problem with native threads is that the current object memory is not designed to work with multiple independent mutator threads. There are GC algorithms which work with parallel threads, but AFAIK they all have quite some overhead relative to the single-thread situation.

IMO, a combination of native threads and green threads would be the best (although it still has the problem of parallel GC): The VM runs a small fixed number of native threads (default: number of available cores, but could be a little more to efficiently handle blocking calls to external functions) which compete for the runnable Smalltalk processes. That way, a number of processes could be active at any one time instead of just one. The synchronization overhead in the process-switching primitives should be negligible compared to the overhead needed for GC synchronization.

This is exactly what I have started work on. I want to use the foundations of SqueakElib as a msg passing mechanism between objects assigned to different native threads. There would be one native thread per core. I am currently trying to understand what to do with all of the global variables used in the interp loop, so I can have multiple threads running that code. I have given very little thought to what would need to be protected in the object memory or in the primitives. I take this very much as a learning project. Just think, I'll be able to see how the interpreter works, the object memory, bytecode dispatch, primitives....all of it in fact. If I can come out with a working system that does msg passing, even at the cost of poorly performing object memory, et al., then it will be a major success for me.

It is going to be slower, anyway, because I have to intercept each msg send as a possible non-local send. To this end, the Macro Transforms had to be disabled so I could intercept them. The system slowed considerably. I hope to speed them up with runtime info: is the receiver in the same thread that's running?

I do appreciate your comments and know that I may be wasting my time. :)

...

The simple yet efficient ObjectMemory of current Squeak can not be used with parallel threads (at least not without significant synchronization overhead). AFAIK, efficient algorithms require every thread to have its own object allocation area to avoid contention on object allocations. Tenuring (making young objects old) and storing new objects into old objects (remembered table) require synchronization. In other words, grafting a threadsafe object memory onto Squeak would be a major project.

In contrast, for a significant subset of applications (servers) it is orders of magnitudes simpler to run several images in parallel. Those images don't stomp on each other's object memory, so there is absolutely no synchronization overhead. For stateful sessions, a front end can handle routing requests to the image which currently holds a session's state, stateless requests can be handled by any image.

Cheers, Hans-Martin

John M McIntosh

6:49 p.m.

...

I am currently trying to understand what to do with all of the global variables used in the interp loop, so I can have multiple threads running that code.

Ah, well my intent was to ensure there was no globals, however there are a few because.

sqInt extraVMMemory; /* Historical reasons for mac os-9 setup, not needed as a global now*/ sqInt (*compilerHooks[16])(); /* earlier versions of the code warrior compiler had issues when you stuck this in a structure, should be fixed now usqInt memory; /* There where some usage of memory in ccode: constructs in the in interp, I think these might be gone now void* showSurfaceFn; /* not sure about this one

struct VirtualMachine* interpreterProxy; /* This points to the interpreterProxy, It's there historically to allow direct linking from support code, but really you should use an accessor.

The rest are set to values which you can't do in a struct, however somewhere in or before the readImageFromFileHeapSizeStartingAt you could allocate the foo structure and initialize these values. There of course is some messy setup code in the VM that might refer to procedures in interp.c before an image is loaded of course, that is poor practice, you would need to root that out.

char* obsoleteIndexedPrimitiveTable[][3] = { const char* obsoleteNamedPrimitiveTable[][3] = { void *primitiveTable[577] = { const char *interpreterVersion =

-- ======================================================================== === John M. McIntosh johnmci@smalltalkconsulting.com Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== ===

Robert Withers

7:57 p.m.

Thanks John! I'll save this for when I actually start looking at, even though I said I already was. I am reading Tim's chapter to get familiar with it all. I'm going to need to add a word to the object header, I think.

cheers, Rob

On Oct 18, 2007, at 9:49 AM, John M McIntosh wrote:

...

...
I am currently trying to understand what to do with all of the global variables used in the interp loop, so I can have multiple threads running that code.

Ah, well my intent was to ensure there was no globals, however there are a few because.

sqInt extraVMMemory; /* Historical reasons for mac os-9 setup, not needed as a global now*/ sqInt (*compilerHooks[16])(); /* earlier versions of the code warrior compiler had issues when you stuck this in a structure, should be fixed now usqInt memory; /* There where some usage of memory in ccode: constructs in the in interp, I think these might be gone now void* showSurfaceFn; /* not sure about this one

struct VirtualMachine* interpreterProxy; /* This points to the interpreterProxy, It's there historically to allow direct linking from support code, but really you should use an accessor.

The rest are set to values which you can't do in a struct, however somewhere in or before the readImageFromFileHeapSizeStartingAt you could allocate the foo structure and initialize these values. There of course is some messy setup code in the VM that might refer to procedures in interp.c before an image is loaded of course, that is poor practice, you would need to root that out.

char* obsoleteIndexedPrimitiveTable[][3] = { const char* obsoleteNamedPrimitiveTable[][3] = { void *primitiveTable[577] = { const char *interpreterVersion =

--

===== John M. McIntosh johnmci@smalltalkconsulting.com Corporate Smalltalk Consulting Ltd. http:// www.smalltalkconsulting.com ====================================================================== =====

Joshua Gargus

7:01 p.m.

On Oct 18, 2007, at 9:06 AM, Robert Withers wrote:

...

On Oct 17, 2007, at 11:12 PM, Hans-Martin Mosner wrote:

...
This would only make things more complicated since then the primitives would have to start parallel native threads working on the same object memory. The problem with native threads is that the current object memory is not designed to work with multiple independent mutator threads. There are GC algorithms which work with parallel threads, but AFAIK they all have quite some overhead relative to the single-thread situation.

IMO, a combination of native threads and green threads would be the best (although it still has the problem of parallel GC): The VM runs a small fixed number of native threads (default: number of available cores, but could be a little more to efficiently handle blocking calls to external functions) which compete for the runnable Smalltalk processes. That way, a number of processes could be active at any one time instead of just one. The synchronization overhead in the process-switching primitives should be negligible compared to the overhead needed for GC synchronization.

This is exactly what I have started work on. I want to use the foundations of SqueakElib as a msg passing mechanism between objects assigned to different native threads. There would be one native thread per core. I am currently trying to understand what to do with all of the global variables used in the interp loop, so I can have multiple threads running that code. I have given very little thought to what would need to be protected in the object memory or in the primitives. I take this very much as a learning project. Just think, I'll be able to see how the interpreter works, the object memory, bytecode dispatch, primitives....all of it in fact. If I can come out with a working system that does msg passing, even at the cost of poorly performing object memory, et al., then it will be a major success for me.

It is going to be slower, anyway, because I have to intercept each msg send as a possible non-local send.

Isn't this a show-stopper for a practical system? Or is this a stepping-stone? If so, how do you envision resolving this in the future?

FWIW, Croquet was at one time envisioned to work in the way that you describe. The architects weren't able to produce a satisfactory design/implementation within the necessary time frame, and instead developed the current "Islands" mechanism. This has worked out very well in practice, and there is no pressing need to try to implement the original idea.

In my understanding, Croquet islands and E vats are quite similar in that regard (and the latter informed the design of the former)... both use an explicit far-ref proxy to an object in another island/ vat. What is the motivation for the approach you have chosen, other than it being a fun learning process (which may certainly be a good enough reason on its own)?

Cheers, Josh

...

To this end, the Macro Transforms had to be disabled so I could intercept them. The system slowed considerably. I hope to speed them up with runtime info: is the receiver in the same thread that's running?

I do appreciate your comments and know that I may be wasting my time. :)

...
The simple yet efficient ObjectMemory of current Squeak can not be used with parallel threads (at least not without significant synchronization overhead). AFAIK, efficient algorithms require every thread to have its own object allocation area to avoid contention on object allocations. Tenuring (making young objects old) and storing new objects into old objects (remembered table) require synchronization. In other words, grafting a threadsafe object memory onto Squeak would be a major project.

In contrast, for a significant subset of applications (servers) it is orders of magnitudes simpler to run several images in parallel. Those images don't stomp on each other's object memory, so there is absolutely no synchronization overhead. For stateful sessions, a front end can handle routing requests to the image which currently holds a session's state, stateless requests can be handled by any image.

Cheers, Hans-Martin

Robert Withers

7:46 p.m.

On Oct 18, 2007, at 10:01 AM, Joshua Gargus wrote:

...

On Oct 18, 2007, at 9:06 AM, Robert Withers wrote:

...
On Oct 17, 2007, at 11:12 PM, Hans-Martin Mosner wrote:

...
This would only make things more complicated since then the primitives would have to start parallel native threads working on the same object memory. The problem with native threads is that the current object memory is not designed to work with multiple independent mutator threads. There are GC algorithms which work with parallel threads, but AFAIK they all have quite some overhead relative to the single-thread situation.

IMO, a combination of native threads and green threads would be the best (although it still has the problem of parallel GC): The VM runs a small fixed number of native threads (default: number of available cores, but could be a little more to efficiently handle blocking calls to external functions) which compete for the runnable Smalltalk processes. That way, a number of processes could be active at any one time instead of just one. The synchronization overhead in the process-switching primitives should be negligible compared to the overhead needed for GC synchronization.

This is exactly what I have started work on. I want to use the foundations of SqueakElib as a msg passing mechanism between objects assigned to different native threads. There would be one native thread per core. I am currently trying to understand what to do with all of the global variables used in the interp loop, so I can have multiple threads running that code. I have given very little thought to what would need to be protected in the object memory or in the primitives. I take this very much as a learning project. Just think, I'll be able to see how the interpreter works, the object memory, bytecode dispatch, primitives....all of it in fact. If I can come out with a working system that does msg passing, even at the cost of poorly performing object memory, et al., then it will be a major success for me.

It is going to be slower, anyway, because I have to intercept each msg send as a possible non-local send.

Isn't this a show-stopper for a practical system?

Probably. Although, if a single thread executes code slower than current squeak, yet all threads together generate higher throughput, then it's to be considered faster.

...

Or is this a stepping-stone?

It is a stepping-stone to see what inter-thread messaging looks like and behaves.

...

If so, how do you envision resolving this in the future?

My thinking is that getting the messaging working is the first step, followed by looking at synchronization problems, and then looking at what things like Exupery may offer to speed things up.

The example I gave of MacroTransforms is telling. Currently an #ifTrue: message is macro transformed into bytecodes that do the #ifTrue: inline. I have had to back that out so the #ifTrue: can be intercepted if the receiver is non-local. At runtime, it would be nice to see that if the receiver is in fact local, then some form of inlining could be used, otherwise intercept. Since this is runtime selected bytecodes, I thought of Exupery.

I think there could be lots of interesting optimization work if the basic system if functional.

...

FWIW, Croquet was at one time envisioned to work in the way that you describe. The architects weren't able to produce a satisfactory design/implementation within the necessary time frame, and instead developed the current "Islands" mechanism. This has worked out very well in practice, and there is no pressing need to try to implement the original idea.

I didn't know that, that's cool. Islands is neat.

...

In my understanding, Croquet islands and E vats are quite similar in that regard (and the latter informed the design of the former)... both use an explicit far-ref proxy to an object in another island/vat. What is the motivation for the approach you have chosen, other than it being a fun learning process (which may certainly be a good enough reason on its own)?

As I described above, maybe it's a stepping-stone. Having a thread- based vat, means there are resolved refs like NearRef (same thread), ThreadRef (same process/mem, different thread), possibly ProcessRef (different process, uses pipes), FarRef (on the net).

I'm not very experienced with the vm/object memory, so this is also a fun learning experience!

Cheers, Rob

...

Cheers, Josh

...
To this end, the Macro Transforms had to be disabled so I could intercept them. The system slowed considerably. I hope to speed them up with runtime info: is the receiver in the same thread that's running?

I do appreciate your comments and know that I may be wasting my time. :)

...
The simple yet efficient ObjectMemory of current Squeak can not be used with parallel threads (at least not without significant synchronization overhead). AFAIK, efficient algorithms require every thread to have its own object allocation area to avoid contention on object allocations. Tenuring (making young objects old) and storing new objects into old objects (remembered table) require synchronization. In other words, grafting a threadsafe object memory onto Squeak would be a major project.

In contrast, for a significant subset of applications (servers) it is orders of magnitudes simpler to run several images in parallel. Those images don't stomp on each other's object memory, so there is absolutely no synchronization overhead. For stateful sessions, a front end can handle routing requests to the image which currently holds a session's state, stateless requests can be handled by any image.

Cheers, Hans-Martin

bryce＠kampjes.demon.co.uk

22 Oct 22 Oct

10:30 p.m.

Robert Withers writes:

...

My thinking is that getting the messaging working is the first step, followed by looking at synchronization problems, and then looking at what things like Exupery may offer to speed things up.

The example I gave of MacroTransforms is telling. Currently an #ifTrue: message is macro transformed into bytecodes that do the #ifTrue: inline. I have had to back that out so the #ifTrue: can be intercepted if the receiver is non-local. At runtime, it would be nice to see that if the receiver is in fact local, then some form of inlining could be used, otherwise intercept. Since this is runtime selected bytecodes, I thought of Exupery.

One option would be to just disable ifTrue: inlining using Klaus's code and wait for Exupery to solve the speed problem introduced. Full message inlining should be able to optimise the message sends out of ifTrue:. This optimisation is planned for Exupery 2.0.

I'm still working to get to 1.0 so waiting doesn't make sense if you need the speed now or in the next year or two.

Optimising ifTrue: implemented with message sends is a simple use of the dynamic inlining work pioneered by Urs Holzle in Self.

Bryce

Peter William Lount

23 Oct 23 Oct

3:27 a.m.

"In February, 2007 NVIDIA, the worldwide leader in programmable graphics processor technologies, launched CUDA, a C-Compiler and developer's kit that gives software developers access to the parallel processing power of the GPU through the standard language of C."

"Until recently, graphic cards' GPUs couldn't be used for applications such as password recovery. Older graphics chips could only perform floating-point calculations, and most cryptography algorithms require fixed-point mathematics. Today's chips can process fixed-point calculations. And with as much as 1.5 Gb of onboard video memory and up to 128 processing units, these powerful GPU chips are much more effective than CPUs in performing many of these calculations."

"Since high-end PC mother boards can work with four separate video cards, the future is bright for even faster ... applications."

Some applications have experienced a 25x speed up using a $150 graphics card's GPU.

http://www.net-security.org/secworld.php?id=5567

"NVIDIA® CUDA^(TM) technology is a fundamentally new computing architecture that enables the GPU to solve complex computational problems in consumer, business, and technical applications. CUDA (Compute Unified Device Architecture) technology gives computationally intensive applications access to the tremendous processing power of NVIDIA graphics processing units (GPUs) through a revolutionary new programming interface. Providing orders of magnitude more performance and simplifying software development by using the standard C language, CUDA technology enables developers to create innovative solutions for data-intensive problems. For advanced research and language development, CUDA includes a low level assembly language layer and driver interface." http://developer.nvidia.com/object/cuda.html

Hi,

How can Squeak leverage this? Certainly in the area of graphics. Which other areas?

Squeak for a GPU anyone?

What can be accomplished with 128 x 4 GPU processing units per cheap PC node?

All the best,

Peter

Jecel Assumpcao Jr

4:59 a.m.

Peter William Lount wrote:

...

How can Squeak leverage this? Certainly in the area of graphics. Which other areas?

Squeak for a GPU anyone?

What can be accomplished with 128 x 4 GPU processing units per cheap PC node?

One option is to wait for the Wheel of Hardware Reincarnation to crank through a couple of more steps giving us a huge number of processors, all alike. Then we are back to the subject of this thread :-)

http://www.cap-lore.com/Hardware/Wheel.html

-- Jecel

Peter William Lount

6:01 a.m.

Jecel Assumpcao Jr wrote:

...

One option is to wait for the Wheel of Hardware Reincarnation to crank through a couple of more steps giving us a huge number of processors, all alike. Then we are back to the subject of this thread :-)

http://www.cap-lore.com/Hardware/Wheel.html

Hi Jecel,

Sweet.

Fortunately the cycle is swinging back around with the Tile-64 (and Tile-N) core processors that are just now being released. Also Intel has a similar 80-core chip that they've showed off but isn't slated for production (quite yet). The general purpose highly connected chips using on chip networks to communicate are likely the way of the future. Intel will eventually produce X86-64 variants (and hopefully Itanium's) that have N-cores where N is 64 or larger - maybe sooner than we think.

The next steps of N-core and N-threading design for Squeak and Smalltalk are crucial.

Even the magical Erlang way of concurrency won't solve real world issues such as multiple processes contending for limited hardware resources. These need synchronization. No one answered Igor's point on this.

It would still be nice if someone who is supporting the Erlangisation (or is that Erlangization) of Smalltalk's processes to write up a complete description of what they are actually proposing. It makes debating it easier. Thanks.

All the best,

Peter

Jason Johnson

8:23 p.m.

On 10/23/07, Peter William Lount peter@smalltalk.org wrote:

...

Even the magical Erlang way of concurrency won't solve real world issues such as multiple processes contending for limited hardware resources. These need synchronization. No one answered Igor's point on this.

But they do deal with it: points of contention like this get their own process. When you open a file in Erlang a process is started to manage it. All reads and writes go through this process so you can have as many processes doing these read/writes as you want.

Rob Withers

7:52 a.m.

----- Original Message ----- From: bryce@kampjes.demon.co.uk To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Monday, October 22, 2007 1:30 PM Subject: Re: Multy-core CPUs

...

Robert Withers writes:

...
My thinking is that getting the messaging working is the first step, followed by looking at synchronization problems, and then looking at what things like Exupery may offer to speed things up.

The example I gave of MacroTransforms is telling. Currently an #ifTrue: message is macro transformed into bytecodes that do the #ifTrue: inline. I have had to back that out so the #ifTrue: can be intercepted if the receiver is non-local. At runtime, it would be nice to see that if the receiver is in fact local, then some form of inlining could be used, otherwise intercept. Since this is runtime selected bytecodes, I thought of Exupery.

One option would be to just disable ifTrue: inlining using Klaus's code and wait for Exupery to solve the speed problem introduced. Full message inlining should be able to optimise the message sends out of ifTrue:. This optimisation is planned for Exupery 2.0.

That would be awesome. I look forward to it.

cheers, Rob

...

I'm still working to get to 1.0 so waiting doesn't make sense if you need the speed now or in the next year or two.

Optimising ifTrue: implemented with message sends is a simple use of the dynamic inlining work pioneered by Urs Holzle in Self.

Bryce

Bergel, Alexandre

21 Oct 21 Oct

3:11 a.m.

...

This would only make things more complicated since then the primitives would have to start parallel native threads working on the same object memory. The problem with native threads is that the current object memory is not designed to work with multiple independent mutator threads. There are GC algorithms which work with parallel threads, but AFAIK they all have quite some overhead relative to the single-thread situation.

[...]

Dear Hans-Martin,

Thanks for your clear explanation. It is really instructive.

Regards, Alexandre

-- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.

Ralph Johnson

18 Oct 18 Oct

1:09 p.m.

On 10/17/07, Steve Wart steve.wart@gmail.com wrote:

...

I don't know if mapping Smalltalk processes to native threads is the way to go, given the pain I've seen in the Java and C# space.

Shared-memory parallelism has always been difficult. People claimed it was the language, the environment, or they needed better training. They always thought that with one more thing, they could "fix" shared-memory parallelism and make it usable. But Java has done a good job with providiing reasonable language primitives. There has been a lot of work on making threads efficient, and plenty of people have learned to write mutli-threaded Java. But it is still way too hard.

I think that shared-memory parallism, with explicit synchronization, is a bad idea. Transactional memory might be a solution, but it eliminates explicit synchronization. I think the most likely solution is to avoid shared memory altogether, and go with message passing. Erlang is a perfect example of this. We could take this approach in Smalltalk by making minimal images like Spoon, making images that are designed to be used by other images (angain, like Spoon), and then implementing our systms as hundreds or thousands of separate images. Image startup would have to be very fast. I think that this is more likely to be useful than rewriting garbage collectors to support parallelism.

-Ralph Johnson

Sebastian Sastre

4:18 p.m.

Hey this sounds a an interesting path to me. If we think in nature and it's design, that images could be analog to cells of a larger body. Fragmentation keep things simple without compromising scalability. Natural facts concluded that is more efficient not to develop few supercomplex brain cells but to develop zillions of a far simpler brain cells, this is, that are just complex enough, and make them able to setup in an inimaginable super complex network: a brain.

Other approach that also makes me conclude this is interesting is that we know that one object that is too smart smells bad. I mean it easily starts to become less flexible so less scalable in complexity, less intuitive (you have to learn more about how to use it), more to memorize, maintain, document, etc. So it is smarter but it could happen that it begins to become a bad deal because of beign too costly. Said so, if we think in those flexible mini images as objects, each one using a core we can scale enourmusly and almost trivially in this whole multicore thing and in a way we know it works.

Other interesting point is faul tolerance. If one of those images happen to pass a downtime (because a power faliure on the host where they where running or whatever reason) the system could happen to feel it somehow but not being in a complete faiure because there are other images to handle demand. A small (so efficient), well protected critical system can coordinate measures of contention for the "crisis" an hopefully the system never really makes feel it's own crisis to the users.

Again I found this is a tradeof about when to scale horizontally or vertically. For hardware, Intel and friends have scaled vertically (more bits and Hz for instance) for years as much as they where phisically able to do it. Now they reached a kind of barrier and started to scale horizontally (adding cores). Please don't fall in endless discussions, like the ones I saw out there, about comparing apples with bannanas because they are fruits but are not comparable. I mean it's about scaling but they are 2 different axis of a multidimensional scaling (complexity, load, performance, etc).

I'm thinking here as vertical being to make one squeak smarter to be capable to be trhead safe and horizontal to make one smart network of N squeaks.

Sometimes one choice will be a good business and sometimes it will be the other. I feel like the horizontal time has come. If that's true, to invest (time, $, effort) now in vertical scaling could happen to be have a lower cost/benefit rate if compared to the results of the investiment of horizontal scaling.

The truth is that this is all speculative and I don't know. But I do trust in nature.

Cheers,

Sebastian Sastre

...

-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Ralph Johnson Enviado el: Jueves, 18 de Octubre de 2007 08:09 Para: The general-purpose Squeak developers list Asunto: Re: Multy-core CPUs

On 10/17/07, Steve Wart steve.wart@gmail.com wrote:

...
I don't know if mapping Smalltalk processes to native

threads is the

...
way to go, given the pain I've seen in the Java and C# space.

Shared-memory parallelism has always been difficult. People claimed it was the language, the environment, or they needed better training. They always thought that with one more thing, they could "fix" shared-memory parallelism and make it usable. But Java has done a good job with providiing reasonable language primitives. There has been a lot of work on making threads efficient, and plenty of people have learned to write mutli-threaded Java. But it is still way too hard.

I think that shared-memory parallism, with explicit synchronization, is a bad idea. Transactional memory might be a solution, but it eliminates explicit synchronization. I think the most likely solution is to avoid shared memory altogether, and go with message passing. Erlang is a perfect example of this. We could take this approach in Smalltalk by making minimal images like Spoon, making images that are designed to be used by other images (angain, like Spoon), and then implementing our systms as hundreds or thousands of separate images. Image startup would have to be very fast. I think that this is more likely to be useful than rewriting garbage collectors to support parallelism.

-Ralph Johnson

Igor Stasenko

5:36 p.m.

On 18/10/2007, Sebastian Sastre ssastre@seaswork.com wrote:

...

Hey this sounds a an interesting path to me. If we think in nature and it's design, that images could be analog to cells of a larger body. Fragmentation keep things simple without compromising scalability. Natural facts concluded that is more efficient not to develop few supercomplex brain cells but to develop zillions of a far simpler brain cells, this is, that are just complex enough, and make them able to setup in an inimaginable super complex network: a brain.

Other approach that also makes me conclude this is interesting is that we know that one object that is too smart smells bad. I mean it easily starts to become less flexible so less scalable in complexity, less intuitive (you have to learn more about how to use it), more to memorize, maintain, document, etc. So it is smarter but it could happen that it begins to become a bad deal because of beign too costly. Said so, if we think in those flexible mini images as objects, each one using a core we can scale enourmusly and almost trivially in this whole multicore thing and in a way we know it works.

Other interesting point is faul tolerance. If one of those images happen to pass a downtime (because a power faliure on the host where they where running or whatever reason) the system could happen to feel it somehow but not being in a complete faiure because there are other images to handle demand. A small (so efficient), well protected critical system can coordinate measures of contention for the "crisis" an hopefully the system never really makes feel it's own crisis to the users.

Again I found this is a tradeof about when to scale horizontally or vertically. For hardware, Intel and friends have scaled vertically (more bits and Hz for instance) for years as much as they where phisically able to do it. Now they reached a kind of barrier and started to scale horizontally (adding cores). Please don't fall in endless discussions, like the ones I saw out there, about comparing apples with bannanas because they are fruits but are not comparable. I mean it's about scaling but they are 2 different axis of a multidimensional scaling (complexity, load, performance, etc).

I'm thinking here as vertical being to make one squeak smarter to be capable to be trhead safe and horizontal to make one smart network of N squeaks.

Sometimes one choice will be a good business and sometimes it will be the other. I feel like the horizontal time has come. If that's true, to invest (time, $, effort) now in vertical scaling could happen to be have a lower cost/benefit rate if compared to the results of the investiment of horizontal scaling.

The truth is that this is all speculative and I don't know. But I do trust in nature.

I often thought myself about making an ST 'vertical' (by making it multithreaded with single shared memory). Now, after reading this post i think your approach is much better. Then i think, it would be good to make some steps towards supporting multiple images by single executable: - make single executable capable of running a number of images in separate native threads. This will save memory resources and also could help in making inter-image messaging not so costly.

...

    Cheers,
Sebastian Sastre

...
-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Ralph Johnson Enviado el: Jueves, 18 de Octubre de 2007 08:09 Para: The general-purpose Squeak developers list Asunto: Re: Multy-core CPUs

On 10/17/07, Steve Wart steve.wart@gmail.com wrote:

...
I don't know if mapping Smalltalk processes to native

threads is the

...
way to go, given the pain I've seen in the Java and C# space.

Shared-memory parallelism has always been difficult. People claimed it was the language, the environment, or they needed better training. They always thought that with one more thing, they could "fix" shared-memory parallelism and make it usable. But Java has done a good job with providiing reasonable language primitives. There has been a lot of work on making threads efficient, and plenty of people have learned to write mutli-threaded Java. But it is still way too hard.

I think that shared-memory parallism, with explicit synchronization, is a bad idea. Transactional memory might be a solution, but it eliminates explicit synchronization. I think the most likely solution is to avoid shared memory altogether, and go with message passing. Erlang is a perfect example of this. We could take this approach in Smalltalk by making minimal images like Spoon, making images that are designed to be used by other images (angain, like Spoon), and then implementing our systms as hundreds or thousands of separate images. Image startup would have to be very fast. I think that this is more likely to be useful than rewriting garbage collectors to support parallelism.

-Ralph Johnson

-- Best regards, Igor Stasenko AKA sig.

Peter William Lount

7:28 p.m.

Hi,

What Ralph and the others have said said is on target in many respects. The Erlang and Smalltalk model of messaging have a lot to be desired and when combined may provide a powerful and compelling computing platform.

However, having just worked on a very large multi-threaded commercial application in live production it is very clear that even single native threaded Smalltalk applications have very nasty concurrency problems. Our team of seven was able to solve many of the worst of the these concurrency problems in a year and a half and improved the reliability of this important production application.

It's important that concurrency be taken into account at all levels of an application's design, from the lowest levels of the virtual machine through the end user experience (which is where concurrency on multiple cores can really make a significant paradigm adjusting difference if done well).

Of the lessons learned from this complex real world application was that almost ALL of the concurrency problems you have with multiple cores running N native threads you have with a single core running one native thread. The implication of this is that the proposed solutions of running multiple images with one native thread each won't really save you from concurrency problems, as each image on it's own can have serious concurrency issues.

When you have a single native thread running, say, ten to twenty Smalltalk green threads (aka Smalltalk processes) the concurrency problems can be a real nightmare to contemplate. Comprehension of what is happening is exasperated by the limited debugging information captured at runtime crash dumps.

Diagnosing the real world concurrency problems in a live production application revealed that it's not an easy problem even with one native thread running! Additional native threads really wouldn't have changed much (assuming that the VM can properly handle GC and other issues as is done in Smalltalk MT) with the concurrency problems we were dealing with. This includes all the nasty problems with the standard class library collection classes.

It is for the above reasons that I support many approaches be implemented so that we can find out the best one(s) for various application domains.

It's unlikely that there is a one solution fits all needs type of paradigm.

1) With existing Smalltalks (and other languages) it's relatively easy to support one image per native "process" (aka task) with their own separate memory spaces. This seems to be trivial for squeak. The main thing that is needed is an effective and appropriate distributed object-messaging system via TCP/IP. This also has the advantage of easily distributing the image-nodes across multiple server nodes on a network.

I propose that any distributed object messaging system that is developed for inter-image communication meet a wide range of criteria and application needs before being considered as a part of the upcoming next Smalltalk Standard. These criteria would need to be elucidated from the literature and the needs of members of the Smalltalk community and their clients.

2) It's been mentioned that it would be straightforward to have squeak start up multiple copies of the image (or even multiple different images) in one process (task) memory space with each image having it's own native thread and keeping it's object table and memory separate within the larger memory space. This sounds like a very nice approach. This is very likely practical for multi-core cpus such as the N-core (where N is 2, 4, 8, 64) cpus from AMD, Intel, and Tilera.

3) A single image running on N-cores with M-native threads (M may be larger than N) is the full generalization of course.

This may be the best way to take advantage of paradigm shaking chips such as the Tile64 processor from Tilera.

However, we may need to rethink the entire architecture of the Smalltalk virtual machine notions since the Tile 64 chip has capabilities that radically alter the paradigm. Messages between processor nodes take less time to pass between nodes then the same amount of data takes to be written into memory. Think about that. It offers a new paradigm unavailable to other N-Core processors (at this current time).

I believe that we, the Smalltalk community, need to have Smalltalk capable of being deployed into the fully generalized scenario running on N-cores with M-native threads and with O-images in one memory space being able to communicate with P other nodes. It is us that need to do the hard work of providing systems that work correctly in the face of the multi-core-multi-threaded reality that is now upon us. If we run away from the hard work the competitors who tackle it and provide workable solutions will prevail.

Food for thought.

All the best,

Peter William Lount Smalltalk.org Editor peter@smalltalk.org

Jason Johnson

9:27 p.m.

On 10/18/07, Peter William Lount peter@smalltalk.org wrote:

...

Hi,

However, having just worked on a very large multi-threaded commercial application in live production it is very clear that even single native threaded Smalltalk applications have very nasty concurrency problems.

Yes, and these problems are exaggerated by (what I would call) the old way of doing threaded programming, i.e. shared state, fine-grained locking.

...

It's important that concurrency be taken into account at all levels of an application's design, from the lowest levels of the virtual machine through the end user experience (which is where concurrency on multiple cores can really make a significant paradigm adjusting difference if done well).

But if we truly to the n-core (n being 100 and above) world to improve computation speed, and it looks as though we must, then this simply isn't realistic for most programmers. No more then manual memory management was realistic for large applications.

...

Of the lessons learned from this complex real world application was that almost ALL of the concurrency problems you have with multiple cores running N native threads you have with a single core running one native thread.

Depending on when execution can be interrupted, you have exactly the same issues.

...

When you have a single native thread running, say, ten to twenty Smalltalk green threads (aka Smalltalk processes) the concurrency problems can be a real nightmare to contemplate. Comprehension of what is happening is exasperated by the limited debugging information captured at runtime crash dumps.

But this depends largely on the model. If you go away from the old, tried and untrue method of fine-grained locking then debugging gets much easier. It's no problem at all, for example, in Erlang. Sometimes when something is really really hard to do, it is a sign that we are going about it the wrong way.

...

It is for the above reasons that I support many approaches be implemented so that we can find out the best one(s) for various application domains.

We know from long experience what fine-grained is like. At least one STM implementation is out there to try, and I believe the actor model Erlang uses is either out there, or easy to set up.

...

It's unlikely that there is a one solution fits all needs type of paradigm.

No, but we can get the 99% like Garbage collection has.

...

It's been mentioned that it would be straightforward to have squeak

start up multiple copies of the image (or even multiple different images) in one process (task) memory space with each image having it's own native thread and keeping it's object table and memory separate within the larger memory space. This sounds like a very nice approach. This is very likely practical for multi-core cpus such as the N-core (where N is 2, 4, 8, 64) cpus from AMD, Intel, and Tilera.

A single image running on N-cores with M-native threads (M may be

larger than N) is the full generalization of course.

This may be the best way to take advantage of paradigm shaking chips such as the Tile64 processor from Tilera.

If you mean by this a form of shared-state fine-grained programming then I disagree whole heartedly. We have long experience with fine-grained in C++, Java, now C#, Smalltalk, on and on. It just can't be the path to the future.

Smalltalk needs to keep inventing the future. Chasing this primitive form of threading would put us firmly behind languages like C++ that have been doing it this way for decades.

...

However, we may need to rethink the entire architecture of the Smalltalk virtual machine notions since the Tile 64 chip has capabilities that radically alter the paradigm. Messages between processor nodes take less time to pass between nodes then the same amount of data takes to be written into memory. Think about that. It offers a new paradigm unavailable to other N-Core processors (at this current time).

I wonder how Erlang will run on these machines.

Peter William Lount

19 Oct 19 Oct

9 a.m.

Hi Jason,

...

...
However, having just worked on a very large multi-threaded commercial application in live production it is very clear that even single native threaded Smalltalk applications have very nasty concurrency problems.

Yes, and these problems are exaggerated by (what I would call) the old way of doing threaded programming, i.e. shared state, fine-grained locking.

That may be the case from your - and others - perspective - and I have empathy for it -, however they are still valid techniques and others, such as myself, don't share your perspective.

Smalltalk should let people - the (educated) users - choose the mechanism of concurrency, not dictate it. In my humble opinion.

...

...
It's important that concurrency be taken into account at all levels of an application's design, from the lowest levels of the virtual machine through the end user experience (which is where concurrency on multiple cores can really make a significant paradigm adjusting difference if done well).

But if we truly to the n-core (n being 100 and above) world to improve computation speed, and it looks as though we must, then this simply isn't realistic for most programmers. No more then manual memory management was realistic for large applications.

The reality of processor designs like the Tile 64 require us to have all available techniques at our disposal.

...

...
Of the lessons learned from this complex real world application was that almost ALL of the concurrency problems you have with multiple cores running N native threads you have with a single core running one native thread.

Depending on when execution can be interrupted, you have exactly the same issues.

Exactly my point. Thus the solutions proposed as being "simplier" are just an illusion. They might be simplier in some cases but when you really need complex concurrency controls sometimes you need the other "dirtier" techniques at your disposal. Smalltalk is supposed to be a computer language with general power to control the computer and access it's true power and potential. Limiting the solution space by only implementing a limited set of concurrency primitives makes no sense. You'll just give the market to other lesser systems like Erlang and Java type systems.

...

...
When you have a single native thread running, say, ten to twenty Smalltalk green threads (aka Smalltalk processes) the concurrency problems can be a real nightmare to contemplate. Comprehension of what is happening is exasperated by the limited debugging information captured at runtime crash dumps.

But this depends largely on the model. If you go away from the old, tried and untrue method of fine-grained locking then debugging gets much easier. It's no problem at all, for example, in Erlang. Sometimes when something is really really hard to do, it is a sign that we are going about it the wrong way.

Yes, the model is always important.

Yes, sometimes that is a sign for improvement. Maybe I simply need detailed specifics of what you are talking about, however, if you really want Smalltalk to be general purpose - as I do - then it needs to cover the full domain of techniques for concurrency!

...

...
It is for the above reasons that I support many approaches be implemented so that we can find out the best one(s) for various application domains.

We know from long experience what fine-grained is like. At least one STM implementation is out there to try, and I believe the actor model Erlang uses is either out there, or easy to set up.

Yes, we do. Sometimes it is what is needed.

For example, when building hard core operating systems.

...

...
It's unlikely that there is a one solution fits all needs type of paradigm.

No, but we can get the 99% like Garbage collection has.

Not really.

...

...

It's been mentioned that it would be straightforward to have squeak

start up multiple copies of the image (or even multiple different images) in one process (task) memory space with each image having it's own native thread and keeping it's object table and memory separate within the larger memory space. This sounds like a very nice approach. This is very likely practical for multi-core cpus such as the N-core (where N is 2, 4, 8, 64) cpus from AMD, Intel, and Tilera.

A single image running on N-cores with M-native threads (M may be

larger than N) is the full generalization of course.

This may be the best way to take advantage of paradigm shaking chips such as the Tile64 processor from Tilera.

If you mean by this a form of shared-state fine-grained programming then I disagree whole heartedly. We have long experience with fine-grained in C++, Java, now C#, Smalltalk, on and on. It just can't be the path to the future.

There are many paths. I'm excited about the path that you are forging. All I ask is that you don't make that the only path to travel for people using Smalltalk.

...

Smalltalk needs to keep inventing the future. Chasing this primitive form of threading would put us firmly behind languages like C++ that have been doing it this way for decades.

While I support Smalltalk inventing the future, keeping it from supporting valid concurrency techniques is ignoring the future (and the past) of what works!

...

...
However, we may need to rethink the entire architecture of the Smalltalk virtual machine notions since the Tile 64 chip has capabilities that radically alter the paradigm. Messages between processor nodes take less time to pass between nodes then the same amount of data takes to be written into memory. Think about that. It offers a new paradigm unavailable to other N-Core processors (at this current time).

I wonder how Erlang will run on these machines.

I do as well.

All the best,

Peter

Jason Johnson

10:20 p.m.

On 10/19/07, Peter William Lount peter@smalltalk.org wrote:

...

That may be the case from your - and others - perspective - and I have empathy for it -, however they are still valid techniques and others, such as myself, don't share your perspective.

Sure, just as manual memory management is still valid and needed at the lowest levels of programming. It's just not valid in most applications.

...

Smalltalk should let people - the (educated) users - choose the mechanism of concurrency, not dictate it. In my humble opinion.

But herein lies the problem. As discussed in the previous thread I liked to; adding actor style message passing should be relatively easy. Adding state transactional memory is doable. Making the Squeak VM fully multi-threaded (natively) is going to be a lot of pain and hard to get right. Just ask the Java VM team.

The pay back of adding this obsolete (except in the lowest level cases) method of dealing with threading just isn't going to be worth the pain to implement it.

...

The reality of processor designs like the Tile 64 require us to have all available techniques at our disposal.

Why?

...

Exactly my point. Thus the solutions proposed as being "simplier" are just an illusion.

? They are unquestionably simpler to the programmer who is using them (which is what I meant).

...

They might be simplier in some cases but when you really need complex concurrency controls sometimes you need the other "dirtier" techniques at your disposal.

This is like saying that Smalltalk is wrong to not expose manual memory management to you for when you need to get "down and dirty". It's simply not the case. You move to a higher level, just as we do with all abstractions.

...

Smalltalk is supposed to be a computer language with general power to control the computer and access it's true power and potential. Limiting the solution space by only implementing a limited set of concurrency primitives makes no sense. You'll just give the market to other lesser systems like Erlang and Java type systems.

This last sentence is quite odd, and to be frank not well reasoned at all.

First of all Erlang is not lesser, it is in fact currently the leader in this area. It's funny though, that you suggest we would "give the market over" to Erlang, since Erlang supports precisely *one* form of concurrency: share-nothing message passing. Erlang can run in multiple threads, but only the interpreter does that, and it's transparent to the processes running in the VM.

Second of all, do you seriously think adding fine-grained threading to Smalltalk automatically will cause it to take over the market? Of course not, everyone would just say "what? You *just now* got that?", except Erlang who would simply laugh at a language who put in so much effort to add a feature that gets less relevant all the time.

Ironically, the fact of the matter is: the languages that make threading *simpler to implementers* are going to be the ones who win in the apparently coming multi-core world. Just ask Tim Sweeny.

...

For example, when building hard core operating systems.

If you want to build a hard core operating system in Smalltalk you have other more pressing issues to deal with then how threading is accomplished. And actually there has been some work done in operating systems that do not support this silly pthreads module we have today, but use something closer to the Erlang model. It's interesting work, but sadly one still can't get much traction with an OS other then Windows, Mac or a Unix variant.

...

Not really.

Aren't you the one who always requests well thought out arguments? :) I really don't see what it is you think you lose not having this old, out dated fine-grained threading model.

...

There are many paths. I'm excited about the path that you are forging. All I ask is that you don't make that the only path to travel for people using Smalltalk.

Well, at the moment I'm forging nothing, only stating what I know of the situation. At some later point I do intend to look at what's required to make this happen in Squeak, but I have some other more pressing issues for the present.

...

While I support Smalltalk inventing the future, keeping it from supporting valid concurrency techniques is ignoring the future (and the past) of what works!

We have very different definitions for "works". Here you are using it the same way someone would use for <insert crappy programming language>. It works in the same way you can paint a house with a tooth brush.

Peter William Lount

21 Oct 21 Oct

5:08 a.m.

Hi Jason,

...

...
That may be the case from your - and others - perspective - and I have empathy for it -, however they are still valid techniques and others, such as myself, don't share your perspective.

Sure, just as manual memory management is still valid and needed at the lowest levels of programming. It's just not valid in most applications.

I've not yet seen any serious discussion of the case for your point of view which bridges the gap of complexity in concurrency as automatic memory management magically does. Please illuminate us with specific and complete details of your proposal for such a breakthrough in concurrency complexity.

...

Making the Squeak VM fully multi-threaded (natively) is going to be a lot of pain and hard to get right. Just ask the Java VM team.

Then either the hard work needs to be done, or the VM needs to be completely rethought.

...

The pay back of adding this obsolete (except in the lowest level cases) method of dealing with threading just isn't going to be worth the pain to implement it.

What are you going on about? What techniques are you saying are obsolete exactly? How are they obsolete?

...

...
The reality of processor designs like the Tile 64 require us to have all available techniques at our disposal.

Why?

Why? 64 processors on a single chip - with 128 coming next year and 1024 planned - that's why.

With that many processors on a single core it's important that systems and applications run smoothly taking advantage of all the opportunities for parallelism. This has many implications, some of which work better on one method of concurrency than on another. One size of shoe doesn't fit all solutions.

...

...
Exactly my point. Thus the solutions proposed as being "simplier" are just an illusion.

? They are unquestionably simpler to the programmer who is using them (which is what I meant).

You've missed the point. Even the simplest of concurrency methods proposed so far by people in the Squeak thread lead to the most complex concurrency control error scenarios. That's one of the points. Another is that the simplest of concurrency models can't handle all the scenarios.

As asked above please describe in detail and completely the proposed "simple" approach to concurrency that is being proposed. Links to appropriate descriptions if it exist would also be fine (unless it contains too much extraneous text).

...

...
They might be simplier in some cases but when you really need complex concurrency controls sometimes you need the other "dirtier" techniques at your disposal.

This is like saying that Smalltalk is wrong to not expose manual memory management to you for when you need to get "down and dirty". It's simply not the case. You move to a higher level, just as we do with all abstractions.

Nonsense it's not like saying that at all.

Sometimes moving to a higher level abstraction isn't the solution. Sometimes moving laterally provides the insight for the solution. Often moving down to the lowest levels and rethinking how they work provides the solutions without higher levels of abstraction.

A case in point for clarity: Exokernels. They remove higher levels of abstraction so that we have access to the power of the real hardware.

"The idea behind exokernels is to force as few abstractions as possible on developers, enabling them to make as many decisions as possible about hardware abstractions." - http://en.wikipedia.org/wiki/Exokernel.

The problem with concurrency is that it's much more complex by orders of magnitude than garbage collection. Much more complex a beast, so much more so that the comparison breaks down.

Stephen Wolfram's work on Cellular Automata (page 27 of A New Kind of Science http://www.wolframscience.com/nksonline/page-27) proves, yes, proves that even (some) simple systems can generate results (i.e. behaviour) that is as complex as any generated by a complex system.

Wolfram states: "The picture [of a cellular automation rule 30] shows what happens when one starts with just one black cell and then applies this rule over and over again. And what one sees is something quite startling - and probably the single most surprising scientific discovery I have ever made. Rather than getting a simple regular pattern as we might expect, the cellular automation instead produces a pattern that seems extremely irregular and complex." Most of the rest of his book is exploring just how complex this behavior really is.

Often this feature of simple systems is just what we want to take advantage of. Certainly Smalltalk leverages the power of this simplicity with it's simple syntax. So if there is a way (or ways) to have simple concurrency that is effective I'm all for it.

However, there is a dark side to Stephen Wolfram's discovery as well that needs addressing and that I'm attempting to point out here. The dark side is that simple systems can generate complex results. Complex results (beyond comprehension) is just want we don't want when we enter the world of concurrency. The rub is that there isn't anyway to avoid the complex results as far as I can see for simple systems can generate complexity as complex as complex systems.

I fear that the solutions space isn't as straightforward as having a set of simplified concurrency primitives as proposed by some for Smalltalk. The reality is harsher than you think. The solution space requires more than a simple set of concurrency primitives.

...

...
Smalltalk is supposed to be a computer language with general power to control the computer and access it's true power and potential. Limiting the solution space by only implementing a limited set of concurrency primitives makes no sense. You'll just give the market to other lesser systems like Erlang and Java type systems.

This last sentence is quite odd, and to be frank not well reasoned at all.

Thank you for calling it "odd". That's what happens when you think different, at first people think it odd. I often encourage people to think different as Apple does in their marketing of a few years ago.

As for how it's reasoned, yes, it is well reasoned even if you don't get it at first or even if I wasn't clear about it. Let me attempt to clarify the reasoning for you.

...

First of all Erlang is not lesser, it is in fact currently the leader in this area.

How is that?

...

It's funny though, that you suggest we would "give the market over" to Erlang, since Erlang supports precisely *one* form of concurrency: share-nothing message passing.

Yes, but Erlang is a purely function non-object-oriented non-keyword-message passing language.

While it has a form of message passing it's not the same as Smalltalk's. It's simply passing parameters to functions that run in separate green or native threads.

Yes it is impressive what they have accomplished, but it isn't the be all and end all.

...

Erlang can run in multiple threads, but only the interpreter does that, and it's transparent to the processes running in the VM.

Every system needs improvement.

...

Second of all, do you seriously think adding fine-grained threading to Smalltalk automatically will cause it to take over the market?

No I don't think that it will "cause" Smalltalk to "automatically take over the market". Of course not! Nor did I imply that or intend to imply that.

I simply think that having all the tools at our disposal is important to maintaining and growing market share.

...

Ironically, the fact of the matter is: the languages that make threading *simpler to implementers* are going to be the ones who win in the apparently coming multi-core world.

"Simpler to implement" concurrency leads to just as difficult to manage software systems as more complex well thought out concurrency. In fact, I think, that making it simplistic will lead many programmers to implement software that is impossible to debug without enormous efforts. The problem is that even the simplest concurrency leads to the nastiest and most complex bugs in software.

For example what could be simpler than Smalltalk's forking of blocks of code? It sure seems simple doesn't it; just a simple message to a block of code: "[ ... ] fork". In many cases it is simple and no harm is done as the code will execute and everything will be good and consistent afterwards. The problems begin when you "fork" code that wasn't designed to be forked and run concurrently. Then all hell can break loose and boom your program crashes unexpectedly with a mystery. It gets even worse when it only happens occasionally - try figuring it out then.

I just working on fixing a "fork" happy approach in a major Smalltalk production application. In the end we took out many of the forking statements or fixed them in other ways. That application was running in a Smalltalk with ONE native thread for all the Smalltalk processes (aka green lightweight threads). So much for simple threading being easier.

Part of the problem was the programmers being "fork" happy. Part of the problem is that the Smalltalk class library isn't designed to be thread safe. Very few class libraries are. Most of the problem is that the concurrency simply wasn't thought out well.

I don't see how you can have a simple concurrency threading model solve the problems of when and how to use concurrency properly to avoid the many pitfalls of threading. If you can see that please illuminate it for the rest of us.

...

Just ask Tim Sweeny.

Do you mean Tim Sweeney the game developer? http://en.wikipedia.org/wiki/Tim_Sweeney_(game_developer)

Alright even though I don't know Tim I'll take the bait and see where it goes, Tim Sweeney (or Sweeny) what do you think? (If someone who knows him would be kind enough to pass this thread on to him or post his thoughts on this topic that would be great - thanks).

...

...
For example, when building hard core operating systems.

If you want to build a hard core operating system in Smalltalk you have other more pressing issues to deal with then how threading is accomplished.

Yes, there are many issues in implementing an operating system in a language such as Smalltalk. In exploring these issues ZokuScript was born. Real native processes with protected memory spaces and multiple threads are just one of these important issues. Performance is another.

Threading including native threading on one core or N cores (where N can be large) under existing operating systems is very important to the future of Smalltalk.

...

I really don't see what it is you think you lose not having this old, out dated fine-grained threading model.

For clarity purposes, please define in detail what you mean when you use the phrase "fine-grained threading model" so that we can make sure that we are on the same page.

...

...
There are many paths. I'm excited about the path that you are forging. All I ask is that you don't make that the only path to travel for people using Smalltalk.

Well, at the moment I'm forging nothing, only stating what I know of the situation. At some later point I do intend to look at what's required to make this happen in Squeak, but I have some other more pressing issues for the present.

...
While I support Smalltalk inventing the future, keeping it from supporting valid concurrency techniques is ignoring the future (and the past) of what works!

We have very different definitions for "works". Here you are using it the same way someone would use for <insert crappy programming language>. It works in the same way you can paint a house with a tooth brush.

You seem to think that there is some magical breakthrough in the world of concurrency that is on par with the magic of automatic garbage collection. I'd sure love to know what that is and how it avoids the pitfalls with even simple concurrency models and issues that occur in real world projects. If you could, please describe in full detail and completely with real world examples. Thanks very much.

All the best,

Peter William Lount peter@smalltalk.org Smalltalk.org Editor

Ralph Johnson

5:41 a.m.

...

I've not yet seen any serious discussion of the case for your point of view which bridges the gap of complexity in concurrency as automatic memory management magically does. Please illuminate us with specific and complete details of your proposal for such a breakthrough in concurrency complexity.

Peter, Jason is not saying that eliminating shared memory will make concurrent programming as easy as automatic memory management. What he said is that, just like a system that mostly uses automatic memory management might use manual memory management in a few places, so a system that mostly uses message passing for concurrency might use threads and semaphores in a few places.

...

Making the Squeak VM fully multi-threaded (natively) is going to be a lot of pain and hard to get right. Just ask the Java VM team.

Then either the hard work needs to be done, or the VM needs to be completely rethought.

What Jason said was that, for any VM design, making the VM fully mutl-threaded is hard. It has nothing to do with Squeak or with the Squeak VM.

...

The pay back of adding this obsolete (except in the lowest level cases) method of dealing with threading just isn't going to be worth the pain to implement it.

What are you going on about? What techniques are you saying are obsolete exactly? How are they obsolete?

He is saying that shared memory parallel programming is obsolete. It doesn't scale. By the time we get to thousands of processors (which will be only a decade) then it won't work at all. Experience shows that it doesn't work very well know even when hardware can support it, because it is just too hard to make correct pograms using that model.

Jason's point, which I agree with, is that programming with threads in shared memory and using semaphores (or monitors, or critical sections) to eliminate interference is a bad idea. Parallel programming with no shared memory, i.e. by having processes only communicate with shared memory, is much easier to program.

-Ralph

Peter William Lount

6:33 a.m.

Hi Ralph,

It's good to converse again with you. It's been many years.

...

...
I've not yet seen any serious discussion of the case for your point of view which bridges the gap of complexity in concurrency as automatic memory management magically does. Please illuminate us with specific and complete details of your proposal for such a breakthrough in concurrency complexity.

Peter, Jason is not saying that eliminating shared memory will make concurrent programming as easy as automatic memory management.

That's good for that makes no sense given real world experience with systems that don't use shared memory as a basis for their currency control.

...

What he said is that, just like a system that mostly uses automatic memory management might use manual memory management in a few places, so a system that mostly uses message passing for concurrency might use threads and semaphores in a few places.

Ok, that sounds nice and rosey but so far. Can someone please explain in full detail and completely how it would actually work? Thanks.

...

...
Making the Squeak VM fully multi-threaded (natively) is going to be a lot of pain and hard to get right. Just ask the Java VM team.

Then either the hard work needs to be done, or the VM needs to be completely rethought.

What Jason said was that, for any VM design, making the VM fully mutl-threaded is hard. It has nothing to do with Squeak or with the Squeak VM.

Yes, I'm clear about that. That's why I said that the hard work needs to be done by those of us who are knowledgeable and more experienced than the typical programmer using Smalltalk. We are supposed to be systems people aren't we? We are supposed to do the hard work so that others have an easier time aren't we? Of course if we can avoid the hard work then I'm all for it. However when it comes to concurrency control and program consistency it just isn't possible to avoid the hard work.

...

...
The pay back of adding this obsolete (except in the lowest level cases) method of dealing with threading just isn't going to be worth the pain to implement it.

What are you going on about? What techniques are you saying are obsolete exactly? How are they obsolete?

He is saying that shared memory parallel programming is obsolete. It doesn't scale. By the time we get to thousands of processors (which will be only a decade) then it won't work at all. Experience shows that it doesn't work very well know even when hardware can support it, because it is just too hard to make correct pograms using that model.

So, two processes that share a chunk of RAM memory across their protected memory spaces is obsolete in your view?

What about two or N light weight threads (aka Smalltalk processes) in one memory space sharing objects in that single memory space? Is that obsolete as well?

...

Jason's point, which I agree with, is that programming with threads in shared memory and using semaphores (or monitors, or critical sections) to eliminate interference is a bad idea.

Ok. I get that it's complex and that there are scaling issues with some of the techniques when N-core is very large. I don't see how it's a bad idea though - I don't see how it's any worse than the alternative that's being suggested.

...

Parallel programming with no shared memory, i.e. by having processes only communicate with shared memory, is much easier to program.

So you mean one thread per protected memory space? No light weight threads (since they are using shared memory by definition)? Not more than one Smalltalk process per protected memory space? Just one thread of execution for each operating system process/task? So if I want one hundred Smalltalk processes running in my application I will need one hundred operating system processes?

All objects are to be copied across the memory space boundaries via serialized objects or via references (for copying later or for return messages back to the originating node later)?

No "active" object running in it's own operating system process can respond to more than one inbound message at once? Since it only has one thread/Smalltalk process to avoid shared memory it must complete all the work that the current message send caused. What about deadlock avoidance in your model?

Maybe I'm misunderstanding your definitions but it seems to me that that is what is implied by what you are saying.

To ensure clarity on this complex topic please provide definitions and full explanations with examples. Please be very detailed. Thanks very much.

All the best,

Peter

Peter William Lount

6:53 a.m.

Hi Ralph,

Continued.

...

...
Parallel programming with no shared memory, i.e. by having processes only communicate with shared memory, is much easier to program.

Another way to express what I'm saying is the following. You can run multiple processes in either separate or one memory space. When running in one memory space the following applies.

Sure you could have N threads - native or green - each running ONE (Squeak) Smalltalk process running in one protected memory space as long as NONE of these threads share ANY memory between them. That means no objects being shared. That means each Smalltalk process is really it's own image running independently with only one user level Smalltalk process. Any communication between them with objects is serialized and transmitted in some manner that - oh dear - avoids using shared memory or a shared communications "buffer" between two or more threads. This is done even though sending messages via shared memory buffers in one protected memory space is very efficient or sending messages via a shared memory space when more than one protected memory space is in use is also very efficient.

It seems to me that as soon as you have more than one Smalltalk process running in a Smalltalk image you will have concurrency control issues - unless the forked off threads are trivial in nature. This fact is the reason that I don't share your optimism about your solution - for if you have more than one thread in accessing the same memory space - regardless if they are in the same protected memory space or across operating system processes - you've got the concurrency complexity and program data consistency issues occurring. How do you avoid that?

Please educate me so that I fully understand your proposed solution. Thank you.

All the best,

peter

tim Rowledge

7:25 a.m.

I'm sure I'm going to regret saying anything in this thread but what the hell...

...

Sure you could have N threads - native or green - each running ONE (Squeak) Smalltalk process running in one protected memory space as long as NONE of these threads share ANY memory between them.

I don't think that was actually specified but yes that would be one way of doing it.

...

That means no objects being shared.

No it doesn't. Objects can be shared via communication; that's what messages are for. It only happens that the Smalltalk systems we're mostly used to share by sharing memory.

...

That means each Smalltalk process is really it's own image running independently with only one user level Smalltalk process.

Works for me. Like an ecology of cells. Now where did I hear that analogy before? Oh, yes, I think it might have been either Alan Kay's doctoral thesis or an early talk on the idea of objects.

...

Any communication between them with objects is serialized and transmitted in some manner that - oh dear - avoids using shared memory or a shared communications "buffer" between two or more threads.

Transputer.

...

This is done even though sending messages via shared memory buffers in one protected memory space is very efficient or sending messages via a shared memory space when more than one protected memory space is in use is also very efficient.

Only applies in the limited sphere of the typical single processor systems we have got used to over the last few years. And for the purposes of any discussion about real parallelism, current 2/4/8 core systems are really a poor quality patch on the single cpu idea.

A key idea that people are going to have to get used to is, oddly enough, just like the one they had to get used to in order to accept late binding and dynamic languages. That is, to paraphrase a old quotation from Dan Ingalls (I'm pretty certain) "we've got past the stage of worrying about the number of computational cycles. we need to start worrying about the quality" Nominal inefficiency in having message passing across processes done by something 'slower' than shared memory may be a key to allowing massively spread computation. Trading the 'efficiency' of hand coded assembler for higher level languages made it more practical to build bigger programs that could do more. Trading the 'efficiency' of C for a decent late bound language allows more conceptually complex problems to be tackled. Trading the 'efficiency' of shared memory as a medium for sharing information with some other transmission method may be the lever to unlock really complex systems.

I think it's time people read up a bit on some computational history. Quite a bit of this stuff was worked on in the old days before the x86 started to dominate the world with a saurian single-core brutishness. Learn about Occam for example.

And Peter, before I "explain in full detail and completely how it would actually work" how about you explain in full detail and completely how you're going to fund my research? ;-)

tim PS another 'randomly chosen' sigline that manages to be eerily appropriate -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Try not to let implementation details sneak into design documents.

Peter William Lount

11:24 a.m.

tim Rowledge wrote:

...

I'm sure I'm going to regret saying anything in this thread but what the hell...

;--)

...

...
Sure you could have N threads - native or green - each running ONE (Squeak) Smalltalk process running in one protected memory space as long as NONE of these threads share ANY memory between them.

I don't think that was actually specified but yes that would be one way of doing it.

That is essentially the Erlang way, isn't it?

...

...
That means no objects being shared.

No it doesn't. Objects can be shared via communication; that's what messages are for. It only happens that the Smalltalk systems we're mostly used to share by sharing memory.

I meant what you said. ;--)

...

...
That means each Smalltalk process is really it's own image running independently with only one user level Smalltalk process.

Works for me. Like an ecology of cells. Now where did I hear that analogy before? Oh, yes, I think it might have been either Alan Kay's doctoral thesis or an early talk on the idea of objects.

It would be good to turn up a reference to that. Alan? Is your doctoral thesis online? If not could it be?

...

...
Any communication between them with objects is serialized and transmitted in some manner that - oh dear - avoids using shared memory or a shared communications "buffer" between two or more threads.

Transputer.

Yeah, those were sweet. The Tile64 seems similar in many ways.

...

...
This is done even though sending messages via shared memory buffers in one protected memory space is very efficient or sending messages via a shared memory space when more than one protected memory space is in use is also very efficient.

Only applies in the limited sphere of the typical single processor systems we have got used to over the last few years. And for the purposes of any discussion about real parallelism, current 2/4/8 core systems are really a poor quality patch on the single cpu idea.

Well not really and regardless of that using a shared memory buffer is an efficient manner of transport for a serialized object message (with or without complete objects or just references or a mix of the two) in the case of a system with N-cores and shared memory between protected memory spaces. When the images are in separate server nodes then using TCP/IP is needed. Using TCP/IP locally on one server box may or may not make sense and would likely be slower than a direct shared memory buffer although it's possible that the OS might optimize for that case.

...

A key idea that people are going to have to get used to is, oddly enough, just like the one they had to get used to in order to accept late binding and dynamic languages. That is, to paraphrase a old quotation from Dan Ingalls (I'm pretty certain) "we've got past the stage of worrying about the number of computational cycles. we need to start worrying about the quality" Nominal inefficiency in having message passing across processes done by something 'slower' than shared memory may be a key to allowing massively spread computation. Trading the 'efficiency' of hand coded assembler for higher level languages made it more practical to build bigger programs that could do more. Trading the 'efficiency' of C for a decent late bound language allows more conceptually complex problems to be tackled. Trading the 'efficiency' of shared memory as a medium for sharing information with some other transmission method may be the lever to unlock really complex systems.

I'm all for wisdom in computing, if I wasn't I'd not be using Smalltalk!

...

I think it's time people read up a bit on some computational history. Quite a bit of this stuff was worked on in the old days before the x86 started to dominate the world with a saurian single-core brutishness. Learn about Occam for example.

And Peter, before I "explain in full detail and completely how it would actually work" how about you explain in full detail and completely how you're going to fund my research? ;-)

;--) That was my next line for you!!! ;--)

Seriously, if I could I would.

Ok, so if you really are talking about a "strict" Erlang style model with ONE Smalltalk process per "image" space (whether or not they are in one protected memory space or many protected memory spaces) where objects are not shared with any other threads except by copying them over the "serialization wire" or by "reference" then I get what you are talking about.

However, you'll still end up with concurrency control issues and you've got an object version explosion problem occurring as well. How will you control concurrency problems with your simplified system? Is there a succinct description of the way that Erlang does it? Would that apply to Smalltalk?

If on the other hand, you are allowing more than one process per "image" then you don't gain anything at all since anytime you have more than one thread on a memory space you have all the concurrency problems that shared memory between operating system process give you.

You simplified concurrency system also dramatically alters the Smalltalk paradigm.

What have I missed or left out from what is being proposed or worked on?

Is this the approach that Cincom is using in their Visual Works system? They seem to not be embracing the notion of native threads. However it's also unlikely that they are embracing the notion of only ONE Smalltalk process per image either. At least in the sense of preventing a user from forking off additional Smalltalk processes in the same memory space. To truly support the above simplified model "forking" of blocks of code will need to slice off a copy of the entire image (some one did that for Squeak a while back). Then you have all the problems with object versions. Sigh.

One shoe doesn't fit all solutions.

All the best,

Peter

Jason Johnson

12:07 p.m.

On 10/21/07, Peter William Lount peter@smalltalk.org wrote:

...

tim Rowledge wrote:

Ok, so if you really are talking about a "strict" Erlang style model with ONE Smalltalk process per "image" space (whether or not they are in one protected memory space or many protected memory spaces) where objects are not shared with any other threads except by copying them over the "serialization wire" or by "reference" then I get what you are talking about.

That is a strange way of putting it. The fact is, Erlang has many processes per image. Many more then you could ever get as real processes or native threads (as a test I made a little program that spawned 64 *thousand* threads and passed messages between them on my laptop).

But with their model, process creation is extremely cheap. And since there is no sharing as far as the language is concerned, there is no need for locking to slow everything down.

Smalltalk can do this too. I think it needs a little work still, but I'm optimistic about what can be done here.

...

However, you'll still end up with concurrency control issues and you've got an object version explosion problem occurring as well. How will you control concurrency problems with your simplified system? Is there a succinct description of the way that Erlang does it? Would that apply to Smalltalk?

Much like how Smalltalk does it, as it turns out. That is, you don't have a version problem so much as you have "old" and "new". So when ready you send the "upgrade" message to the system and all new calls to the main functions of a process will be the new version. All currently running code will access the old code until it's completion, and all new code runs in the new space.

...

You simplified concurrency system also dramatically alters the Smalltalk paradigm.

The current paradigm is fine-grained locked/shared state. In my opinion and the opinion of many (probably most in fact, outside of the Java community) people who are more expert is this area then you or I, we *have* to move away from this paradigm.

...

Is this the approach that Cincom is using in their Visual Works system? They seem to not be embracing the notion of native threads.

Thank God. :)

...

However it's also unlikely that they are embracing the notion of only ONE Smalltalk process per image either.

If I understand you correctly, then I would suggest not to use the word "image" as this is confusing. Another way to put it would be "each process has it's own view of the world". And honestly, what is the problem you see with this?

Right now, if you run two separate images with only one thread or process, then you have two processes that each have their own set of objects in their own space interacting with each other.

Now we add a way for one image to send a message *between* images. Perhaps the VM can detect when we are trying to do this, but instead of complicating the default Smalltalk message sending subsystem, lets make it explicit with some special binary message:

Processes at: 'value computer' ! computeValue.

Now we have the ability to send messages locally within a process, and a way of freely sending between processes. No locking and the problems associated with locking.

So, now what is stopping us from moving this separate process *inside the same image*? If you fork a process and he starts making objects, no other processes have references to those objects. No shared state issue there. This part could work right now today with no changes to the VM.

The only issue I can think of are globals, the most obvious being class side variables. Note that even classes themselves are not an issue because without class side variables, they are effect free (well, obviously basicNew would have to be looked at).

But I think this issue is solvable. The VM could take a "copy on write" approach on classes/globals. That is, a class should be side effect free (to itself, i.e. it's the same after every call), so let all processes share the memory space where meta-class objects live. But as soon as any process tries to modify the class in some way (literally, it would be the class modifying itself), he gets his own copy. Processes must not see changes made by other processes, so a modification to a global class is a "local only" change.

Of course the only big thing left would be; what happens when we add a new class. But Erlang has had success with the old/new space approach, and what Smalltalk has now is very similar.

Igor Stasenko

2:35 p.m.

On 21/10/2007, Jason Johnson jason.johnson.081@gmail.com wrote:

...

On 10/21/07, Peter William Lount peter@smalltalk.org wrote:

...
tim Rowledge wrote:

Ok, so if you really are talking about a "strict" Erlang style model with ONE Smalltalk process per "image" space (whether or not they are in one protected memory space or many protected memory spaces) where objects are not shared with any other threads except by copying them over the "serialization wire" or by "reference" then I get what you are talking about.

That is a strange way of putting it. The fact is, Erlang has many processes per image. Many more then you could ever get as real processes or native threads (as a test I made a little program that spawned 64 *thousand* threads and passed messages between them on my laptop).

But with their model, process creation is extremely cheap. And since there is no sharing as far as the language is concerned, there is no need for locking to slow everything down.

Smalltalk can do this too. I think it needs a little work still, but I'm optimistic about what can be done here.

...
However, you'll still end up with concurrency control issues and you've got an object version explosion problem occurring as well. How will you control concurrency problems with your simplified system? Is there a succinct description of the way that Erlang does it? Would that apply to Smalltalk?

Much like how Smalltalk does it, as it turns out. That is, you don't have a version problem so much as you have "old" and "new". So when ready you send the "upgrade" message to the system and all new calls to the main functions of a process will be the new version. All currently running code will access the old code until it's completion, and all new code runs in the new space.

...
You simplified concurrency system also dramatically alters the Smalltalk paradigm.

The current paradigm is fine-grained locked/shared state. In my opinion and the opinion of many (probably most in fact, outside of the Java community) people who are more expert is this area then you or I, we *have* to move away from this paradigm.

...
Is this the approach that Cincom is using in their Visual Works system? They seem to not be embracing the notion of native threads.

Thank God. :)

...
However it's also unlikely that they are embracing the notion of only ONE Smalltalk process per image either.

If I understand you correctly, then I would suggest not to use the word "image" as this is confusing. Another way to put it would be "each process has it's own view of the world". And honestly, what is the problem you see with this?

Right now, if you run two separate images with only one thread or process, then you have two processes that each have their own set of objects in their own space interacting with each other.

Now we add a way for one image to send a message *between* images. Perhaps the VM can detect when we are trying to do this, but instead of complicating the default Smalltalk message sending subsystem, lets make it explicit with some special binary message:

Processes at: 'value computer' ! computeValue.

Now we have the ability to send messages locally within a process, and a way of freely sending between processes. No locking and the problems associated with locking.

So, now what is stopping us from moving this separate process *inside the same image*? If you fork a process and he starts making objects, no other processes have references to those objects. No shared state issue there. This part could work right now today with no changes to the VM.

The only issue I can think of are globals, the most obvious being class side variables. Note that even classes themselves are not an issue because without class side variables, they are effect free (well, obviously basicNew would have to be looked at).

But I think this issue is solvable. The VM could take a "copy on write" approach on classes/globals. That is, a class should be side effect free (to itself, i.e. it's the same after every call), so let all processes share the memory space where meta-class objects live. But as soon as any process tries to modify the class in some way (literally, it would be the class modifying itself), he gets his own copy. Processes must not see changes made by other processes, so a modification to a global class is a "local only" change.

I don't think this have a sense. Classes should be treated as regular objects (everything is an object - remember?), and due to that fact you can't make preferences over different kind of objects. Class objects and instance objects are the same: objects holding a mutable state. A method dictionary is a mutable state of class, so changing it should affect all objects of given class immediately.

...

Of course the only big thing left would be; what happens when we add a new class. But Erlang has had success with the old/new space approach, and what Smalltalk has now is very similar.

And btw, i starting doubt about 'no sharing at all' paradigm. Just in one of the links you given about Erlang, one of the readers commented: (http://armstrongonsoftware.blogspot.com/2006/08/concurrency-is-easy.html)

--- Anonymous said... One of your premises is obviuosly wrong:

We don't have shared memory. I have my memory, you have yours, we have two brains, one each, they are not joined together.

The premise that the above statement is based on - is that we are not internally concurrent. The earlier statements in your blog entry say that we are. The actuality is that all of our individual (i.e. internal) processes are concurrent and share memory. Even including our main conscious "thinking" process.

It is our interaction between individuals that doesn't always share memeory and even here we do share memory - many examples (dairies, libaraies, system processes, etc) - in various ways. ---

There an obvious evidence of a system which performs in parallel by having a single shared state - human brain. Do we experience heavy 'locking' or 'concurrency' problems? Personally, I'm not :) And if so, then why sharing state should be considered evil?

Also, in fact, you can't eliminate using shared state at all. There can be only a ways how to delegate shared state management to outer context (such as OS, or VM e.t.c). A 'fine-grained' locking is nothing more than manual controlling of shared state made by developer. Try implement Erlang (or any other system which uses parallel computations and having a shared state) without locks and semaphores. There are always something needed to manage the locking, even if it hidden from your eyes. So, tell me, what the difference between manually controlling it, or 'magically', like Erlang does?

A proposed a message passing between different OS processes(squeak images) is nothing more than letting OS deal with shared state problems (locks/queing e.t.c).. naively thinking that we are get rid of shared state concurrency. Are we really do?

-- Best regards, Igor Stasenko AKA sig.

Jason Johnson

5:26 p.m.

On 10/21/07, Igor Stasenko siguctua@gmail.com wrote:

...

I don't think this have a sense. Classes should be treated as regular objects (everything is an object - remember?), and due to that fact you can't make preferences over different kind of objects.

Ah, but they are the same, I'm not making a difference, I'm making an optimization and not getting caught. ;)

...

Class objects and instance objects are the same: objects holding a mutable state.

Fine, you can have mutable state. But only within your own process. If you change it no other process will see it.

...

A method dictionary is a mutable state of class, so changing it should affect all objects of given class immediately.

This is a special case in my envisioned system, just as it's a special case now.

...

And btw, i starting doubt about 'no sharing at all' paradigm. Just in one of the links you given about Erlang, one of the readers commented: (http://armstrongonsoftware.blogspot.com/2006/08/concurrency-is-easy.html)

Ok, so someone presented a counter-point about a meaningless analogy, and that disproves the power of message passing?

You can doubt it if you want. You can work incredibly hard for the next 5, 10 years or however long it takes and add native thread shared state concurrency to Squeak. But Erlang will continue to gain market share. Dinosaurs like Java, with their old shared-state model will get less and less relevant as concurrent software gets the complexity explosion that web software is going through now. And if your fast you will make it just in time to charge into the party as the last light goes out.

...

There an obvious evidence of a system which performs in parallel by having a single shared state - human brain. Do we experience heavy 'locking' or 'concurrency' problems? Personally, I'm not :) And if so, then why sharing state should be considered evil?

For the same reason manual memory management is in most cases: It's too hard to get right, it's too hard to reason about, it's impossible to compose with other software made independently.

Now of course we can get back into the watershed of beating ourselves for being "bad programmers" but after so many failed projects, and success itself meaning simply having a small enough critical bugs to be usable, maybe we should admit we're trying to put the shoe on backwards.

...

Also, in fact, you can't eliminate using shared state at all.

I don't think that's been proven at all. In fact there have been OS'es to do exactly that.

...

There can be only a ways how to delegate shared state management to outer context (such as OS, or VM e.t.c).

That it doesn't have to be shared state. Does Linux have shared state? I don't recall ever having to lock a mutex before reading a disk, or allocating memory or anything. Of course Linux does have shared resources internally, but if you broke it up into smaller "Kernels" that each had a the same interface between each other then perhaps this could be eliminated.

...

A 'fine-grained' locking is nothing more than manual controlling of shared state made by developer.

Exactly, and manual memory management is nothing more then manual controlling of memory made by the developer.

...

Try implement Erlang (or any other system which uses parallel computations and having a shared state) without locks and semaphores.

That depends on the OS and CPU, not the problem domain.

...

So, tell me, what the difference between manually controlling it, or 'magically', like Erlang does?

Seriously? Well lets see: (1) lines of code needed to accomplish a given task, (2) literally man *years* of work time, (3) stability of finished product (or perhaps you know of a Java/C++/whatever system that is in heavy use and achieves 9 9's of reliability?), (4) difficulty of solving bug that do appear due to concurrency, (5) the ability to solve concurrency issues at *design time* instead of *implementation time* (probably the most important of all of them), ....

...

A proposed a message passing between different OS processes(squeak images) is nothing more than letting OS deal with shared state problems (locks/queing e.t.c).. naively thinking that we are get rid of shared state concurrency. Are we really do?

Please reread what you wrote here because.... What you basically said is "doing X is nothing more then letting the OS and/or system deal with the complexity for us and naively thinking it just goes away". Is this not the exact reason we are using Smalltalk instead of C++, C or assembler?

Igor Stasenko

7:14 p.m.

On 21/10/2007, Jason Johnson jason.johnson.081@gmail.com wrote:

...

On 10/21/07, Igor Stasenko siguctua@gmail.com wrote:

...
I don't think this have a sense. Classes should be treated as regular objects (everything is an object - remember?), and due to that fact you can't make preferences over different kind of objects.

Ah, but they are the same, I'm not making a difference, I'm making an optimization and not getting caught. ;)

...
Class objects and instance objects are the same: objects holding a mutable state.

Fine, you can have mutable state. But only within your own process. If you change it no other process will see it.

...
A method dictionary is a mutable state of class, so changing it should affect all objects of given class immediately.

This is a special case in my envisioned system, just as it's a special case now.

...
And btw, i starting doubt about 'no sharing at all' paradigm. Just in one of the links you given about Erlang, one of the readers commented: (http://armstrongonsoftware.blogspot.com/2006/08/concurrency-is-easy.html)

Ok, so someone presented a counter-point about a meaningless analogy, and that disproves the power of message passing?

You can doubt it if you want. You can work incredibly hard for the next 5, 10 years or however long it takes and add native thread shared state concurrency to Squeak. But Erlang will continue to gain market share. Dinosaurs like Java, with their old shared-state model will get less and less relevant as concurrent software gets the complexity explosion that web software is going through now. And if your fast you will make it just in time to charge into the party as the last light goes out.

...
There an obvious evidence of a system which performs in parallel by having a single shared state - human brain. Do we experience heavy 'locking' or 'concurrency' problems? Personally, I'm not :) And if so, then why sharing state should be considered evil?

For the same reason manual memory management is in most cases: It's too hard to get right, it's too hard to reason about, it's impossible to compose with other software made independently.

Now of course we can get back into the watershed of beating ourselves for being "bad programmers" but after so many failed projects, and success itself meaning simply having a small enough critical bugs to be usable, maybe we should admit we're trying to put the shoe on backwards.

...
Also, in fact, you can't eliminate using shared state at all.

I don't think that's been proven at all. In fact there have been OS'es to do exactly that.

...
There can be only a ways how to delegate shared state management to outer context (such as OS, or VM e.t.c).

That it doesn't have to be shared state. Does Linux have shared state? I don't recall ever having to lock a mutex before reading a disk, or allocating memory or anything. Of course Linux does have shared resources internally, but if you broke it up into smaller "Kernels" that each had a the same interface between each other then perhaps this could be eliminated.

...
A 'fine-grained' locking is nothing more than manual controlling of shared state made by developer.

Exactly, and manual memory management is nothing more then manual controlling of memory made by the developer.

...
Try implement Erlang (or any other system which uses parallel computations and having a shared state) without locks and semaphores.

That depends on the OS and CPU, not the problem domain.

...
So, tell me, what the difference between manually controlling it, or 'magically', like Erlang does?

Seriously? Well lets see: (1) lines of code needed to accomplish a given task, (2) literally man *years* of work time, (3) stability of finished product (or perhaps you know of a Java/C++/whatever system that is in heavy use and achieves 9 9's of reliability?), (4) difficulty of solving bug that do appear due to concurrency, (5) the ability to solve concurrency issues at *design time* instead of *implementation time* (probably the most important of all of them), ....

...
A proposed a message passing between different OS processes(squeak images) is nothing more than letting OS deal with shared state problems (locks/queing e.t.c).. naively thinking that we are get rid of shared state concurrency. Are we really do?

Please reread what you wrote here because.... What you basically said is "doing X is nothing more then letting the OS and/or system deal with the complexity for us and naively thinking it just goes away". Is this not the exact reason we are using Smalltalk instead of C++, C or assembler?

Hehe, then all we need is to implement ST with Erlang and we are fine. Yes? :) If you noticed, i'm not arguing about is it possible to make ST VM ready for multi-core. I'm really pointing on, how efficient it can be (depends on proposed solutions) and what we can do to make good VM competing by speed with other language platforms. If we not taking speed into account, then there's nothing to talk about.

And don't tell me that trying to make ST to run as fast as can is pointless... Such thoughts are only for people, who taking a car and drive to wall-mart to buy a glass of water, located in 20 feets from their house.

For same reasons as you don't want use C++/C/assembler for your projects i don't want them too. But at same time, i don't like being sitting and waiting, while other parts of system (OS/Hardware) become so advanced to make my code run smoothly. If i don't like how it runs, i trying to fix that instead of waiting until someone else will fix it.

As side note, i don't want to make one ST Process === single native thread. Such solution is really 'obsolete' and lead to nowhere. I'm talking about VM which conserves all CPU(s) power while managing execution/GC seamlessly to language. The in-language Processes and semaphores must not have any relation to native threads or number of cores. It's just a language level abstraction, nothing more. In that case, i can't see how 'manual fine grained locking' can do any harm. If you need to synchronize access to some object you simply can't avoid that - at any level of abstraction (being in Erlang or OS, or assembler).

-- Best regards, Igor Stasenko AKA sig.

Jason Johnson

10:44 p.m.

On 10/21/07, Igor Stasenko siguctua@gmail.com wrote:

...

Hehe, then all we need is to implement ST with Erlang and we are fine. Yes? :)

There is always room for improvement. Especially when we start getting more flexibility on the hardware side. But Erlang shows a good way of dealing with concurrency that Smalltalk can learn from.

...

I'm really pointing on, how efficient it can be (depends on proposed solutions) and what we can do to make good VM competing by speed with other language platforms. If we not taking speed into account, then there's nothing to talk about.

And don't tell me that trying to make ST to run as fast as can is pointless... Such thoughts are only for people, who taking a car and drive to wall-mart to buy a glass of water, located in 20 feets from their house.

Of course I care about speed. I have a few ideas for speeding Squeak up myself. But in this case don't assume that message passing is fundamentally slower. Keep in mind that in a pure message passing environment there are no locks to slow everything down, and the system is easier and more likely to be designed to maximize concurrent operations. So I would expect it to be faster then other approaches as the scale gets larger. Yaws vs. Apache is a perfect illustration of this.

...

As side note, i don't want to make one ST Process === single native thread. Such solution is really 'obsolete' and lead to nowhere. I'm talking about VM which conserves all CPU(s) power while managing execution/GC seamlessly to language. The in-language Processes and semaphores must not have any relation to native threads or number of cores. It's just a language level abstraction, nothing more. In that case, i can't see how 'manual fine grained locking' can do any harm.

The harm is that it's simply the wrong abstraction. Having the option there leads people to use it and gives us a code base that using a model that can't scale or compose.

...

If you need to synchronize access to some object you simply can't avoid that - at any level of abstraction (being in Erlang or OS, or assembler).

That's the point: you *don't* have synchronized access to any object. All you have is messages. Think of it as an OO view of processes. You can't see what's inside, you can only ask the process to do things on your behalf.

Igor Stasenko

22 Oct 22 Oct

9:49 a.m.

On 21/10/2007, Jason Johnson jason.johnson.081@gmail.com wrote:

...

On 10/21/07, Igor Stasenko siguctua@gmail.com wrote:

...
Hehe, then all we need is to implement ST with Erlang and we are fine. Yes? :)

There is always room for improvement. Especially when we start getting more flexibility on the hardware side. But Erlang shows a good way of dealing with concurrency that Smalltalk can learn from.

...
I'm really pointing on, how efficient it can be (depends on proposed solutions) and what we can do to make good VM competing by speed with other language platforms. If we not taking speed into account, then there's nothing to talk about.

And don't tell me that trying to make ST to run as fast as can is pointless... Such thoughts are only for people, who taking a car and drive to wall-mart to buy a glass of water, located in 20 feets from their house.

Of course I care about speed. I have a few ideas for speeding Squeak up myself. But in this case don't assume that message passing is fundamentally slower. Keep in mind that in a pure message passing environment there are no locks to slow everything down, and the system is easier and more likely to be designed to maximize concurrent operations. So I would expect it to be faster then other approaches as the scale gets larger. Yaws vs. Apache is a perfect illustration of this.

...
As side note, i don't want to make one ST Process === single native thread. Such solution is really 'obsolete' and lead to nowhere. I'm talking about VM which conserves all CPU(s) power while managing execution/GC seamlessly to language. The in-language Processes and semaphores must not have any relation to native threads or number of cores. It's just a language level abstraction, nothing more. In that case, i can't see how 'manual fine grained locking' can do any harm.

The harm is that it's simply the wrong abstraction. Having the option there leads people to use it and gives us a code base that using a model that can't scale or compose.

If there's nothing else, which can be a replacement to this model , then you don't have a choice, but use above.

...

...
If you need to synchronize access to some object you simply can't avoid that - at any level of abstraction (being in Erlang or OS, or assembler).

That's the point: you *don't* have synchronized access to any object. All you have is messages. Think of it as an OO view of processes. You can't see what's inside, you can only ask the process to do things on your behalf.

Again, a question raised: how to ensure that messages are passed in correct order and make sure that messages are delivered? Now lets look inside: to make it working properly, you need to implement a message queue. And queue means that you must make an 'enqueue' and 'dequeue' operations synchronized. And that's exactly what i mean: even if you hide the concurrency problems from the eyes of developer, this is not means that problems are gone: now you have to deal with them by own. If you know another way(s) how to make proper message passing scheme without using synchronized object (such as queue), i am all ears.

-- Best regards, Igor Stasenko AKA sig.

Giovanni Corriga

10:32 a.m.

Il giorno lun, 22/10/2007 alle 10.49 +0300, Igor Stasenko ha scritto:

...

On 21/10/2007, Jason Johnson jason.johnson.081@gmail.com wrote:

...
On 10/21/07, Igor Stasenko siguctua@gmail.com wrote:

...
If you need to synchronize access to some object you simply can't avoid that - at any level of abstraction (being in Erlang or OS, or assembler).

That's the point: you *don't* have synchronized access to any object. All you have is messages. Think of it as an OO view of processes. You can't see what's inside, you can only ask the process to do things on your behalf.

Again, a question raised: how to ensure that messages are passed in correct order and make sure that messages are delivered? Now lets look inside: to make it working properly, you need to implement a message queue. And queue means that you must make an 'enqueue' and 'dequeue' operations synchronized. And that's exactly what i mean: even if you hide the concurrency problems from the eyes of developer, this is not means that problems are gone: now you have to deal with them by own. If you know another way(s) how to make proper message passing scheme without using synchronized object (such as queue), i am all ears.

The Erlang way: don't care about the order of arrival of the messages, and let the developer care about that when it's important.

Giovanni

Igor Stasenko

10:55 a.m.

On 22/10/2007, Giovanni Corriga giovanni@corriga.net wrote:

...

Il giorno lun, 22/10/2007 alle 10.49 +0300, Igor Stasenko ha scritto:

...
On 21/10/2007, Jason Johnson jason.johnson.081@gmail.com wrote:

...
On 10/21/07, Igor Stasenko siguctua@gmail.com wrote:

...
If you need to synchronize access to some object you simply can't avoid that - at any level of abstraction (being in Erlang or OS, or assembler).

That's the point: you *don't* have synchronized access to any object. All you have is messages. Think of it as an OO view of processes. You can't see what's inside, you can only ask the process to do things on your behalf.

Again, a question raised: how to ensure that messages are passed in correct order and make sure that messages are delivered? Now lets look inside: to make it working properly, you need to implement a message queue. And queue means that you must make an 'enqueue' and 'dequeue' operations synchronized. And that's exactly what i mean: even if you hide the concurrency problems from the eyes of developer, this is not means that problems are gone: now you have to deal with them by own. If you know another way(s) how to make proper message passing scheme without using synchronized object (such as queue), i am all ears.

The Erlang way: don't care about the order of arrival of the messages, and let the developer care about that when it's important.

Yes, a simple example when i need to have correct order: Collection>>do:

to print an array i'll have all items ordered from start to end , not in random order.

And of course there are cases, when i don't need to have items iterated in specific order. When i simply need to visit all items in collection to send a message to them.

So, we need at least 2 messages to reflect a different behaviour: #do: and #orderedDo:

and that's only the simplest case...

...

    Giovanni

-- Best regards, Igor Stasenko AKA sig.

Sebastian Sastre

2:10 p.m.

...

...
The Erlang way: don't care about the order of arrival of

the messages,

...
and let the developer care about that when it's important.

Yes, a simple example when i need to have correct order: Collection>>do:

to print an array i'll have all items ordered from start to end , not in random order.

And of course there are cases, when i don't need to have items iterated in specific order. When i simply need to visit all items in collection to send a message to them.

So, we need at least 2 messages to reflect a different behaviour: #do: and #orderedDo:

and that's only the simplest case...

...
    Giovanni

Are you sure Igor? why you will a developer use an OrderedCollection if he/she don't care about order? I think is more proper to use a aSet or aBag even to perform something to the elements of that ordered collection in an unordered way instead of (pre)asuming how #do: implements the traversal.

Perhaps you found another contraexample/s.

Cheers,

Sebastian

Igor Stasenko

2:09 p.m.

On 22/10/2007, Sebastian Sastre ssastre@seaswork.com wrote:

...

...
...
The Erlang way: don't care about the order of arrival of

the messages,

...
and let the developer care about that when it's important.

Yes, a simple example when i need to have correct order: Collection>>do:

to print an array i'll have all items ordered from start to end , not in random order.

And of course there are cases, when i don't need to have items iterated in specific order. When i simply need to visit all items in collection to send a message to them.

So, we need at least 2 messages to reflect a different behaviour: #do: and #orderedDo:

and that's only the simplest case...

...
    Giovanni
Are you sure Igor? why you will a developer use an OrderedCollection if he/she don't care about order? I think is more proper to use a aSet or aBag even to perform something to the elements of that ordered collection in an unordered way instead of (pre)asuming how #do: implements the traversal.

Perhaps you found another contraexample/s.

Well, its maybe not a proper example, i just wanted to show, that we will need changes to codebase (not only VM) to better support of parallelism.

...

Cheers,

Sebastian

-- Best regards, Igor Stasenko AKA sig.

Sebastian Sastre

4:19 p.m.

...

...
Perhaps you found another contraexample/s.

Well, its maybe not a proper example, i just wanted to show, that we will need changes to codebase (not only VM) to better support of parallelism.

I see your point. Will be possible that we make a basic early estimation of how big the codebase earthquake will be? If we define that essential parts to run system should work as survive, do you think will be feasible to survive?

Cheers,

Sebastian PS: Survivors will be stronger for sure ;-)

Jason Johnson

23 Oct 23 Oct

7:21 p.m.

On 10/22/07, Igor Stasenko siguctua@gmail.com wrote:

...

Well, its maybe not a proper example, i just wanted to show, that we will need changes to codebase (not only VM) to better support of parallelism.

Well, personally I'm not trying to add transparent parallelism (Erlang doesn't try this either). I want inter-process communication to be completely explicit, just easy. I hadn't planned on adding anything to the core libraries. a parallelDo: wouldn't know where to send the work anyway if you don't tell it.

Jason Johnson

7:18 p.m.

On 10/22/07, Igor Stasenko siguctua@gmail.com wrote:

...

Yes, a simple example when i need to have correct order: Collection>>do:

Well, first of all, do you want do to be parallel? I would personally prefer to have a #parallelDo: for that. Second of all, the point of having the multiple processes is that we can run these things in parallel in different threads and different processes. How do you suggest controlling execution order in that scenario?

If you need work done in parallel and then need the results sorted back to original order you have to do it in multiple steps, e.g.

1) collect a collection of "things" into a collection of associations that have "thing" keys and position as the value, 2) run the parralelDo on this new collection 3) take the results and collect them back into the correct order using the value

or you could do something like this: http://bc.tech.coop/blog/070520.html

...

So, we need at least 2 messages to reflect a different behaviour: #do: and #orderedDo:

and that's only the simplest case...

How do you envision this working? It's no better using shared-state/fine-grained locking unless you are modifying the collection in place, which do and co. are not doing now.

Sebastian Sastre

22 Oct 22 Oct

2:28 p.m.

...

...
That's the point: you *don't* have synchronized access to

any object.

...
All you have is messages. Think of it as an OO view of processes. You can't see what's inside, you can only ask the process

to do things

...
on your behalf.

Again, a question raised: how to ensure that messages are passed in correct order and make sure that messages are delivered? Now lets look inside: to make it working properly, you need to implement a message queue. And queue means that you must make an 'enqueue' and 'dequeue' operations synchronized. And that's exactly what i mean: even if you hide the concurrency problems from the eyes of developer, this is not means that problems are gone: now you have to deal with them by own. If you know another way(s) how to make proper message passing scheme without using synchronized object (such as queue), i am all ears.

-- Best regards, Igor Stasenko AKA sig.

1. For the correct order: I understand that Erlang is open so, to some point, nothing stop us from looking that how-tos on how Erlang's VM makes the message passing in correct order right? Seems to me that somehow they solved that question and probably we can study how assimilate that virtue. 2. For ensuring messages sends: "send and pray"

That way when a smalltalk's erlangized message send is in a process that terminates it should end with some cause for the act of finish. Maybe this will allow for instance to implement DNU: the VM don't find a proper method in the object to receive that message so it terminates the process stating that as cause.

Cheers,

Sebastian

Igor Stasenko

2:30 p.m.

On 22/10/2007, Sebastian Sastre ssastre@seaswork.com wrote:

...

...
...
That's the point: you *don't* have synchronized access to

any object.

...
All you have is messages. Think of it as an OO view of processes. You can't see what's inside, you can only ask the process

to do things

...
on your behalf.

Again, a question raised: how to ensure that messages are passed in correct order and make sure that messages are delivered? Now lets look inside: to make it working properly, you need to implement a message queue. And queue means that you must make an 'enqueue' and 'dequeue' operations synchronized. And that's exactly what i mean: even if you hide the concurrency problems from the eyes of developer, this is not means that problems are gone: now you have to deal with them by own. If you know another way(s) how to make proper message passing scheme without using synchronized object (such as queue), i am all ears.

-- Best regards, Igor Stasenko AKA sig.

For the correct order: I understand that Erlang is open so, to some

point, nothing stop us from looking that how-tos on how Erlang's VM makes the message passing in correct order right? Seems to me that somehow they solved that question and probably we can study how assimilate that virtue.

While reading this topic, i googled, just to look what solutions are found in this area for non-locking queues. There is no wonder (still) - they all based on atomic CaS (Compare and Store) processor instructions. Its of course interesting how Erlang manages message passing, but i doubt that it based on something much different.

...

For ensuring messages sends: "send and pray"

That way when a smalltalk's erlangized message send is in a process that terminates it should end with some cause for the act of finish. Maybe this will allow for instance to implement DNU: the VM don't find a proper method in the object to receive that message so it terminates the process stating that as cause.

An 'erlangenization' of sends mean that we need deal differently with contexts. I think best way for this, is to rethink a context to make it look closer to what is a process in Erlang. Yes, we must pay the price of making all contexts be real objects for each message send, so we might expect a real slow-down of single thread execution. Then the only way how we could regain this loss is to use highly parallelisable algorithms.

...

Cheers,

Sebastian

-- Best regards, Igor Stasenko AKA sig.

Sebastian Sastre

4:33 p.m.

...

...

For the correct order: I understand that Erlang is open

so, to some

...
point, nothing stop us from looking that how-tos on how Erlang's VM makes the message passing in correct order right? Seems to me that somehow they solved that question and probably we can study

how assimilate that virtue.

While reading this topic, i googled, just to look what solutions are found in this area for non-locking queues. There is no wonder (still) - they all based on atomic CaS (Compare and Store) processor instructions. Its of course interesting how Erlang manages message passing, but i doubt that it based on something much different.

But I think that until we have async CPU's there allways have to be implemented someway like that. I see it as a hardware limitation not as a problem per se. My point is, as you say, that even with that it's very interesting how they managed to take advantage of it and make such a good message passing machine and feed the wonder of it being assimilable by the objects paradigm.

...

...

For ensuring messages sends: "send and pray"

That way when a smalltalk's erlangized message send is in a process that terminates it should end with some cause for the act

of finish.

...
Maybe this will allow for instance to implement DNU: the VM

don't find

...
a proper method in the object to receive that message so it

terminates

...
the process stating that as cause.

An 'erlangenization' of sends mean that we need deal differently with contexts. I think best way for this, is to rethink a context to make it look closer to what is a process in Erlang. Yes, we must pay the price of making all contexts be real objects for each message send, so we might expect a real slow-down of single thread execution. Then the only way how we could regain this loss is to use highly parallelisable algorithms.

...
Cheers,

Sebastian

-- Best regards, Igor Stasenko AKA sig.

I see that consequence but we are forced to think big by healty trends. Systems todays are kind of optimal for monocore CPU's because they was not designed for multicore and the adaptation to multicore CPU's has a tradeoff of releasing that optimization for single processes. But that is a worry just for what, one or two years? is a very short duration of time to worry about for. Future seems to have all in favor of parallelization. This is all about that. Hundreds of cores maybe higher orders of magnitude.

Cheers,

Sebastian

Klaus D. Witzel

5:17 p.m.

On Mon, 22 Oct 2007 14:30:28 +0200, Igor Stasenko wrote:

...

On 22/10/2007, Sebastian Sastre wrote:

...
...
...
That's the point: you *don't* have synchronized access to

any object.

...
All you have is messages. Think of it as an OO view of processes. You can't see what's inside, you can only ask the process

to do things

...
on your behalf.

Again, a question raised: how to ensure that messages are passed in correct order and make sure that messages are delivered? Now lets look inside: to make it working properly, you need to implement a message queue. And queue means that you must make an 'enqueue' and 'dequeue' operations synchronized. And that's exactly what i mean: even if you hide the concurrency problems from the eyes of developer, this is not means that problems are gone: now you have to deal with them by own. If you know another way(s) how to make proper message passing scheme without using synchronized object (such as queue), i am all ears.

-- Best regards, Igor Stasenko AKA sig.

For the correct order: I understand that Erlang is open so, to some

point, nothing stop us from looking that how-tos on how Erlang's VM makes the message passing in correct order right? Seems to me that somehow they solved that question and probably we can study how assimilate that virtue.

While reading this topic, i googled, just to look what solutions are found in this area for non-locking queues. There is no wonder (still) - they all based on atomic CaS (Compare and Store) processor instructions. Its of course interesting how Erlang manages message passing, but i doubt that it based on something much different.

...

For ensuring messages sends: "send and pray"

That way when a smalltalk's erlangized message send is in a process that terminates it should end with some cause for the act of finish. Maybe this will allow for instance to implement DNU: the VM don't find a proper method in the object to receive that message so it terminates the process stating that as cause.

An 'erlangenization' of sends mean that we need deal differently with contexts. I think best way for this, is to rethink a context to make it look closer to what is a process in Erlang. Yes, we must pay the price of making all contexts be real objects for each message send, so we might expect a real slow-down of single thread execution.

Not only slow-down :( For an example, have a look at the implementor of #debug:title:full: in class Process, where thisContext is assigned to a variable.

When #ifTrue:ifFalse: is really sent, ([thisContext] class) is BlockContext *and* its sender is nil so the test for #hasContext: in the next statement fails.

But Squeak's compiler [usually] doesn't emit code for sending #ifTrue:ifFalse: so ([thisContext] class) is MethodContext and #hasContext: doesn't fail (in this example).

...

Then the only way how we could regain this loss is to use highly parallelisable algorithms.

... which can be employed regardless of 'erlangenization' :)

/Klaus

...

...
Cheers,

Sebastian

Igor Stasenko

7:10 p.m.

On 22/10/2007, Klaus D. Witzel klaus.witzel@cobss.com wrote:

...

Not only slow-down :( For an example, have a look at the implementor of #debug:title:full: in class Process, where thisContext is assigned to a variable.

When #ifTrue:ifFalse: is really sent, ([thisContext] class) is BlockContext *and* its sender is nil so the test for #hasContext: in the next statement fails.

I think this is because of optimization. For BlockContext a sender should be a context of method #ifTrue:ifFalse: (which sends #value to block). But compiler never creates such context due to optimization. In this case, since compiler 'cuts' the #ifTrue:ifFalse: out, then a correct context, i think, should be a sender of #ifTrue:ifFalse?? but not nil.

...

But Squeak's compiler [usually] doesn't emit code for sending #ifTrue:ifFalse: so ([thisContext] class) is MethodContext and #hasContext: doesn't fail (in this example).

...
Then the only way how we could regain this loss is to use highly parallelisable algorithms.

... which can be employed regardless of 'erlangenization' :)

A trivial code comes in mind:

(1 to: 1000) do: [:i | [ aBlock value:i ] fork ]

but this leads to burden our parallel processes with scheduling. I would like, instead, to be able to run a number of parallel branches for same process (to schedule a process instead each of these branches).

(1 to: 1000) doInParallel: [:i | aBlock value ]

I really don't like adding another abstraction like Thread, in addition to Process. Maybe we should stick with a Process and have a subclass of it, like ProcessNoScheduling. I'm just thinking, in what ways we can avoid excessive scheduling/preempting? Or maybe, by following road of 'erlangisation' we should make a Process more lightweight, so spawning thousands of them will not cause a speed degradation.

-- Best regards, Igor Stasenko AKA sig.

Sebastian Sastre

9:28 p.m.

...

...
A trivial code comes in mind:

(1 to: 1000) do: [:i | [ aBlock value:i ] fork ]

but this leads to burden our parallel processes with scheduling. I would like, instead, to be able to run a number of parallel branches for same process (to schedule a process instead each of these branches).

(1 to: 1000) doInParallel: [:i | aBlock value ]

I really don't like adding another abstraction like Thread, in addition to Process. Maybe we should stick with a Process and have a subclass of it, like ProcessNoScheduling. I'm just thinking, in what ways we can avoid excessive scheduling/preempting?

Seems to me that OS process and native threads is not the best choice (too costly to create, etc).

...

Or maybe, by following road of 'erlangisation' we should make a Process more lightweight, so spawning thousands of them will not cause a speed degradation.

Making process in Erlang is as cheap as creating objects in Smalltalk so this how-to's are exactly the ones I think we can take a look in Erlang's VM design/internals and/or talk with people familiar to Erlang VM. Would be cool to meet them and have a smalltalk :) to see what goes from it

Cheers,

Sebastian

...

-- Best regards, Igor Stasenko AKA sig.

Klaus D. Witzel

23 Oct 23 Oct

9:13 a.m.

New subject: Scalability [was: Multy-core CPUs]

On Mon, 22 Oct 2007 19:10:49 +0200, Igor Stasenko wrote:

...

On 22/10/2007, Klaus D. Witzel wrote:

...
Not only slow-down :( For an example, have a look at the implementor of #debug:title:full: in class Process, where thisContext is assigned to a variable.

When #ifTrue:ifFalse: is really sent, ([thisContext] class) is BlockContext *and* its sender is nil so the test for #hasContext: in the next statement fails.

I think this is because of optimization. For BlockContext a sender should be a context of method #ifTrue:ifFalse: (which sends #value to block). But compiler never creates such context due to optimization. In this case, since compiler 'cuts' the #ifTrue:ifFalse: out, then a correct context, i think, should be a sender of #ifTrue:ifFalse?? but not nil.

...
But Squeak's compiler [usually] doesn't emit code for sending #ifTrue:ifFalse: so ([thisContext] class) is MethodContext and #hasContext: doesn't fail (in this example).

...
Then the only way how we could regain this loss is to use highly parallelisable algorithms.

... which can be employed regardless of 'erlangenization' :)

A trivial code comes in mind:

(1 to: 1000) do: [:i | [ aBlock value:i ] fork ]

but this leads to burden our parallel processes with scheduling. I would like, instead, to be able to run a number of parallel branches for same process (to schedule a process instead each of these branches).

(1 to: 1000) doInParallel: [:i | aBlock value ]

I really don't like adding another abstraction like Thread, in addition to Process. Maybe we should stick with a Process and have a subclass of it, like ProcessNoScheduling.

I think that the present multi-core CPU thread would benefit from having a look at what other people achieved in this area, I mean people who bet their whole carreer on optimizing resource allocation and resource scheduling, like for example this one

Scalability of Microkernel-Based Systems - http://l4ka.org/publications/2005/uhlig_phd-thesis_scalability.pdf (just skip the few pages in German, the paper is in English)

And one shouldn't care about that L4 folks are mainly concerned with OS components since that are objects like any other :) They [L4 folks] have minimalistic number of concepts and tough, very tough requirement definitions which have to be matched with reality ;-)

But, the way I understand the present multi-core CPU thread, Squeak people aim to save the multi-core processing world by reinventing it :-D

...

I'm just thinking, in what ways we can avoid excessive scheduling/preempting?

I think that you can find answers to this [with benchmarks and, comparisions also at the conceptual level] in the abovementioned paper :)

...

Or maybe, by following road of 'erlangisation' we should make a Process more lightweight, so spawning thousands of them will not cause a speed degradation.

But it will. There are hidden constants associated with our present understanding of massive parallelism (you mentioned the cost of resource allocation and resource scheduling, add to that that messages can get lost, non-local updates for keeping the system viable, etc).

And you have to find problems which can be solved with massive parallel threads/processes ;-)

/Klaus

...

Igor Stasenko

5:38 p.m.

New subject: Scalability [was: Multy-core CPUs]

On 23/10/2007, Klaus D. Witzel klaus.witzel@cobss.com wrote:

...

On Mon, 22 Oct 2007 19:10:49 +0200, Igor Stasenko wrote:

...
On 22/10/2007, Klaus D. Witzel wrote:

...
Not only slow-down :( For an example, have a look at the implementor of #debug:title:full: in class Process, where thisContext is assigned to a variable.

When #ifTrue:ifFalse: is really sent, ([thisContext] class) is BlockContext *and* its sender is nil so the test for #hasContext: in the next statement fails.

I think this is because of optimization. For BlockContext a sender should be a context of method #ifTrue:ifFalse: (which sends #value to block). But compiler never creates such context due to optimization. In this case, since compiler 'cuts' the #ifTrue:ifFalse: out, then a correct context, i think, should be a sender of #ifTrue:ifFalse?? but not nil.

...
But Squeak's compiler [usually] doesn't emit code for sending #ifTrue:ifFalse: so ([thisContext] class) is MethodContext and #hasContext: doesn't fail (in this example).

...
Then the only way how we could regain this loss is to use highly parallelisable algorithms.

... which can be employed regardless of 'erlangenization' :)

A trivial code comes in mind:

(1 to: 1000) do: [:i | [ aBlock value:i ] fork ]

but this leads to burden our parallel processes with scheduling. I would like, instead, to be able to run a number of parallel branches for same process (to schedule a process instead each of these branches).

(1 to: 1000) doInParallel: [:i | aBlock value ]

I really don't like adding another abstraction like Thread, in addition to Process. Maybe we should stick with a Process and have a subclass of it, like ProcessNoScheduling.

I think that the present multi-core CPU thread would benefit from having a look at what other people achieved in this area, I mean people who bet their whole carreer on optimizing resource allocation and resource scheduling, like for example this one

Scalability of Microkernel-Based Systems

http://l4ka.org/publications/2005/uhlig_phd-thesis_scalability.pdf

(just skip the few pages in German, the paper is in English)

And one shouldn't care about that L4 folks are mainly concerned with OS components since that are objects like any other :) They [L4 folks] have minimalistic number of concepts and tough, very tough requirement definitions which have to be matched with reality ;-)

But, the way I understand the present multi-core CPU thread, Squeak people aim to save the multi-core processing world by reinventing it :-D

Of course not. I have read somewhere a description of system which uses such 'micro-kernel' architecture in mind. A system divided in parts which communicate by establishing a 'contracts' - some kind of agreement between system parts on protocols and security.

A Multi-Core CPU's working in parallel by having own 'private' memory (cache) and shared memory. Why then VM can't do the same? All we need to do, is to put this in use.

...

...
I'm just thinking, in what ways we can avoid excessive scheduling/preempting?

I think that you can find answers to this [with benchmarks and, comparisions also at the conceptual level] in the abovementioned paper :)

...
Or maybe, by following road of 'erlangisation' we should make a Process more lightweight, so spawning thousands of them will not cause a speed degradation.

But it will. There are hidden constants associated with our present understanding of massive parallelism (you mentioned the cost of resource allocation and resource scheduling, add to that that messages can get lost, non-local updates for keeping the system viable, etc).

And you have to find problems which can be solved with massive parallel threads/processes ;-)

/Klaus

...

-- Best regards, Igor Stasenko AKA sig.

Klaus D. Witzel

6:19 p.m.

New subject: Scalability [was: Multy-core CPUs]

On Tue, 23 Oct 2007 17:38:02 +0200, Igor Stasenko wrote:

...

On 23/10/2007, Klaus D. Witzel wrote:

[...]

...

...
But, the way I understand the present multi-core CPU thread, Squeak people aim to save the multi-core processing world by reinventing it :-D

Of course not.

...

I have read somewhere a description of system which uses such 'micro-kernel' architecture in mind. A system divided in parts which communicate by establishing a 'contracts' - some kind of agreement between system parts on protocols and security.

Go ahead. Don't stop here. What's it about?

...

A Multi-Core CPU's working in parallel by having own 'private' memory (cache) and shared memory. Why then VM can't do the same? All we need to do, is to put this in use.

I know you are familiar with the many aspects of the Squeak VM. Where would you start with a parallelized VM, perhaps here

- http://en.wikipedia.org/wiki/Automatic_parallelization

which needs complex program analysis on the compiler side, or something completely different?

/Klaus

Sebastian Sastre

7:50 p.m.

New subject: Scalability [was: Multy-core CPUs]

...

...
But, the way I understand the present multi-core CPU thread, Squeak people aim to save the multi-core processing world by

reinventing it

...
:-D

Of course not. I have read somewhere a description of system which uses such 'micro-kernel' architecture in mind. A system divided in parts which communicate by establishing a 'contracts' - some kind of agreement between system parts on protocols and security.

A Multi-Core CPU's working in parallel by having own 'private' memory (cache) and shared memory. Why then VM can't do the same? All we need to do, is to put this in use.

I think because we don't have a machinery to comunicate VM's without disrupting the user experience or making the N-cores thing to make a real difference. But if a machinery for that is found acceptable enough for most problems domains then it will become appealing (as the N spoons idea).

Cheers,

Sebastian

Jason Johnson

8:18 p.m.

On 10/22/07, Igor Stasenko siguctua@gmail.com wrote:

...

but this leads to burden our parallel processes with scheduling. I would like, instead, to be able to run a number of parallel branches for same process (to schedule a process instead each of these branches).

Scheduling doesn't have to be a problem if done e.g. event driven [1]. This is one of the optimizations I planned to have as advantage over the Erlang implementation.

[1] In an event driven scheduler you look at what the process the process did, and demote or promote them in priority based on this. You end up just touching two processes per switch, but processes that quickly give up the CPU (e.g. a process that just sends messages to have work done) get the CPU any time they want it.

Jason Johnson

7:25 p.m.

On 10/22/07, Igor Stasenko siguctua@gmail.com wrote:

...

An 'erlangenization' of sends mean that we need deal differently with contexts. I think best way for this, is to rethink a context to make it look closer to what is a process in Erlang. Yes, we must pay the price of making all contexts be real objects for each message send, so we might expect a real slow-down of single thread execution. Then the only way how we could regain this loss is to use highly parallelisable algorithms.

Aha! Ok, the confusion is indeed coming from us talking about two different things.

My suggestion: Add true (explicit!) concurrency to Squeak by way of async "Actor" style message (like what Erlang has)

What you seem to think I'm suggesting: Making Squeak message send transparently inter-process.

But this is exactly what I *don't* want. In my experience, trying to abstract these different concepts into one thing just make code that's impossible to reason about. I want my inter-process communication doable and easy, but explicit as I can.

Jason Johnson

7:08 p.m.

On 10/22/07, Igor Stasenko siguctua@gmail.com wrote:

...

If there's nothing else, which can be a replacement to this model , then you don't have a choice, but use above.

The *VM* will have to, but no one using the Smalltalk system would.

...

Again, a question raised: how to ensure that messages are passed in correct order and make sure that messages are delivered?

Message delivery is a guarantee of the system, but order is absolutely not guaranteed.

...

Now lets look inside: to make it working properly, you need to implement a message queue. And queue means that you must make an 'enqueue' and 'dequeue' operations synchronized. And that's exactly what i mean: even if you hide the concurrency problems from the eyes of developer, this is not means that problems are gone: now you have to deal with them by own.

.... I don't get your objection. Again, the VM abstracts *lots* of tough details away from us so that we *never* have to think about it. Yes, in the current OS/Hardware options of course *the VM* will have to do some synchronization on message "mailboxes", but so what? It does memory management for us now and saves us a great burden.

...

If you know another way(s) how to make proper message passing scheme without using synchronized object (such as queue), i am all ears.

There are things out there, but who cares? This is a low level detail the VM can hide from us just fine. Ask any Erlang programmer how often they worry about synchronization issues with their message mailboxes.

Sebastian Sastre

22 Oct 22 Oct

8:07 a.m.

New subject: What about "Erlanging" the smalltalk interstitial space? (was RE: Multy-core CPUs)

Hi there, I'm following this discussion with great interest,

after introducing myself in what Eralang is by reading http://armstrongonsoftware.blogspot.com/2006/08/concurrency-is-easy.html referenced by Igor, I have a question (that maybe is silly I don't know that's why it's a question after all :)

Some contextualization first:

1. Speculative by analogy:

1.1 Casually about a month ago I've attended a seminar of a neurologist that explained he's research and proposal about a new and more complete theory of dreams. What I've learned there and I'm associating with this regard is the concept of tides, very common for neurologists by the way. They talk about tides of zillion of impulses in the brain cells interconections triggered by senses or even thoughts, emotions, etc.

1.2 the interpretation of (1) Objects as conceptacles in the sense of a receptacle of a concept: "definition of behavior and/or a holder of memory of another/s conceptacle/s" and interpretation of (2) that something trigger a tide which is composed of lots of little impulses.

1.3 this huge quantity of impulses are like well organized little messages that have no problem at all to work concurrently.

1.4 I'm interpreting the normal thousands of processes Erlang manages as analogue (so similar in function) to one of this tides and point why we cannot make tides of Smalltalk message sends in an image?

2. Erlang made it's point in theory about making paralellism really scalable. Then made it's point materializing it into enough succes stories to focus interest of people with pragmatic/con$ervative profiles. The hardware trend indicates this will be of great value.

3. If I understood well, Erlang's main strenght is not that it has functions to do things but that it has a message passing technique really great that was designed to take advantage of parallelism in a simple way efficiently and making processes very cheap (ceaper than the OS ones).

4. If I recall correctly (please correct me if I'm wrong) in "The Computer Revolution hasn't happend yet. Keynote OOPSLA 1997" Alan Kay says that object oriented was not the most happy way of calling OOP becaouse after the initial happines of focusing thought in objects happens to become of a higher priority to think about the process of that objects. This is, not the "body" they have but the "lives" they live with it. So.. he also said that may be process oriented could be a better way to call it todays. Making it's point about the importance of message passing and continues he's talk.

Said that and that I don't have experience with VM internals nor Erlang in an exercise of imagination naively I ask:

The intersticial space of virtual objects, AKA messages sends, can be "Erlanged" by making each message send to be in a "process" (the cheap ones) of it's own like Erlang messages between "process"?

Same in other words:

What consequences will affect us if we make a Squeak to use a VM that to pass messages use processes ala Erlang (that are simple, cheap, efficient and scalable in a world of "cores")?

Can this allow us to assimilate in the Smalltalk's objects paradigm the Erlagn's process paradigm? This is: will this allow us to gain that parallelizing benefits preventing us to change the programing paradigm?

If I understood well this will be (an unknown quantity?) change in the message passing part of the VM and probably has an impact on how todays an image understood what aProcess is to become what it allways should be by giving to processes the importance that they allways deserved and hardware technology wasn't able to deliver (deviating the focus of sofware developers due to sad hardware limitations) to them until todays (in an economy of scale way, AKA cheap). So every message send *is* in one of this different cheap and enormously scalable processes (tens of thousands) that are increasing it's value in the industry todays.

Sorry if I'm being too naif but I had to ask to be able to sleep :)

Sebastian Sastre PD: I've tried to imagine if this saves us from having to make code trhead safe or not. I was unable to refutate this by myself so I also ask kindly that the most experienced and critic minds collaborate on this regard.

...

-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Igor Stasenko Enviado el: Domingo, 21 de Octubre de 2007 09:36 Para: The general-purpose Squeak developers list Asunto: Re: Multy-core CPUs

On 21/10/2007, Jason Johnson jason.johnson.081@gmail.com wrote:

...
On 10/21/07, Peter William Lount peter@smalltalk.org wrote:

...
tim Rowledge wrote:

Ok, so if you really are talking about a "strict" Erlang

style model

...
...
with ONE Smalltalk process per "image" space (whether or not they are in one protected memory space or many protected

memory spaces)

...
...
where objects are not shared with any other threads except by copying them over the "serialization wire" or by

"reference" then I

...
...
get what you are talking about.

That is a strange way of putting it. The fact is, Erlang has many processes per image. Many more then you could ever get as real processes or native threads (as a test I made a little program that spawned 64 *thousand* threads and passed messages between

them on my

...
laptop).

But with their model, process creation is extremely cheap.

And since

...
there is no sharing as far as the language is concerned,

there is no

...
need for locking to slow everything down.

Smalltalk can do this too. I think it needs a little work

still, but

...
I'm optimistic about what can be done here.

...
However, you'll still end up with concurrency control issues and you've got an object version explosion problem occurring as well. How will you control concurrency problems with your simplified system? Is there a succinct description of the way that

Erlang does

...
...
it? Would that apply to Smalltalk?

Much like how Smalltalk does it, as it turns out. That is,

you don't

...
have a version problem so much as you have "old" and "new".

So when

...
ready you send the "upgrade" message to the system and all

new calls

...
to the main functions of a process will be the new version. All currently running code will access the old code until it's

completion,

...
and all new code runs in the new space.

...
You simplified concurrency system also dramatically alters the Smalltalk paradigm.

The current paradigm is fine-grained locked/shared state. In my opinion and the opinion of many (probably most in fact,

outside of the

...
Java community) people who are more expert is this area

then you or I,

...
we *have* to move away from this paradigm.

...
Is this the approach that Cincom is using in their Visual

Works system?

...
...
They seem to not be embracing the notion of native threads.

Thank God. :)

...
However it's also unlikely that they are embracing the notion of only ONE Smalltalk process per image either.

If I understand you correctly, then I would suggest not to use the word "image" as this is confusing. Another way to put it would be "each process has it's own view of the world". And

honestly, what is

...
the problem you see with this?

Right now, if you run two separate images with only one thread or process, then you have two processes that each have their

own set of

...
objects in their own space interacting with each other.

Now we add a way for one image to send a message *between* images. Perhaps the VM can detect when we are trying to do this,

but instead

...
of complicating the default Smalltalk message sending

subsystem, lets

...
make it explicit with some special binary message:

Processes at: 'value computer' ! computeValue.

Now we have the ability to send messages locally within a

process, and

...
a way of freely sending between processes. No locking and the problems associated with locking.

So, now what is stopping us from moving this separate

process *inside

...
the same image*? If you fork a process and he starts

making objects,

...
no other processes have references to those objects. No

shared state

...
issue there. This part could work right now today with no

changes to

...
the VM.

The only issue I can think of are globals, the most obvious being class side variables. Note that even classes themselves are not an issue because without class side variables, they are effect free (well, obviously basicNew would have to be looked at).

But I think this issue is solvable. The VM could take a "copy on write" approach on classes/globals. That is, a class

should be side

...
effect free (to itself, i.e. it's the same after every

call), so let

...
all processes share the memory space where meta-class objects live. But as soon as any process tries to modify the class in some way (literally, it would be the class modifying itself), he

gets his own

...
copy. Processes must not see changes made by other processes, so a modification to a global class is a "local only" change.

I don't think this have a sense. Classes should be treated as regular objects (everything is an object - remember?), and due to that fact you can't make preferences over different kind of objects. Class objects and instance objects are the same: objects holding a mutable state. A method dictionary is a mutable state of class, so changing it should affect all objects of given class immediately.

...
Of course the only big thing left would be; what happens

when we add a

...
new class. But Erlang has had success with the old/new space approach, and what Smalltalk has now is very similar.

And btw, i starting doubt about 'no sharing at all' paradigm. Just in one of the links you given about Erlang, one of the readers commented: (http://armstrongonsoftware.blogspot.com/2006/08/concurrency-i

s-easy.html)

...

Anonymous said... One of your premises is obviuosly wrong:
We don't have shared memory. I have my memory, you have 
yours, we have two brains, one each, they are not joined together.
The premise that the above statement is based on - is 
that we are not internally concurrent. The earlier statements in your blog entry say that we are. The actuality is that all of our individual (i.e. internal) processes are concurrent and share memory. Even including our main conscious "thinking" process.
It is our interaction between individuals that doesn't 
always share memeory and even here we do share memory - many examples (dairies, libaraies, system processes, etc) - in various ways.

There an obvious evidence of a system which performs in parallel by having a single shared state - human brain. Do we experience heavy 'locking' or 'concurrency' problems? Personally, I'm not :) And if so, then why sharing state should be considered evil?

Also, in fact, you can't eliminate using shared state at all. There can be only a ways how to delegate shared state management to outer context (such as OS, or VM e.t.c). A 'fine-grained' locking is nothing more than manual controlling of shared state made by developer. Try implement Erlang (or any other system which uses parallel computations and having a shared state) without locks and semaphores. There are always something needed to manage the locking, even if it hidden from your eyes. So, tell me, what the difference between manually controlling it, or 'magically', like Erlang does?

A proposed a message passing between different OS processes(squeak images) is nothing more than letting OS deal with shared state problems (locks/queing e.t.c).. naively thinking that we are get rid of shared state concurrency. Are we really do?

-- Best regards, Igor Stasenko AKA sig.

Jason Johnson

23 Oct 23 Oct

6:58 p.m.

New subject: What about "Erlanging" the smalltalk interstitial space? (was RE: Multy-core CPUs)

On 10/22/07, Sebastian Sastre ssastre@seaswork.com wrote:

...

    1.4 I'm interpreting the normal thousands of processes Erlang
manages as analogue (so similar in function) to one of this tides and point why we cannot make tides of Smalltalk message sends in an image?

We can. In fact, I think we could do it right now with the existing VM and some packages that have been written already.

...

    3. If I understood well, Erlang's main strenght is not that it has
functions to do things but that it has a message passing technique really great that was designed to take advantage of parallelism in a simple way efficiently and making processes very cheap (ceaper than the OS ones).

Erlang processes are much cheaper then OS processes *and* OS threads. But then, so are Smalltalk's. The only difference is that Erlang processes are encapsulated entities that have no shared memory [1], while Smalltalk's do.

...

    The intersticial space of virtual objects, AKA messages sends, can
be "Erlanged" by making each message send to be in a "process" (the cheap ones) of it's own like Erlang messages between "process"?

Well, keep in mind, Erlang is a functional language and code written in uses functions. Message sends are for communicating between processes.

...

    Same in other words:

    What consequences will affect us if we make a Squeak to use a VM
that to pass messages use processes ala Erlang (that are simple, cheap, efficient and scalable in a world of "cores")?

If you mean every Smalltalk message send is what an Erlang message send is, then the results would be devastating. As I mentioned above, Erlang does it's work with functions. In Smalltalk, the equivalent method of doing work is what Smalltalk calls "messages". In Erlang there is a concept of sending messages between processes, and I would do the same for Smalltalk.

...

    Can this allow us to assimilate in the Smalltalk's objects paradigm
the Erlagn's process paradigm? This is: will this allow us to gain that parallelizing benefits preventing us to change the programing paradigm?

We can, and it would [2]. But I think we should, at least at first, make inter-process message sends very obviously different from inter-object message sends. It would be possible, for example, that objects of type "Process" have a different way of handling messages so that:

(Processes at: 'bank account') addUSD: 5000

Is actually an inter-process send, but I would still want to use the ! syntax, or something equivalent so it's completely obvious that we are doing something different.

Inter-process message sends have their own lookup complexity that I think should be separate from the inter-object message sends we have now. For example, in Erlang if you send a message to a process that happens to be in the same image, a simple reference copy happens (no danger since variables are immutable). The other two cases would be: a different OS/native thread in the same image, and a totally different image (same computer or on the network).

Now if what you're talking about is basically promoting every object to it's own process (using the terms I have described so far), then I haven't really given this much though. This would be a totally different paradigm and area of research (maybe like CORBA?). Though I'm sure someone somewhere has done research on it (or is currently). :)

...

Sebastian Sastre PD: I've tried to imagine if this saves us from having to make code trhead safe or not. I was unable to refutate this by myself so I also ask kindly that the most experienced and critic minds collaborate on this regard.

Us as people who use Smalltalk? Yes I believe it does. It doesn't make it impossible to make a design that has deadlocks, but imo the big win is that these concerns move to design time instead of implementation time.

[1] This is enforced by the language. As far as I know, the processes actually share the same heap, etc.. They just don't know it and can't take advantage of it :)

[2] Well, I believe so anyway, and aim to find out. The issue is that Erlang has it easy: you *can't* share data at a language level in Erlang. In Smalltalk this gets a little tricky, mainly due to one of Smalltalk's greatest strengths: Classes are live objects (with state and so on). I believe this can be overcome, while preserving the Smalltalk semantics, but I can't prove it yet. :)

Jason Johnson

7:01 p.m.

New subject: What about "Erlanging" the smalltalk interstitial space? (was RE: Multy-core CPUs)

On 10/23/07, Jason Johnson jason.johnson.081@gmail.com wrote:

...

...
Sebastian Sastre PD: I've tried to imagine if this saves us from having to make code trhead safe or not. I was unable to refutate this by myself so I also ask kindly that the most experienced and critic minds collaborate on this regard.

Us as people who use Smalltalk? Yes I believe it does. It doesn't make it impossible to make a design that has deadlocks, but imo the big win is that these concerns move to design time instead of implementation time.

Ah, and forgot to mention: If you mean "us" as in people who write the VM, then no. The VM will get more complex to deal with this stuff and have to take steps to ensure message operations are atomic.

Sebastian Sastre

8:43 p.m.

New subject: What about "Erlanging" the smalltalk interstitial space? (wasRE: Multy-core CPUs)

...

-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Jason Johnson Enviado el: Martes, 23 de Octubre de 2007 13:58 Para: The general-purpose Squeak developers list Asunto: Re: What about "Erlanging" the smalltalk interstitial space? (wasRE: Multy-core CPUs)

On 10/22/07, Sebastian Sastre ssastre@seaswork.com wrote:

...
    1.4 I'm interpreting the normal thousands of 
processes Erlang

...
manages as analogue (so similar in function) to one of this

tides and

...
point why we cannot make tides of Smalltalk message sends

in an image?

We can. In fact, I think we could do it right now with the existing VM and some packages that have been written already.

...
    3. If I understood well, Erlang's main strenght is 
not that it

...
has functions to do things but that it has a message

passing technique

...
really great that was designed to take advantage of

parallelism in a

...
simple way efficiently and making processes very cheap

(ceaper than the OS ones).

Erlang processes are much cheaper then OS processes *and* OS threads. But then, so are Smalltalk's. The only difference is that Erlang processes are encapsulated entities that have no shared memory [1], while Smalltalk's do.

...
    The intersticial space of virtual objects, AKA 
messages sends,

...
can be "Erlanged" by making each message send to be in a "process" (the cheap ones) of it's own like Erlang messages between "process"?

Well, keep in mind, Erlang is a functional language and code written in uses functions. Message sends are for communicating between processes.

...
    Same in other words:

    What consequences will affect us if we make a 
Squeak to use a

...
VM that to pass messages use processes ala Erlang (that are simple, cheap, efficient and scalable in a world of "cores")?

If you mean every Smalltalk message send is what an Erlang message send is, then the results would be devastating. As I mentioned above, Erlang does it's work with functions. In Smalltalk, the equivalent method of doing work is what Smalltalk calls "messages". In Erlang there is a concept of sending messages between processes, and I would do the same for Smalltalk.

Erlang does not have objects so I don't think that one paradigm can map 1:1 the other in both sides. Trying to compare literally will be noisy as minimum. That's why I'm using, as borgs do :), the word "assimilate" meant to be parsed as "to take from it's conceptual essence it's virtues and discarding it's vices".

...

...
    Can this allow us to assimilate in the Smalltalk's objects 
paradigm the Erlagn's process paradigm? This is: will this
allow us to

...
gain that parallelizing benefits preventing us to change

the programing paradigm?

We can, and it would [2]. But I think we should, at least at first, make inter-process message sends very obviously different from inter-object message sends. It would be possible, for example, that objects of type "Process" have a different way of handling messages so

But that will introduce a singularity in the paradigm. I'm afraid that accept that is too much. Can you find a way of archieve the goal of your proposal without devastating the "all is an object" premise?

...

that:

(Processes at: 'bank account') addUSD: 5000

Is actually an inter-process send, but I would still want to use the ! syntax, or something equivalent so it's completely obvious that we are doing something different.

And accepting singularities like that is how a language gets it's syntax polluted and developers has to compensate that uncompletism by having to remember (and model) in it's brains N more rules. The worst of course is not the syntax but damaging the paradigm. That's is accepting the policy of unloading of work the machines to load humans. As I see things humans are not here for that and machines are not here for that. Dear Jason, I'm in the "opposite corner of the ring" for that policy.

...

Inter-process message sends have their own lookup complexity that I think should be separate from the inter-object message sends we have now. For example, in Erlang if you send a message to a process that happens to be in the same image, a simple reference copy happens (no danger since variables are immutable). The other two cases would be: a different OS/native thread in the same image, and a totally different image (same computer or on the network).

Now if what you're talking about is basically promoting every object to it's own process (using the terms I have described so far), then I haven't really given this much though. This would be a totally different paradigm and area of research (maybe like CORBA?). Though I'm sure someone somewhere has done research on it (or is currently). :)

Mmmm no. I mean that every message send should have a process ala Erlang. Of course this will only optimize in the other cores the messages sends that are parallelizable (discern on which is a question that deserves cogitation). Maybe is just a modest improvement to take advantage of multicore but it never has any intention to disrupt the paradigm.

...

...
Sebastian Sastre PD: I've tried to imagine if this saves us from having to make code trhead safe or not. I was unable to refutate this by myself

so I also

...
ask kindly that the most experienced and critic minds

collaborate on this regard.

Us as people who use Smalltalk? Yes I believe it does. It doesn't make it impossible to make a design that has deadlocks, but imo the big win is that these concerns move to design time instead of implementation time.

[1] This is enforced by the language. As far as I know, the processes actually share the same heap, etc.. They just don't know it and can't take advantage of it :)

[2] Well, I believe so anyway, and aim to find out. The issue is that Erlang has it easy: you *can't* share data at a language level in Erlang. In Smalltalk this gets a little tricky, mainly due to one of Smalltalk's greatest strengths: Classes are live objects (with state and so on). I believe this can be overcome, while preserving the Smalltalk semantics, but I can't prove it yet. :)

Well.. To be honest I interpret belief as being the user of a system of thought and of course diferent from a fact or a model that has enough proofs of concept to deserve investment (time, efforts, $$, energy, etc). But I think I see your point. I also think that there is no solution without tradeoffs and I'm not willing to disrupt the paradigm. To gain my willing (probably others) show a more complete model that works first ;)

All the best,

Sebastian Sastre

Jason Johnson

25 Oct 25 Oct

6:27 p.m.

New subject: What about "Erlanging" the smalltalk interstitial space? (wasRE: Multy-core CPUs)

On 10/23/07, Sebastian Sastre ssastre@seaswork.com wrote:

...

But that will introduce a singularity in the paradigm. I'm afraid that accept that is too much. Can you find a way of archieve the goal of your proposal without devastating the "all is an object" premise?

Where is it broken?

...

And accepting singularities like that is how a language gets it's syntax polluted and developers has to compensate that uncompletism by having to remember (and model) in it's brains N more rules. The worst of course is not the syntax but damaging the paradigm. That's is accepting the policy of unloading of work the machines to load humans. As I see things humans are not here for that and machines are not here for that. Dear Jason, I'm in the "opposite corner of the ring" for that policy.

Huh? I'm not talking about adding new syntax, I'm talking about using the (afaik) unused ! binary operator for sending messages.

...

Mmmm no. I mean that every message send should have a process ala Erlang. Of course this will only optimize in the other cores the messages sends that are parallelizable (discern on which is a question that deserves cogitation). Maybe is just a modest improvement to take advantage of multicore but it never has any intention to disrupt the paradigm.

Your ideas are interesting, but I'm a quite incremental builder. Add one little thing after the other and see how far we get.

Peter William Lount

23 Oct 23 Oct

1:44 p.m.

Jason Johnson wrote:

...

On 10/21/07, Peter William Lount peter@smalltalk.org wrote:

...
tim Rowledge wrote:

Ok, so if you really are talking about a "strict" Erlang style model with ONE Smalltalk process per "image" space (whether or not they are in one protected memory space or many protected memory spaces) where objects are not shared with any other threads except by copying them over the "serialization wire" or by "reference" then I get what you are talking about.

That is a strange way of putting it.

Why? That is what Erlang achieves via it's total encapsulation of state that is only transferred by message passing to and back from a process. To achieve the same thing in Smalltalk you'd need to isolate the component objects running in an "image" object space with the process otherwise you'd be breaking the encapsulation that provides the protection against a large number of class es of concurrency problems.

The principle is that anytime you have more than one thread or process working on the same memory space, or object space, you WILL have concurrency issues (unless your code is just running very simple concurrency). The point is that in order to implement your utopia-vision-of-simple-problem-free-concurrency (utopia-concurrencia for lack of a better name) in Smalltalk you MUST isolate the objects to ONLY ONE thread of possible alteration of their state otherwise you end up with the possibility of many classes of concurrency problems. Shared memory problems exist even within one protected memory space and not just between them. To isolate the objects involved in a process you can have a separate object space which contains the objects that will be operated on. This is the Erlang way, isn't it? The thing about Erlang, unless I'm mistaken (and if I am mistaken I'd expect to be corrected), is that the objects in a process are only visible to that process until the results are returned. The objects that pass in and out of an Erlang process are only primitive data types and not complex objects. However for Smalltalk you'd need to pass in complex object graphs of arbitrary size and connectedness to be general purpose. This then results in a version problem.

For example, lets say that you have a graph of one million objects that is highly connected and you want to perform not just a simple read operation on it but a massive number of edits which would result in the graph growing by 50% and the number of connections growing by 70%. For speed you decide to implement the algorithms so that they can run in parallel upon this moderately large graph of objects. Lets say that you have enough compute and memory resources to split this into 10,000 processes. Now you have the problem of sharing the one million objects with the 10,000 processes. That's a lot of data to move around just to get things started assuming that you packaged up the whole mess into a serial blob and spit it at the various processes. A lot of redundant data. Ok, maybe it's better to do this in small chunks, after all incrementalism is a powerful technique. For this approach you send each of the 10,000 processes a starting node plus a "search pattern" and the type of edits it will perform upon the graph along with the actual edits as they flow in from another source. So now you have 10,000 processes each vying to traverse the one million node graph scanning for patterns and applying edits as they find what they are looking for. Some of these processes will then update the "shared graph". Oh. What happens when two processes both update the same node in this graph but in different ways? Let's say one edit in one process adds a connection while the other edit in the other node modifies an instance variable on that node? Let's say that these two edits occur at the same time and are mutually exclusive - that is both edits would break the object's own internal consistency rules. So now you have two edits that either must both fail, or one must succeed while the other fails or the other must succeed - both can't succeed. Now you've got a problem that the magical erlang message passing won't solve.

If it does what is the erlang solution to this million node parallel editing problem?

Now someone mentioned Software Transactional Memory (STM) so briefly that it would be easy to miss. Is that your solution? If so you still have other concurrency issues, object versioning issues, plus more to deal with. No solution is a panacea for all problems unless you are an advocate of silver bullet solutions.

The problem of editing a large graph of objects with many parallel threads is the generalized case of a nasty and complex set of concurrency and transactional issues. There are many ways to solve this. If you reply to this example I would hope that you do so fully explaining how you'd handle the concurrency and - importantly - the object consistency issues.

...

The fact is, Erlang has many processes per image.

Yes, I understand that early tests indicate that Erlang can handle approximately 100,000 or so processes at a time without hickups while Java can handle about 8,000 or so before blowing up. I don't know what the various Smalltalks can handle, but I doubt it's as high as Erlang and is more likely less than even Java - just a guess though. Maybe someone has worked it out.

...

Many more then you could ever get as real processes or native threads (as a test I made a little program that spawned 64 *thousand* threads and passed messages between them on my laptop).

That's only because the current crop of operating systems were designed and envisioned when a few hundred processes and threads was considered a lot. Also because native operating system processes take a lot of resources.

...

But with their model, process creation is extremely cheap. And since there is no sharing as far as the language is concerned, there is no need for locking to slow everything down.

Yes, and how would the no sharing be implemented in Smalltalk?

How would you solve the concurrency one million node editing problem above without locking in your utopian threading implementation?

...

Smalltalk can do this too. I think it needs a little work still, but I'm optimistic about what can be done here.

What would you do to Smalltalk to make it do this. So far you and the others have been very short on specifics and have just argued that something magical can be done to make concurrency happen without locks. A few papers and web sites have been linked to but no one has written down what they are proposing or what they mean past it can be done.

I'll grant you that you can see that it can be done. Please illuminate what it is that you see can be done in detail and how you might do it. Thanks.

...

...
However, you'll still end up with concurrency control issues and you've got an object version explosion problem occurring as well. How will you control concurrency problems with your simplified system? Is there a succinct description of the way that Erlang does it? Would that apply to Smalltalk?

Much like how Smalltalk does it, as it turns out. That is, you don't have a version problem so much as you have "old" and "new". So when ready you send the "upgrade" message to the system and all new calls to the main functions of a process will be the new version. All currently running code will access the old code until it's completion, and all new code runs in the new space.

Ok, so there would be 10,000 separate process-object-spaces with the one million nodes being edited and new nodes being created in each of these 10,000 separate spaces. How do you expect to "merge" the results and solve the edits that will inevitably cause "logical data inconsistency" collisions?

...

...
You simplified concurrency system also dramatically alters the Smalltalk paradigm.

The current paradigm is fine-grained locked/shared state.

So?

...

In my opinion and the opinion of many (probably most in fact, outside of the Java community) people who are more expert is this area then you or I, we *have* to move away from this paradigm.

Why? Please provide more than anticidal or belief driven comments for this point of view. What are the reasons? What is it that you'd be moving towards?

...

...
Is this the approach that Cincom is using in their Visual Works system? They seem to not be embracing the notion of native threads.

Thank God. :)

It's a huge mistake on their part in my humble view.

While it may be easy from the point of view of adapting their image it's a huge mistake. I've had many people comment that that's one of the reasons that Java is better than Smalltalk - it already works with multiple cpu cores. Yes they have to solve the concurrency problems, but those are NO WORSE than the concurrency problems that already exist within Smalltalk when running with a single native process and multiple (green threads aka) Smalltalk Processes. No different. Do you actually get that? If you don't then you fail to appreciate that the approach that Cincom is taking isn't going to solve the concurrency problems since - unless they correct me on this - it seems that their direction is to simply have N-instances of their image (in the same memory space or in separate operating system processes) where N would frequently be the same as the number of cores on the computer (or server) in question (although the instances could be more or less as needed). Each individual image would still have the problems of multi-threading within it IF AND ONLY IF there are multiple threads forked. Then you have all the same concurrency problems that happen with multiple threads on objects in one memory space. Sure this is a simpler approach for them as they don't have to completely toss their current virtual machine design - they can hack it by simply using one image space per native processor or per native operating system process. Then all they need is a cheap and dirty distributed object transport system to move object graphs (complete or partial) around between the various images. This will work for them and ALL Smalltalk systems including Squeak. In fact this can work now essentially with unmodified Smalltalk systems - all that's reallly needed is the distributed objects framework and there are a few of those kicking around.

This is of course a far cry from the radical concurrency system that is being proposed by the erlangization concurrency proponents.

...

...
However it's also unlikely that they are embracing the notion of only ONE Smalltalk process per image either.

If I understand you correctly, then I would suggest not to use the word "image" as this is confusing. Another way to put it would be "each process has it's own view of the world". And honestly, what is the problem you see with this?

Ok. How will you implement that?

...

Right now, if you run two separate images with only one thread or process, then you have two processes that each have their own set of objects in their own space interacting with each other.

Yes, exactly. This is the illusion that Erlang provides. This can also be achieved now with ANY Smalltalk version just by starting multiple images - one for each core if you want to map them that way as may be "natural" to want to do.

...

Now we add a way for one image to send a message *between* images.

Yes. That can be done now.

...

Perhaps the VM can detect when we are trying to do this, but instead of complicating the default Smalltalk message sending subsystem, lets make it explicit with some special binary message:

Processes at: 'value computer' ! computeValue.

There isn't any need for new syntax with the "!" character. Now sure you're using it with a binary message selector "!" but why obfuscate it. I'd recommend using a keyword selector for better clarity. Thanks.

...

Now we have the ability to send messages locally within a process, and a way of freely sending between processes. No locking and the problems associated with locking.

Not so. You'd have to transmit - in my example above - one million objects to the various images and have them compute and return their resutls which would then have to be combined in a manner that leaves the graph of objects in a consistent state with one and a half million objects and 70% more interconnections between them. It is this parallel updating of many parts of the same data graph that will require the concurrency controls.

...

So, now what is stopping us from moving this separate process *inside the same image*?

Nothing but you've got to address the concurrency problem that I've mentioned above.

...

If you fork a process and he starts making objects, no other processes have references to those objects. No shared state issue there. This part could work right now today with no changes to the VM.

Are you talking about forking a new operating system process with a copy of the image? The "copied" objects or the objects that were in the "image" to begin with are "duplicates" (or N-plicates really) which is a real headache if they get modified in multiple images and need to be "recombined" into one real persistent state.

These are object database problems and attempting to split the processing into multiple threads to avoid the "locking" issues does not solve the problem. It just pushes it further away. While it might work for some applications like telephone switching systems it can't generalize to ALL types of problems which could benefit from concurrency solutions. That's wishful thinking and a pipe dream otherwise known as a silver bullet.

...

The only issue I can think of are globals,

All Object Databases have a couple of rooted objects. Maybe many more than a couple.

...

the most obvious being class side variables. Note that even classes themselves are not an issue because without class side variables, they are effect free (well, obviously basicNew would have to be looked at).

I'm not sure what you mean.

...

But I think this issue is solvable. The VM could take a "copy on write" approach on classes/globals. That is, a class should be side effect free (to itself, i.e. it's the same after every call), so let all processes share the memory space where meta-class objects live. But as soon as any process tries to modify the class in some way (literally, it would be the class modifying itself), he gets his own copy. Processes must not see changes made by other processes, so a modification to a global class is a "local only" change.

Yes, a variant of the Software Transactional Memory. However, you still have the problems mentioned above.

...

Of course the only big thing left would be; what happens when we add a new class. But Erlang has had success with the old/new space approach, and what Smalltalk has now is very similar.

Having two spaces, old and new space, won't solve the problems mentioned above when you have N processes (threads) running on M-objects in parallel and need to combine the results of the parallel computations.

Many problems have this "split processes off with their chunk of data" and "recombine" the results. Many of these problems are simplified - if possible - so that the results can't collide with the issues presented above. However, we are not talking about those special cases - such as parallel ray tracing algorithms. We are talking about the completely generic cases that occur in general purpose and every day use of code in Smalltalk applications - such as the massive Smalltalk business database front end applications which are typical at many corporations today and which utilize many threads to accomplish their parallel tasks in order to speed up the user experience. A real world consequence of this is increased productivity of thousands of users day in and day out at these corporations.

Maybe your applications aren't a complex as these but I don't see the benefits of an Erlang ONLY approach. I do see the benefit of STM and Erlang approaches in some cases but why intentionally limit the tool box to just a few cases? It makes no sense to ignore the harsh reality of concurrency issues by picking a limited set of solutions.

All the best,

Peter William Lount Peter@smalltalk.org

Wolfgang Eder

3:27 p.m.

New subject: Multy-core CPUs, ERLANG

Peter William Lount wrote:

...

Jason Johnson wrote:

...
On 10/21/07, Peter William Lount peter@smalltalk.org wrote:

...
tim Rowledge wrote:

Ok, so if you really are talking about a "strict" Erlang style model with ONE Smalltalk process per "image" space (whether or not they are in one protected memory space or many protected memory spaces) where objects are not shared with any other threads except by copying them over the "serialization wire" or by "reference" then I get what you are talking about.

That is a strange way of putting it.

Why? That is what Erlang achieves via it's total encapsulation of state that is only transferred by message passing to and back from a process. To achieve the same thing in Smalltalk you'd need to isolate the component objects running in an "image" object space with the process otherwise you'd be breaking the encapsulation that provides the protection against a large number of class es of concurrency problems.

[more stuff snipped]

Hello all, I think that Erlang does have mechanisms to share stuff between processes. First, the code is shared. When I update a module, all processes using the code of the module will (eventually) switch to the new version. And then there is the Mnesia database and its parts that can be used to share data between processes.

And, slightly off topic probably: One thing that strikes me as remarkable about the Erlang system is that, since there is non-destructive assignment, you cannot have cycles in your object graphs. I think this simplifies the GC tremendously. But I can think of no way of doing something similar with Smalltalk objects, unfortunately.

Cheers, Wolfgang

Peter William Lount

4:52 p.m.

New subject: Multy-core CPUs, ERLANG

Wolfgang Eder wrote:

...

[more stuff snipped]

Hello all, I think that Erlang does have mechanisms to share stuff between processes. First, the code is shared. When I update a module, all processes using the code of the module will (eventually) switch to the new version. And then there is the Mnesia database and its parts that can be used to share data between processes.

And, slightly off topic probably: One thing that strikes me as remarkable about the Erlang system is that, since there is non-destructive assignment, you cannot have cycles in your object graphs. I think this simplifies the GC tremendously. But I can think of no way of doing something similar with Smalltalk objects, unfortunately.

Cheers, Wolfgang

Hi,

That's interesting. Thus Erlang DOES IN FACT HAVE SHARED MEMORY between processes: for code and for data. I'd like to learn more about that. Could anyone provide more details?

One proposal was a "copy-on-write" object space model where objects that are about to be written to in a Smalltalk process would be copied to that processes private object space - in effect that processes view of the "image".

To implement a copy-on-write technique would require operating system support for the typical modern mainstream operating system. To implement copy-on-write requires a synchronization primitive to be used by the operating system - if I'm not mistaken - at least for a few instructions while the page tables are updated - a critical section.

To implement copy-on-write requires a language to have an ability to go beyond the Erlang style of concurrency capabilities.

One of the crucial aspects that Alan Kay (and others) have promoted over and over again is the ability of a language to be expressed in itself. This has a certain beauty to it as well as a mathematical aesthetic that has important ramifications that go way beyond those characteristics. To have a "mobius" system that can rewrite itself while retaining functioning versions across a continuous evolutionary path one requires a system that can be expressed in itself. Alan Kay points to a page in the Lisp Manual where Lisp is implemented in itself. Since Smalltalk is supposed to be a general purpose programming language it is crucial that it have this aspect of being able to implement itself with itself. So far Squeak comes close to this - at least with respect to the virtual machine which is written in the slang subset of Smalltalk. Unfortunately Squeak relies upon manually written C files for binding with the various operating systems. Co-existence with C based technology has it's price and it's high in that it blocks access to the entire system from within the system; by being blocked one is prevented from online interactive exploration and experimentation that we are used to at the Smalltalk source code level. At least this is being addressed in the amazing work of Ian Piumarta (http://piumarta.com/pepsi/pepsi.html) and the incredible work of LLVM (http://llvm.org). In fact I highly recommend that Squeak move from it's current obsolete C compilers to make use of either of these two projects as the bottom of the VM. Apple is funding LLVM and Ian's work seems to be part of the work of Alan Kay's Viewpoints Research Institute (http://www.vpri.org).

The "non-destructive" assignment aspect of Erlang is typical of non-write-in-place functional and object database systems. It's a key aspect of the ZokuScript Object Database Management System and Technologies. However it's not a panacea that the silver bullet utopians think it is. As with any other solution matrix it has it's benefits, payoffs, minuses and costs. These need to be balanced for every application. As Wolfgang points out there are issues with it such as the "cycle" problem that need to be overcome via implementation exceptions.

The other issue is how fine to you cut the objects? At what point do you say enough is enough? That is at what point does a process say oh, I don't really have control of changes to the object in question... as that object is private to another object space. Thus control needs to be passed to a process in the other object space likely on another compute node. For example corporate security constraints may require that certain data remain on the server while only permitting some data to be shared with a laptop node running remotely.

It's important to consider the wider issues involved in distributed systems that are to be deployed in the real world. For Smalltalk to evolve we must get really serious about these issues ahead of the curve that others are pursuing now.

It's shocking that systems like Flash MX's Javascript compatible language has a few features that are more advanced than Smalltalk. It's shocking that Flash is so popular even though the language also has serious flaws - for example in it's handling of exceptions.

One of the tremendous strengths of Smalltalk is shared with Unix systems. If you visit the Smalltalk versions page at Smalltalk.org (http://Smalltalk.org/versions) you'll a great many versions of Smalltalk. In fact the page isn't complete as there are older historical versions of Smalltalk that are missing as well as a slate (no pun intended) of Smalltalk and Smalltalk like languages that are missing from the roster listed there. Smalltalk shares this proliferation aspect with Unix. Count the Unix variants and it's in the hundreds if not approaching thousands of distributions that have been or that are available now. Linux alone has hundreds of variants.

Compare this variety with Java and Microsoft. They are stagnant with just one thread of evolution. Smalltalk and Unix are undergoing a much wider range of co-evolutionary development much of which is parallel and much of which is divergent. Both aspects are important.

Divergence is important for strong vendors so that they can distinguish their products and meet the needs of their set of vertical markets.

Parallel co-evolution, cross pollination and open sharing of code via libraries and the ANSI Standard for Smalltalk (new version in the works - please contribute) is important for the language as a unified entity.

Parallelism is one of the low level aspects that needs to be shared openly between the vendors for such features to become "standard" features. Otherwise parallelism across the vendors products will become or remain hodgepodge (as it is now).

The same goes for the Graphical User Interface but that's an entirely different conversation.

The basic point is that for a language to be expressible in itself it means that ALL the computer science techniques used to implement the language must be expressible in the language. It goes beyond this self referential definition since the language must also be able to express ANY computer science technique that is needed for the full range of systems that will be implemented in it. To do less is to create a language that is less than capable.

With the advances in static compiler just in time technologies (LLVM, Code-Pepsi, etc...) that can co-exist with the C universe it's possible for Smalltalk to become a full fledged systems language again as it once was. To limit the language and prevent this from happening will create a version of Smalltalk that simply only addresses the needs of a small segment of the market.

Concurrency control issues are a very important aspect of any general purpose programming language. To limit the solution space to a tiny corner of solutions would be a mistake by design.

Certainly making concurrency easier and fool proof is a laudable goal. However the cost might be too high a price if it's not done well or if it alters the language beyond it's current shape.

One of the reasons that I'm implementing a new language, ZokuScript, is that it does change the paradigm beyond that of Smalltalk. Keeping connected with Smalltalk is done via ZokuTalk. However the execution engine (not a virtual machine) will translate ZokuTalk (i.e. Smalltalk) into ZokuScript and then compile it to native code. ZokuTalk and Smalltalk are subsets of ZokuScript which is a fusion of many ideas and concepts from other languages and - above all else - application requirements.

The erlangification (erlangization, or erlangisation) of Smalltalk may be a radical enough transformation that it's no longer Smalltalk. If that's the way of Squeak that's fine however it seems that a fork is likely the result (and yes, the pun of forking was intended).

Since the driver is the requirements and not just technology awe what are the requirements for concurrency in Squeak and in Smalltalk (since Squeak is diverging from Smalltalk more and more)?

Inventing the future is fun and hard work. Which future are you inventing?

All the best,

Peter William Lount

5:08 p.m.

New subject: Multy-core CPUs, ERLANG

Hi,

Continued.

Of course one could also implement a copy-on-write-bit for objects in the "read-only-shared-top-level-object-space-of-the-image". In order to accomplish any work a process must be forked! Also, this way any process that forks off will need to copy all of the objects it modifies into it's own private object-space until the process commits it's changes into the top level object-space or until it aborts. That's assuming a Software Transactional Memory scheme is added to Smalltalk.

Actually this idea is quite appealing if done right.

Of course there are a host of other awesomely complex problems implied by the above that a simple concurrency model will NOT solve.

Concurrency isn't like automatic garbage collection - which is actually quite broad and complex a field - at all. The sets of problems with concurrent systems are way more complex. This is especially the case when you bring distribution beyond a single compute node into the fold and especially when other issues such as distributed garbage collection are required. Welcome to the complex world of tomorrow today.

What do the actual vendor's staff who write the virtual machines think about this?

All the best,

Peter William Lount

Sebastian Sastre

10:13 p.m.

New subject: Multy-core CPUs, ERLANG

I don't have access to it but maybe someone has this paper?: http://portal.acm.org/citation.cfm?id=38844&coll=portal&dl=ACM

I wonder if there is some light there cheers,

Sebastian Sastre

...

-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Peter William Lount Enviado el: Martes, 23 de Octubre de 2007 12:09 Para: The general-purpose Squeak developers list Asunto: Re: Multy-core CPUs, ERLANG

Hi,

Continued.

Of course one could also implement a copy-on-write-bit for objects in the "read-only-shared-top-level-object-space-of-the-image". In order to accomplish any work a process must be forked! Also, this way any process that forks off will need to copy all of the objects it modifies into it's own private object-space until the process commits it's changes into the top level object-space or until it aborts. That's assuming a Software Transactional Memory scheme is added to Smalltalk.

Actually this idea is quite appealing if done right.

Of course there are a host of other awesomely complex problems implied by the above that a simple concurrency model will NOT solve.

Concurrency isn't like automatic garbage collection - which is actually quite broad and complex a field - at all. The sets of problems with concurrent systems are way more complex. This is especially the case when you bring distribution beyond a single compute node into the fold and especially when other issues such as distributed garbage collection are required. Welcome to the complex world of tomorrow today.

What do the actual vendor's staff who write the virtual machines think about this?

All the best,

Peter William Lount

Jason Johnson

24 Oct 24 Oct

9:15 a.m.

New subject: Multy-core CPUs, ERLANG

On 10/23/07, Peter William Lount peter@smalltalk.org wrote:

...

Of course one could also implement a copy-on-write-bit for objects in the "read-only-shared-top-level-object-space-of-the-image". In order to accomplish any work a process must be forked! Also, this way any process that forks off will need to copy all of the objects it modifies into it's own private object-space until the process commits it's changes into the top level object-space or until it aborts.

Once again I have no idea what you're talking about. I guess you're not responding to me with this, since the system I'm talking about would not commit any changes back to a top level process.

...

Concurrency isn't like automatic garbage collection - which is actually quite broad and complex a field - at all.

*sigh*. Ok, if you're going to respond to things I say, please read what I write. Speed reading obviously isn't working. I said message passing is *ANALOGOUS*.

analogous

adjective 1. similar or equivalent in some respects though otherwise dissimilar; "brains and computers are often considered analogous"; "salmon roe is marketed as analogous to caviar"

Manual memory management is hard to do and does not scale or compose well as explained in the email I originally linked to.

Shared state fine grained locking is hard to do and does not scale or compose well as explained in the email I originally linked to.

Frank Shearar

2:16 p.m.

New subject: Multy-core CPUs, ERLANG

"Jason Johnson" jason.johnson.081@gmail.com wrote:

...

On 10/23/07, Peter William Lount peter@smalltalk.org wrote:

<snip>

...

...
Concurrency isn't like automatic garbage collection - which is actually quite broad and complex a field - at all.

*sigh*. Ok, if you're going to respond to things I say, please read what I write. Speed reading obviously isn't working. I said message passing is *ANALOGOUS*.

Interestingly enough, there was a paper at OOPSLA titled "The transactional memory/garbage collection analogy":

http://portal.acm.org/citation.cfm?id=1297080&jmp=abstract&coll=port...

(The URL requires an ACM subscription to read.)

The abstract reads:

This essay presents remarkable similarities between transactional memory and garbage collection. The connections are fascinating in their own right, and they let us better understand one technology by thinking about the corresponding issues for the other.

frank

Jon Hylands

2:19 p.m.

New subject: Multy-core CPUs, ERLANG

On Wed, 24 Oct 2007 14:16:38 +0200, "Frank Shearar" frank.shearar@angband.za.org wrote:

...

Interestingly enough, there was a paper at OOPSLA titled "The transactional memory/garbage collection analogy":

Here's the paper:

http://www.cs.washington.edu/homes/djg/papers/analogy_oopsla07.pdf

(Thanks to Google Scholar)

Later, Jon

-------------------------------------------------------------- Jon Hylands Jon@huv.com http://www.huv.com/jon

Project: Micro Raptor (Small Biped Velociraptor Robot) http://www.huv.com/blog

Peter William Lount

25 Oct 25 Oct

2:10 a.m.

New subject: Multy-core CPUs, ERLANG

Jason Johnson wrote:

...

On 10/23/07, Peter William Lount peter@smalltalk.org wrote:

...
Of course one could also implement a copy-on-write-bit for objects in the "read-only-shared-top-level-object-space-of-the-image". In order to accomplish any work a process must be forked! Also, this way any process that forks off will need to copy all of the objects it modifies into it's own private object-space until the process commits it's changes into the top level object-space or until it aborts.

Once again I have no idea what you're talking about. I guess you're not responding to me with this, since the system I'm talking about would not commit any changes back to a top level process.

...
Concurrency isn't like automatic garbage collection - which is actually quite broad and complex a field - at all.

*sigh*. Ok, if you're going to respond to things I say, please read what I write. Speed reading obviously isn't working. I said message passing is *ANALOGOUS*.

analogous

adjective

similar or equivalent in some respects though otherwise

dissimilar; "brains and computers are often considered analogous"; "salmon roe is marketed as analogous to caviar"

Manual memory management is hard to do and does not scale or compose well as explained in the email I originally linked to.

Shared state fine grained locking is hard to do and does not scale or compose well as explained in the email I originally linked to.

Hi,

Yes I read what you said. I simply don't think they are analogous.

Certainly the parallels that you see between them are not clear from your analogy since this reader didn't get it.

Many things don't scale well, or don't compose well in computer science. It doesn't mean that they are all analogous.

Now I've not yet had a chance to read the PDF pointed to by Jon Hylands but it seems to me that they are more dissimilar than similar.

Peter

Sebastian Sastre

23 Oct 23 Oct

7:46 p.m.

New subject: Multy-core CPUs, ERLANG

Hi there,

I think that Peter posts are very pragmatic and edutative and are making this discussion just richer.

Erlangization (and friends) of smalltalk message sends is, I'm afraid, literally not possible. I'm afraid that Erlang simplifies reality too much to archieve it's goal by abusing of the fact that it's paradigm is based on data instead of arbitrary object graphs plus a use of processes and messages ala object paradigm. That way it archieves management of about 100k processes (have in mind that for example right now the squeak image I'm developing on has.. about 754794 subinstances of protoobject).

But in this discussion we are exploring, and maybe defining bases of, what could be a better cocktail of technologic techniques that brings to smalltalk such an efficient use of the multi-core incoming hardware. This should be a pacific point for us all.

Now.. Peter, you said in a previous post that implementing a smalltalk that does not share is not possible. But then you said, if I understood you right, that if we found a solution to "fine cuting objects" and a little of transactional memory we can open the door to an appealing solution space that solves, for instance, your million object model graph.

I'm curious about what problems are left ouside with a concurrency solution space like that.

In fact even in such a new field I think that an enumeration of the solution spaces with it's requisites (promoted changes), pros and cons is necessary to help us all to order ideas (to keep it as candidates or discarded).

The community should judge about what space will be the one that solves the most valuable solution space (brings the most promisory solutions to the most frequent problems of this community). I also think we need to give more time for community tides to cook a good solution/s in this world of increasing complexity.

cheers,

Sebastian Sastre PS: remember that complexity was allways out there. The problem is just that we are trying to make something useful with less rudimentary models of it using atumatons as adobe.

...

-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Peter William Lount Enviado el: Martes, 23 de Octubre de 2007 11:52 Para: The general-purpose Squeak developers list Asunto: Re: Multy-core CPUs, ERLANG

Wolfgang Eder wrote:

...
[more stuff snipped]

Hello all, I think that Erlang does have mechanisms to share stuff between processes. First, the code is shared. When I update a module, all processes using the code of the module will (eventually) switch to the new version. And then there is the Mnesia database and its parts that

can be used

...
to share data between processes.

And, slightly off topic probably: One thing that strikes me as remarkable about the Erlang system is that, since there is non-destructive assignment, you cannot have cycles in your object graphs. I think this simplifies the GC tremendously. But I can think of no way of doing something similar with Smalltalk objects, unfortunately.

Cheers, Wolfgang

Hi,

That's interesting. Thus Erlang DOES IN FACT HAVE SHARED MEMORY between processes: for code and for data. I'd like to learn more about that. Could anyone provide more details?

One proposal was a "copy-on-write" object space model where objects that are about to be written to in a Smalltalk process would be copied to that processes private object space - in effect that processes view of the "image".

To implement a copy-on-write technique would require operating system support for the typical modern mainstream operating system. To implement copy-on-write requires a synchronization primitive to be used by the operating system

if I'm not mistaken - at least for a few instructions while

the page tables are updated - a critical section.

To implement copy-on-write requires a language to have an ability to go beyond the Erlang style of concurrency capabilities.

One of the crucial aspects that Alan Kay (and others) have promoted over and over again is the ability of a language to be expressed in itself. This has a certain beauty to it as well as a mathematical aesthetic that has important ramifications that go way beyond those characteristics. To have a "mobius" system that can rewrite itself while retaining functioning versions across a continuous evolutionary path one requires a system that can be expressed in itself. Alan Kay points to a page in the Lisp Manual where Lisp is implemented in itself. Since Smalltalk is supposed to be a general purpose programming language it is crucial that it have this aspect of being able to implement itself with itself. So far Squeak comes close to this - at least with respect to the virtual machine which is written in the slang subset of Smalltalk. Unfortunately Squeak relies upon manually written C files for binding with the various operating systems. Co-existence with C based technology has it's price and it's high in that it blocks access to the entire system from within the system; by being blocked one is prevented from online interactive exploration and experimentation that we are used to at the Smalltalk source code level. At least this is being addressed in the amazing work of Ian Piumarta (http://piumarta.com/pepsi/pepsi.html) and the incredible work of LLVM (http://llvm.org). In fact I highly recommend that Squeak move from it's current obsolete C compilers to make use of either of these two projects as the bottom of the VM. Apple is funding LLVM and Ian's work seems to be part of the work of Alan Kay's Viewpoints Research Institute (http://www.vpri.org).

The "non-destructive" assignment aspect of Erlang is typical of non-write-in-place functional and object database systems. It's a key aspect of the ZokuScript Object Database Management System and Technologies. However it's not a panacea that the silver bullet utopians think it is. As with any other solution matrix it has it's benefits, payoffs, minuses and costs. These need to be balanced for every application. As Wolfgang points out there are issues with it such as the "cycle" problem that need to be overcome via implementation exceptions.

The other issue is how fine to you cut the objects? At what point do you say enough is enough? That is at what point does a process say oh, I don't really have control of changes to the object in question... as that object is private to another object space. Thus control needs to be passed to a process in the other object space likely on another compute node. For example corporate security constraints may require that certain data remain on the server while only permitting some data to be shared with a laptop node running remotely.

It's important to consider the wider issues involved in distributed systems that are to be deployed in the real world. For Smalltalk to evolve we must get really serious about these issues ahead of the curve that others are pursuing now.

It's shocking that systems like Flash MX's Javascript compatible language has a few features that are more advanced than Smalltalk. It's shocking that Flash is so popular even though the language also has serious flaws - for example in it's handling of exceptions.

One of the tremendous strengths of Smalltalk is shared with Unix systems. If you visit the Smalltalk versions page at Smalltalk.org (http://Smalltalk.org/versions) you'll a great many versions of Smalltalk. In fact the page isn't complete as there are older historical versions of Smalltalk that are missing as well as a slate (no pun intended) of Smalltalk and Smalltalk like languages that are missing from the roster listed there. Smalltalk shares this proliferation aspect with Unix. Count the Unix variants and it's in the hundreds if not approaching thousands of distributions that have been or that are available now. Linux alone has hundreds of variants.

Compare this variety with Java and Microsoft. They are stagnant with just one thread of evolution. Smalltalk and Unix are undergoing a much wider range of co-evolutionary development much of which is parallel and much of which is divergent. Both aspects are important.

Divergence is important for strong vendors so that they can distinguish their products and meet the needs of their set of vertical markets.

Parallel co-evolution, cross pollination and open sharing of code via libraries and the ANSI Standard for Smalltalk (new version in the works

please contribute) is important for the language as a

unified entity.

Parallelism is one of the low level aspects that needs to be shared openly between the vendors for such features to become "standard" features. Otherwise parallelism across the vendors products will become or remain hodgepodge (as it is now).

The same goes for the Graphical User Interface but that's an entirely different conversation.

The basic point is that for a language to be expressible in itself it means that ALL the computer science techniques used to implement the language must be expressible in the language. It goes beyond this self referential definition since the language must also be able to express ANY computer science technique that is needed for the full range of systems that will be implemented in it. To do less is to create a language that is less than capable.

With the advances in static compiler just in time technologies (LLVM, Code-Pepsi, etc...) that can co-exist with the C universe it's possible for Smalltalk to become a full fledged systems language again as it once was. To limit the language and prevent this from happening will create a version of Smalltalk that simply only addresses the needs of a small segment of the market.

Concurrency control issues are a very important aspect of any general purpose programming language. To limit the solution space to a tiny corner of solutions would be a mistake by design.

Certainly making concurrency easier and fool proof is a laudable goal. However the cost might be too high a price if it's not done well or if it alters the language beyond it's current shape.

One of the reasons that I'm implementing a new language, ZokuScript, is that it does change the paradigm beyond that of Smalltalk. Keeping connected with Smalltalk is done via ZokuTalk. However the execution engine (not a virtual machine) will translate ZokuTalk (i.e. Smalltalk) into ZokuScript and then compile it to native code. ZokuTalk and Smalltalk are subsets of ZokuScript which is a fusion of many ideas and concepts from other languages and - above all else

application requirements.

The erlangification (erlangization, or erlangisation) of Smalltalk may be a radical enough transformation that it's no longer Smalltalk. If that's the way of Squeak that's fine however it seems that a fork is likely the result (and yes, the pun of forking was intended).

Since the driver is the requirements and not just technology awe what are the requirements for concurrency in Squeak and in Smalltalk (since Squeak is diverging from Smalltalk more and more)?

Inventing the future is fun and hard work. Which future are you inventing?

All the best,

Peter William Lount

Jason Johnson

24 Oct 24 Oct

9:03 a.m.

New subject: Multy-core CPUs, ERLANG

On 10/23/07, Peter William Lount peter@smalltalk.org wrote:

...

That's interesting. Thus Erlang DOES IN FACT HAVE SHARED MEMORY between processes: for code and for data. I'd like to learn more about that. Could anyone provide more details?

*sigh*. It does *not* have shared *mutable* memory, and that is the key. If I have to contextualize everything I say every time I say it my mails are going to get even longer, and I'm writing books as it is. But I stated several times that sharing is no problem *when it's read only*.

...

One proposal was a "copy-on-write" object space model where objects that are about to be written to in a Smalltalk process would be copied to that processes private object space - in effect that processes view of the "image".

To implement a copy-on-write technique would require operating system support for the typical modern mainstream operating system. To implement copy-on-write requires a synchronization primitive to be used by the operating system - if I'm not mistaken - at least for a few instructions while the page tables are updated - a critical section.

What are you on about? This has been done in Smalltalk before with a system that had certain objects in ROM. It's pretty simple and requires no OS help and no locking.

The VM handles requests made by processes. So if a process makes a request that modifies something in the read only space, the VM simply copies the data to the processes "area" and makes the update. Simple. It requires no OS help obviously and *no locking* because *by definition the read only space can not be changed*.

The only issue not handled by this is *code* changes, so this needs a separate mechanism as it does in Erlang (which also can change the code at runtime): the old and new code areas. Smalltalk does this now with "ObsoleteObject", so it shouldn't be a show stopper.

...

One of the crucial aspects that Alan Kay (and others) have promoted over and over again is the ability of a language to be expressed in itself. This has a certain beauty to it as well as a mathematical aesthetic that has important ramifications that go way beyond those characteristics. To have a "mobius" system that can rewrite itself while retaining functioning versions across a continuous evolutionary path one requires a system that can be expressed in itself. Alan Kay points to a page in the Lisp Manual where Lisp is implemented in itself. Since Smalltalk is supposed to be a general purpose programming language it is crucial that it have this aspect of being able to implement itself with itself. So far Squeak comes close to this - at least with respect to the virtual machine which is written in the slang subset of Smalltalk.

Ok, so what's the problem? The system I'm proposing would also be written in Slang. I'm certainly not going to do it in C anymore then I have to.

...

Unfortunately Squeak relies upon manually written C files for binding with the various operating systems. Co-existence with C based technology has it's price and it's high in that it blocks access to the entire system from within the system; by being blocked one is prevented from online interactive exploration and experimentation that we are used to at the Smalltalk source code level. At least this is being addressed in the amazing work of Ian Piumarta (http://piumarta.com/pepsi/pepsi.html) and the incredible work of LLVM (http://llvm.org). In fact I highly recommend that Squeak move from it's current obsolete C compilers to make use of either of these two projects as the bottom of the VM. Apple is funding LLVM and Ian's work seems to be part of the work of Alan Kay's Viewpoints Research Institute (http://www.vpri.org).

Now we move into our areas of common ground. :) I too look forward to the day that we can walk away from the ultimate premature optimization that is C.

...

The "non-destructive" assignment aspect of Erlang is typical of non-write-in-place functional and object database systems. It's a key aspect of the ZokuScript Object Database Management System and Technologies. However it's not a panacea that the silver bullet utopians think it is. As with any other solution matrix it has it's benefits, payoffs, minuses and costs. These need to be balanced for every application. As Wolfgang points out there are issues with it such as the "cycle" problem that need to be overcome via implementation exceptions.

Of course. In CS everything is a trade off. I never proposed message passing as a silver bullet, but rather the analogical equivalent of a GC in the concurrency world.

...

The other issue is how fine to you cut the objects? At what point do you say enough is enough? That is at what point does a process say oh, I don't really have control of changes to the object in question... as that object is private to another object space. Thus control needs to be passed to a process in the other object space likely on another compute node. For example corporate security constraints may require that certain data remain on the server while only permitting some data to be shared with a laptop node running remotely.

If the message passing is explicit, and the system isn't trying to do anything fancy for me, this is no issue. As far as "oh I don't control the object in question", this is encapsulation. Some people even consider encapsulation a good thing. ;)

...

It's important to consider the wider issues involved in distributed systems that are to be deployed in the real world. For Smalltalk to evolve we must get really serious about these issues ahead of the curve that others are pursuing now.

Exactly. And wasting precious resources to get where Java was 10 years ago when everyone else has realized that model can't scale is exactly what we need to avoid.

...

The erlangification (erlangization, or erlangisation) of Smalltalk may be a radical enough transformation that it's no longer Smalltalk. If that's the way of Squeak that's fine however it seems that a fork is likely the result (and yes, the pun of forking was intended).

No, what I envision will act just like Smalltalk does today with the singular exception that you wont have need of #critical:, Semaphore or any of that stuff anymore. And since most people don't use it, it doesn't look that painful to me.

All the interprocess communication is just going to be accomplished as message sends like everything else.

Peter William Lount

25 Oct 25 Oct

2:05 a.m.

New subject: Multy-core CPUs, ERLANG

Jason Johnson wrote:

...

PWL wrote

...
The erlangification (erlangization, or erlangisation) of Smalltalk may be a radical enough transformation that it's no longer Smalltalk. If that's the way of Squeak that's fine however it seems that a fork is likely the result (and yes, the pun of forking was intended).

No, what I envision will act just like Smalltalk does today with the singular exception that you wont have need of #critical:, Semaphore or any of that stuff anymore. And since most people don't use it, it doesn't look that painful to me.

All the interprocess communication is just going to be accomplished as message sends like everything else.

Hi,

Ok, then how does this magic actually work? Rather than bringing Erlang or other systems into your description please just describe how you see it working in Smalltalk. In detail please.

Peter

Igor Stasenko

23 Oct 23 Oct

4:52 p.m.

On 23/10/2007, Peter William Lount peter@smalltalk.org wrote:

[ your message was here ]

...

Peter William Lount Peter@smalltalk.org

A BIG +1 to your point. You expressed most things which i had in mind (i'm not native English speaker, so sometimes its hard to say what i have in mind).

Absolutely, there is no magical cure for concurrency. And hoping that we can deal with it by using __insert tool__ is an illusion.

And what is more frustrating (unfortunately), that a concurrency can be solved only when we come to problem from both sides: VM and language. By changing VM and not touching a bit in ST codebase we will have a crappy solution. By changing ST codebase, but using old one-threaded VM we have also crappy solution - while its very useful for generic distributed computing its too ineffective for computationally heavy problems (for example - raytracing). While you can do things in parallel, but in the end you must gather results into same memory space. Its good when your problem domain can be simply splitted onto smaller parts, which can be computed in parallel and overhead of objects serialization is too low comparing to time of computing partial results. But as Peter said in general case we may expect that overhead will be too high for some tasks, and some tasks can't be parallelised at all making single OS process working, while others simply hanging in memory eating space and consuming CPU resources by computing nothing.

Also, by spawning parallel OS processes we hand over many aspects of our parallel processing control to OS and losing many elegant and simple solutions. For example, why i have to lose simple 'send and receive' and be it replaced by 'send and pray' paradigm?

And don't take me wrong: i'm a big fan of spoon and Croquet islands, still these solutions hardly can be considered as generic in perspective of Multi-Core. Lets make it clear: a Multi-Core is _NOT_ distributed computing. Yet they have many in common, but to use them most effectively we need different approaches.

-- Best regards, Igor Stasenko AKA sig.

Jason Johnson

10:52 p.m.

On 10/23/07, Peter William Lount peter@smalltalk.org wrote:

...

The principle is that anytime you have more than one thread or process working on the same memory space, or object space, you WILL have concurrency issues (unless your code is just running very simple concurrency). The point is that in order to implement your utopia-vision-of-simple-problem-free-concurrency (utopia-concurrencia for lack of a better name) in Smalltalk you MUST isolate the objects to ONLY ONE thread of possible alteration of their state otherwise you end up with the possibility of many classes of concurrency problems.

Yes, this is mostly true. The insight with Erlang is that they don't actually have to be in a different memory space, it just has to be impossible at the language level for one process to get a reference to an object of another process *and modify it*.

...

Shared memory problems exist even within one protected memory space and not just between them. To isolate the objects involved in a process you can have a separate object space which contains the objects that will be operated on. This is the Erlang way, isn't it?

Kind of. The Erlang approach works so well for them because variables can't be changed. Once you create a variable it is frozen in that form. Other process *can* look at it because no change can happen from either thread.

Obviously more care will have to be taken in Smalltalk as the objects can always be changed.

...

The thing about Erlang, unless I'm mistaken (and if I am mistaken I'd expect to be corrected), is that the objects in a process are only visible to that process until the results are returned. The objects that pass in and out of an Erlang process are only primitive data types and not complex objects.

The last sentence is incorrect. The message can be any complexity, including sending functions, file handles, whatever.

...

However for Smalltalk you'd need to pass in complex object graphs of arbitrary size and connectedness to be general purpose. This then results in a version problem.

Only if what you pass can be modified by either side.

...

For example, <snip> Now you've got a problem that the magical erlang message passing won't solve.

Problem: your example is using shared data and updating of variables. In the message passing paradigm *there is no shared data*. Period. None. In Erlang specifically there isn't updating of variables even within a process. So this would be done in Erlang something like this:

some_process(DataStructure) -> break_up_structure(DataStructure, 10000), get_new_structure({}, 10000). % return result of get_new_structure

break_up_structure(_, 0) -> done; % base case, no processes left break_up_structure(DataStructure, Processes) -> % otherwise RestOfDataStructure = split_and_send(DataStructure), % cut off a piece and send break_up_structure(RestOfDataStructure, Processes - 1). % tail call with new values

get_new_structure(DataStructure, 0) -> DataStructure; % base case, return what we built get_new_structure(DataStructure, Processes) -> Data = receive, %psuedo code for brevity NewDataStructure = add_data_to_structure(Data, DataStructure), get_new_structure(NewDataStructure, Processes - 1).

The fact that variables are immutable is dealt with in the normal functional programming way of using tail recursion and passing any variables that need "updating" as arguments.

In case the above code isn't clear: The process breaks up the parts of the data structure and farms them out to the different processes, then waits for responses and incrementally assembles them into the new data structure.

Now, the issue here is obviously: This only makes sense when the processing of the data that was carved out is more expensive then the carving out and reattaching. If the structure is very large that may well not be the case.

In that case I'm not sure how I would handle it, but I look at it like any other performance issue: I would try algorithm changes before I looked at going to a lower level.

...

Now someone mentioned Software Transactional Memory (STM) so briefly that it would be easy to miss. Is that your solution?

No, if someone else wants to look at this it's ok. I'm a bit concerned about the book keeping.

...

If so you still have other concurrency issues, object versioning issues, plus more to deal with. No solution is a panacea for all problems unless you are an advocate of silver bullet solutions.

There is no such thing, but just as a generational garbage collector is "good enough" in all but the most special cases, I believe message passing will be "good enough" as well.

...

The problem of editing a large graph of objects with many parallel threads is the generalized case of a nasty and complex set of concurrency and transactional issues. There are many ways to solve this. If you reply to this example I would hope that you do so fully explaining how you'd handle the concurrency and - importantly - the object consistency issues.

Transactional and concurrency issues arise because you are sharing something. If you give one entity alone access to that something and all access must go through him these issues go away. They are traded for new issues, but issues that are much easier to reason about.

...

Yes, I understand that early tests indicate that Erlang can handle approximately 100,000 or so processes at a time without hickups while Java can handle about 8,000 or so before blowing up.

No where near 8,000. At least not on any box I've ever seen (or do you have a reference?). The problem is Java's just too fat, on a 32-bit operating system you run out of memory well before 8k processes or threads.

...

I don't know what the various Smalltalks can handle, but I doubt it's as high as Erlang and is more likely less than even Java - just a guess though. Maybe someone has worked it out.

Actually Smalltalk is not so far from Erlang right now (theoretically. The question mark is the scheduling). Erlang is optimized for this so the size of each process might be half the size of a Smalltalk one (but I'm not sure of even this), but it's *certainly* much higher then any native process or thread solution can hope to achieve.

...

That's only because the current crop of operating systems were designed and envisioned when a few hundred processes and threads was considered a lot. Also because native operating system processes take a lot of resources.

It's because of the resources and how the OS deals with them. Keep in mind that a thread can call "detach" and become a running process, so some care has to be taken that space will be available. Of course linux deals with this by not having real threads at all, just processes that have the same memory map as other processes.

...

Yes, and how would the no sharing be implemented in Smalltalk?

This is what my investigations will reveal. As I alluded to in a previous mail, any immutable data is not a concurrency issue. It doesn't matter who can see it so long as it can't be updated. Mutable data (e.g. objects) are also no issue provided you can guarantee no process can get access to it besides the process that created it.

So that leaves globals, especially classes. Until I get into this I'm now 100% how I'll deal with it, but I can't image that it's not solvable.

...

How would you solve the concurrency one million node editing problem above without locking in your utopian threading implementation?

As described above.

...

What would you do to Smalltalk to make it do this. So far you and the others have been very short on specifics and have just argued that something magical can be done to make concurrency happen without locks.

With current hardware/OSes, there will be locks, but in the VM where they belong. The only structure in Erlang that must be atomic is the message "mailbox", it's the only place that should can be accessed at the same time by multiple processes.

...

A few papers and web sites have been linked to but no one has written down what they are proposing or what they mean past it can be done.

Well, I'm a Smalltalker. I form a vague idea and then go try to do it. I'll let you know what the specification is when I've implemented it. :) But I have researched into this as far as what exists today and I haven't seen anything I feel is a show stopper, nor anything that will require a change in Smalltalk semantics. It's very possible (even likely) that there's something I've overlooked, but I'll need to get into it to find that out.

...

I'll grant you that you can see that it can be done. Please illuminate what it is that you see can be done in detail and how you might do it. Thanks.

Is it clearer now? I feel that I have detailed it out twice now (the relevant details anyway).

...

However, you'll still end up with concurrency control issues and you've got an object version explosion problem occurring as well. How will you control concurrency problems with your simplified system? Is there a succinct description of the way that Erlang does it? Would that apply to Smalltalk?

Can you give an example of one of these issues, so I can explain how I would deal with it? Please note, there is *no data sharing, period* in this paradigm. At least at the language level.

...

Ok, so there would be 10,000 separate process-object-spaces with the one million nodes being edited and new nodes being created in each of these 10,000 separate spaces. How do you expect to "merge" the results and solve the edits that will inevitably cause "logical data inconsistency" collisions?

By having just one process that owns the data (or lots of processes that own their own piece of it) that all processes must talk to if they wish to make changes.

...

You simplified concurrency system also dramatically alters the Smalltalk paradigm.

The current paradigm is fine-grained locked/shared state.

So?

So obviously this part of the current paradigm will be altered, and I say it needs to be. Even if we find that certain parallel tasks need the old shared state method, this shouldn't be provide anywhere most people will find it. The problem is that most people who know how to do concurrency code only know this shared state model, so if you present multiple options they will all use this, the familiar.

...

Why? Please provide more than anticidal or belief driven comments for this point of view. What are the reasons? What is it that you'd be moving towards?

Because of the reasons I've laid out several times in this thread: 1) it does not scale, 2) it can not be composed, 3) it's incredibly difficult to reason about, 4) it's a low level detail, 5) it ensures encapsulation violation and on and on.

There are plenty of papers out there on this subject, if you are looking for me to go through them all and condense it for you in a summary more then I've already done then I'm afraid that's not going to happen. It's a pretty well known fact that shared-state fine grained locking *can not scale*.

...

It's a huge mistake on their part in my humble view.

While it may be easy from the point of view of adapting their image it's a huge mistake. I've had many people comment that that's one of the reasons that Java is better than Smalltalk

If someone thinks that mess that is Java is better then Smalltalk, I already question what useful information they can bring to the table. Java has *some things* better then Smalltalk sure, but such a statement is an "information smell" or a "taste smell" to say the least.

...

it already works with multiple cpu

cores. Yes they have to solve the concurrency problems, but those are NO WORSE than the concurrency problems that already exist within Smalltalk when running with a single native process and multiple (green threads aka) Smalltalk Processes. No different. Do you actually get that?

For someone who is so violently against personally attacks, you sure hang over the fence, eh? Just because you're not understanding where I'm coming from doesn't mean these concepts are just beyond me and only you get it.

...

If you don't then you fail to appreciate that the approach that Cincom is taking isn't going to solve the concurrency problems since - unless they correct me on this - it seems that their direction is to simply have N-instances of their image (in the same memory space or in separate operating system processes) where N would frequently be the same as the number of cores on the computer (or server) in question (although the instances could be more or less as needed).

Which *does* solve it! And conveniently walks right past all the terrible issues that shared-state concurrency programming has. Once again, while Java people are trying to debug issues the Smalltalk guys will already be adding features to the next release.

...

Each individual image would still have the problems of multi-threading within it IF AND ONLY IF there are multiple threads forked.

Right, so don't do that. :)

...

This is of course a far cry from the radical concurrency system that is being proposed by the erlangization concurrency proponents.

Actually not so much. Erlang spanned actual CPU's by running more images, just like Smalltalk. So only the processes inside the image are different, but even this can be done today with discipline. I would like to remove this need for discipline by making it *impossible* to affect other processes, but so long as you make sure you don't update anything that other processes can see you could do this kind of message passing today.

...

There isn't any need for new syntax with the "!" character. Now sure you're using it with a binary message selector "!" but why obfuscate it. I'd recommend using a keyword selector for better clarity. Thanks.

This is just what Erlang uses. I want inter-process sends to stand out clearly.

...

Not so. You'd have to transmit - in my example above - one million objects to the various images and have them compute and return their resutls which would then have to be combined in a manner that leaves the graph of objects in a consistent state with one and a half million objects and 70% more interconnections between them. It is this parallel updating of many parts of the same data graph that will require the concurrency controls.

No. You are describing shared data which doesn't exist. No shared data = no locking needed.

...

Nothing but you've got to address the concurrency problem that I've mentioned above.

It wasn't a problem with message passing style, but for shared-state concurrency programming.

...

Are you talking about forking a new operating system process with a copy of the image?

I'm talking about: [ "some code" ] fork

...

These are object database problems and attempting to split the processing into multiple threads to avoid the "locking" issues does not solve the problem. It just pushes it further away. While it might work for some applications like telephone switching systems it can't generalize to ALL types of problems which could benefit from concurrency solutions.

No it can't, and I don't believe I ever said it did. But garbage collection can't either and we do fine with that as our only option in Squeak. If we need more we step outside the normal bounds, as it should be.

...

All Object Databases have a couple of rooted objects. Maybe many more than a couple.

Object databases are a whole other can of worms. I don't know how I would deal with it, but I would start by looking at what Mnesia (basically an object db for Erlang) does.

...

Yes, a variant of the Software Transactional Memory. However, you still have the problems mentioned above.

No, Software transactional memory means we update several variables inside an "automic" block and if the system notices something changed while these changes were being made it rolls the block back to what it was before.

I'm talking about a VM optimization to deal with metaclasses.

...

Having two spaces, old and new space, won't solve the problems mentioned above when you have N processes (threads) running on M-objects in parallel and need to combine the results of the parallel computations.

Old space and new space is purely for dealing with live code updates. Nothing more. I'm not trying to solve any object versioning issues, because I haven't seen any real evidence they will exist.

...

Many problems have this "split processes off with their chunk of data" and "recombine" the results. Many of these problems are simplified - if possible

so that the results can't collide with the issues presented above.

However, we are not talking about those special cases - such as parallel ray tracing algorithms. We are talking about the completely generic cases that occur in general purpose and every day use of code in Smalltalk applications

The only things I can think of that wouldn't work in this model is problems where splitting up and rebuilding a dataset is more expensive then the actual processing. But I think can usually be solved by design changes.

...

such as the massive Smalltalk business database front end applications

which are typical at many corporations today and which utilize many threads to accomplish their parallel tasks in order to speed up the user experience. A real world consequence of this is increased productivity of thousands of users day in and day out at these corporations.

I'm not sure what you're saying here. Apparently these Smalltalk applications aren't doing real multithreading now right (since it's only an option on a few ST implementations)? So how is offering a simple way to achieve concurrency going to make this worse?

...

Maybe your applications aren't a complex as these but I don't see the benefits of an Erlang ONLY approach. I do see the benefit of STM and Erlang approaches in some cases but why intentionally limit the tool box to just a few cases? It makes no sense to ignore the harsh reality of concurrency issues by picking a limited set of solutions.

For the reasons mentioned above. Choice isn't the holy grail you seem to think it is. If it was we would all be on the C++ list talking. Funny that we ended up in a language that 1) doesn't allow you to allocate your own memory, 2) forces you to use single inheritance, 3) forces you to use an image instead of files, etc.

I'm comfortable with the simplest thing being what works 90-99% the time and having to work much harder if I need something more.

Sebastian Sastre

24 Oct 24 Oct

2:09 a.m.

...

...
Yes, and how would the no sharing be implemented in Smalltalk?

This is what my investigations will reveal. As I alluded to in a previous mail, any immutable data is not a concurrency issue. It doesn't matter who can see it so long as it can't be updated. Mutable data (e.g. objects) are also no issue provided you can guarantee no process can get access to it besides the process that created it.

So that leaves globals, especially classes. Until I get into this I'm now 100% how I'll deal with it, but I can't image that it's not solvable.

Jason, I'll describe what I understood about your idea of solution so please correct me if I don't get you right.

<description> The idea you are exploring is about a Smalltalk in which one can send messages to processes, this is, object processes which are instances of something that can't be defined as the processes we know today not in the OS not in the current smalltalk vm's but more like the Erlang ones (extremely cheap). Note: for practical use I will desambiguate this new concept calling them as if they where instances of ObjectInProcess.

So lets imagine now this smalltalk which instead of the Object class we know it has today it has an ObjectProcess base class and all it's subclasses are kind of hierarchy we have today.

In this smalltalk, the VM guarantees that any instance can look anything about any creature of this virtual universe but nobody can modify it. So exists an strict respect of encapsulation. The modification of instVars can only happen if made by the objectProcess instance X itself and VM makes imposible to me modified differently from this.

So anObjectProcess will listen to messages but for the rest of the creatures of this image it has read only instVars. If other anyObjectProcess want to say something to it that demands a modification of some instVar that modification is again guaranteed by the VM that will be made only by itself. So creatures in this virtual universe can make pressure or ask kindly to any other objectProcess creature that it make a change but the reality is that universe guarantees strinctly that the change can be made only by itself. </description>

<analogicCuriousObservation> This is starting to sound strangely familiar to me: anybody can try to convince me about how things are, etc. but as owner of my "hardware" I'm the only creature that can change my synapses. </analogicCuriousObservation>

So in a previous example of yours any instance of any object (that reaches it) can make:

(Processes at: 'bank account') addUSD: 5000

and the correct instance in it's own process modifies the instVar.

OK.. here go what I think:

What I think is that messages should not be made special between processes. That smells badly. They should be normal and homogeneous. What should be refactored is the Object concept refining it's definition (in this new smalltalk version) to an object that lives in a process. We don't need to change the word but refine the definition, and conceptual interpretation of course, of what a smalltalk object is!.

We allways talk about smalltalk being a space of live objects and that it has a kind of anthropomorphic philosophy. Sorry this is strong but maybe this hardware shift are telling us that time has come to ask us about the need of the LiveObject class instead of Object to keep having the most complete and minimalist paradigm of the industry. I'm not saying the name formally is just to illustrate the value of time affecting instances. Of course is not alive, it's also to illustrate emphatically about the evolutionary timeline of any instance. The things it will experiment once it start to exist in the image. The evolution in time is what we call "Process" (not today in this industry but maybe soon).

IMHO the Object class, in the way we know it today, does not contemplate what Alan claimed in OOPSLA 97 about taking in mind the process part as being of such importance. The very own existance of an instance depends (inherently) in the experience it will suffer in time. And today we have no holder for that.

Besides the very nature of objects (now I mean in real life) is that they are not frozen in time. Even a piece of ice has strong molecular activity that we can interpret as being __molecules in processes__ inside.

Refining that concept to allow Object to become more like this LiveObject, ProcessObject or ObjectInProcess (whatever I don't care about names now, but I prefer to maintain calling it Object) will mean that we care about creating the possibility of seriusly modeling this reality about experience in time that objects suffer.

If we use smalltalk because it's nature is to be heuristic, this is familiar respect to reality, then maybe it's time to explore stop keeping the process concept with such a low priority from the design point of view and equalize the importance, again at design level, of the experience of the objects in time.

What we have today is that the st vm's makes objects to be supported by other objects running in vm threads but this idea it's about making objects to be existing __in__ a process. A process of it's own. All of them.

So in this imaginary smalltalk a process and an object are indissociable they don't exists as separated entities.

In an hypothetical simulation of the dynamic of one of this brain tides, every object MessageOfNeuron (may be hundreds of millions) run chemically on a process that goes ahead by it's own through someAxon and someDendrites of someNeurons.

So this simulation will run more efficiently in this smalltalk and modeled as an object and modeled more closer to reality and every MessageOfNeuron instance will run balanced in N cores.

Wow! I'll stop here by now but this is as much radical as interesting.

I'll appreciate criticism of all about this. Any kind of it, I don't care, this idea seems to me too important.

Now a technical question:

As said before a squeak image easily has 800k instances. Erlang was claimed to have about 100k processes without problem (I don't know in which hardware). Of course that a proof of concept should start small but I'm worried about that we still need to run about 800k vm _instance processes_ to achieve the goal. I wonder how that will not be a problem?

Jason, please keep us informed about any progress,

All the best,

Sebastian

Igor Stasenko

3:19 a.m.

On 23/10/2007, Jason Johnson jason.johnson.081@gmail.com wrote:

...

Problem: your example is using shared data and updating of variables. In the message passing paradigm *there is no shared data*. Period. None. In Erlang specifically there isn't updating of variables even within a process. So this would be done in Erlang something like this:

some_process(DataStructure) -> break_up_structure(DataStructure, 10000), get_new_structure({}, 10000). % return result of get_new_structure

break_up_structure(_, 0) -> done; % base case, no processes left break_up_structure(DataStructure, Processes) -> % otherwise RestOfDataStructure = split_and_send(DataStructure), % cut off a piece and send break_up_structure(RestOfDataStructure, Processes - 1). % tail call with new values

get_new_structure(DataStructure, 0) -> DataStructure; % base case, return what we built get_new_structure(DataStructure, Processes) -> Data = receive, %psuedo code for brevity NewDataStructure = add_data_to_structure(Data, DataStructure), get_new_structure(NewDataStructure, Processes - 1).

The fact that variables are immutable is dealt with in the normal functional programming way of using tail recursion and passing any variables that need "updating" as arguments.

In case the above code isn't clear: The process breaks up the parts of the data structure and farms them out to the different processes, then waits for responses and incrementally assembles them into the new data structure.

Now, the issue here is obviously: This only makes sense when the processing of the data that was carved out is more expensive then the carving out and reattaching. If the structure is very large that may well not be the case.

In that case I'm not sure how I would handle it, but I look at it like any other performance issue: I would try algorithm changes before I looked at going to a lower level.

...
Now someone mentioned Software Transactional Memory (STM) so briefly that it would be easy to miss. Is that your solution?

No, if someone else wants to look at this it's ok. I'm a bit concerned about the book keeping.

...
If so you still have other concurrency issues, object versioning issues, plus more to deal with. No solution is a panacea for all problems unless you are an advocate of silver bullet solutions.

There is no such thing, but just as a generational garbage collector is "good enough" in all but the most special cases, I believe message passing will be "good enough" as well.

This having a perspective, only if you have unlimited memory resources and zero cost memory allocation. Lets look more precise on this. I will write only in ST(i don't know Erlang), and assuming that i understood well your concept , by having following ST code:

SomeClass>>setVars self setVar1: value1. self setVar2: value2. ... ^ self

here at each message send , instead of writing to receiver memory, we do copy-on-write cloning. so, self setVar1: value1 will return us a modified copy - self' , To keep things semantically correct, then we substitute self in 'self setVar2: value2' by just received copy and so on.. at the end by returning self we substitute it by self''''''' . So, each time we modifying object we got a modified copy instead modifying original. Now think about costs: memory allocation and orders of magnitude more garbage generated. Now, even if we assume that each process haves own private memory region, its still should be located somewhere in physical memory. And as you may know, a physical memory is shared among all cores, so your 'topmost' memory manager have no excuses, but to use so disliked by you locking to deal with concurrent requests for resources. And as you may see from this example, this model really fast going to, that memory manager will become great bottleneck of your model, because of orders of magnitude higher memory consumption.

And now consider alternative: even by putting a dumb lock-write-unlock we can have much less cycles wasted. Because in your 'non-locking' model your main load is just producing tons of garbage by cloning objects over and over.

-- Best regards, Igor Stasenko AKA sig.

Sebastian Sastre

3:44 p.m.

...

This having a perspective, only if you have unlimited memory resources and zero cost memory allocation. Lets look more precise on this. I will write only in ST(i don't know Erlang), and assuming that i understood well your concept , by having following ST code:

SomeClass>>setVars self setVar1: value1. self setVar2: value2. ... ^ self

here at each message send , instead of writing to receiver memory, we do copy-on-write cloning.

But I don't understand why you want to do that if you can just make that the process of the receiver update it's own memory when receives that message that 1 was already assigned and 2 no copy is needed at all.

Cheers,

Sebastian

...

so, self setVar1: value1 will return us a modified copy - self' , To keep things semantically correct, then we substitute self in 'self setVar2: value2' by just received copy and so on.. at the end by returning self we substitute it by self''''''' . So, each time we modifying object we got a modified copy instead modifying original. Now think about costs: memory allocation and orders of magnitude more garbage generated. Now, even if we assume that each process haves own private memory region, its still should be located somewhere in physical memory. And as you may know, a physical memory is shared among all cores, so your 'topmost' memory manager have no excuses, but to use so disliked by you locking to deal with concurrent requests for resources. And as you may see from this example, this model really fast going to, that memory manager will become great bottleneck of your model, because of orders of magnitude higher memory consumption.

And now consider alternative: even by putting a dumb lock-write-unlock we can have much less cycles wasted. Because in your 'non-locking' model your main load is just producing tons of garbage by cloning objects over and over.

-- Best regards, Igor Stasenko AKA sig.

Igor Stasenko

7:07 p.m.

On 24/10/2007, Sebastian Sastre ssastre@seaswork.com wrote:

...

...
This having a perspective, only if you have unlimited memory resources and zero cost memory allocation. Lets look more precise on this. I will write only in ST(i don't know Erlang), and assuming that i understood well your concept , by having following ST code:

SomeClass>>setVars self setVar1: value1. self setVar2: value2. ... ^ self

here at each message send , instead of writing to receiver memory, we do copy-on-write cloning.

But I don't understand why you want to do that if you can just make that the process of the receiver update it's own memory when receives that message that 1 was already assigned and 2 no copy is needed at all.

Because a receiver is an object which passed as parameter to method. We can't modify it, thats why - we can't suppose where it passed from and can't suppose it it used in another parallel process, so best we can do is cloning. And since the above, you forced to do same for _any_ method, so you'll go cloning and cloning even by sending self setVar: ..

...

Cheers,

Sebastian

...
so, self setVar1: value1 will return us a modified copy - self' , To keep things semantically correct, then we substitute self in 'self setVar2: value2' by just received copy and so on.. at the end by returning self we substitute it by self''''''' . So, each time we modifying object we got a modified copy instead modifying original. Now think about costs: memory allocation and orders of magnitude more garbage generated. Now, even if we assume that each process haves own private memory region, its still should be located somewhere in physical memory. And as you may know, a physical memory is shared among all cores, so your 'topmost' memory manager have no excuses, but to use so disliked by you locking to deal with concurrent requests for resources. And as you may see from this example, this model really fast going to, that memory manager will become great bottleneck of your model, because of orders of magnitude higher memory consumption.

And now consider alternative: even by putting a dumb lock-write-unlock we can have much less cycles wasted. Because in your 'non-locking' model your main load is just producing tons of garbage by cloning objects over and over.

-- Best regards, Igor Stasenko AKA sig.

-- Best regards, Igor Stasenko AKA sig.

Sebastian Sastre

10:52 p.m.

...

...
...
we do copy-on-write cloning.

But I don't understand why you want to do that if you can just make that the process of the receiver update it's own memory

when receives

...
that message that 1 was already assigned and 2 no copy is

needed at all.

...
Because a receiver is an object which passed as parameter to method. We can't modify it, thats why - we can't suppose where it passed from and can't suppose it it used in another parallel process, so best we can do is cloning. And since the above, you forced to do same for _any_ method, so you'll go cloning and cloning even by sending self setVar: ..

Well I see your point. But let me clarify that I talk about something that does not suffer of that problem at all.

If we can forget that problem for a minute I can try to show you what I see. So here I go:

I don't know if you saw the reference I cited about the Alan Kay's OOPSLA 97 presentation. Is about an hour and a half or less in duration. It's reacheable with youtube.

There at some point Alan talks about someone said once that every host must be able to have a valid IP in internet and he states that every object should be able to have a valid IP. At first that statement is shocking because is too radical. And we don't have resources for that yet. The problem is that no matter how radical it is, it's still being a great idea.

As I saw it as just scaling the message passing between objects paradigm from image level to interenet level (which is of course massive).

We have no technology to make use of something like that, and maybe is not important to try to make that today nor in next five years. But that unhappy fact, forced by todays lack of resources reality, does not make that idea to be less good.

So we can decide to have the attitude to prepare ourselves to the moment in which hardware and industry makes that reality more closer. Could be in next 10? 20? 30 years? Nobody knows but we all do know that this industry is very accellerated and we are inventing future.

Now I'm trying to think <metaphore>with same software Alan used to sate that prhase</metaphore> but in another domain. The domain of processes. And with something that todays is closer to us: multicore technology.

So I'm stating here that in a smalltalk image of the future *every object should have a process*. Every instance. All of them.

We also could decide to interpret the phenomenon of seing the Erlang VM managing 100k processes messaging themselves sucessfully like a mere proof of concept to encourage us about hardware starting to turn *less worst* creatures than in the past. That way we can start to think that hardware it's becoming less worst to the point in wich we can take more seriusly making this hardware to manage a quantity of processes of the same order of magnitude of the quantity of instances in a smalltalk image. This makes feasible to map 1:1 process with instances.

Said that I return to the problem you stated about the need of copy copy copy, saying that this premise changes things and you don't need to copy anymore because a VM like that, no matter who or when, an instVar of an object is to be modified it will provide you of guarantee that the write will be made by the process that corresponds to that instance.

This idea is redefining what we understood today as anObject by *coupling* it to aProcess. In this hypothetical smalltalk anObject can't live without a process. It's indissociable.

So.. in this hypothetical smalltalk:

- we can supose that every object lives in a process - we can supose that nobody but the very owner of an instVar can write that instVar - we can supose that no other process but the one of that instance will write that piece of RAM - we can supose that everythig is an object - we can supose that all the instances-processes can be freely balanced trhough cores

Besides: - we have no need to pollute syntax nor smalltalk rules - we are not introducing singularities in the paradigm - we do are consuming more resources but in compensation of gaining unprecedent scalability - we are keeping the heuristic, completism and simplicity that defines smalltalk - we are making a step forward in anthropomorphism that will maintain smalltalk concepts familiar to persons - last but not least: we take advantage of multicore cpus transparently

Take a minute to think in it's consequences. This suposition, same concept in more explicit words, this assumption of anObject being an indissociable thing with aProcess, objects being 1:1 with processes, makes all the difference and dramatically simplifies all. An we do know that simplification improves scalability.

I'm, of course, being extremely speculative in the exploration of this idea in group with you all. But think is cheap :). In fact I don't even buy it myself yet. The problem is that I'm honestrly unable to refute myself about the convenience of this path :) so it becomes stronger.

I hope you and others understand why I'm starting to think that this is a powerful idea.

all the best,

Sebastian Sastre PS: Sorry for the size. I've tried to express this in my previous post. I'm trying to be didactic and illustrative.

...

...
Cheers,

Sebastian

...

-- Best regards, Igor Stasenko AKA sig.

Igor Stasenko

10:31 p.m.

Sebastian, you can envision that any unique object in VM is a unique process. No changes required to VM. Your concept having zero worth for me, because VM already supports that each objects have own encapsulated state, and you can change object's state only by sending messages to it. So, we already might say, that all objects are living and can be represented as a processes which triggered by sending message(s) to them.

On 24/10/2007, Sebastian Sastre ssastre@seaswork.com wrote:

...

...
...
...
we do copy-on-write cloning.

But I don't understand why you want to do that if you can just make that the process of the receiver update it's own memory

when receives

...
that message that 1 was already assigned and 2 no copy is

needed at all.

...
Because a receiver is an object which passed as parameter to method. We can't modify it, thats why - we can't suppose where it passed from and can't suppose it it used in another parallel process, so best we can do is cloning. And since the above, you forced to do same for _any_ method, so you'll go cloning and cloning even by sending self setVar: ..

Well I see your point. But let me clarify that I talk about something that does not suffer of that problem at all.

If we can forget that problem for a minute I can try to show you what I see. So here I go:

I don't know if you saw the reference I cited about the Alan Kay's OOPSLA 97 presentation. Is about an hour and a half or less in duration. It's reacheable with youtube.

There at some point Alan talks about someone said once that every host must be able to have a valid IP in internet and he states that every object should be able to have a valid IP. At first that statement is shocking because is too radical. And we don't have resources for that yet. The problem is that no matter how radical it is, it's still being a great idea.

As I saw it as just scaling the message passing between objects paradigm from image level to interenet level (which is of course massive).

We have no technology to make use of something like that, and maybe is not important to try to make that today nor in next five years. But that unhappy fact, forced by todays lack of resources reality, does not make that idea to be less good.

So we can decide to have the attitude to prepare ourselves to the moment in which hardware and industry makes that reality more closer. Could be in next 10? 20? 30 years? Nobody knows but we all do know that this industry is very accellerated and we are inventing future.

Now I'm trying to think <metaphore>with same software Alan used to sate that prhase</metaphore> but in another domain. The domain of processes. And with something that todays is closer to us: multicore technology.

So I'm stating here that in a smalltalk image of the future *every object should have a process*. Every instance. All of them.

We also could decide to interpret the phenomenon of seing the Erlang VM managing 100k processes messaging themselves sucessfully like a mere proof of concept to encourage us about hardware starting to turn *less worst* creatures than in the past. That way we can start to think that hardware it's becoming less worst to the point in wich we can take more seriusly making this hardware to manage a quantity of processes of the same order of magnitude of the quantity of instances in a smalltalk image. This makes feasible to map 1:1 process with instances.

Said that I return to the problem you stated about the need of copy copy copy, saying that this premise changes things and you don't need to copy anymore because a VM like that, no matter who or when, an instVar of an object is to be modified it will provide you of guarantee that the write will be made by the process that corresponds to that instance.

This idea is redefining what we understood today as anObject by *coupling* it to aProcess. In this hypothetical smalltalk anObject can't live without a process. It's indissociable.

So.. in this hypothetical smalltalk:
    - we can supose that every object lives in a process
    - we can supose that nobody but the very owner of an instVar can
write that instVar - we can supose that no other process but the one of that instance will write that piece of RAM - we can supose that everythig is an object - we can supose that all the instances-processes can be freely balanced trhough cores

Besides: - we have no need to pollute syntax nor smalltalk rules - we are not introducing singularities in the paradigm - we do are consuming more resources but in compensation of gaining unprecedent scalability - we are keeping the heuristic, completism and simplicity that defines smalltalk - we are making a step forward in anthropomorphism that will maintain smalltalk concepts familiar to persons - last but not least: we take advantage of multicore cpus transparently

Take a minute to think in it's consequences. This suposition, same concept in more explicit words, this assumption of anObject being an indissociable thing with aProcess, objects being 1:1 with processes, makes all the difference and dramatically simplifies all. An we do know that simplification improves scalability.

I'm, of course, being extremely speculative in the exploration of this idea in group with you all. But think is cheap :). In fact I don't even buy it myself yet. The problem is that I'm honestrly unable to refute myself about the convenience of this path :) so it becomes stronger.

I hope you and others understand why I'm starting to think that this is a powerful idea.
    all the best,
Sebastian Sastre PS: Sorry for the size. I've tried to express this in my previous post. I'm trying to be didactic and illustrative.

...
...
Cheers,

Sebastian

...

...
-- Best regards, Igor Stasenko AKA sig.

-- Best regards, Igor Stasenko AKA sig.

Sebastian Sastre

25 Oct 25 Oct

4:36 a.m.

...

-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Igor Stasenko Enviado el: Miércoles, 24 de Octubre de 2007 17:32 Para: The general-purpose Squeak developers list Asunto: Re: Multy-core CPUs

Sebastian, you can envision that any unique object in VM is a unique process. No changes required to VM. Your concept having zero worth for me, because VM already supports that each objects have own encapsulated state, and you can change object's state only by sending messages to it. So, we already might say, that all objects are living and can be represented as a processes which triggered by sending message(s) to them.

Well seems to me that you are very near to get the idea. You have described what we do have now. You seems to be missing the part of the process. I'm saying that an object has a double nature. It is an object as we know it today but it lives not statically (like in a photography) but in a process. And I'm stating that this relation exists by nature. It's the result of coupling the concept of object with the concept of process. Now we can cleverly clamp the process part of it and somehow invent a support that maps this process part nature of the object in some process running in some core of this incoming hardware.

I think that was with Spoon, I can't recall well now but someone of this group has made a visual Smalltalk memory map with it's instances. Just to have an idea try to visualize an instance. It is supported somewhere in RAM right? Metaphorically it has one "foot" step firmly in RAM. Currently some VM process, anyone, can tell it to modify itself and it writes in it's piece of RAM. Ok, now the idea I'm exploring and trying to communicate here it's that objects should (metaphorically) have "two foots". One in RAM and the other in a process running in some core. But that VM *will do guarantee* that every instance has it's piece of RAM as usual and a process that will be guaranteed to be the only one writing in that piece of RAM: the process that *is* the process part of the double nature of the instance.

As someone properly already stated: no sharing memory no concurrency problems.

Now.. what I think is a good question to answer regarding to this hypothetical Smalltalk is how do we translate this conceptual model to a bit based model to make it fit our current (incoming) hardware?

I bet that the VM will have to be modified.

As you properly said the current VM guarantees that the state of an object can be only be changed by sending it some message. What it is not guaranteeing right now is which process of that VM will be the one that will send the instructions to do the write. So now could be any VM process. And <metaphor> that's how we have purchased the concurrency problem </metaphor> we are trying to solve now.

But once again: what I'm saying here is about a VM that guarantees which process write that piece of RAM: the only one that is assigned to that piece of RAM belonging to that instance.

Why we "purchased" that concurrency problem? Becaouse a trade off prioritizing pragmatism with the resources available at that time. We polluted the conceptual domain with implementation matters (the need to make N process share in read/write the same space of RAM) to be able to get what we get today with the hardware we have today. But may be ideas like this using better hardware can depollute it.

See: mathematis are nothing but a tool to model. All models are rudimentary simplifications of some system. Make a limited model reality with boolean algebra may tehoretically be possible but extremelly unhuman. Programing in assembler is a paliative to walk that path with less "brain damage".

Smalltalk is also a tool to model. The difference resides in it's nature: it is heuristic by design. It's intellectually ergonomic. It shows respect to the form in we humans form thought and concepts, mature old ones by refining and explore new ones by prototyping, and relegates to a secondary place how machines need (this couple of decades?) things to be done so they obey behaving as we need.

So Smalltalk pays a price of being less efficient (than C, etc) to bring you the freedom of maping what you see in reality directly to a computer. It frees you from having to map it to mathematics to reifi it later even more polluted (one order of magnitude of distorsion in the modeling).

That way you can model keeping minimalized the machinery madness so you gain the chance to make a less polluted virtual model by maping the model your brain quickly makes 1:1 with the virtual model computers need. It breaks the trend to model things booleanly, mathematically even relationally. It gives you a tool that can make your concept be from timidly fragile and embrionary maturing to rock solid to go for production: that tool is the virtual object, an instance.

Current hardware is based on mathematics. Boolean. So we had no other option in that trade off than to take the boolean path to be able to use hardware to make a Smalltalk opening computers to a bigger, and closer to humans, space of solutions.

Maybe hardware is reaching a point in which, excuse my french, it sucks less. And we can give a step backward in that old, absolutely understandable, trade off and regain the conceptual refinement we allways needed in this system. Somethig that maybe was seen there in time or may be is being seen now because it's time to reach a new degree of subtlelty of cogitation of this clever artifice we know as Smalltalk.

Returning to planet Earth now.. In the idea I'm exploring here, the VM of this hypothetical Smalltalk you will also have guarantee that the process that sends intructions to write in memory is the one that belong to the instance that belongs to that process [1]. That way you never got inconsistent states nor concurrency to write there because you never shared anything [2].

That VM should make the processes of the instancess in a fashion that does not matter in which core it's running, so it can be balanced, nor what part of the RAM has assigned. Once assigned will be for it and only it. It will be written by that process and only by that process.

I care about passing this message right so I ask you kindly: do you see value now?

cheers,

Sebastian

[1] the process belongs to the instance or the instance to the process? A Moebius thing here? It's reasonable because I have somehow fusioned the concepts [2] in the brain analogy makes sense because you dont share yours brain memory. You "serialize" your thoughts, to written text or spoken words, but you never ever ever share RAM which, at hardware level, are your synapses.

Jason Johnson

5:49 p.m.

On 10/24/07, Igor Stasenko siguctua@gmail.com wrote:

...

Sebastian, you can envision that any unique object in VM is a unique process. No changes required to VM. Your concept having zero worth for me, because VM already supports that each objects have own encapsulated state, and you can change object's state only by sending messages to it. So, we already might say, that all objects are living and can be represented as a processes which triggered by sending message(s) to them.

But not exactly. The difference is the threads of execution. Today the thread of execution belongs to processes, not objects. So if two processes call a method on the same object at the same time... race condition. If the thread of execution belonged to the objects themselves this couldn't happen, the different requests would have to wait in line.

dpharris＠telus.net

24 Oct 24 Oct

10:53 p.m.

Quoting Sebastian Sastre ssastre@seaswork.com:

...

... Take a minute to think in it's consequences. This suposition, same concept in more explicit words, this assumption of anObject being an indissociable thing with aProcess, objects being 1:1 with processes, makes all the difference and dramatically simplifies all. An we do know that simplification improves scalability.

I'm, of course, being extremely speculative in the exploration of this idea in group with you all. But think is cheap :). In fact I don't even buy it myself yet. The problem is that I'm honestrly unable to refute myself about the convenience of this path :) so it becomes stronger.

I hope you and others understand why I'm starting to think that this is a powerful idea.

all the best,

Sebastian Sastre PS: Sorry for the size. I've tried to express this in my previous post. I'm trying to be didactic and illustrative.

I was always impressed in Self by an idea of the implementers. They chose a structure that is implicitly inefficient in terms of implementation, namely that every object has its own named slots.

However, the smart idea was that, behind the scenes they used implementation 'tricks' to mitigate the efficiency hit, while maintaining the model at the top level. For example, they defined "maps to transparently group objects cloned from the same prototype, providing data type information and eliminating the apparent space overhead for prototype-based systems."

Similarly, we could accept the object-process model, and then explore ways to make this efficient behind the scenes. It seems to me that on a uniprocessor, the 'behind the scenes' would look much like Smalltalk today. But as multi-core and distributed computing becomes more prevalent we would gain the benefit. Certainly we could explore the realities and consequences of this model.

David

Sebastian Sastre

25 Oct 25 Oct

3:48 p.m.

...

-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de dpharris@telus.net Enviado el: Miércoles, 24 de Octubre de 2007 17:53 Para: The general-purpose Squeak developers list Asunto: RE: Multy-core CPUs

Quoting Sebastian Sastre ssastre@seaswork.com:

...
... Take a minute to think in it's consequences. This suposition, same concept in more explicit words, this assumption of anObject

being an

...
indissociable thing with aProcess, objects being 1:1 with

processes,

...
makes all the difference and dramatically simplifies all. An we do know that simplification improves scalability.

I'm, of course, being extremely speculative in the

exploration of this

...
idea in group with you all. But think is cheap :). In fact I don't even buy it myself yet. The problem is that I'm honestrly unable to refute myself about the convenience of this path :) so it

becomes stronger.

...
I hope you and others understand why I'm starting to think

that this

...
is a powerful idea.

all the best,

Sebastian Sastre PS: Sorry for the size. I've tried to express this in my previous post. I'm trying to be didactic and illustrative.

I was always impressed in Self by an idea of the implementers. They chose a structure that is implicitly inefficient in terms of implementation, namely that every object has its own named slots.

However, the smart idea was that, behind the scenes they used implementation 'tricks' to mitigate the efficiency hit, while maintaining the model at the top level. For example, they defined "maps to transparently group objects cloned from the same prototype, providing data type information and eliminating the apparent space overhead for prototype-based systems."

Similarly, we could accept the object-process model, and then explore ways to make this efficient behind the scenes. It seems to me that on a uniprocessor, the 'behind the scenes' would look much like Smalltalk today. But as multi-core and distributed computing becomes more prevalent we would gain the benefit. Certainly we could explore the realities and consequences of this model.

David

Because they prioritized the conceptual model "workarrounding machines" to provide an acceptable overall user experience. Machines are to be used by persons. They are only important to materialize consequences of it's use. They should be meant to improve quality of life of persons and/or the environment. With a personal value like that, very deeply affecting all your thought, it's obvious that systems has to be made priorizing intellectual ergonomy and heuristic principles.

The case you David are citing about Self, a system in which designers managed to prioritize the conceptual model over the boolean, it's a rare and precious coherent sample of what I'm saying.

If we program computers for some time *we know* there allways be ways to optimize how things are done. So, if making a system, like this Smalltalk of "One Process Per Instance", results that it has to consume some more computer resources than others, I say "I don't care" because it's machines work. Hardware it's becoming less worst and RAM and cores cheaper and cheaper very fast. I don't feel empathy for machines, I use them to extract value for me and other persons. So if that is the case lets do clever things to mitigate the conceptual-boolean impedance mismatch instead of endlessly discusing about machinery madness that pollute our ideas. As David say, Self people has showed us mitigation in hard cases is possible by it's example. Smalltalk it's a splendid example also.

But this idea is about making the conceptual model to refine rising one step. To step forward in a direction that will bring us unprecedent scalability without compromising the simplicity we have today in Smalltalk. I'm not saying it's the only path but I think it deserves more exploration because seems to me that it's a path of success in a world of increasing need of persons modeling it's complexity using machines. And in this path, I think Smalltalk has a chance to be vanguardist. Again.

All the best,

Sebastian PS1: Some people claimed I'm being too philosophic, maybe I am because it's necessary to gestate a solid fundation to build things over it. But it's ok nothing wrong with that claim, lets embrace pragmatism also. Use the list, express yourself about this. Your experience or opinions may be important for others or to continue cogitation. PS2: I also read Peter posts and I think he's is very good arguing and he seems to have a very valuable experience. Peter what do you think about this One Process Per Instance Smalltalk?

Sebastian Sastre

6:03 p.m.

New subject: One Process Per Instance (RE: Multy-core CPUs)

...

Quoting Sebastian Sastre ssastre@seaswork.com:

... what do you think about this One Process Per Instance Smalltalk?

...

I'll answer an important aspect that just came to my mind: persistence.

Question: we won a persistence solution at the same price of making a one process per instance Smalltalk?

Well lets see... "one crisis at the time"

I decided to interpret that all database madness is because the industry was trying to think is normal to share writes. And I decided to interpret that the shared write is not normal but a trade off that was made in the industry to win some usage at the price of "buying the concurrency problem" for us all.

With One Process Per Instance we have put aside the shared-write so closed the door to it's whole big space of problems.

I speculate about how an image of this Smalltalk would raise in a host with the million objects of the intensely dense graph that Peter bring as example.

Of course one million processes can't be reasonably run in a laptop of any of us today so we are forced to think differently from how a current image starts up today. So perhaps we can manage that million by taking 4K light processes at the time in a let's say 4 core processor CPU with 4G of RAM.

But how this will work? Differently from an image of current smalltalk that reifies all instances in RAM at startup, this will reify objects on demand (lazily) in a swarm of instances of N thousand at the time. N will be trivially and directly depending on the hardware resources.

And what happen when a demand of reaching an instance occurs? The VM assigns a process and a little portion of RAM for that instance that will be stick to it as long as it's reified and not garbage collected. Please note that I said that only that process it's allowed to write. That only happens if not previously reified. Once an instance is reified, so it has it's process assigned and it's portion of RAM, it can receive the message.

And what happen when you save an image like that? The VM has assigned that portion of RAM to that process but the system just gives a different support (medium) for that piece of RAM. It trivially maps RAM to disk. No impedance mismatch. Just the instance format plus a way to locate it when needed again.

So? Well.. that way every piece of RAM dedicated to an instance has an homologous dedicated piece of disk.

And? And that piece of disk can only be written by the process of it's own instance provinding guarantee that it never ever ever never ever will be inconsistent.

Oh.. so we won a persistence solution at the same price of making a one process per instance Smalltalk? I have serious doubts about fault tolerance to power faliures and such but putting that in a second plane I'm starting to think that yes we do because a commit will be a mere flush of that disk or medium and you never need to rollback anything just write or do not do it.

Intriguing...dont you think?

Sebastian Sastre

Jason Johnson

5:45 p.m.

On 10/24/07, Sebastian Sastre ssastre@seaswork.com wrote:

...

So I'm stating here that in a smalltalk image of the future *every object should have a process*. Every instance. All of them.

That is an interesting idea. That would open a door to a new way of Garbage collection, because it can then be tied to the exit of a process.

...

Said that I return to the problem you stated about the need of copy copy copy, saying that this premise changes things and you don't need to copy anymore because a VM like that, no matter who or when, an instVar of an object is to be modified it will provide you of guarantee that the write will be made by the process that corresponds to that instance.

Yes, in such a system, you don't need to copy because all that gets passed around are references to processes.

Peter William Lount

7:24 p.m.

Jason Johnson wrote:

...

On 10/24/07, Sebastian Sastre ssastre@seaswork.com wrote:

...
So I'm stating here that in a smalltalk image of the future *every object should have a process*. Every instance. All of them.

That is an interesting idea. That would open a door to a new way of Garbage collection, because it can then be tied to the exit of a process.

...
Said that I return to the problem you stated about the need of copy copy copy, saying that this premise changes things and you don't need to copy anymore because a VM like that, no matter who or when, an instVar of an object is to be modified it will provide you of guarantee that the write will be made by the process that corresponds to that instance.

Yes, in such a system, you don't need to copy because all that gets passed around are references to processes.

hi,

What? That just won't work. Think of the memory overhead.

Tying an object instance to a particular process makes no sense. If you did that you'd likely end up with just as many dead locks and other concurrency problems since you'd now have message sends to the object being queued up on the processes input queue. Since processes could only process on message at a time deadlocks can occur - plus all kinds of nasty problems resulting from the order of messages in the queue. (There is a similar nasty problem with the GUI event processing in VisualAge Smalltalk that leads to very difficult to diagnose and comprehend concurrency problems). It's a rats maze that's best to avoid.

Besides, in some cases an object with multiple threads could respond to many messages - literally - at the same time given multiple cores. Why slow down the system by putting all the messages into a single queue when you don't have to!?

Tying an object's life time to the lifetime of a process doesn't make sense since there could be references to the object all over the place. If the process quits the object should still be alive IF there are still references to it.

You'd need to pass around more than references to processes. For if a process has more than one object you'd not get the resolution you'd need. No, passing object references around is way better.

Even if you considered an object as having it's own "logical" process you'd get into the queuing problems hinted at above.

Besides objects in Smalltalk are really fine grained. The notion that each object would have it's own thread would require so much thread switching that no current processor could handle that. It would also be a huge waste of resources.

Again, one solution does not fit all problems - if it did programming would be easier.

All the best,

Peter

Jason Johnson

7:58 p.m.

Well ok, that's a pretty good break down of the problems of such a thing. Though it wasn't my idea and I'm not invested in it.

Of course the one thing I have to disagree with in your message is the comment about an object responding to multiple messages. I wouldn't want that. If you want an object to respond to more messages concurrently make more of them.

On 10/25/07, Peter William Lount peter@smalltalk.org wrote:

...

Jason Johnson wrote: On 10/24/07, Sebastian Sastre ssastre@seaswork.com wrote:

So I'm stating here that in a smalltalk image of the future *every object should have a process*. Every instance. All of them.

That is an interesting idea. That would open a door to a new way of Garbage collection, because it can then be tied to the exit of a process.

Said that I return to the problem you stated about the need of copy copy copy, saying that this premise changes things and you don't need to copy anymore because a VM like that, no matter who or when, an instVar of an object is to be modified it will provide you of guarantee that the write will be made by the process that corresponds to that instance.

Yes, in such a system, you don't need to copy because all that gets passed around are references to processes.

hi,

What? That just won't work. Think of the memory overhead.

Tying an object instance to a particular process makes no sense. If you did that you'd likely end up with just as many dead locks and other concurrency problems since you'd now have message sends to the object being queued up on the processes input queue. Since processes could only process on message at a time deadlocks can occur - plus all kinds of nasty problems resulting from the order of messages in the queue. (There is a similar nasty problem with the GUI event processing in VisualAge Smalltalk that leads to very difficult to diagnose and comprehend concurrency problems). It's a rats maze that's best to avoid.

Besides, in some cases an object with multiple threads could respond to many messages - literally - at the same time given multiple cores. Why slow down the system by putting all the messages into a single queue when you don't have to!?

Tying an object's life time to the lifetime of a process doesn't make sense since there could be references to the object all over the place. If the process quits the object should still be alive IF there are still references to it.

You'd need to pass around more than references to processes. For if a process has more than one object you'd not get the resolution you'd need. No, passing object references around is way better.

Even if you considered an object as having it's own "logical" process you'd get into the queuing problems hinted at above.

Besides objects in Smalltalk are really fine grained. The notion that each object would have it's own thread would require so much thread switching that no current processor could handle that. It would also be a huge waste of resources.

Again, one solution does not fit all problems - if it did programming would be easier.

All the best,

Peter

Peter William Lount

8:38 p.m.

Jason Johnson wrote:

...

Of course the one thing I have to disagree with in your message is the comment about an object responding to multiple messages. I wouldn't want that. If you want an object to respond to more messages concurrently make more of them.

Hi,

If you make more of them then they aren't the same object! That could have wide implications for an object graph. Identity is an important concept especially in large object graphs.

Also, if you do make more than one of the object you have to concurrently synchronize any changes between it's copies unless of course it's the simple case where the objects are read only.

Simplification can get one in trouble when it comes to concurrency, distribution and other areas of computing. Part of that trouble is thinking that ones solution is general when it just covers a subset of the desired solution space. This is why discussion is a good thing as it tends to reveal the blind spots that one has. This goes for me too!

Thanks by the way for the correction about Erlang being able to "message" more than simple data types across it's wire. I'd forgotten that part.

All the best,

Peter

Sebastian Sastre

9:23 p.m.

hi,

What? That just won't work. Think of the memory overhead.

I don't give credit to unfounded apriorisms. I think it deserves to be proved that does not work. Anyway let's just assume that may be too much for state of the art hardware in common computers in year 2007. What about in 2009? what about in 2012? Remember the attitude you had saying this now the first day of 2012.

You didn't understand the model I'm talking about. There isn't such a thing as an object with multiple trheads. That does not exists in this model. It does exists one process per instance no more no less. I think you're thinking about processes and threads the same way you know them today. Lets see if this helps you to get the idea: Desambiguation: for this model I'm talking about process not as an OS process but as a VM light process which we also use to call them threads. So I'm saying that in this model you have only one process per instance but that process is not a process that can have threads belonging to it. That generates a hell of complexity. The process I'm saying it's tied to an instance it's more close to the process word you know from dictionary plus what you know what an instance is and with the process implemented by a VM that can balance it across cores.

I'm not falling in the pitfall of start trying to parallelize code automagically. This far from it. In fact I think this is better than that illusion. Every message is guaranteed by VM to reach it's destination in guaranteed order. Otherwise will be chaos. And we want an ordered chaos like the one we have now in a Squeak reified image.

Clarified that I ask why do you think could be deadlocks? and what other kind of concurrency problems do you think that will this model suffer?

You'd need to pass around more than references to processes. For if a process has more than one object you'd not get the resolution you'd need. No, passing object references around is way better.

Yes of course there will be. In this system a process termination is one of two things: A) that instance is being reclaimed in a garbage collection or B) that instance has been written to disk in a kind of hibernation that can be reified again on demand. Please refer to my previous post with subject "One Process Per Instance.." where I talk more about exacly this.

Even if you considered an object as having it's own "logical" process you'd get into the queuing problems hinted at above.

Which I dont see and I ask your help to understand if you still find them after the clarifications made about the model.

And what do you think was going out of the mouths of criticizers of the initiatives like the park place team had in 1970's making a Smalltalk with the price of the CPU's and RAM at that time? that VM's are a smart efficient use of resources?

So I copy paste myself: "I don't give credit to unfounded apriorisms. It deserves to be proven that does not work. Anyway let's just assume that may be too much for state of the art hardware in common computers in year 2007. What about in 2009? what about in 2012?"

Again, one solution does not fit all problems - if it did programming would be easier.

But programming should have to be easier. Smalltalk made it easier in a lot of aspects. Listen.. I'm not a naif silver bullet purchaser nor a faithful person. I'm a critic Smalltalker that thinks he gets the point about OOP and tries to find solutions to surpass the multicore crisis by getting an empowered system not consoling itself with a weaker one. Peter please try to forget about how systems are made and think in how you want to make them.

cheers,

Sebastian

All the best,

Peter

Peter William Lount

9:28 p.m.

Hi,

Sebastian Sastre wrote:

...

hi,

What? That just won't work. Think of the memory overhead.

I don't give credit to unfounded apriorisms. I think it deserves to be proved that does not work. Anyway let's just assume that may be too much for state of the art hardware in common computers in year 2007. What about in 2009? what about in 2012? Remember the attitude you had saying this now the first day of 2012.

It's not an unfounded apriorism as you put it.

Current hardware and technology expected in the next ten years isn't optimized for N hundred thousand or N million threads of execution. Maybe in the future that will be the case.

The Tile-64 processor is expected to grow to about 4096 processors by pushing the limits of technology beyond what they are today. To reach the levels you are talking about for a current Smalltalk image with millions of objects each having their own thread (or process) isn't going to happen anytime soon.

I work with real hardware.

I am open and willing to be pleasantly surprised however.

...

Tying an object instance to a particular process makes no sense. If you did that you'd likely end up with just as many dead locks and other concurrency problems since you'd now have message sends to the object being queued up on the processes input queue. Since processes could only process on message at a time deadlocks can occur - plus all kinds of nasty problems resulting from the order of messages in the queue. (There is a similar nasty problem with the GUI event processing in VisualAge Smalltalk that leads to very difficult to diagnose and comprehend concurrency problems). It's a rats maze that's best to avoid.

Besides, in some cases an object with multiple threads could respond to many messages - literally - at the same time given multiple cores. Why slow down the system by putting all the messages into a single queue when you don't have to!? You didn't understand the model I'm talking about.

That is likely the case.

...

There isn't such a thing as an object with multiple trheads. That does not exists in this model.

Ok. I got that.

...

It does exists one process per instance no more no less.

I did get that. Even if you only do that logically you've got serious problems.

...

I think you're thinking about processes and threads the same way you know them today.

I can easily see such a scenario working and also breaking all over the place.

...

Lets see if this helps you to get the idea: Desambiguation: for this model I'm talking about process not as an OS process but as a VM light process which we also use to call them threads.

Ok.

...

So I'm saying that in this model you have only one process per instance but that process is not a process that can have threads belonging to it.

ok.

...

That generates a hell of complexity.

You lost me there. What complexity?

...

The process I'm saying it's tied to an instance it's more close to the process word you know from dictionary plus what you know what an instance is and with the process implemented by a VM that can balance it across cores.

I didn't understand. Please restate.

...

I'm not falling in the pitfall of start trying to parallelize code automagically. This far from it. In fact I think this is better than that illusion. Every message is guaranteed by VM to reach it's destination in guaranteed order. Otherwise will be chaos. And we want an ordered chaos like the one we have now in a Squeak reified image.

Yes, squeak is ordered chaos. ;--).

...

Clarified that I ask why do you think could be deadlocks? and what other kind of concurrency problems do you think that will this model suffer?

If a number of messages are waiting in the input queue of a process that can only process one message at a time since it's not multi-threaded then those messages are BLOCKED while in the thread. Now imagine another process with messages in it's queue that are also BLOCKED since they are waiting in the queue and only one message can be processed at a time. Now imagine that process A and process B each have messages that the other needs before it can proceed but those messages are BLOCKED waiting for processing in the queues.

This is a real example of what can happen with message queues. The order isn't guaranteed. Simple concurrency solutions often have deadlock scenarios. This can occur when objects must synchronize events or information. As soon as you have multiple threads of execution you've got problems that need solving regardless of the concurrency model in place.

...

Tying an object's life time to the lifetime of a process doesn't make sense since there could be references to the object all over the place. If the process quits the object should still be alive IF there are still references to it. You'd need to pass around more than references to processes. For if a process has more than one object you'd not get the resolution you'd need. No, passing object references around is way better.

Yes of course there will be. In this system a process termination is one of two things: A) that instance is being reclaimed in a garbage collection or B) that instance has been written to disk in a kind of hibernation that can be reified again on demand. Please refer to my previous post with subject "One Process Per Instance.." where I talk more about exacly this.

If all there is is a one object per process and one process per object - a 1 to 1 mapping then yes gc would work that way but the 1 to 1 mapping isn't likely to ever happen given current and future hardware prospects.

...

Even if you considered an object as having it's own "logical" process you'd get into the queuing problems hinted at above.

Which I dont see and I ask your help to understand if you still find them after the clarifications made about the model.

See the example above.

...

Besides objects in Smalltalk are really fine grained. The notion that each object would have it's own thread would require so much thread switching that no current processor could handle that. It would also be a huge waste of resources. And what do you think was going out of the mouths of criticizers of the initiatives like the park place team had in 1970's making a Smalltalk with the price of the CPU's and RAM at that time? that VM's are a smart efficient use of resources?

That's not really relevant. If you want to build that please go ahead - please don't let me stop you, that's the last thing I'd want. I wish you luck. I get to play with current hardware and hardware that's coming down the pipe such as the Tile-64 or the newest GPUs when they are available to the wider market.

...

So I copy paste myself: "I don't give credit to unfounded apriorisms. It deserves to be proven that does not work. Anyway let's just assume that may be too much for state of the art hardware in common computers in year 2007. What about in 2009? what about in 2012?"

Well just get out your calculator. There is an overhead to a thread or process in bytes. Say 512 bytes per thread plus it's stack. There is the number of objects. Say 1 million for a medium to small image. Now multiply those and you get 1/2 gigabyte. Oh, we forgot the stack space and the memory for the objects themselves. Add a multiplier for that, say 8 and you get 4 gigabytes. Oh, wait we forgot that the stack is kind weird since as each message send that isn't to self must be an interprocess or interthread message send you've got some weirdness going on let along all the thread context switching for each message send that occurs. Then you've got to add more for who knows what... the list could go on for quite a bit. It's just mind boggling.

Simply put current cpu architectures are simply not designed for that approach. Heck they are even highly incompatible with dynamic message passing since they favor static code in terms of optimizations.

...

Again, one solution does not fit all problems - if it did programming would be easier.

But programming should have to be easier.

Yes, I concur, whenever it's possible to do so. But it also shouldn't ignore the hard problems either.

...

Smalltalk made it easier in a lot of aspects.

Sure I concur. That's why I am working here in this group spending time (is money) on these emails.

...

Listen.. I'm not a naif silver bullet purchaser nor a faithful person. I'm a critic Smalltalker that thinks he gets the point about OOP and tries to find solutions to surpass the multicore crisis by getting an empowered system not consoling itself with a weaker one.

I do get that about you.

...

Peter please try to forget about how systems are made and think in how you want to make them.

I do think about how I want to make them. However to make them I have no choice but to consider how to actually build them using existing technologies and the coming future technologies.

Currently we have 2-core and 4-core processors as the mainstream with 3-core and 8-core coming to a computer store near you. We have the current crop of GPUs from NVidia that have 128 processing units that can be programmed in a variant of C for some general purpose program tasks using a SIMD (single instruction multiple data) format - very useful for those number crunching applications like graphics, cryptology and numeric analysis to name just a few. We also have the general purpose networked Tile-64 coming - lots of general purpose compute power with an equal amount of scalable networked IO power - very impressive. Intel even has a prototype with 80-cores that is similar. Intel also has it's awesomely impressive Itanium processor with instruction level parallelism as well as multiple cores - just wait till that's a 128 core beastie. Please there is hardware that we likely don't know about or that hasn't been invented yet. Please bring it on!!!

The bigger problem is that in order to build real systems I need to think about how they are constructed.

So yes, I want easy parallel computing but it's just a harsh reality that concurrency, synchronization, distributed processing, and other advanced topics are not always easy or possible to simplify as much as we try to want them to be. That is the nature of computers.

Sorry for being a visionary-realist. Sorry if I've sounded like the critic. I don't mean to be the critic that kills your dreams - if I've done that I apologize. I've simply meant to be the realist who informs the visionary that certain adjustments are needed.

All the best,

Peter

nicolas cellier

10:40 p.m.

But Smalltalk methods are sequential procedures by nature, so having a process per object maybe would adress mutual exclusion problem, but will not introduce parallelism per se.

Someone has to decide to break execution sequential path into parallel paths.

As long as we have 2,3,4 processing units, maybe we can trust programmer can use old concurrency model, working with already existing Smalltalk::Process objects, we can try hard to make them robust to parallelism...

This obviously won't scale to a 1000 or more processing units!

It seems to me that: - focusing exclusively on solving mutual exclusion problem, sharing whole state or none (though I don't understand the value) or partial state via some duplication and syncing mecanism (a-la-croquet or a-la-spoon-sauce maybe). - not adressing parallelisation problem (Except early doInParallel: proposal). makes this thread old-concurrency-few-core problem related.

Maybe there is a future for declarative language revival...

PS: for fun, what happens to all-is-object paradigm if each and every object has a MessageQueue object? What is the MessageQueue of the MessageQueue of ...

Peter William Lount a écrit :

...

Hi,

Sebastian Sastre wrote:

...
hi,

What? That just won't work. Think of the memory overhead.

I don't give credit to unfounded apriorisms. I think it deserves to be proved that does not work. Anyway let's just assume that may be too much for state of the art hardware in common computers in year 2007. What about in 2009? what about in 2012? Remember the attitude you had saying this now the first day of 2012.

It's not an unfounded apriorism as you put it.

Current hardware and technology expected in the next ten years isn't optimized for N hundred thousand or N million threads of execution. Maybe in the future that will be the case.

The Tile-64 processor is expected to grow to about 4096 processors by pushing the limits of technology beyond what they are today. To reach the levels you are talking about for a current Smalltalk image with millions of objects each having their own thread (or process) isn't going to happen anytime soon.

I work with real hardware.

I am open and willing to be pleasantly surprised however.

...
Tying an object instance to a particular process makes no sense. If you did that you'd likely end up with just as many dead locks and other concurrency problems since you'd now have message sends to the object being queued up on the processes input queue. Since processes could only process on message at a time deadlocks can occur - plus all kinds of nasty problems resulting from the order of messages in the queue. (There is a similar nasty problem with the GUI event processing in VisualAge Smalltalk that leads to very difficult to diagnose and comprehend concurrency problems). It's a rats maze that's best to avoid.

Besides, in some cases an object with multiple threads could respond to many messages - literally - at the same time given multiple cores. Why slow down the system by putting all the messages into a single queue when you don't have to!? You didn't understand the model I'm talking about.

That is likely the case.

...
There isn't such a thing as an object with multiple trheads. That does not exists in this model.

Ok. I got that.

...
It does exists one process per instance no more no less.

I did get that. Even if you only do that logically you've got serious problems.

...
I think you're thinking about processes and threads the same way you know them today.

I can easily see such a scenario working and also breaking all over the place.

...
Lets see if this helps you to get the idea: Desambiguation: for this model I'm talking about process not as an OS process but as a VM light process which we also use to call them threads.

Ok.

...
So I'm saying that in this model you have only one process per instance but that process is not a process that can have threads belonging to it.

ok.

...
That generates a hell of complexity.

You lost me there. What complexity?

...
The process I'm saying it's tied to an instance it's more close to the process word you know from dictionary plus what you know what an instance is and with the process implemented by a VM that can balance it across cores.

I didn't understand. Please restate.

...
I'm not falling in the pitfall of start trying to parallelize code automagically. This far from it. In fact I think this is better than that illusion. Every message is guaranteed by VM to reach it's destination in guaranteed order. Otherwise will be chaos. And we want an ordered chaos like the one we have now in a Squeak reified image.

Yes, squeak is ordered chaos. ;--).

...
Clarified that I ask why do you think could be deadlocks? and what other kind of concurrency problems do you think that will this model suffer?

If a number of messages are waiting in the input queue of a process that can only process one message at a time since it's not multi-threaded then those messages are BLOCKED while in the thread. Now imagine another process with messages in it's queue that are also BLOCKED since they are waiting in the queue and only one message can be processed at a time. Now imagine that process A and process B each have messages that the other needs before it can proceed but those messages are BLOCKED waiting for processing in the queues.

This is a real example of what can happen with message queues. The order isn't guaranteed. Simple concurrency solutions often have deadlock scenarios. This can occur when objects must synchronize events or information. As soon as you have multiple threads of execution you've got problems that need solving regardless of the concurrency model in place.

...
Tying an object's life time to the lifetime of a process doesn't make sense since there could be references to the object all over the place. If the process quits the object should still be alive IF there are still references to it. You'd need to pass around more than references to processes. For if a process has more than one object you'd not get the resolution you'd need. No, passing object references around is way better.

Yes of course there will be. In this system a process termination is one of two things: A) that instance is being reclaimed in a garbage collection or B) that instance has been written to disk in a kind of hibernation that can be reified again on demand. Please refer to my previous post with subject "One Process Per Instance.." where I talk more about exacly this.

If all there is is a one object per process and one process per object - a 1 to 1 mapping then yes gc would work that way but the 1 to 1 mapping isn't likely to ever happen given current and future hardware prospects.

...
Even if you considered an object as having it's own "logical" process you'd get into the queuing problems hinted at above.

Which I dont see and I ask your help to understand if you still find them after the clarifications made about the model.

See the example above.

...
Besides objects in Smalltalk are really fine grained. The notion that each object would have it's own thread would require so much thread switching that no current processor could handle that. It would also be a huge waste of resources. And what do you think was going out of the mouths of criticizers of the initiatives like the park place team had in 1970's making a Smalltalk with the price of the CPU's and RAM at that time? that VM's are a smart efficient use of resources?

That's not really relevant. If you want to build that please go ahead - please don't let me stop you, that's the last thing I'd want. I wish you luck. I get to play with current hardware and hardware that's coming down the pipe such as the Tile-64 or the newest GPUs when they are available to the wider market.

...
So I copy paste myself: "I don't give credit to unfounded apriorisms. It deserves to be proven that does not work. Anyway let's just assume that may be too much for state of the art hardware in common computers in year 2007. What about in 2009? what about in 2012?"

Well just get out your calculator. There is an overhead to a thread or process in bytes. Say 512 bytes per thread plus it's stack. There is the number of objects. Say 1 million for a medium to small image. Now multiply those and you get 1/2 gigabyte. Oh, we forgot the stack space and the memory for the objects themselves. Add a multiplier for that, say 8 and you get 4 gigabytes. Oh, wait we forgot that the stack is kind weird since as each message send that isn't to self must be an interprocess or interthread message send you've got some weirdness going on let along all the thread context switching for each message send that occurs. Then you've got to add more for who knows what... the list could go on for quite a bit. It's just mind boggling.

Simply put current cpu architectures are simply not designed for that approach. Heck they are even highly incompatible with dynamic message passing since they favor static code in terms of optimizations.

...
Again, one solution does not fit all problems - if it did programming would be easier.

But programming should have to be easier.

Yes, I concur, whenever it's possible to do so. But it also shouldn't ignore the hard problems either.

...
Smalltalk made it easier in a lot of aspects.

Sure I concur. That's why I am working here in this group spending time (is money) on these emails.

...
Listen.. I'm not a naif silver bullet purchaser nor a faithful person. I'm a critic Smalltalker that thinks he gets the point about OOP and tries to find solutions to surpass the multicore crisis by getting an empowered system not consoling itself with a weaker one.

I do get that about you.

...
Peter please try to forget about how systems are made and think in how you want to make them.

I do think about how I want to make them. However to make them I have no choice but to consider how to actually build them using existing technologies and the coming future technologies.

Currently we have 2-core and 4-core processors as the mainstream with 3-core and 8-core coming to a computer store near you. We have the current crop of GPUs from NVidia that have 128 processing units that can be programmed in a variant of C for some general purpose program tasks using a SIMD (single instruction multiple data) format - very useful for those number crunching applications like graphics, cryptology and numeric analysis to name just a few. We also have the general purpose networked Tile-64 coming - lots of general purpose compute power with an equal amount of scalable networked IO power - very impressive. Intel even has a prototype with 80-cores that is similar. Intel also has it's awesomely impressive Itanium processor with instruction level parallelism as well as multiple cores - just wait till that's a 128 core beastie. Please there is hardware that we likely don't know about or that hasn't been invented yet. Please bring it on!!!

The bigger problem is that in order to build real systems I need to think about how they are constructed.

So yes, I want easy parallel computing but it's just a harsh reality that concurrency, synchronization, distributed processing, and other advanced topics are not always easy or possible to simplify as much as we try to want them to be. That is the nature of computers.

Sorry for being a visionary-realist. Sorry if I've sounded like the critic. I don't mean to be the critic that kills your dreams - if I've done that I apologize. I've simply meant to be the realist who informs the visionary that certain adjustments are needed.

All the best,

Peter

Sebastian Sastre

26 Oct 26 Oct

12:46 a.m.

...

-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de nicolas cellier Enviado el: Jueves, 25 de Octubre de 2007 17:40 Para: squeak-dev@lists.squeakfoundation.org Asunto: Re: Multy-core CPUs

But Smalltalk methods are sequential procedures by nature, so having a process per object maybe would adress mutual exclusion problem, but will not introduce parallelism per se.

But that's a black hole which I don't want to enter nor be near. I never wanted to introduce parallelism per se. What I do want is *just* to get a Smalltalk that can conveniently balance the cpu load in an arbitrary quantity of cores.

...

Someone has to decide to break execution sequential path into

....

...

PS: for fun, what happens to all-is-object paradigm if each and every object has a MessageQueue object? What is the MessageQueue of the MessageQueue of ...

LOL.. Good question kind of class Metaclass Moebious thing

Cheers,

Sebastian

Sebastian Sastre

25 Oct 25 Oct

11:15 p.m.

_____

De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Peter William Lount Enviado el: Jueves, 25 de Octubre de 2007 16:29 Para: The general-purpose Squeak developers list Asunto: Re: Multy-core CPUs

Hi,

Sebastian Sastre wrote:

hi,

What? That just won't work. Think of the memory overhead.

It's not an unfounded apriorism as you put it.

Current hardware and technology expected in the next ten years isn't optimized for N hundred thousand or N million threads of execution. Maybe in the future that will be the case.

I work with real hardware.

I am open and willing to be pleasantly surprised however.

Peter.. Peter.. you have to fight a little harder to that demon. Look I asked you to read my previous post with subject "One Process Per Instance" where I taked the time (so money) of explainly as didactic as I can how *your* millon object example could be managed in a system like the one I'm speculating with. So please, please Peter, I ask you not to make repeat myself go read it and make your statements there if you found them. As I already said I think the experiences you are sharing in this matter are precious so discussion gets just richer.

You didn't understand the model I'm talking about.

That is likely the case.

So I ask you kindly if you can read my previus emails where I where I have taken the job of expresing my exploratory thoughts until reached this model and the speculation about the existence of this model (consequences).

There isn't such a thing as an object with multiple trheads. That does not exists in this model.

Ok. I got that.

It does exists one process per instance no more no less.

I did get that. Even if you only do that logically you've got serious problems.

If you have read where I talk about how to manage with this model N millon objects with limited hardware and you still found problems please be my guest to inform me here because I want to know that as soon as possible.

I think you're thinking about processes and threads the same way you know them today.

I can easily see such a scenario working and also breaking all over the place.

Why?

Lets see if this helps you to get the idea: Desambiguation: for this model I'm talking about process not as an OS process but as a VM light process which we also use to call them threads.

Ok.

So I'm saying that in this model you have only one process per instance but that process is not a process that can have threads belonging to it.

ok.

That generates a hell of complexity.

You lost me there. What complexity?

Does not matter is other model not the one I'm speculating. (probably one you have imagined before clarifiying the 1:1 object process thing)

The process I'm saying it's tied to an instance it's more close to the process word you know from dictionary plus what you know what an instance is and with the process implemented by a VM that can balance it across cores.

I didn't understand. Please restate.

I restates that N times in my previus emails being too long. To give you a clue it's about the double nature I'm saying the object has. An amalgam between object and process. It's conceptual indissociability. More on those previus emails.

Yes, squeak is ordered chaos. ;--).

Clarified that I ask why do you think could be deadlocks? and what other kind of concurrency problems do you think that will this model suffer?

But that can happen right now if you give a bad use of process in a current Smalltalk. I don't want to solve deadlocks for anybody using parallelism badly. I just want a Smalltalk that works like todays but balancing cpu load across cores and scaling to an arbitrary number of them. All this trhead it's about that.

You'd need to pass around more than references to processes. For if a process has more than one object you'd not get the resolution you'd need. No, passing object references around is way better.

But Peter don't lower your guard on that so easy! we know techniques to administer resources like navegating 10 thousand instances at the time a 10 gigas image of 10 million objects! don't shoot hope before it borns! I talk some details I've imagined about this in my "One Process Per Instance" post.

Even if you considered an object as having it's own "logical" process you'd get into the queuing problems hinted at above.

Which I dont see and I ask your help to understand if you still find them after the clarifications made about the model.

See the example above.

We all have to use cheap hardware. Please (re)think about what I said about administering hardware resources over this model.

I just can't beleive we really can't find clever ways of adminiter resources to the point in which this becomes acceptable.

Yes that happens with machines based on mathematic models like the boolean model. It injects an inpedance mismatch between the conceptual modeling and the virtual modeling.

Again, one solution does not fit all problems - if it did programming would be easier.

But programming should have to be easier.

Yes, I concur, whenever it's possible to do so. But it also shouldn't ignore the hard problems either.

Smalltalk made it easier in a lot of aspects.

Sure I concur. That's why I am working here in this group spending time (is money) on these emails.

Listen.. I'm not a naif silver bullet purchaser nor a faithful person. I'm a critic Smalltalker that thinks he gets the point about OOP and tries to find solutions to surpass the multicore crisis by getting an empowered system not consoling itself with a weaker one.

I do get that about you.

Peter please try to forget about how systems are made and think in how you want to make them.

I do think about how I want to make them. However to make them I have no choice but to consider how to actually build them using existing technologies and the coming future technologies.

The bigger problem is that in order to build real systems I need to think about how they are constructed.

All the best,

Peter

Please take your time to think about what I've stated of administering resources being possible to manage load of millions of instances by a swarm of a few at the time. And don't be sorry of anything. I love criticism. Our culture need tons of criticism to be stronger. It's the only way we can unistall deprecated or obsolete ideas. You are helping here. If I really dreaming an this don't work I want that dream to be kill now so I can spend my time in something better. That helps.

By now this model it's just getting stronger. Please try to get it down !!! :)))

cheers,

Sebastian

Peter William Lount

11:08 p.m.

Hi,

Sebastian Sastre wrote:

Hi,

Sebastian Sastre wrote:

...

hi,

What? That just won't work. Think of the memory overhead. 

I don't give credit to unfounded apriorisms. I think it deserves
to be proved that does not work. Anyway let's just assume that may
be too much for state of the art hardware in common computers in
year 2007. What about in 2009? what about in 2012? Remember the
attitude you had saying this now the first day of 2012.

It's not an unfounded apriorism as you put it.

Current hardware and technology expected in the next ten years isn't optimized for N hundred thousand or N million threads of execution. Maybe in the future that will be the case.

I work with real hardware.

I am open and willing to be pleasantly surprised however.

I recall a counter example of the million objects was to split the data objects into 10,000 chunks. However, that's a different problem not the one that I have to deal with.

...

Tying an object instance to a particular process makes no sense.
If you did that you'd likely end up with just as many dead locks
and other concurrency problems since you'd now have message sends
to the object being queued up on the processes input queue. Since
processes could only process on message at a time deadlocks can
occur - plus all kinds of nasty problems resulting from the order
of messages in the queue. (There is a similar nasty problem with
the GUI event processing in VisualAge Smalltalk that leads to very
difficult to diagnose and comprehend concurrency problems). It's a
rats maze that's best to avoid.

Besides, in some cases an object with multiple threads could
respond to many messages - literally - at the same time given
multiple cores. Why slow down the system by putting all the
messages into a single queue when you don't have to!?
You didn't understand the model I'm talking about.

That is likely the case.

There are so many emails in this thread please link to the emails you'd like me to reread. Thanks very much.

...

There isn't such a thing as an object with multiple trheads. That
does not exists in this model.

Ok. I got that.

...

It does exists one process per instance no more no less.

I did get that. Even if you only do that logically you've got serious problems.

Yes, I know it's possible for systems like Erlang to have 100,000 virtual processes, aka lightweight threads, that can be in one real native operating system process or across many native processes.

Are you saying that you've figured out how to do that with millions of processes?

...

I think you're thinking about processes and threads the same way
you know them today.

I can easily see such a scenario working and also breaking all over the place.

Why?

...

Lets see if this helps you to get the idea: Desambiguation: for
this model I'm talking about process not as an OS process but as a
VM light process which we also use to call them threads.

Ok.

...

So I'm saying that in this model you have only one process per
instance but that process is not a process that can have threads
belonging to it.

ok.

...

That generates a hell of complexity.

You lost me there. What complexity?

Does not matter is other model not the one I'm speculating. (probably one you have imagined before clarifiying the 1:1 object process thing)

Alan Kay's original work suggested that each object had a process. There is the logical view and the idealized view. Then there is the concrete and how to implement things. How one explains things to end users is often with the idealized view. How one implements is more often with something that isn't quite the idea.

...

The process I'm saying it's tied to an instance it's more close to
the process word you know from dictionary plus what you know what
an instance is and with the process implemented by a VM that can
balance it across cores.

I didn't understand. Please restate.

Alright I'll have to reread the entire thread since no one wants to clearly state their pov in one email as I attempt to do out of courtesy. I don't have time to reread the entire thread today though. (That's why it is a courtesy to repost - it saves your readers time).

...

I'm not falling in the pitfall of start trying to parallelize code
automagically. This far from it. In fact I think this is better
than that illusion. Every message is guaranteed by VM to reach
it's destination in guaranteed order. Otherwise will be chaos. And
we want an ordered chaos like the one we have now in a Squeak
reified image.

Yes, squeak is ordered chaos. ;--).

...

Clarified that I ask why do you think could be deadlocks? and what
other kind of concurrency problems do you think that will this
model suffer?

Yes it can happen now. That's why it's important to actually learn concurrency control techniques. Books like the Little Book of Semaphores can help with that learning process.

The point is that a number of people in this thread are proposing solutions that seem to claim that these problems magically go away in some utopian manner with process-based concurrency. All I'm pointing out is that there isn't a silver bullet or concurrency utopia and now I'm getting flack for pointing out that non-ignorable reality. So be it. Those that push ahead are often the ones with many arrows in their back.

...

Tying an object's life time to the lifetime of a process doesn't
make sense since there could be references to the object all over
the place. If the process quits the object should still be alive
IF there are still references to it.
You'd need to pass around more than references to processes. For
if a process has more than one object you'd not get the resolution
you'd need. No, passing object references around is way better.

Yes of course there will be. In this system a process termination
is one of two things: A) that instance is being reclaimed in a
garbage collection or B) that instance has been written to disk in
a kind of hibernation that can be reified again on demand.  Please
refer to my previous post with subject "One Process Per
Instance.." where I talk more about exacly this.

Well then you must have a radically different meaning of 1 to 1 object to process mapping than I have or a radically different implementation that I've understood from your writings. If you can make it work that is all the proof that you need, isn't it!?

...

Even if you considered an object as having it's own "logical"
process you'd get into the queuing problems hinted at above.

Which I dont see and I ask your help to understand if you still
find them after the clarifications made about the model.

See the example above.

...

Besides objects in Smalltalk are really fine grained. The notion
that each object would have it's own thread would require so much
thread switching that no current processor could handle that. It
would also be a huge waste of resources.
And what do you think was going out of the mouths of criticizers
of the initiatives like the park place team had in 1970's making a
Smalltalk with the price of the CPU's and RAM at that time? that
VM's are a smart efficient use of resources?

We all have to use cheap hardware. Please (re)think about what I said about administering hardware resources over this model.

...

So I copy paste myself: "I don't give credit to unfounded
apriorisms. It deserves to be proven that does not work. Anyway
let's just assume that may be too much for state of the art
hardware in common computers in year 2007. What about in 2009?
what about in 2012?"

I just can't beleive we really can't find clever ways of adminiter resources to the point in which this becomes acceptable.

Each thread needs a stack, a stored set of registers. That's at least two four kilobyte memory pages (one for the stack and one for the registers) with current hardware assuming your thread is mapped to a real processor thread of execution at some point. The two pages are there so that you can have the processor detect if the stack grows beyond it's four kilobyte page. Now you could pack them into one page when it's not being executed but that would increase your context switch time to pack and unpack. If you avoid that and simply use the same page for both then you're risking having your stack overwrite memory used for the process/thread which would be unsafe multi-threading.

Maybe some hardware designers will figure it out.

However, there is still the worst pitfall of the 1 to 1 mapping of process to object: that is the overhead of each message send to another object would require a thread context switch! That's is inescapably huge.

Yes that happens with machines based on mathematic models like the boolean model. It injects an inpedance mismatch between the conceptual modeling and the virtual modeling.

...

Again, one solution does not fit all problems - if it did
programming would be easier.

But programming should have to be easier.

Yes, I concur, whenever it's possible to do so. But it also shouldn't ignore the hard problems either.

...

Smalltalk made it easier in a lot of aspects.

Sure I concur. That's why I am working here in this group spending time (is money) on these emails.

...

Listen.. I'm not a naif silver bullet purchaser nor a faithful
person. I'm a critic Smalltalker that thinks he gets the point
about OOP and tries to find solutions to surpass the multicore
crisis by getting an empowered system not consoling itself with a
weaker one.

I do get that about you.

...

Peter please try to forget about how systems are made and think in
how you want to make them.

I do think about how I want to make them. However to make them I have no choice but to consider how to actually build them using existing technologies and the coming future technologies.

The bigger problem is that in order to build real systems I need to think about how they are constructed.

All the best,

Peter

Well having loads of millions of instances managed by a swarm of them at once is what I was assuming. In fact Linux does this (well for thousands not millions anyway). It turns out that Intel's X86/IA32 architecture can only handle 4096 threads in hardware. What Linux did was virtualize them so that only one hardware thread was used for the active thread (per core I would assume). This allowed Linux to avoid the glass ceiling of 4096 threads. However, there are limits due to the overhead of context switching time and the overhead of space that each thread - even with a minimal stack as would be the case with the model you are proposing might have. It's just too onerous for practical use.

Unless you are doing something radically different that I don't understand that is.

By now this model it's just getting stronger. Please try to get it down !!! :)))

I though I crushed it already!!! ;--)

Certainly until you can provide a means for it to handle the one million data objects across 10,000 processes with edits going to the 10,000 processes plus partial object graph seeding (and any object on demand) to them and end up with one and a half million output objects with the total number of interconnections increased by 70% I'll consider it crushed. ;--)

Forward to the future - to infinity and beyond with real hardware!

Cheers,

Peter

Sebastian Sastre

26 Oct 26 Oct

2:14 a.m.

Hi Peter,

here I wrote (conceptually) about how an image with One Process Per Instance should start, work and be written in disk:

http://www.nabble.com/One-Process-Per-Instance-%28RE%3A-Multy-core-CPUs%29- p13408771.html http://www.nabble.com/One-Process-Per-Instance-%28RE%3A-Multy-core-CPUs%29-p 13408771.html

Here I sate some of why prioritize an idea like this and invokes your attention about your 1M instances example:

http://www.nabble.com/RE%3A-Multy-core-CPUs-p13406112.html

You may consider this option crushed but I consider it more onerous that we has used to but still valid. In fact I beleive that but I'm convinced that clever VM techniques can do it usable enough for mosts typical Smalltalk needs. So now maybe is not valid to real time sampling high quality sound nor real time ray traycing because of the hardware but that does dont makes it an invalid option.

For lots of applications will still be usable. Remember that cheap or expensive is subjetive. This model unloads the persons from having to pay the impedance mismatch by loading machines to pay it so machines had the hard work for us at that price. It's a trade off that you show that are you're not willing to take. But I'm very confident that, not being the silver bullet, this has a wide space of solutions.

In short: you don't use GemStone/S to sample audio. An by the way, with all the overhead you cited, how do you think an application like this will perform compared to one that uses a relational database for persistance? maybe compared to GemStone/S itself?

Anyway I feel we reached the limit of the theoretical discussion. Tests will be needed to go forward. With them maybe VM numbers surprise us and is not that onerous or we can use lots of known optimizations into it to mitigate the initial implementation cost. Sadly I'm unable to invest in this now to know where it leads.

So I think I'll hibernate this then. Maybe in 2 or 4 years result that is a convenient approach to something.

With a still valid model but with "wings cutted" by the lack of resources :)

all the best!

Sebastian

_____

De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Peter William Lount Enviado el: Jueves, 25 de Octubre de 2007 18:08 Para: The general-purpose Squeak developers list Asunto: Re: Multy-core CPUs

Hi,

...

I recall a counter example of the million objects was to split the data objects into 10,000 chunks. However, that's a different problem not the one that I have to deal with.

You didn't understand the model I'm talking about.

That is likely the case.

There are so many emails in this thread please link to the emails you'd like me to reread. Thanks very much.

There isn't such a thing as an object with multiple trheads. That does not exists in this model.

Ok. I got that.

It does exists one process per instance no more no less.

I did get that. Even if you only do that logically you've got serious problems.

Yes, I know it's possible for systems like Erlang to have 100,000 virtual processes, aka lightweight threads, that can be in one real native operating system process or across many native processes.

Are you saying that you've figured out how to do that with millions of processes?

I think you're thinking about processes and threads the same way you know them today.

I can easily see such a scenario working and also breaking all over the place.

Why?

Lets see if this helps you to get the idea: Desambiguation: for this model I'm talking about process not as an OS process but as a VM light process which we also use to call them threads.

Ok.

So I'm saying that in this model you have only one process per instance but that process is not a process that can have threads belonging to it.

ok.

That generates a hell of complexity.

You lost me there. What complexity?

Does not matter is other model not the one I'm speculating. (probably one you have imagined before clarifiying the 1:1 object process thing)

I didn't understand. Please restate.

Yes, squeak is ordered chaos. ;--).

Clarified that I ask why do you think could be deadlocks? and what other kind of concurrency problems do you think that will this model suffer?

Yes it can happen now. That's why it's important to actually learn concurrency control techniques. Books like the Little Book of Semaphores can help with that learning process.

You'd need to pass around more than references to processes. For if a process has more than one object you'd not get the resolution you'd need. No, passing object references around is way better.

Even if you considered an object as having it's own "logical" process you'd get into the queuing problems hinted at above.

Which I dont see and I ask your help to understand if you still find them after the clarifications made about the model.

See the example above.

We all have to use cheap hardware. Please (re)think about what I said about administering hardware resources over this model.

I just can't beleive we really can't find clever ways of adminiter resources to the point in which this becomes acceptable.

Maybe some hardware designers will figure it out.

Yes that happens with machines based on mathematic models like the boolean model. It injects an inpedance mismatch between the conceptual modeling and the virtual modeling.

Again, one solution does not fit all problems - if it did programming would be easier.

But programming should have to be easier.

Yes, I concur, whenever it's possible to do so. But it also shouldn't ignore the hard problems either.

Smalltalk made it easier in a lot of aspects.

Sure I concur. That's why I am working here in this group spending time (is money) on these emails.

I do get that about you.

Peter please try to forget about how systems are made and think in how you want to make them.

I do think about how I want to make them. However to make them I have no choice but to consider how to actually build them using existing technologies and the coming future technologies.

The bigger problem is that in order to build real systems I need to think about how they are constructed.

All the best,

Peter

Unless you are doing something radically different that I don't understand that is.

By now this model it's just getting stronger. Please try to get it down !!! :)))

I though I crushed it already!!! ;--)

Forward to the future - to infinity and beyond with real hardware!

Cheers,

Peter

Marcel Weiher

6:25 a.m.

New subject: Multi-core CPUs

On Oct 25, 2007, at 12:28 PM, Peter William Lount wrote:

...

The Tile-64 processor is expected to grow to about 4096 processors by pushing the limits of technology beyond what they are today. To reach the levels you are talking about for a current Smalltalk image with millions of objects each having their own thread (or process) isn't going to happen anytime soon.

I work with real hardware.

A couple of numbers:

- Montecito, the new dual-core Itanic has 1.72 billion transistors. - The ARM6 macrocell has around 35000 transistors - divide the two, and you will find that you could get more ARM6 cores for the Montecito transistor budget than the ARM6 has transistors

So we can have a 35K object system with every processor having its own CPU core and all message-passing being asynchronous. This is likely to be highly inefficient, with most of the CPUs waiting/idle most of the time, say 99%. With 1% efficiency, and say, a 200MHz clock, the effective throughput would still be 200M * 35000 / 100 = 70 billion instructions per second. That's a lot of instructions. And wait what happens if we have some really parallel algorithm that cranks efficiency up to 10%!

I am not saying any of these numbers are valid or that this is a realistic system, but I do find the numbers of that little thought experiment...interesting. And of coures, while Moore's law appears to have stoppe for cycle times, it does seem to still be going for transistors per chip.

Marcel

Jason Johnson

6:57 a.m.

New subject: Multi-core CPUs

Interesting. I really think that to make real progress in parallelization we will have to walk away from the Intel model of shared memory stacked on 3 levels of cache, snoopy busses to propagate writes and so on. Of course they will force that model to scale to a point, but it can't go on forever and there just has to be a simpler way.

On 10/26/07, Marcel Weiher marcel.weiher@gmail.com wrote:

...

On Oct 25, 2007, at 12:28 PM, Peter William Lount wrote:

...
The Tile-64 processor is expected to grow to about 4096 processors by pushing the limits of technology beyond what they are today. To reach the levels you are talking about for a current Smalltalk image with millions of objects each having their own thread (or process) isn't going to happen anytime soon.

I work with real hardware.

A couple of numbers:

Montecito, the new dual-core Itanic has 1.72 billion transistors.

The ARM6 macrocell has around 35000 transistors

divide the two, and you will find that you could get more ARM6 cores

for the Montecito transistor budget than the ARM6 has transistors

So we can have a 35K object system with every processor having its own CPU core and all message-passing being asynchronous. This is likely to be highly inefficient, with most of the CPUs waiting/idle most of the time, say 99%. With 1% efficiency, and say, a 200MHz clock, the effective throughput would still be 200M * 35000 / 100 = 70 billion instructions per second. That's a lot of instructions. And wait what happens if we have some really parallel algorithm that cranks efficiency up to 10%!

I am not saying any of these numbers are valid or that this is a realistic system, but I do find the numbers of that little thought experiment...interesting. And of coures, while Moore's law appears to have stoppe for cycle times, it does seem to still be going for transistors per chip.

Marcel

tim Rowledge

7:16 a.m.

New subject: Multi-core CPUs

On 25-Oct-07, at 9:25 PM, Marcel Weiher wrote:

...

Montecito, the new dual-core Itanic has 1.72 billion transistors.

The ARM6 macrocell has around 35000 transistors

divide the two, and you will find that you could get more ARM6

cores for the Montecito transistor budget than the ARM6 has transistors

Nicely pointed out Marcel! I've been trying to make a similar point for about, oh two decades now....

In fact around ten years ago TI announced some new technology relating to wafer scale fabrication (I think, don't hold me to this) and as an illustration of its possibilities they said it meant they could put (something like) 128 StrongARM cpus each with 4MB ram on a wafer. Now let's say we take an easy path and put a mere 1000 ARM cores on a chip, so as to leave some room for caches and transputer- like links (I think someone actually did those for ARM at some point in the past) and interface stuff. ARM 1176 cores are rated for 800MHz with claims of up to 1GHz so we have potential for a quadrillion instruction per second. Even Microsoft would surely have trouble soaking up that much cpu with pointless fiddle-faddle.

If we got no better than 1% useful work because of poor code we'd still be getting 10 gips.

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful Latin Phrases:- Utinam logica falsa tuam philosophiam totam suffodiant! = May faulty logic undermine your entire philosophy!

Peter William Lount

8:32 p.m.

New subject: Multi-core CPUs

tim Rowledge wrote:

...

On 25-Oct-07, at 9:25 PM, Marcel Weiher wrote:

...

Montecito, the new dual-core Itanic has 1.72 billion transistors.

The ARM6 macrocell has around 35000 transistors

divide the two, and you will find that you could get more ARM6

cores for the Montecito transistor budget than the ARM6 has transistors

Nicely pointed out Marcel! I've been trying to make a similar point for about, oh two decades now....

In fact around ten years ago TI announced some new technology relating to wafer scale fabrication (I think, don't hold me to this) and as an illustration of its possibilities they said it meant they could put (something like) 128 StrongARM cpus each with 4MB ram on a wafer. Now let's say we take an easy path and put a mere 1000 ARM cores on a chip, so as to leave some room for caches and transputer-like links (I think someone actually did those for ARM at some point in the past) and interface stuff. ARM 1176 cores are rated for 800MHz with claims of up to 1GHz so we have potential for a quadrillion instruction per second. Even Microsoft would surely have trouble soaking up that much cpu with pointless fiddle-faddle.

If we got no better than 1% useful work because of poor code we'd still be getting 10 gips.

Hi,

That is essentially what Tilera is doing with their Tile-N processors (where N is 36, 64, 128, 1024, 4096, ...). They are shipping the Tile-64 chip now or shortly. http://www.Tilera.com.

They have a design "kill rule" which states that if they increase the surface area by N% the cpu performance must also increase by at least N%.

The Itanium however is an awesome processor in it's own right regardless of the number of transistors it's using. It has predicate registers plus 128 64 bit integer registers and 128 floating point registers. Lots of registers so the arguments about not enough registers can be put to bed. In fact the register file is sort of like but not quite like the Sun Sparc processors. It has instruction level parallelism which is good for a great many problems. Overall a very interesting and powerful processor.

When it comes to transistor budgets your analysis is correct... and may will the day in the market place. We'll see if Tilera or Intel will bring these internally networked grid chips to the mainstream market.

Peter

Sebastian Sastre

5:05 p.m.

New subject: Multi-core CPUs

...

-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Marcel Weiher Enviado el: Viernes, 26 de Octubre de 2007 01:26 Para: peter@smalltalk.org; The general-purpose Squeak developers list Asunto: Re: Multi-core CPUs

...

A couple of numbers:

Montecito, the new dual-core Itanic has 1.72 billion transistors.

The ARM6 macrocell has around 35000 transistors

divide the two, and you will find that you could get more

ARM6 cores for the Montecito transistor budget than the ARM6 has transistors

So we can have a 35K object system with every processor having its own CPU core and all message-passing being asynchronous. This is likely to be highly inefficient, with most of the CPUs waiting/idle most of the time, say 99%. With 1% efficiency, and say, a 200MHz clock, the effective throughput would still be 200M * 35000 / 100 = 70 billion instructions per second. That's a lot of instructions. And wait what happens if we have some really parallel algorithm that cranks efficiency up to 10%!

I am not saying any of these numbers are valid or that this is a realistic system, but I do find the numbers of that little thought experiment...interesting. And of coures, while Moore's law appears to have stoppe for cycle times, it does seem to still be going for transistors per chip.

Marcel

Marcel, as Smalltalk always shine in scaling in complexity I posted, there in the beginning of this matter, about the different dimensions of scalability. For the CPU, if cycle times is vertical and cores are horizontal we are, as you suggest, entering a horizontal cpu scaling moment (next years) mesurable with transistors per chip.

No matter wich model we choose to map the conceptual model in boolean processors (due to holy transistors) it will have impedance mismach.

When we select a solution our trade off will necessarily be making the choice by balancing, that impedance because of complexity, between machines, boolean domain, and persons, conceptual domain.

Ironically, this industry made by persons has an incredible talent to make things easier for machines at the cost of polluting the conceptual model.

Given that we can choose a path that pollute the conceptual model or not. As I see things polluting conceptual model is a "shoot in the foot". I think that the Smalltalk community should prioritize again the heurístic spirit of Smalltalk by showin willing to evade the injection of pollution in the conceptual model. Anyway it's our choice.

Regards,

Sebastian

Rob Withers

4:48 p.m.

New subject: Multi-core CPUs

Peter,

I also want to thank you for this link: http://www.greenteapress.com/semaphores/downey05semaphores.pdf I started to read it after David's comment about it and it is entertaining and I am learning lots.

I also plan on using it in phase 3 of my multi-threaded vm project.

Phase 1, my current phase, is to get all msg sends to be interceptable by the SqueakElib promise framework. This includes things that have been macro transformed by the Compiler, like #ifTrue:, #ifNil:, #whileTrue:, and so on. It also includes bytecodeMethods like #class and #==.

Phase 2 is to allow all primitives and bytecode methods to have a promise as an argument. Here, my plan is to stop short the primitive call and send the excapsulated primitive call to the promise(s) as part of a whenMoreResolved call. When the promise resolves, the primitive call will be made. QoS can be satisfied by joining the promise with a timer, such that if the promise does not resolve in xxx milliseconds, it will become broken and the primitive call will "fail".

Phase 3 is to make the Interpreter multithreaded, while protecting ObjectMemory with Semaphores. I have a quad-core chip and so I want 4 Interpreter threads (Vats). Only one of them can be inside of ObjectMemory at a time and that could be for purposes of allocation, mutation, or GC. It's possible that a simple mutex semaphore would suffice, initially. In this model, references to objects in other Vats will be ThreadRefs (a form of a FarRef) and msgs will be serialized to the other Vat (reassigned the VatID in the same shared ObjectMemory, or copied to a different but co-located ObjectMemory).

I don't think having a single ObjectMemory will scale to 10's of "processors", but will probably also need to be multithreaded with one per Vat. It's good from the standpoint of no shared memory. One challenge then is what if refs from 2 Vats are involved in the same primitive call. Well, memory reads don't have to be protected, unless memory can be relocated, that is. One thing at a time, I tell myself.

I have 0 experience in this area (Interpreter+ObjectMemory), but I thought it would be fun. Your link will help tremendously.

Cheers, Rob

Peter William Lount

7:53 p.m.

New subject: Multi-core CPUs

Rob Withers wrote:

...

Peter,

I also want to thank you for this link: http://www.greenteapress.com/semaphores/downey05semaphores.pdf I started to read it after David's comment about it and it is entertaining and I am learning lots.

I also plan on using it in phase 3 of my multi-threaded vm project.

Phase 1, my current phase, is to get all msg sends to be interceptable by the SqueakElib promise framework. This includes things that have been macro transformed by the Compiler, like #ifTrue:, #ifNil:, #whileTrue:, and so on. It also includes bytecodeMethods like #class and #==.

Phase 2 is to allow all primitives and bytecode methods to have a promise as an argument. Here, my plan is to stop short the primitive call and send the excapsulated primitive call to the promise(s) as part of a whenMoreResolved call. When the promise resolves, the primitive call will be made. QoS can be satisfied by joining the promise with a timer, such that if the promise does not resolve in xxx milliseconds, it will become broken and the primitive call will "fail".

Phase 3 is to make the Interpreter multithreaded, while protecting ObjectMemory with Semaphores. I have a quad-core chip and so I want 4 Interpreter threads (Vats). Only one of them can be inside of ObjectMemory at a time and that could be for purposes of allocation, mutation, or GC. It's possible that a simple mutex semaphore would suffice, initially. In this model, references to objects in other Vats will be ThreadRefs (a form of a FarRef) and msgs will be serialized to the other Vat (reassigned the VatID in the same shared ObjectMemory, or copied to a different but co-located ObjectMemory).

I don't think having a single ObjectMemory will scale to 10's of "processors", but will probably also need to be multithreaded with one per Vat. It's good from the standpoint of no shared memory. One challenge then is what if refs from 2 Vats are involved in the same primitive call. Well, memory reads don't have to be protected, unless memory can be relocated, that is. One thing at a time, I tell myself.

I have 0 experience in this area (Interpreter+ObjectMemory), but I thought it would be fun. Your link will help tremendously.

Cheers, Rob

Hi Rob,

Yeah, it's an awesome little book that cuts right to the chase. I particularly like that they show some solutions that won't work as concurrency is quite difficult and sometimes you think it's correct when it isn't. It's good to learn about those pitfalls.

You're plan sounds excellent. Thank you for taking up the task of making Squeak VM multi-threaded with native threads!

If you need anything...

All the best,

Peter

Ron Teitelbaum

7:58 p.m.

New subject: Multi-core CPUs

http://news.squeak.org/2007/10/26/wait-for-it-the-little-book-of-semaphores/

I like it too! :)

Ron Teitelbaum Squeak News Team Leader

...

-----Original Message----- From: Peter William Lount

Rob Withers wrote:

...
Peter,

I also want to thank you for this link: http://www.greenteapress.com/semaphores/downey05semaphores.pdf I started to read it after David's comment about it and it is entertaining and I am learning lots.

I also plan on using it in phase 3 of my multi-threaded vm project.

Phase 1, my current phase, is to get all msg sends to be interceptable by the SqueakElib promise framework. This includes things that have been macro transformed by the Compiler, like #ifTrue:, #ifNil:, #whileTrue:, and so on. It also includes bytecodeMethods like #class and #==.

Phase 2 is to allow all primitives and bytecode methods to have a promise as an argument. Here, my plan is to stop short the primitive call and send the excapsulated primitive call to the promise(s) as part of a whenMoreResolved call. When the promise resolves, the primitive call will be made. QoS can be satisfied by joining the promise with a timer, such that if the promise does not resolve in xxx milliseconds, it will become broken and the primitive call will "fail".

Phase 3 is to make the Interpreter multithreaded, while protecting ObjectMemory with Semaphores. I have a quad-core chip and so I want 4 Interpreter threads (Vats). Only one of them can be inside of ObjectMemory at a time and that could be for purposes of allocation, mutation, or GC. It's possible that a simple mutex semaphore would suffice, initially. In this model, references to objects in other Vats will be ThreadRefs (a form of a FarRef) and msgs will be serialized to the other Vat (reassigned the VatID in the same shared ObjectMemory, or copied to a different but co-located ObjectMemory).

I don't think having a single ObjectMemory will scale to 10's of "processors", but will probably also need to be multithreaded with one per Vat. It's good from the standpoint of no shared memory. One challenge then is what if refs from 2 Vats are involved in the same primitive call. Well, memory reads don't have to be protected, unless memory can be relocated, that is. One thing at a time, I tell myself.

I have 0 experience in this area (Interpreter+ObjectMemory), but I thought it would be fun. Your link will help tremendously.

Cheers, Rob

Hi Rob,

Yeah, it's an awesome little book that cuts right to the chase. I particularly like that they show some solutions that won't work as concurrency is quite difficult and sometimes you think it's correct when it isn't. It's good to learn about those pitfalls.

You're plan sounds excellent. Thank you for taking up the task of making Squeak VM multi-threaded with native threads!

If you need anything...

All the best,

Peter

Peter William Lount

9:05 p.m.

New subject: Multi-core CPUs - Synchronization Patterns as Classes?

Hi,

It seems that the "patterns" of synchronization in "The Little Book of Semaphores" are just that, patterns. Like other patterns they could be implemented as abstract and concrete classes so that rather than having to rewrite the solutions all over each time they are off the shelf and available for use. A class library of synchronization using semaphores might help with enabling people to leverage multi-threading with N-core cpus in Smalltalk (where N is greater than or equal to 1) and using green threads, native threads or both.

Just a thought.

Cheers,

peter

"The Little Book of Semaphores" http://www.greenteapress.com/semaphores/downey05semaphores.pdf

Rob Withers

27 Oct 27 Oct

1:23 a.m.

New subject: Multi-core CPUs

----- Original Message ----- From: "Peter William Lount" peter@smalltalk.org

...

You're plan sounds excellent. Thank you for taking up the task of making Squeak VM multi-threaded with native threads!

If you need anything...

I don't want to make it sound like I can't use some help, especially if it's offered. I can't do this alone. Forget it, especially with the day job. No, I figure it to be a 2 year task, at least. But I would rather build something than talk about all the theory. I fleshed out the phases I posted earlier with what I thought were some more managable tasks. I'd like to point out that Phase 3, implementing the multithreaded vm, is entirely independent of SqueakElib and would be useable by anyone wanting to do multithreading.

Here's the new page, add what you like, help where you can, holler to talk it over: http://wiki.squeak.org/squeak/6011

Cheers, Rob

Jason Johnson

4:01 p.m.

New subject: Multi-core CPUs

On 10/26/07, Peter William Lount peter@smalltalk.org wrote:

...

You're plan sounds excellent. Thank you for taking up the task of making Squeak VM multi-threaded with native threads!

Yes, thanks. I will need a true mult-threaded VM at some point as well. I just have to make it transparent to the processes running in the VM. :)

Peter William Lount

7:48 p.m.

New subject: Multi-core CPUs

Jason Johnson wrote:

...

On 10/26/07, Peter William Lount peter@smalltalk.org wrote:

...
You're plan sounds excellent. Thank you for taking up the task of making Squeak VM multi-threaded with native threads!

Yes, thanks. I will need a true mult-threaded VM at some point as well. I just have to make it transparent to the processes running in the VM. :)

Hi,

I really do like the notion of easy multi threading - really. I've admired Erlang for what it's achieved in that regard for years now. I encourage everyone interested in that to keep persevering and searching for a practical way forward towards your vision.

All the best,

Peter

Jason Johnson

10:36 p.m.

New subject: Multi-core CPUs

Well, what I plan to try out isn't the only way, and probably not the best, but I think it's a baby step in the right direction. As Andreas pointed out, there are other solutions that may even be better from a high level point of view (message passing still requires careful design).

I really believe shared state concurrency with fine grained locking can't scale much further then it already has. And I'm by no means the only one. Here is another thread on the matter:

http://lambda-the-ultimate.org/node/2048

On 10/27/07, Peter William Lount peter@smalltalk.org wrote:

...

Jason Johnson wrote: On 10/26/07, Peter William Lount peter@smalltalk.org wrote:

You're plan sounds excellent. Thank you for taking up the task of making Squeak VM multi-threaded with native threads!

Yes, thanks. I will need a true mult-threaded VM at some point as well. I just have to make it transparent to the processes running in the VM. :)

Hi,

I really do like the notion of easy multi threading - really. I've admired Erlang for what it's achieved in that regard for years now. I encourage everyone interested in that to keep persevering and searching for a practical way forward towards your vision.

All the best,

Peter

Steven Elkins

26 Oct 26 Oct

10:38 p.m.

The Design and Implementation of ConcurrentSmalltalk

http://www.amazon.com/Implementation-Concurrent-Smalltalk-Computer-Science/d...

...

From the Introduction: "In Concurrent Smalltalk, an object is not only

a unit of data abstraction but also a unit of execution."

On 10/25/07, Jason Johnson jason.johnson.081@gmail.com wrote:

...

On 10/24/07, Sebastian Sastre ssastre@seaswork.com wrote:

...
So I'm stating here that in a smalltalk image of the future *every object should have a process*. Every instance. All of them.

That is an interesting idea. That would open a door to a new way of Garbage collection, because it can then be tied to the exit of a process.

...
Said that I return to the problem you stated about the need of copy copy copy, saying that this premise changes things and you don't need to copy anymore because a VM like that, no matter who or when, an instVar of an object is to be modified it will provide you of guarantee that the write will be made by the process that corresponds to that instance.

Yes, in such a system, you don't need to copy because all that gets passed around are references to processes.

Jason Johnson

24 Oct 24 Oct

9:02 p.m.

On 10/24/07, Igor Stasenko siguctua@gmail.com wrote:

...

This having a perspective, only if you have unlimited memory resources and zero cost memory allocation.

I don't understand.

...

Lets look more precise on this. I will write only in ST(i don't know Erlang), and assuming that i understood well your concept , by having following ST code:

SomeClass>>setVars self setVar1: value1. self setVar2: value2. ... ^ self

here at each message send , instead of writing to receiver memory, we do copy-on-write cloning.

No, this is exactly what we *do not* do. As I have mentioned several times, I want message passing to be explicit. I had hoped the Erlang code would be clear, since it wasn't here is the same thing again in proposed Smalltalk code:

SomeProcess>>run "arbitrary name, doesn't have to be run or anything like that" self breakupStructureWith: 10000. self buildNewStrucureFrom: 10000. ^ structure

SomeProcess>>breakupStructureWith: aProcessCount |rest| rest := structure.

1 to: aProcessCount do: [ rest := splitAndSend: rest ].

SomeProcess>>buildNewStructureFrom: aProcessCount 1 to: aProcessCount do: [ |data| data := self process receive. self addDataToStructure: data.

Ok. Not optimal code in either case, but this Smalltalk code is the equivalent of what the Erlang code above did. Note that in ST I'm not passing the structure around or doing recursion because in ST I can modify variables. Also Note that in the Erlang example and this example the actual send was not shown. The function (split_and_send and splitAndSend respectively) were not shown because I didn't want to write a bunch of code breaking up some imaginary structure.

These are all just normal message sends. The only interprocess stuff is the unshown send (I had planned to use the binary message #!) and the receive method.

...

So, each time we modifying object we got a modified copy instead modifying original.

Why? Inside a given process I don't see a reason to disallow regular mutability.

Igor Stasenko

10:11 p.m.

On 24/10/2007, Jason Johnson jason.johnson.081@gmail.com wrote:

...

On 10/24/07, Igor Stasenko siguctua@gmail.com wrote:

...
This having a perspective, only if you have unlimited memory resources and zero cost memory allocation.

I don't understand.

...
Lets look more precise on this. I will write only in ST(i don't know Erlang), and assuming that i understood well your concept , by having following ST code:

SomeClass>>setVars self setVar1: value1. self setVar2: value2. ... ^ self

here at each message send , instead of writing to receiver memory, we do copy-on-write cloning.

No, this is exactly what we *do not* do. As I have mentioned several times, I want message passing to be explicit. I had hoped the Erlang code would be clear, since it wasn't here is the same thing again in proposed Smalltalk code:

SomeProcess>>run "arbitrary name, doesn't have to be run or anything like that" self breakupStructureWith: 10000. self buildNewStrucureFrom: 10000. ^ structure

SomeProcess>>breakupStructureWith: aProcessCount |rest| rest := structure.

1 to: aProcessCount do: [ rest := splitAndSend: rest ].

SomeProcess>>buildNewStructureFrom: aProcessCount 1 to: aProcessCount do: [ |data| data := self process receive. self addDataToStructure: data.

Ok. Not optimal code in either case, but this Smalltalk code is the equivalent of what the Erlang code above did. Note that in ST I'm not passing the structure around or doing recursion because in ST I can modify variables. Also Note that in the Erlang example and this example the actual send was not shown. The function (split_and_send and splitAndSend respectively) were not shown because I didn't want to write a bunch of code breaking up some imaginary structure.

These are all just normal message sends. The only interprocess stuff is the unshown send (I had planned to use the binary message #!) and the receive method.

...
So, each time we modifying object we got a modified copy instead modifying original.

Why? Inside a given process I don't see a reason to disallow regular mutability.

Aha, now i get it. So, your approach is to establish a fence between different processes, so they can't share objects. Or maybe more correct to say, that any callee process can have read-only access to any objects which belongs to caller process?

Its unclear how you would determine to which process object belongs to? This is at minimum would require an additional slot per object (ok, this is doable easily).

Also, unclear how you would persist state (or results of computation). Since all you can do now is to send a message to process, which will return an object in answer. Now, since returned object most probably will belong to callee process you must copy it to caller process. But in real you should care of copying a whole subgraph of objects (since you can return a collection of newly created objects (and they , in own turn can be a collections e.t.c.)- and all belonging to callee process). Then , after you done merging a graph, you can simply wipe all memory which was allocated by callee process. This part is easy.

Now, the most interesting part: mutating an objects in caller process. Suppose my starting process calls two different processes. And they came to the point, that they are willing to update a state of some object(s) in caller process (to be clear: process A contains object a, it calls processes B and C in parallel, and now B and C wanting to change state of object a). This could be done by detecting that active process tries to perform a write to an object which is not belongs to current process - so you could transform this attempt into implicit message which will be sent to caller process, like: callerProcess setInstVarOf: object index: x value: y or callerProcess setIndexVarOf: object index: x value: y (the number of cases is not very interesting here)

Now, if that _is_ allowed, we having a race condition, when two or more processes trying to update a state of same object(s) in parent process. And there is no ways instead of !! locking !! semantics to solve this. Or maybe i'm wrong here? ;)

If this is not allowed, then you must follow by two ways: - do copy-on-write , at any attempt of updating 'foreign' object. This arises new problem - how to merge a copied and modified state of object with previous one? How to propagate these changes to other processes? (If you not propagate them, then you actually breaking semantics).

- generate an exception on write attempt. Basta. This option diverges your implementation from any current implementation of smalltalk. And its impossible to adopt old code to new VM with such limitations.

-- Best regards, Igor Stasenko AKA sig.

Jason Johnson

10:44 p.m.

On 10/24/07, Igor Stasenko siguctua@gmail.com wrote:

...

Aha, now i get it.

Good, I should have just posted some theoritical Smalltalk code to begin with, this thread would probably be half as big. :)

...

So, your approach is to establish a fence between different processes, so they can't share objects. Or maybe more correct to say, that any callee process can have read-only access to any objects which belongs to caller process?

No, the plan was that since in Smalltalk objects are mutable, I will have to pay an extra cost for internal message sends and have the VM do a deep copy for the sent objects.

Another alternative would be to introduce an immutable flag on references, then the "receiver" gets a reference to the object but flagged immutable. This way might be better, but requires more changes.

...

Its unclear how you would determine to which process object belongs to? This is at minimum would require an additional slot per object (ok, this is doable easily).

Not needed. The boundaries are: object creation, object send and object receive. All controlled from the VM or in the library, so I just have to guarantee that a mutable object can never sneak out of it's process, i.e. a process can never get a mutable reference to an object owned by a different process.

...

Also, unclear how you would persist state (or results of computation). Since all you can do now is to send a message to process, which will return an object in answer.

No, all message sends are async, fire-and-forget. You don't get a future or anything back. You can easily build sync messages on top of this if you want, but the base system isn't planned to directly support it.

...

Now, since returned object most probably will belong to callee process you must copy it to caller process. But in real you should care of copying a whole subgraph of objects (since you can return a collection of newly created objects (and they , in own turn can be a collections e.t.c.)- and all belonging to callee process). Then , after you done merging a graph, you can simply wipe all memory which was allocated by callee process. This part is easy.

Ah, if you're talking about the receive call, yes that will get an object returned. My first cut will just be (as mentioned above) a deep copy. Performance will likely drive me to adding immutable references, or objects or something. Or perhaps what you're suggestion here.

...

Now, the most interesting part: mutating an objects in caller process. Suppose my starting process calls two different processes. And they came to the point, that they are willing to update a state of some object(s) in caller process (to be clear: process A contains object a, it calls processes B and C in parallel, and now B and C wanting to change state of object a).

Can't happen. There is no shared state. Ever. The only thing that's shared is the Process' mail box, but that is an internal VM detail, not visible to the processes.

(remaining comments snipped since I think they assume shared state which does not exist in my plan. If you have some reason you think I can't avoid it let me know because I don't see it so far).

Jason Johnson

10:47 p.m.

On 10/24/07, Jason Johnson jason.johnson.081@gmail.com wrote:

...

No, the plan was that since in Smalltalk objects are mutable, I will have to pay an extra cost for internal message sends and have the VM do a deep copy for the sent objects.

Ack, terminology overload. :) What I meant here is, obviously if I sent a message between two literal images there is no choice but to do a deep copy. Erlang gains some benefit from sending interprocess messages where the sender and receiver are in the same literal image via reference, but I can't because Smalltalk can mutate variables. So this means I have to do the deep copy in *every* case. Unless I make some changes.

Peter William Lount

25 Oct 25 Oct

2:18 a.m.

Jason Johnson wrote:

...

On 10/24/07, Jason Johnson jason.johnson.081@gmail.com wrote:

...
No, the plan was that since in Smalltalk objects are mutable, I will have to pay an extra cost for internal message sends and have the VM do a deep copy for the sent objects.

Ack, terminology overload. :) What I meant here is, obviously if I sent a message between two literal images there is no choice but to do a deep copy. Erlang gains some benefit from sending interprocess messages where the sender and receiver are in the same literal image via reference, but I can't because Smalltalk can mutate variables. So this means I have to do the deep copy in *every* case. Unless I make some changes.

Hi,

No, you'd not have to deep copy every time you send the messages. You can send references and when accessing them in the remote image (or image B if you prefer) you can ask the local image (or image A if you prefer) to send the missing data. Now this assumes that the objects in image A didn't change in the meantime. Yikes. Problems are getting worse. You can't avoid them. There is no silver bullet with this attempt at simplifying concurrency. It's a harsh reality.

Cheers,

Peter

Jason Johnson

6:39 a.m.

On 10/25/07, Peter William Lount peter@smalltalk.org wrote:

...

Hi,

No, you'd not have to deep copy every time you send the messages. You can send references and when accessing them in the remote image (or image B if you prefer) you can ask the local image (or image A if you prefer) to send the missing data.

Or I can just not do that, do the deep copy and not have the problems mentioned in the rest of your mail. Again you are talking about something that *I'm not* and then explaining why *your approach* is hard to do.

...

Now this assumes that the objects in image A didn't change in the meantime. Yikes. Problems are getting worse. You can't avoid them. There is no silver bullet with this attempt at simplifying concurrency. It's a harsh reality.

Cheers,

Peter

The insight that Bell labs had with Unix over the mainframe makers was that *we don't need a silver bullet*. We need to get 90% and provide some way that the small % of people that need more can use.

Peter William Lount

8:12 a.m.

Hi,

Slicing up the data objects and shipping them in one deep copied parcel from one Smalltalk image to another isn't a general solution. It may be a solution for some problems that can be simplified but it will NOT work for the majority of problems. In fact I'd find a "deep copy and split up the data objects only solution" quite useless for most of the parallel problems that I'm working on solving.

In the general large scale case that I gave as an example (in an earlier email) the one million data objects could be retrieved or accessed by any of the 10,000 processes used in the example. While shipping them all in a deep copied parcel in one message is possible it's not always the wisest move. If the compute notes are going off line than it may be required but otherwise the general case of shipping references and a core set of objects is a better approach. In the example it was the "search patterns" that were sliced up across the processes. By slicing up the data objects across the processes the example given won't even work! This of course alters the example in a dramatic way. Now this might be successful for that group of problems, such as rendering where the pieces are independent. A key characteristic of the general problems is that the data objects can and must be accessible from ANY of the forked off processes with ANY of them being able to alter the objects at any point in time with those changes being propagated back to the central node (assuming there is just one central node) when a commit occurs and then updating the other forked off processes that have an interest in seeing updates to objects in mid transaction. Some of these changes will of course nullify the work of some of these processes requiring them to abort and possibly start over with the newest changes. Too many interacting changes between the processes will of course cause too many aborts and retry (assuming that's the chosen mechanism for dealing with overlapping changes that are mutually exclusive resulting in inconsistency to the data objects).

So while it's useful for some problems to simplify the problem to the split up the data and spread it across N process nodes it's not viable for a much larger set of problems that it is viable for. Solving for 90% of the cases will thus require much more than what is being proposed by the simplify concurrency at the loss of capability proponents.

It should be noted that there isn't one solution for the general case. What are needed are solutions that cover various chunks of the solution space and a way of selecting the correct solution mechanisms either manually or automatically (preferred if viable). Then the "deep copy and split up the data objects only solution" may do it's part as a piece in a wider matrix of solutions. To dispense with the tools we need to solve the general problems is folly IMHV (In My Humble View).

An excellent book for learning the ins and outs of concurrency control - and most importantly the common mistakes - is the free PDF book, "The Little Book of Semaphores", by Allen B. Downey and his students: http://www.greenteapress.com/semaphores/downey05semaphores.pdf. Enjoy the full power of parallel programming.

As an aside, one of the reasons that we don't have better object filing out across all the Smalltalk versions is that the original Smalltalk only provided the extremes of shallow and deep copying of objects. To work the middle ground where only portions of an object graph were copied took a lot of work since you had to write it from scratch each time. What is needed is a general purpose method of doing this important job for the widest range of use cases.

All the best,

Peter William Lount

Jason Johnson

6:40 p.m.

On 10/25/07, Peter William Lount peter@smalltalk.org wrote:

...

Hi,

Slicing up the data objects and shipping them in one deep copied parcel from one Smalltalk image to another isn't a general solution. It may be a solution for some problems that can be simplified but it will NOT work for the majority of problems. In fact I'd find a "deep copy and split up the data objects only solution" quite useless for most of the parallel problems that I'm working on solving.

Ok, you just made a jump there from "majority of problems" to "most of the problems that *I'm* working on". It works just fine for the problems I'm interested in at the moment, and obviously for most of the problems businesses, etc. are using Erlang for.

So what are these "majority of problems" you're talking about and who is solving them now?

...

In the general large scale case that I gave as an example (in an earlier email) the one million data objects could be retrieved or accessed by any of the 10,000 processes used in the example. While shipping them all in a deep copied parcel in one message is possible it's not always the wisest move.

And why would you do that? No paradigm can remove the responsibility to think from the programmer. As I've said I don't know how many times, the point of this is simply to move this to design where it belongs instead of implementation where it is now.

Sharing memory is breaking encapsulation.

...

A key characteristic of the general problems is that the data objects can and must be accessible from ANY of the forked off processes with ANY of them being able to alter the objects at any point in time with those changes being propagated back to the central node (assuming there is just one central node) when a commit occurs and then updating the other forked off processes that have an interest in seeing updates to objects in mid transaction. Some of these changes will of course nullify the work of some of these processes requiring them to abort and possibly start over with the newest changes. Too many interacting changes between the processes will of course cause too many aborts and retry (assuming that's the chosen mechanism for dealing with overlapping changes that are mutually exclusive resulting in inconsistency to the data objects).

You lost me again. You seem to come up with a problem *and the way you want it solved* and then complain that a message passing paradigm can't solve it. What does that prove?

...

Solving for 90% of the cases will thus require much more than what is being proposed by the simplify concurrency at the loss of capability proponents.

Your 90% is different then mine then. In fact, I can't think of any concurrency problems I wish to resolve that this *wont* handle.

...

What are needed are solutions that cover various chunks of the solution space and a way of selecting the correct solution mechanisms either manually or automatically (preferred if viable). Then the "deep copy and split up the data objects only solution" may do it's part as a piece in a wider matrix of solutions. To dispense with the tools we need to solve the general problems is folly IMHV (In My Humble View).

We already have. Or is Squeak/most Smalltalks folly? There are certain classes of memory management that don't work well with a GC, but would with manual memory management. Is it a mistake to deny it's use at the language level?

...

An excellent book for learning the ins and outs of concurrency control - and most importantly the common mistakes - is the free PDF book, "The Little Book of Semaphores", by Allen B. Downey and his students: http://www.greenteapress.com/semaphores/downey05semaphores.pdf. Enjoy the full power of parallel programming.

For me that's just old-think. Solutions to incredibly complex problems we never would have had if we simply avoided violating encapsulation.

Peter William Lount

8:31 p.m.

Hi,

Jason Johnson wrote:

...

On 10/25/07, Peter William Lount peter@smalltalk.org wrote:

...
Hi,

Slicing up the data objects and shipping them in one deep copied parcel from one Smalltalk image to another isn't a general solution. It may be a solution for some problems that can be simplified but it will NOT work for the majority of problems. In fact I'd find a "deep copy and split up the data objects only solution" quite useless for most of the parallel problems that I'm working on solving.

Ok, you just made a jump there from "majority of problems" to "most of the problems that *I'm* working on". It works just fine for the problems I'm interested in at the moment, and obviously for most of the problems businesses, etc. are using Erlang for.

So what are these "majority of problems" you're talking about and who is solving them now?

You're splitting hairs with my use of English. The vast majority of concurrency problems won't fit well with the process-based model of concurrency (of Erlang and other systems).

Process-based model of concurrency won't solve ALL the problems. Consider that there are an infinite number of concurrency problems. You're telling me that the solution you are proposing solves them all?

Sure if you have a set of problems that you are solving that fit perfectly with the process-based model of concurrency - as used in Erlang - then excellent use that approach to solve your problems. Go for it.

However, please don't force your "process-based ONLY model of concurrency" onto Smalltalk since there are those of us out here with other real world problems that the process-based model of concurrency won't solve easily or at all. This also applies for the deep coping of objects as an only solution - it just works for a tiny subset of problems. Please don't force ill conceived deep copying solutions upon us as the ONLY solution.

One solution won't fit all. That's the point of a programming language - to be general purpose enough to enable programmers or users to solve as wide a range of problems as possible.

It sounds like you need to put your code into a library of objects that are optional rather than forcing them into the virtual machine.

...

...
In the general large scale case that I gave as an example (in an earlier email) the one million data objects could be retrieved or accessed by any of the 10,000 processes used in the example. While shipping them all in a deep copied parcel in one message is possible it's not always the wisest move.

And why would you do that? No paradigm can remove the responsibility to think from the programmer. As I've said I don't know how many times, the point of this is simply to move this to design where it belongs instead of implementation where it is now.

Sharing memory is breaking encapsulation.

No, sharing memory does not necessarily break encapsulation. Many things break encapsulation for sure. Blocks for one (see Alan Kay's comments on this).

It's how you look at it. If many processes are accessing an object via the objects methods then encapsulation is not broken since the object is in control of who can see what and when. It doesn't matter how many processes see the object if the object is in control of what is happening to it. That is a current capability of Smalltalk.

...

...
A key characteristic of the general problems is that the data objects can and must be accessible from ANY of the forked off processes with ANY of them being able to alter the objects at any point in time with those changes being propagated back to the central node (assuming there is just one central node) when a commit occurs and then updating the other forked off processes that have an interest in seeing updates to objects in mid transaction. Some of these changes will of course nullify the work of some of these processes requiring them to abort and possibly start over with the newest changes. Too many interacting changes between the processes will of course cause too many aborts and retry (assuming that's the chosen mechanism for dealing with overlapping changes that are mutually exclusive resulting in inconsistency to the data objects).

You lost me again. You seem to come up with a problem *and the way you want it solved* and then complain that a message passing paradigm can't solve it. What does that prove?

I apologize if my writing wasn't clear.

I provide an example and one way (of many ways) to solve it. Yes. That's being responsible from my perspective since I know the a solution that will work I'm not going to hide it as some sort of covert test.

Obviously the message passing paradigm of Smalltalk can solve the problem since that's what I've presented. What's not clear is how the "process-based model of concurrency" - of Erlang - can solve every concurrency problem out there. What's not clear is how Erlang would solve the example I put forward without changing it into a slice and dice of the original one million objects.

The point or "what does it prove" is that no one concurrency solution will solve all the problems. I provided an example to show that and to show the kind of nasty problem that other concurrency problems can solve.

See the "Little Book of Semaphores" that I linked to for many other solutions to concurrency problems.

...

...
Solving for 90% of the cases will thus require much more than what is being proposed by the simplify concurrency at the loss of capability proponents.

Your 90% is different then mine then. In fact, I can't think of any concurrency problems I wish to resolve that this *wont* handle.

As I said in another recent email today your set of problems is of a nature that can be solved that way. That's great. However, please don't force your approach unto us all since it won't work for all of us.

Please put your solution into a library so that those who wish to use it can do so if they so choose. Thanks.

...

...
What are needed are solutions that cover various chunks of the solution space and a way of selecting the correct solution mechanisms either manually or automatically (preferred if viable). Then the "deep copy and split up the data objects only solution" may do it's part as a piece in a wider matrix of solutions. To dispense with the tools we need to solve the general problems is folly IMHV (In My Humble View).

We already have.

How so?

...

Or is Squeak/most Smalltalks folly? There are certain classes of memory management that don't work well with a GC, but would with manual memory management. Is it a mistake to deny it's use at the language level?

Sure Smalltalk has limits to what it can do. That is very important to recognize it's limits.

However, within the current limits of what Smalltalk can do is working powerfully with concurrency (albeit in a single native thread for most but not all implementations of Smalltalk). To reduce this current capability to solve just a subset of concurrency problems is what I'm calling folly.

If you can provide a proof that your proposed process-based model of concurrency methodology can solve all the concurrency problems that can currently be solved by Smalltalk then you'll have convinced me (and likely others). So far I'm still not even clear what you are proposing.

I work quite hard to implement more power and capability for Smalltalk not take it away. Altering the Smalltalk language - or even one version of it, Squeak - in such a way as to make it less powerful by removing concurrency control capabilities isn't going to fly with me and a large number of users of Squeak.

If you wish to do that then you'll likely need to be very creative and put your concurrency solution into an optional class library somehow. That will fly. Or fork a new version off of Squeak. That would also fly with many or almost all I suspect.

In fact there are many small versions of Smalltalk such as Little Smalltalk or Susie Smalltalk or ... that are quite limited (actually almost all Smalltalks are quite limited when compared with VisualAge with Envy but that's another discussion ;--). They are separate versions of Smalltalk for that reason and it's a good thing too. (I use Susie Smalltalk and Squeak Smalltalk in the Smalltalk.org web site for various jobs - the right tool for the right job).

Some of the innovations that I've encountered or created will enable Smalltalk to be a more powerful language while retaining all aspects of Smalltalk - if they are adopted. (See my other postings in this group and on Smalltalk.org for some of the details of those).

Other innovations that I've encountered or created however, alter the Smalltalk language so much so that it's no longer really Smalltalk - more powerful and highly influenced by Smalltalk but clearly not Smalltalk anymore (at least to me). For this reason and others I'm implementing ZokuScript which is highly influenced by Smalltalk but goes beyond it enough that it isn't Smalltalk anymore. At the same time I'm implementing ZokuTalk which is meant to be a version of Smalltalk. Actually for leverage ZokuTalk will be converted to ZokuScript on the fly and the ZokuScript equivalent will be compiled for execution.

In the meantime I regularly use the following Smalltalks for various projects (paying and otherwise): Squeak, Susie, Dolphin, Visual Age, Visual Works, Coke-Pepsi-idst, and less often others. Naturally as is the way of a mobius system that evolves to it's next level I'm writing ZokuScript/ZokuTalk in Smalltalk. Once the Zoku Execution Engine (not a virtual machine) has reached a stage that it can run by itself development will proceed self hosted by itself.

...

...
An excellent book for learning the ins and outs of concurrency control - and most importantly the common mistakes - is the free PDF book, "The Little Book of Semaphores", by Allen B. Downey and his students: http://www.greenteapress.com/semaphores/downey05semaphores.pdf. Enjoy the full power of parallel programming.

For me that's just old-think. Solutions to incredibly complex problems we never would have had if we simply avoided violating encapsulation

Well even the Greeks and Babylonians got some things right even if some of their ideas have been refined. As a general principle I concur and support your efforts - as there is some overlap with some of the solutions that I've been working on with Transactions and parallel processing. However, I'm also grounded in the harsh realities of concurrency control in a wide range of contexts - not all problems can be solved by the solution you are presenting. To ignore that is to ignore reality.

If you can however provide a proof I'd have to be adapt since the science aspect of computing implies an evidence based approach.

All the best,

Peter

Jason Johnson

8:56 p.m.

On 10/25/07, Peter William Lount peter@smalltalk.org wrote:

...

You're splitting hairs with my use of English.

I was under the impression you were a native English speaker. Am I incorrect?

...

The vast majority of concurrency problems won't fit well with the process-based model of concurrency (of Erlang and other systems).

Process-based model of concurrency won't solve ALL the problems.

And here another jump. It wont solve *the vast majority* or it wont solve *all*? Which is it?

...

Consider that there are an infinite number of concurrency problems. You're telling me that the solution you are proposing solves them all?

No! Again, will you please stop asking me to defend statements *you made*???

...

Please don't force ill conceived deep copying solutions upon us as the ONLY solution.

Who is forcing anything on anyone? I'm pointing out what I thought (prior to this thread) was already obvious: shared state locking can't scale. The fact is, Squeak already *has* futures in several incarnations, shared state locking in various incarnations and so on.

I'm simply pointing out that adding true threading like Java has to Squeak is a pour use of resources we don't have.

...

One solution won't fit all.

And too many unnecessary choices produce indecision and worse.

Options are good. Too many options are less good. Bad options are bad. Look at C++.

...

No, sharing memory does not necessarily break encapsulation. Many things break encapsulation for sure. Blocks for one (see Alan Kay's comments on this).

Of course it does. Two separate entities that "own" the same resource at the same time.

...

I provide an example and one way (of many ways) to solve it. Yes. That's being responsible from my perspective since I know the a solution that will work I'm not going to hide it as some sort of covert test.

That's fine to do, but you seem to assume there is no other solution and therefor Actor-style message passing can't handle it.

...

The point or "what does it prove" is that no one concurrency solution will solve all the problems. I provided an example to show that and to show the kind of nasty problem that other concurrency problems can solve.

And what exactly is going to solve this problem? Shared state fine-grained locking? Across 10k nodes! You can't be serious.

...

I work quite hard to implement more power and capability for Smalltalk not take it away. Altering the Smalltalk language - or even one version of it, Squeak - in such a way as to make it less powerful by removing concurrency control capabilities isn't going to fly with me and a large number of users of Squeak.

And this is the funniest part. The current model we have (shared-state) is the weakest of the models.

David T. Lewis

26 Oct 26 Oct

1:54 a.m.

On Wed, Oct 24, 2007 at 11:12:13PM -0700, Peter William Lount wrote:

...

An excellent book for learning the ins and outs of concurrency control - and most importantly the common mistakes - is the free PDF book, "The Little Book of Semaphores", by Allen B. Downey and his students: http://www.greenteapress.com/semaphores/downey05semaphores.pdf.

Peter, Thanks for this reference. Dave

Peter William Lount

2:12 a.m.

David T. Lewis wrote:

...

On Wed, Oct 24, 2007 at 11:12:13PM -0700, Peter William Lount wrote:

...
An excellent book for learning the ins and outs of concurrency control - and most importantly the common mistakes - is the free PDF book, "The Little Book of Semaphores", by Allen B. Downey and his students: http://www.greenteapress.com/semaphores/downey05semaphores.pdf.

Peter, Thanks for this reference. Dave

Hi,

Your welcome. It's an invaluable book for learning and reviewing the issues with Semaphores.

Peter

Peter William Lount

2:31 a.m.

David T. Lewis wrote:

...

On Wed, Oct 24, 2007 at 11:12:13PM -0700, Peter William Lount wrote:

...
An excellent book for learning the ins and outs of concurrency control - and most importantly the common mistakes - is the free PDF book, "The Little Book of Semaphores", by Allen B. Downey and his students: http://www.greenteapress.com/semaphores/downey05semaphores.pdf.

Peter, Thanks for this reference. Dave

Hi,

Your welcome. It's an invaluable book for learning and reviewing the issues with Semaphores.

It sounds like the other links people submitted are also quite interesting. I've seen a few before but they obviously need a bit more in depth study to understand everyones point of view.

Peter

Peter William Lount

2:33 a.m.

David T. Lewis wrote:

...

On Wed, Oct 24, 2007 at 11:12:13PM -0700, Peter William Lount wrote:

...
An excellent book for learning the ins and outs of concurrency control - and most importantly the common mistakes - is the free PDF book, "The Little Book of Semaphores", by Allen B. Downey and his students: http://www.greenteapress.com/semaphores/downey05semaphores.pdf.

Peter, Thanks for this reference. Dave

Hi,

Your welcome. It's an invaluable book for learning and reviewing the issues with Semaphores.

It sounds like the other links people submitted are also quite interesting. I've seen a few before but they obviously need a bit more in depth study to understand all the points of view - some of which seem to be in the formation stages rather than fully developed, which wasn't apparent to this reader at the start of the thread.

Peter

Igor Stasenko

24 Oct 24 Oct

10:55 p.m.

On 24/10/2007, Jason Johnson jason.johnson.081@gmail.com wrote:

...

On 10/24/07, Igor Stasenko siguctua@gmail.com wrote:

...
Aha, now i get it.

Good, I should have just posted some theoritical Smalltalk code to begin with, this thread would probably be half as big. :)

...
So, your approach is to establish a fence between different processes, so they can't share objects. Or maybe more correct to say, that any callee process can have read-only access to any objects which belongs to caller process?

No, the plan was that since in Smalltalk objects are mutable, I will have to pay an extra cost for internal message sends and have the VM do a deep copy for the sent objects.

Another alternative would be to introduce an immutable flag on references, then the "receiver" gets a reference to the object but flagged immutable. This way might be better, but requires more changes.

...
Its unclear how you would determine to which process object belongs to? This is at minimum would require an additional slot per object (ok, this is doable easily).

Not needed. The boundaries are: object creation, object send and object receive. All controlled from the VM or in the library, so I just have to guarantee that a mutable object can never sneak out of it's process, i.e. a process can never get a mutable reference to an object owned by a different process.

...
Also, unclear how you would persist state (or results of computation). Since all you can do now is to send a message to process, which will return an object in answer.

No, all message sends are async, fire-and-forget. You don't get a future or anything back. You can easily build sync messages on top of this if you want, but the base system isn't planned to directly support it.

...
Now, since returned object most probably will belong to callee process you must copy it to caller process. But in real you should care of copying a whole subgraph of objects (since you can return a collection of newly created objects (and they , in own turn can be a collections e.t.c.)- and all belonging to callee process). Then , after you done merging a graph, you can simply wipe all memory which was allocated by callee process. This part is easy.

Ah, if you're talking about the receive call, yes that will get an object returned. My first cut will just be (as mentioned above) a deep copy. Performance will likely drive me to adding immutable references, or objects or something. Or perhaps what you're suggestion here.

...
Now, the most interesting part: mutating an objects in caller process. Suppose my starting process calls two different processes. And they came to the point, that they are willing to update a state of some object(s) in caller process (to be clear: process A contains object a, it calls processes B and C in parallel, and now B and C wanting to change state of object a).

Can't happen. There is no shared state. Ever. The only thing that's shared is the Process' mail box, but that is an internal VM detail, not visible to the processes.

(remaining comments snipped since I think they assume shared state which does not exist in my plan. If you have some reason you think I can't avoid it let me know because I don't see it so far).

No sharing you say? Oh.. don't let me starting on this. How about a common procedure of creating a new class? Creating or modifying a method in some class? Note that these changes propagated globally in current implementation due to having a single global namespace (SystemDictionary). How you planning to deal with that without breaking a uniform model of smalltalk (everything is an object e.t.c)?

-- Best regards, Igor Stasenko AKA sig.

Jason Johnson

25 Oct 25 Oct

6:46 a.m.

On 10/24/07, Igor Stasenko siguctua@gmail.com wrote:

...

No sharing you say? Oh.. don't let me starting on this. How about a common procedure of creating a new class? Creating or modifying a method in some class?

Ok, when I say "no sharing" I mean of mutable data, which is where all the problems you and everyone else have mentioned come from.

I have mentioned several times in this thread that classes and code *are a concern*. But I think a solvable one. Doing things to classes (e.g. creating, renaming, adding methods, removing methods, changing methods) are a special case now, and they will be a special case in my proposed system as well. Note that Erlang has this same issue and they solved it in their case 10 years ago. I think this is a solvable problem.

...

Note that these changes propagated globally in current implementation due to having a single global namespace (SystemDictionary). How you planning to deal with that without breaking a uniform model of smalltalk (everything is an object e.t.c)?

As mentioned in probably no less then 10 other emails: I think the "ObsoleteClass" mechanism will be workable for this.

Igor Stasenko

24 Oct 24 Oct

11:08 p.m.

On 24/10/2007, Jason Johnson jason.johnson.081@gmail.com wrote:

...

On 10/24/07, Igor Stasenko siguctua@gmail.com wrote:

...
Aha, now i get it.

Good, I should have just posted some theoritical Smalltalk code to begin with, this thread would probably be half as big. :)

...
So, your approach is to establish a fence between different processes, so they can't share objects. Or maybe more correct to say, that any callee process can have read-only access to any objects which belongs to caller process?

No, the plan was that since in Smalltalk objects are mutable, I will have to pay an extra cost for internal message sends and have the VM do a deep copy for the sent objects.

Sorry for spurious replies.. but.. this statement means that your processes are not so cheap as Erlang ones. By passing a single object to new process you could trigger a cloning a substantial part of image in this case (read - megabytes of data). Or even if not cloning, then marking objects as read-only, or creating a 'hollow' references to objects, which is too have own costs - extra space and access time. Even for spawning a process which doing no more than adding 1+1 i need to copy/mark a SmallInteger class and all its references, until i mark everything reachable from it.. Honestly, i can't see how this concept can be considered cheap and scalable.

-- Best regards, Igor Stasenko AKA sig.

Jason Johnson

25 Oct 25 Oct

7:10 a.m.

On 10/24/07, Igor Stasenko siguctua@gmail.com wrote:

...

Sorry for spurious replies.. but.. this statement means that your processes are not so cheap as Erlang ones.

Why not? At this point in time (afaik) they do the exact same thing in all but one case (communication between 2 "green" processes in the same image).

...

By passing a single object to new process you could trigger a cloning a substantial part of image in this case (read - megabytes of data).

Yes, you can. But this isn't the common case.

...

Or even if not cloning, then marking objects as read-only, or creating a 'hollow' references to objects, which is too have own costs - extra space and access time.

Well, I have no plans of doing "futures". Others are well down that path, so no need for me to duplicate their research. As far as immutable references, all class have a header with various flags, I would only need one more for mutability. Such a change does scare me a bit because the VM would have to be changed that all instVar sets do an extra check, which would impact non-concurrent code as well, but if it could turn out to be a good trade off vs. doing a deep copy every time.

...

Even for spawning a process which doing no more than adding 1+1 i need to copy/mark a SmallInteger class and all its references, until i mark everything reachable from it..

Bad example. :) SmallInteger isn't mutable and isn't an object.

But I understand what you mean, the required traversals do sound expensive, but this has to be done *every time* when you send between two images anyway. Lets not optimize prematurely. :)

...

Honestly, i can't see how this concept can be considered cheap and scalable.

A system with 9 9's of reliability comes to mind. :)

http://www.cincomsmalltalk.com/userblogs/ralph/blogView?entry=3364027251

But really, as soon as you talk between two systems, no other approach is better. A "futures" concept can lazily load the data, saving time when parts of a structure aren't used, but what are the numbers on this? How often do you send a message with a % of unused data high enough to offset the complexity cost of the "futures" mechanism?

Yes, interprocess communication between two same-image process would be a disadvantage, but I'm pretty sure Erlang started this way and optimized from there. That's my plan as well.

After all, we don't know how expensive this would actually be in practice anyway. It's easy to come up with theoretical examples that cripple the system, but will anyone actually do this? And if they do, it will break and they can code around it.

Back before Unix, people were trying to figure out ways to ensure resource deadlocks can not happen. This theoretical problem effectively crippled them from releasing a system. Unix simply ignored it. They gave you tools to see you had a dead lock, and a way to kill dead locked processes. Seamed to work out for them. :)

Jason Johnson

7:12 a.m.

On 10/25/07, Jason Johnson jason.johnson.081@gmail.com wrote:

...

As far as immutable references, all class have a header with various flags

Ugh, my proof reading is failing me. Here, of course I meant "objects" not "class(es)"

Herbert König

24 Oct 24 Oct

8:43 a.m.

New subject: Re[2]: Multy-core CPUs

Hello Peter,

PWL> Jason Johnson wrote:

PWL> Ok, so if you really are talking about a "strict" Erlang style model PWL> with ONE Smalltalk process per "image" space (whether or not they are in PWL> one protected memory space or many protected memory spaces) where PWL> objects are not shared with any other threads except by copying them PWL> over the "serialization wire" or by "reference" then I get what you are PWL> talking about.

PWL> That is a strange way of putting it.

these posts of you are very hard to read as it's not easy to find out what you are saying and what you are quoting.

Would be nice if you could change that.

I wouldn't say that if the topic weren't interesting as well as complicated.

Cheers,

Herbert mailto:herbertkoenig@gmx.net

Jason Johnson

9:20 a.m.

New subject: Re[2]: Multy-core CPUs

Ok, what can I do to make myself clearer and easier to understand?

On 10/24/07, Herbert König herbertkoenig@gmx.net wrote:

...

Hello Peter,

PWL> Jason Johnson wrote:

PWL> Ok, so if you really are talking about a "strict" Erlang style model PWL> with ONE Smalltalk process per "image" space (whether or not they are in PWL> one protected memory space or many protected memory spaces) where PWL> objects are not shared with any other threads except by copying them PWL> over the "serialization wire" or by "reference" then I get what you are PWL> talking about.

PWL> That is a strange way of putting it.

these posts of you are very hard to read as it's not easy to find out what you are saying and what you are quoting.

Would be nice if you could change that.

I wouldn't say that if the topic weren't interesting as well as complicated.

Cheers,

Herbert mailto:herbertkoenig@gmx.net

Herbert König

9:59 a.m.

New subject: Re[4]: Multy-core CPUs

Hello Jason,

JJ> Ok, what can I do to make myself clearer and easier to understand?

JJ> On 10/24/07, Herbert König herbertkoenig@gmx.net wrote:

...

...
Hello Peter,

nothing on your side, I wisher Peter would use proper quoting in his html mails. In my mailer (the bat) I can't distinguish his argument from his quote of your argument. He also sends text mails with proper quotings.

Cheers,

Herbert mailto:herbertkoenig@gmx.net

Jason Johnson

10:30 p.m.

New subject: Re[4]: Multy-core CPUs

Ah! I misread your email. :)

On 10/24/07, Herbert König herbertkoenig@gmx.net wrote:

...

Hello Jason,

JJ> Ok, what can I do to make myself clearer and easier to understand?

JJ> On 10/24/07, Herbert König herbertkoenig@gmx.net wrote:

...
...
Hello Peter,

nothing on your side, I wisher Peter would use proper quoting in his html mails. In my mailer (the bat) I can't distinguish his argument from his quote of your argument. He also sends text mails with proper quotings.

Cheers,

Herbert mailto:herbertkoenig@gmx.net

Peter William Lount

25 Oct 25 Oct

2:13 a.m.

Herbert König wrote:

...

Hello Jason,

JJ> Ok, what can I do to make myself clearer and easier to understand?

JJ> On 10/24/07, Herbert König herbertkoenig@gmx.net wrote:

...
...
Hello Peter,

nothing on your side, I wisher Peter would use proper quoting in his html mails. In my mailer (the bat) I can't distinguish his argument from his quote of your argument. He also sends text mails with proper quotings.

Cheers,

Herbert mailto:herbertkoenig@gmx.net

Hi,

I'm using ThunderBird for emails. I use nested quoting and everything looks good to me. Obviously your email program isn't working the same way. Which program do you use? What is "the bat"?

Peter

Herbert König

9:37 a.m.

New subject: Re[2]: Multy-core CPUs

Hello Peter,

PWL> I'm using ThunderBird for emails. I use nested quoting and PWL> everythinglooks good to me. Obviously your email program isn't

I get two kinds of mails from you, textmode, properly indented and html which I don't see indented.

I just verified on Gmane that your html is also properly indented, so I'll have to fiddle with the settings of my mailer.

Sorry for bothering you!

PWL> working the sameway. Which program do you use? What is "the bat"?

My email program is "The Bat" www.ritlabs.com. In the past when I made this decision, Thunderbird didn't have a thread view.

Cheers

Herbert mailto:herbertkoenig@gmx.net

Peter William Lount

10:20 a.m.

Hi Herbert,

It's all good. I'd not heard of The Rat before. Sounds like an interesting reader. Thunderbird packages up both text and html so others can choose which way to view the posting.

What are your thoughts on the whole concurrency issues being debated?

Cheers,

Peter

Herbert König wrote:

...

Hello Peter,

PWL> I'm using ThunderBird for emails. I use nested quoting and PWL> everythinglooks good to me. Obviously your email program isn't

I get two kinds of mails from you, textmode, properly indented and html which I don't see indented.

I just verified on Gmane that your html is also properly indented, so I'll have to fiddle with the settings of my mailer.

Sorry for bothering you!

PWL> working the sameway. Which program do you use? What is "the bat"?

My email program is "The Bat" www.ritlabs.com. In the past when I made this decision, Thunderbird didn't have a thread view.

Cheers

Herbert mailto:herbertkoenig@gmx.net

Herbert König

4:31 p.m.

New subject: Re[2]: Multy-core CPUs

Hello Peter,

PWL> It's all good. I'd not heard of The Rat before. Sounds like an ^^^^^^^ nice Typo :-)))

PWL> What are your thoughts on the whole concurrency issues being debated?

my practical knowledge in this comes from hardware interrupts (mainly digital signal processing) with hard realtime demands. (If you loose an audio sample every one will hear it. The live performing musician will hear a 5ms processing delay, even worse if it's not constant)

This is a very special case. I decided against memory locking to avoid complexity. I use a kind of semaphore for synching but simpler.

The number of processes (5) is manually manageable but barely. Hard realtime constraints are out of the scope of Win/Mac/Linux OS.

Otherwise I clearly lean towards Jason's proposal of

self breakupStructureWith: 10000 ... and no fine grained locking.

But that's because the problem areas I'm concerned with all fall into categories which benefit from this proposal and won't die of excessive copying. But that's no valid argument in the discussion :-) But I tucked away Jason's mail in case I start parallelizing apps.

So I follow the discussion very interested and close and I hope to never have to solve the kind of problems you use as examples :-).

If I have to, First thing I'll read your suggested book on semaphores. Maybe earlier for fun.

BTW I'm an engineer not a cs person.

Cheers

Herbert mailto:herbertkoenig@gmx.net

Andreas Raab

21 Oct 21 Oct

8:11 a.m.

Peter William Lount wrote:

...

Ok, that sounds nice and rosey but so far. Can someone please explain in full detail and completely how it would actually work? Thanks.

http://www.erights.org

Cheers, - Andreas

Jason Johnson

11:33 a.m.

On 10/21/07, Peter William Lount peter@smalltalk.org wrote:

...

I've not yet seen any serious discussion of the case for your point of view which bridges the gap of complexity in concurrency as automatic memory management magically does. Please illuminate us with specific and complete details of your proposal for such a breakthrough in concurrency complexity.

Here is a reference to my break down again: http://lists.squeakfoundation.org/pipermail/squeak-dev/2007-February/114181.....

I'm not saying *actor style concurrency is the same complexity as garbage collection* nor *manual memory management is as complex as fine-grained locking*.

It is an analogy: Fine-grained locking is to Message passing what manual memory is to generational garbage collection.

...

Then either the hard work needs to be done, or the VM needs to be completely rethought.

I don't think you realize the level of work required here. Look at the work of David Griswold and Strongalk. I wanted to provide a reference, but I believe he states his position most clearly on one of the "Industry misinterpretations" podcasts. What he said was something to the effect of "we did this in Java with a large group of people and much money focused on it, and it took a long time. I don't think this is going to be possible in the free software world with our limited time". Something like that, and I agree. Though I take it one step further and say it's not needed.

...

What are you going on about? What techniques are you saying are obsolete exactly? How are they obsolete?

Here again I mean obsolete in the way that using manual memory management for everything is obsolete. Of course one must use manual memory management at the very lowest levels. And likely (dependent on the OS or CPU architecture) one will need fine-grained locking at the very lowest levels. But no one else should.

In this paper a team writes some software in a locking style, and again with STM and no explicit locking. You go go straight to chapter 4 for the graphs, and keep in mind; the STM version behaves properly in the face of exceptions while the locking version does not. I also don't recall if they mentioned how long each implementation took, but certainly a locking version is harder to get right.

http://www.haskell.org/~simonmar/papers/lockfreedatastructures.pdf

...

Why? 64 processors on a single chip - with 128 coming next year and 1024 planned - that's why.

Ok, that explains why we need parallelization, but it doesn't explain why we need fine-grained locking.

...

You've missed the point. Even the simplest of concurrency methods proposed so far by people in the Squeak thread lead to the most complex concurrency control error scenarios. That's one of the points. Another is that the simplest of concurrency models can't handle all the scenarios.

Where do you come up with this information? From one application you worked on that was threaded? The Erlang people have been working on this stuff for the bulk (if not the entirety) of their careers. Forgive me if I give more weight to their research then your opinion:

http://armstrongonsoftware.blogspot.com/2006/08/concurrency-is-easy.html (note how close this is to the message passing we talk about in Smalltalk. more on this below) http://armstrongonsoftware.blogspot.com/2006/09/why-i-dont-like-shared-memor...

http://ll2.ai.mit.edu/talks/armstrong.pdf http://www.sics.se/~joe/thesis/armstrong_thesis_2003.pdf http://pragmaticprogrammer.com/articles/erlang.html

And I really don't understand why you think one thing can't be done in the other. Anything you can do with fine-grained locking you can do with message passing (at least from an application level, though depending on the CPU it might be equivalent at all levels).

http://armstrongonsoftware.blogspot.com/2006/09/pure-and-simple-transaction-...

...

As asked above please describe in detail and completely the proposed "simple" approach to concurrency that is being proposed. Links to appropriate descriptions if it exist would also be fine (unless it contains too much extraneous text).

I think the texts above should provide pretty good detail. The one wrinkle is that in Smalltalk we do in fact have shared state, so something *would* have to change, as discussed in the thread I linked to near the top of this message.

But I still think it's doable. Objects themselves are not normally a problem, it's class side variables that would cause the biggest problem in our current system. I have some ideas on how to deal with this, but it will be a while before I can look at it.

...

The problem with concurrency is that it's much more complex by orders of magnitude than garbage collection. Much more complex a beast, so much more so that the comparison breaks down.

I don't believe it does. One *does* have to write programs differently in an Erlang style message passing world (in a more OO way!), but there is no functionality you can achieve with fine-grained/shared state that you can't with message passing.

...

Thank you for calling it "odd". That's what happens when you think different, at first people think it odd. I often encourage people to think different as Apple does in their marketing of a few years ago.

No, I said it was odd because you put Java and Erlang in the same boat despite the fact that Java is completely in bed with the old fine-grained/shared state and Erlang is on the exact opposite side of the board. And then you said Smalltalk would somehow lose market share to them because of this...

...

How is that?

Show me another system achieving the same level of parallelism, fault tolerance, lines of code and *9 9's* (!!!) of reliability.

http://www.cincomsmalltalk.com/userblogs/ralph/blogView?showComments=true&am...

...

Yes, but Erlang is a purely function non-object-oriented non-keyword-message passing language.

Are you saying that someone who disagrees with how you see the world is wrong/lesser?

Anyway, this is one way to look at it. Another way to look at it is that Erlang is not so far from the vision Alan Kay talked about. Think about it; a process is an encapsulated entity which you can only interact with via messages. This is OO at a whole new level.

...

While it has a form of message passing it's not the same as Smalltalk's. It's simply passing parameters to functions that run in separate green or native threads.

Or on completely different machines. But the calls look the same. Encapsulation is a nice thing, eh? :)

...

Yes it is impressive what they have accomplished, but it isn't the be all and end all.

I didn't say it was. If I thought it was I would be there, instead of here planning to see how far Smalltalk can go with this.

...

I simply think that having all the tools at our disposal is important to maintaining and growing market share.

That's not a sure thing. Ask C++. Sometimes finding a simple idea that's equivalent (e.g. Smalltalk and for that matter even Java) beats "having all the tools at our disposal" (e.g. C++).

...

"Simpler to implement" concurrency leads to just as difficult to manage software systems as more complex well thought out concurrency. In fact, I think, that making it simplistic will lead many programmers to implement software that is impossible to debug without enormous efforts. The problem is that even the simplest concurrency leads to the nastiest and most complex bugs in software.

This is simply not true in the case of message passing. Message passing is to concurrency what OO is to programming. That is, in shared-state/fine-grained you drown in complexity because every new piece of code has to be considered against all the existing code to see if new deadlocks/et al. are possible.

In message passing you have a process that does X. You want to add something new to the system that uses this service/process/object/whatever you want to call it? Just do it. The only question is: should we spawn another X to handle the extra load. But no possible solution can relieve you of the responsibility to think.

...

I don't see how you can have a simple concurrency threading model solve the problems of when and how to use concurrency properly to avoid the many pitfalls of threading. If you can see that please illuminate it for the rest of us.

Well, let's see. The pitfalls of fine-grained locking revolve around locking. Locking is needed to protect the consistency of shared state. If you get rid of shared state, you get rid of locking and you get rid of the problems associated with it.

It is still possible to make software in such a way that it deadlocks, priority starves and so on, but not nearly as easy as in the shared-state/fine-grained locking model, and much easier to correct.

...

Do you mean Tim Sweeney the game developer? http://en.wikipedia.org/wiki/Tim_Sweeney_(game_developer)

Yes.

...

Alright even though I don't know Tim I'll take the bait and see where it goes, Tim Sweeney (or Sweeny) what do you think? (If someone who knows him would be kind enough to pass this thread on to him or post his thoughts on this topic that would be great - thanks).

Here is what I was talking about: http://www.st.cs.uni-sb.de/edu/seminare/2005/advanced-fp/docs/sweeny.pdf

I don't agree with his conclusions, but given that his domain is high performance games, he is worried about speed and thinks message passing can't be as fast.

...

Threading including native threading on one core or N cores (where N can be large) under existing operating systems is very important to the future of Smalltalk.

Unless of course the idea of a native thread is itself wrong. The idea is just that a process has one or more "threads of execution" (basically, just an IC and a stack), but is this good encapsulation? After decades of race conditions, I would say no, it was in fact a premature optimization.

This may look good on current OS'es and hardware, but instead of working from there, lets imagine how things would look if the most pure design was also the fastest/most efficient. Would you use the native thread model? I wouldn't. I would make messages the key concept, and make them the fastest method of interprocess communication.

...

For clarity purposes, please define in detail what you mean when you use the phrase "fine-grained threading model" so that we can make sure that we are on the same page.

I think it's laid out pretty clearly in my post to the other thread (referenced above). But here is the reference card version:

fine-grained locking/shared state: State is shared among threads of execution. Access is handled by synchronization mechanism such as Mutexes, Symephores and so on.

STM: Think relational database transactions, including rollbacks and so on.

Message passing: Erlang is the most successful version of this I know of, but certainly not the only one. Lisp had this option for a long time, as did many others (Smalltalk as well I'm sure).

...

You seem to think that there is some magical breakthrough in the world of concurrency that is on par with the magic of automatic garbage collection. I'd sure love to know what that is and how it avoids the pitfalls with even simple concurrency models and issues that occur in real world projects. If you could, please describe in full detail and completely with real world examples. Thanks very much.

Let me know if the above wasn't enough detail. But just to reiterate: most concurrency problems we have come from trying to share memory. Get rid of that and many of these issues literally disappear. The rest become a design concern instead of an implementation detail, and surely you would agree that this is how it should be.

subbukk

19 Oct 19 Oct

10:06 p.m.

On Thursday 18 October 2007 10:58 pm, Peter William Lount wrote:

...

I propose that any distributed object messaging system that is developed for inter-image communication meet a wide range of criteria and application needs before being considered as a part of the upcoming next Smalltalk Standard. These criteria would need to be elucidated from the literature and the needs of members of the Smalltalk community and their clients.

It's been mentioned that it would be straightforward to have squeak

start up multiple copies of the image (or even multiple different images) in one process (task) memory space with each image having it's own native thread and keeping it's object table and memory separate within the larger memory space. This sounds like a very nice approach.

I am not so sure. Squeak VM is a processor hog. Threads within VM will need processor for bytecode interpretation. So a VM process can only scale to a few threads before it starves for processor. On the downside, coding errors could trash object memory across threads making testing and debugging difficult. Will the juice be worth the squeeze?

...

A single image running on N-cores with M-native threads (M may be

larger than N) is the full generalization of course. This may be the best way to take advantage of paradigm shaking chips such as the Tile64 processor from Tilera.

With single or few processors, we tend to "serialize" logic ourselves and create huge linear programs. When processors are aplenty, we are free to exploit inherent parallelism and create many small co-ordinating programs. So the N-cores are a problem only for small N (around 8).

...

However, we may need to rethink the entire architecture of the Smalltalk virtual machine notions since the Tile 64 chip has capabilities that radically alter the paradigm. Messages between processor nodes take less time to pass between nodes then the same amount of data takes to be written into memory. Think about that. It offers a new paradigm unavailable to other N-Core processors (at this current time).

True. Squeak's VM could virtualize display/sensors and spawn each project in its own background process bound to a specific processor. The high-speed, low latency paths are well-suited for UI events. Imagine running different projects on each face of a rotating hexecontahedron :-).

Subbu

tim Rowledge

10:14 p.m.

On 19-Oct-07, at 1:06 PM, subbukk wrote:

...

I am not so sure. Squeak VM is a processor hog.

No it isn't. It uses cpu when there is a process to run. If there is no process to run, it sleeps.

It's the code in the image that gets to decide when processes run or sleep.

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Fractured Idiom:- MAZEL TON - Lots of luck

subbukk

11:53 p.m.

On Saturday 20 October 2007 1:44 am, tim Rowledge wrote:

...

On 19-Oct-07, at 1:06 PM, subbukk wrote:

...
I am not so sure. Squeak VM is a processor hog.

No it isn't. It uses cpu when there is a process to run. If there is no process to run, it sleeps. It's the code in the image that gets to decide when processes run or sleep.

I was referring to VM process executing bytecodes in images. Bytecode interpretation is a cpu intensive process.

For instance, the Linux VM running latest etoy-dev consumes a steady 7-12% of cpu if I just drag a polygon object and make it do a forward/turn loop about once a second.

Still, Squeak is a lot smaller and more efficient compared to other interpreters.

Subbu

Bert Freudenberg

20 Oct 20 Oct

12:11 a.m.

On Oct 19, 2007, at 23:53 , subbukk wrote:

...

On Saturday 20 October 2007 1:44 am, tim Rowledge wrote:

...
On 19-Oct-07, at 1:06 PM, subbukk wrote:

...
I am not so sure. Squeak VM is a processor hog.

No it isn't. It uses cpu when there is a process to run. If there is no process to run, it sleeps. It's the code in the image that gets to decide when processes run or sleep.

I was referring to VM process executing bytecodes in images. Bytecode interpretation is a cpu intensive process.

For instance, the Linux VM running latest etoy-dev consumes a steady 7-12% of cpu if I just drag a polygon object and make it do a forward/turn loop about once a second.

This is probably much more the fault of Morphic and Etoys than the VM's. Would that we had time to start optimizing for OLPC ... but even then it's not certain how far you can get with the current Morphic design.

- Bert -

subbukk

6:28 a.m.

On Saturday 20 October 2007 3:41 am, Bert Freudenberg wrote:

...

...
For instance, the Linux VM running latest etoy-dev consumes a steady 7-12% of cpu if I just drag a polygon object and make it do a forward/turn loop about once a second.

This is probably much more the fault of Morphic and Etoys than the VM's. Would that we had time to start optimizing for OLPC ... but even then it's not certain how far you can get with the current Morphic design.

Being cpu-bound is not a sin :-) per se. Morphic is quite useful as is. I get more work in Morphic than in some of the other graphical apps. Morphic 3 deserves a separate discussion thread.

My point was that a Squeak VM process with cpu-bound threads will max out a core with just a few threads. On large multi-core processors, we could scale better if VMs can spawn out into different communicating lightly-threaded processes rather than a single heavily-threaded process.

Subbu

Igor Stasenko

6:40 p.m.

As for Multi-Core, see this: http://rt07.raytracing.nl/

My dream is to see this running interactively and implemented completely in smalltalk.

-- Best regards, Igor Stasenko AKA sig.

Peter William Lount

21 Oct 21 Oct

3:53 a.m.

Igor Stasenko wrote:

...

As for Multi-Core, see this: http://rt07.raytracing.nl/

My dream is to see this running interactively and implemented completely in smalltalk.

Hi Igor,

I for one like your vision! Bring it on!

All the best,

Peter William Lount peter@smalltalk.org

Peter William Lount

5:24 a.m.

Hi,

...

...
I propose that any distributed object messaging system that is developed for inter-image communication meet a wide range of criteria and application needs before being considered as a part of the upcoming next Smalltalk Standard. These criteria would need to be elucidated from the literature and the needs of members of the Smalltalk community and their clients.

It's been mentioned that it would be straightforward to have squeak

start up multiple copies of the image (or even multiple different images) in one process (task) memory space with each image having it's own native thread and keeping it's object table and memory separate within the larger memory space. This sounds like a very nice approach.

I am not so sure. Squeak VM is a processor hog. Threads within VM will need processor for bytecode interpretation. So a VM process can only scale to a few threads before it starves for processor.

It's not the byte codes that cause a lot of cpu usage. It's how many processor instructions that are being executed that cause that. If you run lots of code than you can expect higher cpu usage. The more dense the packing of capability into the computer language library of objects the more processor instruction may be executed. To find out what Squeak is doing when it's chewing through while executing the ~12% cpu you mentioned elsewhere you'd have to trace the code. Then you'd see exactly what's going on. Tracing the code at two levels would be helpful, first at the Smalltalk level and then at the VM primitive byte code level. The byte codes may be fine while the image you've deployed might be doing many things that you really don't need for your particular application needs.

...

On the downside, coding errors could trash object memory across threads making testing and debugging difficult.

Yes. The point that I'm making is that even with so called simple concurrency models these errors can happen. Basically there is no such think as hassle free simple concurrency when it comes to computers!!! Simple concurrency is a myth and a lie. Don't fall for it.

...

Will the juice be worth the squeeze?

That depends on what you are using your computer for. If it's an application that benefits from massive parallelism then yes it is worth the squeeze. If you have a very serial sort of application, like a series of complex dependent computations then it might not be worth the squeeze at all.

If you have a complex business application that is highly threaded - running say ten to twenty Smalltalk processes - on a single native thread then it might be worth the squeeze if the users can work noticeably faster without incurring concurrency nightmares then yes it's worth the squeeze. Otherwise, no it's not worth is as users get very frustrated.

...

...

A single image running on N-cores with M-native threads (M may be

larger than N) is the full generalization of course. This may be the best way to take advantage of paradigm shaking chips such as the Tile64 processor from Tilera.

With single or few processors, we tend to "serialize" logic ourselves and create huge linear programs. When processors are aplenty, we are free to exploit inherent parallelism and create many small co-ordinating programs. So the N-cores are a problem only for small N (around 8).

Eh? Why only "small N (around 8)? Please illuminate further.

...

...
However, we may need to rethink the entire architecture of the Smalltalk virtual machine notions since the Tile 64 chip has capabilities that radically alter the paradigm. Messages between processor nodes take less time to pass between nodes then the same amount of data takes to be written into memory. Think about that. It offers a new paradigm unavailable to other N-Core processors (at this current time).

True. Squeak's VM could virtualize display/sensors and spawn each project in its own background process bound to a specific processor. The high-speed, low latency paths are well-suited for UI events. Imagine running different projects on each face of a rotating hexecontahedron :-)

That would be cool.

The power of the Tile-64 processor from Tilera is that processors can form on the fly arbitrary "compute streams" where data is computed in one processor and passed along to another without ever touching RAM. Oh, WOW! This means for example the six typical stages of rendering could be implemented on six or six * N processors in the Tile-N (where N=36, 64, 128, 512, 1024 or 4096 or more processors). WOW! Now how would you have the Smalltalk system generate objects and messaging binary code from Smalltalk source code to model and program that? How? Let's do it! This requires a shift in paradigm. This requires a shift in your thinking. This requires a shift in my thinking. Think it through. What solutions can you come up with?

All the best,

Peter

David T. Lewis

19 Oct 19 Oct

2:20 a.m.

On Thu, Oct 18, 2007 at 06:36:00PM +0300, Igor Stasenko wrote:

...

Then i think, it would be good to make some steps towards supporting multiple images by single executable:

make single executable capable of running a number of images in

separate native threads. This will save memory resources and also could help in making inter-image messaging not so costly.

What memory and resources do you think that you will save? Squeak already does almost exactly what you describe when you run it on a unix/linux/OSX platform. Putting the interpreters into separate threads (as opposed to unix processes) would at best save a trivial amount of memory, and would add a lot of complexity.

I don't see any savings for inter-image messaging either, but maybe I'm missing something there.

Dave

Igor Stasenko

7:14 a.m.

On 19/10/2007, David T. Lewis lewis@mail.msen.com wrote:

...

On Thu, Oct 18, 2007 at 06:36:00PM +0300, Igor Stasenko wrote:

...
Then i think, it would be good to make some steps towards supporting multiple images by single executable:

make single executable capable of running a number of images in

separate native threads. This will save memory resources and also could help in making inter-image messaging not so costly.

What memory and resources do you think that you will save? Squeak already does almost exactly what you describe when you run it on a unix/linux/OSX platform. Putting the interpreters into separate threads (as opposed to unix processes) would at best save a trivial amount of memory, and would add a lot of complexity.

I can't see how OS process handling can be less complex than threading. Also, unix is the best example among OS-es, which tries to do things nicely. But on windows, for instance, it not shares a DLL's instances between processes resulting to have a copy of same DLL for different processes.

...

I don't see any savings for inter-image messaging either, but maybe I'm missing something there.

Well, for spoon 'imprinting' it requires that you copy object's behaviors between images. By having an images kept in same process, you can just put them as external references. For same reasons, inter-image message sends can be done without serializing objects, because all objects of all images are accessible at any time. Also, the things above don't restricts you to have inter-image processing only for images kept in same process. But for me it's obvious, that inter-image processing between images which are in same process address space can be greatly simplified.

...

Dave

-- Best regards, Igor Stasenko AKA sig.

Petr Fischer

18 Oct 18 Oct

5:36 p.m.

New subject: Multy-core CPUs - communication among images

Hi. What do you recommend for communication among running images? RemoteMessagingToolkit (RMT)? Remote smalltalk (rST)? Soap (ehm)? other (not via. TCP/IP stack - for multiple images running locally)?

Thanks, p.

On 18.10.2007, at 16:18, Sebastian Sastre wrote:

...

Hey this sounds a an interesting path to me. If we think in nature and it's design, that images could be analog to cells of a larger body. Fragmentation keep things simple without compromising scalability. Natural facts concluded that is more efficient not to develop few supercomplex brain cells but to develop zillions of a far simpler brain cells, this is, that are just complex enough, and make them able to setup in an inimaginable super complex network: a brain.

Other approach that also makes me conclude this is interesting is that we know that one object that is too smart smells bad. I mean it easily starts to become less flexible so less scalable in complexity, less intuitive (you have to learn more about how to use it), more to memorize, maintain, document, etc. So it is smarter but it could happen that it begins to become a bad deal because of beign too costly. Said so, if we think in those flexible mini images as objects, each one using a core we can scale enourmusly and almost trivially in this whole multicore thing and in a way we know it works.

Other interesting point is faul tolerance. If one of those images happen to pass a downtime (because a power faliure on the host where they where running or whatever reason) the system could happen to feel it somehow but not being in a complete faiure because there are other images to handle demand. A small (so efficient), well protected critical system can coordinate measures of contention for the "crisis" an hopefully the system never really makes feel it's own crisis to the users.

Again I found this is a tradeof about when to scale horizontally or vertically. For hardware, Intel and friends have scaled vertically (more bits and Hz for instance) for years as much as they where phisically able to do it. Now they reached a kind of barrier and started to scale horizontally (adding cores). Please don't fall in endless discussions, like the ones I saw out there, about comparing apples with bannanas because they are fruits but are not comparable. I mean it's about scaling but they are 2 different axis of a multidimensional scaling (complexity, load, performance, etc).

I'm thinking here as vertical being to make one squeak smarter to be capable to be trhead safe and horizontal to make one smart network of N squeaks.

Sometimes one choice will be a good business and sometimes it will be the other. I feel like the horizontal time has come. If that's true, to invest (time, $, effort) now in vertical scaling could happen to be have a lower cost/benefit rate if compared to the results of the investiment of horizontal scaling.

The truth is that this is all speculative and I don't know. But I do trust in nature.

Cheers,

Sebastian Sastre

...
-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Ralph Johnson Enviado el: Jueves, 18 de Octubre de 2007 08:09 Para: The general-purpose Squeak developers list Asunto: Re: Multy-core CPUs

On 10/17/07, Steve Wart steve.wart@gmail.com wrote:

...
I don't know if mapping Smalltalk processes to native

threads is the

...
way to go, given the pain I've seen in the Java and C# space.

Shared-memory parallelism has always been difficult. People claimed it was the language, the environment, or they needed better training. They always thought that with one more thing, they could "fix" shared-memory parallelism and make it usable. But Java has done a good job with providiing reasonable language primitives. There has been a lot of work on making threads efficient, and plenty of people have learned to write mutli-threaded Java. But it is still way too hard.

I think that shared-memory parallism, with explicit synchronization, is a bad idea. Transactional memory might be a solution, but it eliminates explicit synchronization. I think the most likely solution is to avoid shared memory altogether, and go with message passing. Erlang is a perfect example of this. We could take this approach in Smalltalk by making minimal images like Spoon, making images that are designed to be used by other images (angain, like Spoon), and then implementing our systms as hundreds or thousands of separate images. Image startup would have to be very fast. I think that this is more likely to be useful than rewriting garbage collectors to support parallelism.

-Ralph Johnson

David Mitchell

6:40 p.m.

New subject: Multy-core CPUs - communication among images

Check out MaClientServer (developed for Magma, but useful on its own):

http://liststest.squeakfoundation.org/pipermail/squeak-dev/2004-June/078767....

On 10/18/07, Petr Fischer petr.fischer@praguesoft.cz wrote:

...

Hi. What do you recommend for communication among running images? RemoteMessagingToolkit (RMT)? Remote smalltalk (rST)? Soap (ehm)? other (not via. TCP/IP stack - for multiple images running locally)?

Thanks, p.

On 18.10.2007, at 16:18, Sebastian Sastre wrote:

...
Hey this sounds a an interesting path to me. If we think in nature and it's design, that images could be analog to cells of a larger body. Fragmentation keep things simple without compromising scalability. Natural facts concluded that is more efficient not to develop few supercomplex brain cells but to develop zillions of a far simpler brain cells, this is, that are just complex enough, and make them able to setup in an inimaginable super complex network: a brain.

Other approach that also makes me conclude this is interesting is that we know that one object that is too smart smells bad. I mean it easily starts to become less flexible so less scalable in complexity, less intuitive (you have to learn more about how to use it), more to memorize, maintain, document, etc. So it is smarter but it could happen that it begins to become a bad deal because of beign too costly. Said so, if we think in those flexible mini images as objects, each one using a core we can scale enourmusly and almost trivially in this whole multicore thing and in a way we know it works.

Other interesting point is faul tolerance. If one of those images happen to pass a downtime (because a power faliure on the host where they where running or whatever reason) the system could happen to feel it somehow but not being in a complete faiure because there are other images to handle demand. A small (so efficient), well protected critical system can coordinate measures of contention for the "crisis" an hopefully the system never really makes feel it's own crisis to the users.

Again I found this is a tradeof about when to scale horizontally or vertically. For hardware, Intel and friends have scaled vertically (more bits and Hz for instance) for years as much as they where phisically able to do it. Now they reached a kind of barrier and started to scale horizontally (adding cores). Please don't fall in endless discussions, like the ones I saw out there, about comparing apples with bannanas because they are fruits but are not comparable. I mean it's about scaling but they are 2 different axis of a multidimensional scaling (complexity, load, performance, etc).

I'm thinking here as vertical being to make one squeak smarter to be capable to be trhead safe and horizontal to make one smart network of N squeaks.

Sometimes one choice will be a good business and sometimes it will be the other. I feel like the horizontal time has come. If that's true, to invest (time, $, effort) now in vertical scaling could happen to be have a lower cost/benefit rate if compared to the results of the investiment of horizontal scaling.

The truth is that this is all speculative and I don't know. But I do trust in nature.
  Cheers,
Sebastian Sastre

...
-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Ralph Johnson Enviado el: Jueves, 18 de Octubre de 2007 08:09 Para: The general-purpose Squeak developers list Asunto: Re: Multy-core CPUs

On 10/17/07, Steve Wart steve.wart@gmail.com wrote:

...
I don't know if mapping Smalltalk processes to native

threads is the

...
way to go, given the pain I've seen in the Java and C# space.

Shared-memory parallelism has always been difficult. People claimed it was the language, the environment, or they needed better training. They always thought that with one more thing, they could "fix" shared-memory parallelism and make it usable. But Java has done a good job with providiing reasonable language primitives. There has been a lot of work on making threads efficient, and plenty of people have learned to write mutli-threaded Java. But it is still way too hard.

I think that shared-memory parallism, with explicit synchronization, is a bad idea. Transactional memory might be a solution, but it eliminates explicit synchronization. I think the most likely solution is to avoid shared memory altogether, and go with message passing. Erlang is a perfect example of this. We could take this approach in Smalltalk by making minimal images like Spoon, making images that are designed to be used by other images (angain, like Spoon), and then implementing our systms as hundreds or thousands of separate images. Image startup would have to be very fast. I think that this is more likely to be useful than rewriting garbage collectors to support parallelism.

-Ralph Johnson

Petr Fischer

19 Oct 19 Oct

6:39 p.m.

New subject: Multy-core CPUs - communication among images

Just reading http://wiki.squeak.org/squeak/2978 Looks great, thanks for tip. pf

On 18.10.2007, at 18:40, David Mitchell wrote:

...

Check out MaClientServer (developed for Magma, but useful on its own):

http://liststest.squeakfoundation.org/pipermail/squeak-dev/2004- June/078767.html

On 10/18/07, Petr Fischer petr.fischer@praguesoft.cz wrote:

...
Hi. What do you recommend for communication among running images? RemoteMessagingToolkit (RMT)? Remote smalltalk (rST)? Soap (ehm)? other (not via. TCP/IP stack - for multiple images running locally)?

Thanks, p.

On 18.10.2007, at 16:18, Sebastian Sastre wrote:

...
Hey this sounds a an interesting path to me. If we think in nature and it's design, that images could be analog to cells of a larger body. Fragmentation keep things simple without compromising scalability. Natural facts concluded that is more efficient not to develop few supercomplex brain cells but to develop zillions of a far simpler brain cells, this is, that are just complex enough, and make them able to setup in an inimaginable super complex network: a brain.

Other approach that also makes me conclude this is interesting is that we know that one object that is too smart smells bad. I mean it easily starts to become less flexible so less scalable in complexity, less intuitive (you have to learn more about how to use it), more to memorize, maintain, document, etc. So it is smarter but it could happen that it begins to become a bad deal because of beign too costly. Said so, if we think in those flexible mini images as objects, each one using a core we can scale enourmusly and almost trivially in this whole multicore thing and in a way we know it works.

Other interesting point is faul tolerance. If one of those images happen to pass a downtime (because a power faliure on the host where they where running or whatever reason) the system could happen to feel it somehow but not being in a complete faiure because there are other images to handle demand. A small (so efficient), well protected critical system can coordinate measures of contention for the "crisis" an hopefully the system never really makes feel it's own crisis to the users.

Again I found this is a tradeof about when to scale horizontally or vertically. For hardware, Intel and friends have scaled vertically (more bits and Hz for instance) for years as much as they where phisically able to do it. Now they reached a kind of barrier and started to scale horizontally (adding cores). Please don't fall in endless discussions, like the ones I saw out there, about comparing apples with bannanas because they are fruits but are not comparable. I mean it's about scaling but they are 2 different axis of a multidimensional scaling (complexity, load, performance, etc).

I'm thinking here as vertical being to make one squeak smarter to be capable to be trhead safe and horizontal to make one smart network of N squeaks.

Sometimes one choice will be a good business and sometimes it will be the other. I feel like the horizontal time has come. If that's true, to invest (time, $, effort) now in vertical scaling could happen to be have a lower cost/benefit rate if compared to the results of the investiment of horizontal scaling.

The truth is that this is all speculative and I don't know. But I do trust in nature.
  Cheers,
Sebastian Sastre

...
-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Ralph Johnson Enviado el: Jueves, 18 de Octubre de 2007 08:09 Para: The general-purpose Squeak developers list Asunto: Re: Multy-core CPUs

On 10/17/07, Steve Wart steve.wart@gmail.com wrote:

...
I don't know if mapping Smalltalk processes to native

threads is the

...
way to go, given the pain I've seen in the Java and C# space.

Shared-memory parallelism has always been difficult. People claimed it was the language, the environment, or they needed better training. They always thought that with one more thing, they could "fix" shared-memory parallelism and make it usable. But Java has done a good job with providiing reasonable language primitives. There has been a lot of work on making threads efficient, and plenty of people have learned to write mutli-threaded Java. But it is still way too hard.

I think that shared-memory parallism, with explicit synchronization, is a bad idea. Transactional memory might be a solution, but it eliminates explicit synchronization. I think the most likely solution is to avoid shared memory altogether, and go with message passing. Erlang is a perfect example of this. We could take this approach in Smalltalk by making minimal images like Spoon, making images that are designed to be used by other images (angain, like Spoon), and then implementing our systms as hundreds or thousands of separate images. Image startup would have to be very fast. I think that this is more likely to be useful than rewriting garbage collectors to support parallelism.

-Ralph Johnson

Sebastian Sastre

18 Oct 18 Oct

6:54 p.m.

New subject: Multy-core CPUs - communication among images

Hi Peter, look.. I've implemented RemotedObjects that is remake of rST available in squeaksource but even with the peformance improvments of using the sockets in full duplex I was hoping better results so I've freezed development there. I can't say anything about the RMT nor SOAP because I had no experience with them. Honestly I think we should consult someone with more experience in squeak and network than me. Perhaps people envolved with Croquet or Spoon can bring some experience/ideas/frameworks?

Cheers,

Sebastian Sastre

...

-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Petr Fischer Enviado el: Jueves, 18 de Octubre de 2007 12:36 Para: The general-purpose Squeak developers list Asunto: Re: Multy-core CPUs - communication among images

Hi. What do you recommend for communication among running images? RemoteMessagingToolkit (RMT)? Remote smalltalk (rST)? Soap (ehm)? other (not via. TCP/IP stack - for multiple images running locally)?

Thanks, p.

On 18.10.2007, at 16:18, Sebastian Sastre wrote:

...
Hey this sounds a an interesting path to me. If we think in

nature and

...
it's design, that images could be analog to cells of a larger body. Fragmentation keep things simple without compromising scalability. Natural facts concluded that is more efficient not to develop few

supercomplex brain

...
cells but to develop zillions of a far simpler brain cells,

this is,

...
that are just complex enough, and make them able to setup in an inimaginable super complex network: a brain.

Other approach that also makes me conclude this is

interesting is that

...
we know that one object that is too smart smells bad. I

mean it easily

...
starts to become less flexible so less scalable in complexity, less intuitive (you have to learn more about how to use it), more to memorize, maintain, document, etc. So it is smarter but it could happen that it begins to become a bad deal because of beign too costly. Said so, if we think in those flexible mini images

as objects,

...
each one using a core we can scale enourmusly and almost

trivially in

...
this whole multicore thing and in a way we know it works.

Other interesting point is faul tolerance. If one of those images happen to pass a downtime (because a power faliure on the

host where

...
they where running or whatever reason) the system could

happen to feel

...
it somehow but not being in a complete faiure because there

are other

...
images to handle demand. A small (so efficient), well protected critical system can coordinate measures of contention for

the "crisis"

...
an hopefully the system never really makes feel it's own

crisis to the

...
users.

Again I found this is a tradeof about when to scale horizontally or vertically. For hardware, Intel and friends have scaled vertically (more bits and Hz for instance) for years as much as they where phisically able to do it. Now they reached a kind of barrier and started to scale horizontally (adding cores). Please don't fall in endless discussions, like the ones I saw out there, about comparing apples with bannanas because they are fruits but are not

comparable. I

...
mean it's about scaling but they are 2 different axis of a multidimensional scaling (complexity, load, performance, etc).

I'm thinking here as vertical being to make one squeak

smarter to be

...
capable to be trhead safe and horizontal to make one smart

network of

...
N squeaks.

Sometimes one choice will be a good business and sometimes

it will be

...
the other. I feel like the horizontal time has come. If

that's true,

...
to invest (time, $, effort) now in vertical scaling could

happen to be

...
have a lower cost/benefit rate if compared to the results of the investiment of horizontal scaling.

The truth is that this is all speculative and I don't know.

But I do

...
trust in nature.

Cheers,

Sebastian Sastre

...
-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En

nombre de

...
...
Ralph Johnson Enviado el: Jueves, 18 de Octubre de 2007 08:09 Para: The general-purpose Squeak developers list Asunto: Re: Multy-core CPUs

On 10/17/07, Steve Wart steve.wart@gmail.com wrote:

...
I don't know if mapping Smalltalk processes to native

threads is the

...
way to go, given the pain I've seen in the Java and C# space.

Shared-memory parallelism has always been difficult.

People claimed

...
...
it was the language, the environment, or they needed

better training.

...
...
They always thought that with one more thing, they could "fix" shared-memory parallelism and make it usable. But Java has done a good job with providiing reasonable language primitives.

There has

...
...
been a lot of work on making threads efficient, and plenty

of people

...
...
have learned to write mutli-threaded Java. But it is

still way too

...
...
hard.

I think that shared-memory parallism, with explicit

synchronization,

...
...
is a bad idea. Transactional memory might be a solution, but it eliminates explicit synchronization. I think the most likely solution is to avoid shared memory altogether, and go with message passing. Erlang is a perfect example of this. We could take this

approach in

...
...
Smalltalk by making minimal images like Spoon, making

images that are

...
...
designed to be used by other images (angain, like Spoon), and then implementing our systms as hundreds or thousands of

separate images.

...
...
Image startup would have to be very fast. I think that

this is more

...
...
likely to be useful than rewriting garbage collectors to support parallelism.

-Ralph Johnson

Jason Johnson

7:06 p.m.

New subject: Multy-core CPUs - communication among images

Ugh, not SOAP. Unless you plan to talk to non-smalltalk entities. Smalltalk already deals with objects serialized binarily and deals with the case that the save was done on a machine with a different byte ordering scheme. I would think that system could be exploited to dump Smalltalk data raw across a link (including running methods frozen before transfer).

Maybe Spoon is doing something like this? As old as Smalltalk is, someone must be. :)

On 10/18/07, Petr Fischer petr.fischer@praguesoft.cz wrote:

...

Hi. What do you recommend for communication among running images? RemoteMessagingToolkit (RMT)? Remote smalltalk (rST)? Soap (ehm)? other (not via. TCP/IP stack - for multiple images running locally)?

Thanks, p.

On 18.10.2007, at 16:18, Sebastian Sastre wrote:

...
Hey this sounds a an interesting path to me. If we think in nature and it's design, that images could be analog to cells of a larger body. Fragmentation keep things simple without compromising scalability. Natural facts concluded that is more efficient not to develop few supercomplex brain cells but to develop zillions of a far simpler brain cells, this is, that are just complex enough, and make them able to setup in an inimaginable super complex network: a brain.

Other approach that also makes me conclude this is interesting is that we know that one object that is too smart smells bad. I mean it easily starts to become less flexible so less scalable in complexity, less intuitive (you have to learn more about how to use it), more to memorize, maintain, document, etc. So it is smarter but it could happen that it begins to become a bad deal because of beign too costly. Said so, if we think in those flexible mini images as objects, each one using a core we can scale enourmusly and almost trivially in this whole multicore thing and in a way we know it works.

Other interesting point is faul tolerance. If one of those images happen to pass a downtime (because a power faliure on the host where they where running or whatever reason) the system could happen to feel it somehow but not being in a complete faiure because there are other images to handle demand. A small (so efficient), well protected critical system can coordinate measures of contention for the "crisis" an hopefully the system never really makes feel it's own crisis to the users.

Again I found this is a tradeof about when to scale horizontally or vertically. For hardware, Intel and friends have scaled vertically (more bits and Hz for instance) for years as much as they where phisically able to do it. Now they reached a kind of barrier and started to scale horizontally (adding cores). Please don't fall in endless discussions, like the ones I saw out there, about comparing apples with bannanas because they are fruits but are not comparable. I mean it's about scaling but they are 2 different axis of a multidimensional scaling (complexity, load, performance, etc).

I'm thinking here as vertical being to make one squeak smarter to be capable to be trhead safe and horizontal to make one smart network of N squeaks.

Sometimes one choice will be a good business and sometimes it will be the other. I feel like the horizontal time has come. If that's true, to invest (time, $, effort) now in vertical scaling could happen to be have a lower cost/benefit rate if compared to the results of the investiment of horizontal scaling.

The truth is that this is all speculative and I don't know. But I do trust in nature.
  Cheers,
Sebastian Sastre

...
-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Ralph Johnson Enviado el: Jueves, 18 de Octubre de 2007 08:09 Para: The general-purpose Squeak developers list Asunto: Re: Multy-core CPUs

On 10/17/07, Steve Wart steve.wart@gmail.com wrote:

...
I don't know if mapping Smalltalk processes to native

threads is the

...
way to go, given the pain I've seen in the Java and C# space.

Shared-memory parallelism has always been difficult. People claimed it was the language, the environment, or they needed better training. They always thought that with one more thing, they could "fix" shared-memory parallelism and make it usable. But Java has done a good job with providiing reasonable language primitives. There has been a lot of work on making threads efficient, and plenty of people have learned to write mutli-threaded Java. But it is still way too hard.

I think that shared-memory parallism, with explicit synchronization, is a bad idea. Transactional memory might be a solution, but it eliminates explicit synchronization. I think the most likely solution is to avoid shared memory altogether, and go with message passing. Erlang is a perfect example of this. We could take this approach in Smalltalk by making minimal images like Spoon, making images that are designed to be used by other images (angain, like Spoon), and then implementing our systms as hundreds or thousands of separate images. Image startup would have to be very fast. I think that this is more likely to be useful than rewriting garbage collectors to support parallelism.

-Ralph Johnson

Jason Johnson

6:39 p.m.

Here is a break down from February of the different options for dealing with threading (and therefor Multi-cores):

http://lists.squeakfoundation.org/pipermail/squeak-dev/2007-February/114181.....

I see since then (or could have been before, I don't see a date) Lukas and co. have written a paper about adding STM to Squeak.

http://www.lukas-renggli.ch/files/95/wwpettvsbj457o5i530ou2lrptx0is/transmem...

On 10/17/07, Sebastian Sastre ssastre@seaswork.com wrote:

...

This is not my area but I imagine that somehow Squeak processes should map to OS native threads paralellizable by each of the cores. Any chance to Exupery be of some help on that? I ask because if it is then is a must for that future.
    regards,
Sebastian Sastre

...
-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de gruntfuttuck Enviado el: Miércoles, 17 de Octubre de 2007 06:10 Para: squeak-dev@lists.squeakfoundation.org Asunto: Multy-core CPUs

How is squeak going to handle multy-core CPUs, if at all? If we see cores of 100 plus in the future and squeak stay as it is, I would imagine other languages such as erlang, will look more attractive. -- View this message in context: http://www.nabble.com/Multy-core-CPUs-tf4639074.html#a13249733 Sent from the Squeak - Dev mailing list archive at Nabble.com.

Davide Varvello

17 Oct 17 Oct

10:30 p.m.

New subject: Metrics

Hi all! Have you knowledge of tools to retrieve metrics on Smalltalk code like McCabe, coupling, NCSS...? TIA Davide

Lukas Renggli

11:56 p.m.

New subject: Metrics

...

Have you knowledge of tools to retrieve metrics on Smalltalk code like McCabe, coupling, NCSS...?

http://moose.unibe.ch/

Lukas

-- Lukas Renggli http://www.lukas-renggli.ch

Davide Varvello

18 Oct 18 Oct

10:19 a.m.

New subject: Metrics

Thanks Lukas. Davide

Lukas Renggli wrote:

...

...
Have you knowledge of tools to retrieve metrics on Smalltalk code like McCabe, coupling, NCSS...?

http://moose.unibe.ch/

Lukas

Michael van der Gulik

23 Oct 23 Oct

11:03 a.m.

On 10/17/07, gruntfuttuck gruntfuttuck@gmail.com wrote:

...

How is squeak going to handle multy-core CPUs, if at all? If we see cores of 100 plus in the future and squeak stay as it is, I would imagine other languages such as erlang, will look more attractive.

The answer seems pretty obvious: modify the VM to support them. I'll skim over the details which I'm sure everybody already knows.

My question is: what are we going to do with multi-core CPUs? The code in the image is almost all single threaded. Morphic freezes up when I run something in the workspace (!!). Smalltalkers just don't seem to understand multi-threaded code, even though the basic capabilities been available to them since day one.

I use Futures now and then; I implemented them myself:

f := Future doing: [ some long computation ]. ... insert more code here which runs in parallel with the long computation. f printResult. "Will block until the long computation has returned the result into f and then print the result. "

I imagine that a parallel collection package would be possible to make:

c := ParOrderedCollection new. "or ParSet, ParBag..." c addAll: lots of stuff. c do: [ :each | each doSomething ]. "Will fork a Process for each element in c. " c map: [ :each | each transform] andGather: [ :each :sum | sum combineWith: each]. "Google's map and gather algorithm"

Object>>changed: can be modified to be parallel; this makes the dependents/updating framework parallel. I did this and the image seemed to work fine.

There's heaps of parallel stuff you can do in Squeak. One day I'd like to have a crack at making the VM use pthreads more, but that will be the day after people actually start writing parallel code.

Gulik.

-- http://people.squeakfoundation.org/person/mikevdg http://gulik.pbwiki.com/

Matej Kosik

25 Oct 25 Oct

10:43 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

gruntfuttuck wrote:

...

How is squeak going to handle multy-core CPUs, if at all? If we see cores of 100 plus in the future and squeak stay as it is, I would imagine other languages such as erlang, will look more attractive.

Anyone followed links that Andreas gave?

http://www.erights.org

???

There is a dissertation that addresses two fundamental problems: - - it introduces synchronization mechanisms that are meant to escape from the situation where you - either have interference - or non-deterministic deadlock (Erlang does not solve these problems. The recent book about Erlang does not even mention the word "deadlock") - - *security*

It is a point-of-view-changing reading.

It is true that it is far less mature for production quality, but taking ideas from something as primitive as Erlang ... well, good luck. - -- Matej Kosik ICQ: 300133844 skype: matej_kosik

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHIFcoL+CaXfJI/hgRAux5AKCy100+3z/AS3+/GhTevVW9uS8VDgCgw2e5 5yJj4MDA59uTyTXQX+3M6bQ= =gtQ3 -----END PGP SIGNATURE-----

Frank Shearar

1:43 p.m.

New subject: Erlang a primitive language? (was Re: Multy-core CPUs)

"Matej Kosik" kosik@fiit.stuba.sk wrote:

...

gruntfuttuck wrote:

...
How is squeak going to handle multy-core CPUs, if at all? If we see

cores of

...

...
100 plus in the future and squeak stay as it is, I would imagine other languages such as erlang, will look more attractive.

Anyone followed links that Andreas gave?

http://www.erights.org

???

There is a dissertation that addresses two fundamental problems:

it introduces synchronization mechanisms that are meant to escape

from the situation where you

either have interference

or non-deterministic deadlock

(Erlang does not solve these problems. The recent book about Erlang does not even mention the word "deadlock")

*security*

It is a point-of-view-changing reading.

It is true that it is far less mature for production quality, but taking

ideas from something as

...

primitive as Erlang ... well, good luck.

Why exactly is Erlang primitive? What do you mean? Do you mean that it can't solve real-world problems?

It's proven itself in at the very least its original target market, that of the highly concurrent environment found in things like telephony switches. It certainly seems applicable in other highly concurrent environments. We're not talking about some new thought experiment - this is a language that was released in its open source form in 1998, and whose initial development started in 1986. There are case studies of _successful_ large systems (Ericsson's AXD301, in particular).

frank

Matej Kosik

4:24 p.m.

New subject: Erlang a primitive language? (was Re: Multy-core CPUs)

Hi Frank,

Frank Shearar wrote:

...

"Matej Kosik" kosik@fiit.stuba.sk wrote:

...
gruntfuttuck wrote:

...
How is squeak going to handle multy-core CPUs, if at all? If we see

cores of

...
...
100 plus in the future and squeak stay as it is, I would imagine other languages such as erlang, will look more attractive.

Anyone followed links that Andreas gave?

http://www.erights.org

???

There is a dissertation that addresses two fundamental problems:

it introduces synchronization mechanisms that are meant to escape

from the situation where you

either have interference

or non-deterministic deadlock

(Erlang does not solve these problems. The recent book about Erlang does not even mention the word "deadlock")

*security*

It is a point-of-view-changing reading.

It is true that it is far less mature for production quality, but taking

ideas from something as

...
primitive as Erlang ... well, good luck.

Why exactly is Erlang primitive? What do you mean? Do you mean that it can't solve real-world problems?

It's proven itself in at the very least its original target market, that of the highly concurrent environment found in things like telephony switches. It certainly seems applicable in other highly concurrent environments. We're not talking about some new thought experiment - this is a language that was released in its open source form in 1998, and whose initial development started in 1986. There are case studies of _successful_ large systems (Ericsson's AXD301, in particular).

In what ways is Erlang better than E? In what ways E cannot be extended to be as good as Erlang?

E supports objet-oriented programming. Erlang is not object-oriented.

Non-deterministic deadlocks cannot occur in E. Non-deterministic deadlocks can occur in E.

E can be used for building open systems (where various parties can cooperate despite the fact that they do not trust each other ultimately). Erlang does not provide such mechanisms.

E can be used for building systems internally composed from multiple untrusted components without security risk because E provides mechanisms that enable programmers to follow with principle of the least authority. Principle of the least authority cannot be followed in Erlang. Any part of the Erlang system has the authority to destroy the whole system and perform irreversible damage.

I do not say that E is perfect. I only believe that it is more valuable than Erlang (with respect to future) and should not be demoted just because it is not popular. Of course, the more people use a particular language, the bigger chance that it will ripe (to its full potential/but no further). - -- Matej Kosik ICQ: 300133844 skype: matej_kosik

Frank Shearar

5:35 p.m.

New subject: Erlang a primitive language? (was Re: Multy-core CPUs)

"Matej Kosik" kosik@fiit.stuba.sk wrote:

...

Hi Frank,

Frank Shearar wrote:

...
"Matej Kosik" kosik@fiit.stuba.sk wrote:

...
gruntfuttuck wrote:

...
How is squeak going to handle multy-core CPUs, if at all? If we see

cores of

...
...
100 plus in the future and squeak stay as it is, I would imagine other languages such as erlang, will look more attractive.

Anyone followed links that Andreas gave?

http://www.erights.org

???

There is a dissertation that addresses two fundamental problems:

it introduces synchronization mechanisms that are meant to escape

from the situation where you

either have interference

or non-deterministic deadlock

(Erlang does not solve these problems. The recent book about Erlang does not even mention the word

"deadlock")

...

...
...

*security*

It is a point-of-view-changing reading.

It is true that it is far less mature for production quality, but

taking

...

...
ideas from something as

...
primitive as Erlang ... well, good luck.

Why exactly is Erlang primitive? What do you mean? Do you mean that it

can't

...

...
solve real-world problems?

It's proven itself in at the very least its original target market, that

...

...
the highly concurrent environment found in things like telephony

switches.

...

...
It certainly seems applicable in other highly concurrent environments.

We're

...

...
not talking about some new thought experiment - this is a language that

was

...

...
released in its open source form in 1998, and whose initial development started in 1986. There are case studies of _successful_ large systems (Ericsson's AXD301, in particular).

In what ways is Erlang better than E? In what ways E cannot be extended to be as good as Erlang?

Well, Erlang is "better than" E because I know of at least one large system written in Erlang, while I know of none in E. That's not much, I know :)

I certainly don't mean to imply that E has nothing to teach Smalltalk - I think there are LOTS of languages out there that can teach us LOTS of things.

...

E supports objet-oriented programming. Erlang is not object-oriented.

Well, _that's_ certainly not evidence of being primitive! Haskell's not object oriented either, nor is Lisp (without CLOS).

...

Non-deterministic deadlocks cannot occur in E. Non-deterministic deadlocks can occur in E.

What is the difference between E's promise architecure, and Erlang's combination of ! and receive? (! sends an asynchronous message to a process, and receive checks the mailbox of a process for messages in a blocking fashion, and can time out.) I mean, is one more powerful than the other? (From my cursor reading of E in a Walnut, I think they're equivalent.)

...

E can be used for building open systems (where various parties can

cooperate despite the fact that

...

they do not trust each other ultimately). Erlang does not provide such

mechanisms.

Sure; Erlang focuses on _concurrency_ issues, not security ones.

...

E can be used for building systems internally composed from multiple

untrusted components without

...

security risk because E provides mechanisms that enable programmers to

follow with principle of the

...

least authority. Principle of the least authority cannot be followed in Erlang. Any part of

the Erlang system has the

...

authority to destroy the whole system and perform irreversible damage.

Yes, much like, say, Smalltalk. That doesn't make Smalltalk primitive. On the other hand, perhaps one could implement an E-on-Erlang.

Then again, while it's usual for distributed Erlang applications to run in a trusted environment, there's nothing stopping one from writing applications that use sockets to communicate, in which case the Erlang applications's as secure or insecure as you make it. (Yes, I know, E addresses exactly this issue.)

...

I do not say that E is perfect. I only believe that it is more valuable

than Erlang (with respect to

...

future) and should not be demoted just because it is not popular. Of

course, the more people use a

...

particular language, the bigger chance that it will ripe (to its full

potential/but no further).

So you feel that E is superior to Erlang because, while both support equivalent ways of writing distributed programs, E focuses on security issues right from the get-go?

Anyway, I just object to (pejoratively) labelling a language as "primitive" when it has delivered on what it set out to do.

frank

Matej Kosik

7:51 p.m.

New subject: Erlang a primitive language? (was Re: Multy-core CPUs)

Frank Shearar wrote:

...

"Matej Kosik" kosik@fiit.stuba.sk wrote:

...
Hi Frank,

Frank Shearar wrote:

...
"Matej Kosik" kosik@fiit.stuba.sk wrote:

...
gruntfuttuck wrote:

...
How is squeak going to handle multy-core CPUs, if at all? If we see

cores of

...
...
100 plus in the future and squeak stay as it is, I would imagine other languages such as erlang, will look more attractive.

Anyone followed links that Andreas gave?

http://www.erights.org

???

There is a dissertation that addresses two fundamental problems:

it introduces synchronization mechanisms that are meant to escape

from the situation where you

either have interference

or non-deterministic deadlock

(Erlang does not solve these problems. The recent book about Erlang does not even mention the word

"deadlock")

...
...
...

*security*

It is a point-of-view-changing reading.

It is true that it is far less mature for production quality, but

taking

...
...
ideas from something as

...
primitive as Erlang ... well, good luck.

Why exactly is Erlang primitive? What do you mean? Do you mean that it

can't

...
...
solve real-world problems?

It's proven itself in at the very least its original target market, that

of

...
...
the highly concurrent environment found in things like telephony

switches.

...
...
It certainly seems applicable in other highly concurrent environments.

We're

...
...
not talking about some new thought experiment - this is a language that

was

...
...
released in its open source form in 1998, and whose initial development started in 1986. There are case studies of _successful_ large systems (Ericsson's AXD301, in particular).

In what ways is Erlang better than E? In what ways E cannot be extended to be as good as Erlang?

Well, Erlang is "better than" E because I know of at least one large system written in Erlang, while I know of none in E. That's not much, I know :) I certainly don't mean to imply that E has nothing to teach Smalltalk - I think there are LOTS of languages out there that can teach us LOTS of things.

...
E supports objet-oriented programming. Erlang is not object-oriented.

Well, _that's_ certainly not evidence of being primitive! Haskell's not object oriented either, nor is Lisp (without CLOS).

I am sorry. This wasn't the proper word.

Certainly, I believe that there are things that cannot be modelled in the pure functional language (whose constructs can be without exceptions mapped to the lambda-calculus). This is not only the case of input/output. This is also the case of modelling stateful systems that interact with their environment over time (not only at the beginning and at the end). So pretending that functional programming can cover all the important aspects of systems we need to model is unfaithful. Those impurities are useful.

...

...
Non-deterministic deadlocks cannot occur in E. Non-deterministic deadlocks can occur in E.

What is the difference between E's promise architecure, and Erlang's combination of ! and receive? (! sends an asynchronous message to a process, and receive checks the mailbox of a process for messages in a blocking fashion, and can time out.) I mean, is one more powerful than the other? (From my cursor reading of E in a Walnut, I think they're equivalent.)

Well, their semantics is different. I am used to objects and I want to describe systems as mutualy interacting objects. My bank acount is an object, it has some state and some methods. I find it natural.

Erlang's processes are somewhat similar to objects, they create illusion of polymorphism (this is useful) and encapsulation. However, internally, Erlang processes are expressed procedurally. In E, behavior of processes (vats) is expressed as mutually interacting set of objects. So objects are polymorphic, not only the whole process.

E and Erlang are similar that internal behavior of a process is sequential and the whole process acts as a "monitor" (although in E ensures also partial ordering of messages. This is also useful.)

If the statement "(non-deterministic) deadlocks cannot occur in E" is correct (and this is fantastic) then E's communication mechanisms are less expressive than Erlang's. In Erlang you can write systems with non-deterministic deadlock. In E you cannot. So less expressive language need not to be a disadvantage.

E ensures partial ordering of messages passed around (also transitively). Erlang does not. This has implications.

Erlang does not provide you directly with promise pipelining (it had to be added) whereas it is directly available in E.

...

...
E can be used for building open systems (where various parties can

cooperate despite the fact that

...
they do not trust each other ultimately). Erlang does not provide such

mechanisms.

Sure; Erlang focuses on _concurrency_ issues, not security ones.

...
E can be used for building systems internally composed from multiple

untrusted components without

...
security risk because E provides mechanisms that enable programmers to

follow with principle of the

...
least authority. Principle of the least authority cannot be followed in Erlang. Any part of

the Erlang system has the

...
authority to destroy the whole system and perform irreversible damage.

Yes, much like, say, Smalltalk.

I agree. Neither Erlang nor Smalltalk were designed with security in mind. Funny thing about security is, that if you do not get it right in the beginning, you cannot "add it later".

...

That doesn't make Smalltalk primitive. On the other hand, perhaps one could implement an E-on-Erlang.

It is possible, but it would be E, not Erlang. (although it might be more efficient than E-on-Java)

...

Then again, while it's usual for distributed Erlang applications to run in a trusted environment

Suppose WatchMorph in Squeak. It can delete you home directory. Watch morph is simple so you can read the code and check it. In case of non-trivial system it would not be realistic.

In E, you can put the untrusted code into a powerbox---thus, giving it as much authority as it needs and you are willing to grant it. Then, despite the fact that it is an untrusted piece of code, you can use it and it can be useful. Not using these techniques is a missed chance. Security cannot be "added later".

Erlang cannot be used for writing software this way.

...

there's nothing stopping one from writing applications that use sockets to communicate, in which case the Erlang applications's as secure or insecure as you make it. (Yes, I know, E addresses exactly this issue.)

This would mean emulating E.

...

...
I do not say that E is perfect. I only believe that it is more valuable

than Erlang (with respect to

...
future) and should not be demoted just because it is not popular. Of

course, the more people use a

...
particular language, the bigger chance that it will ripe (to its full

potential/but no further).

So you feel that E is superior to Erlang because, while both support equivalent ways of writing distributed programs, E focuses on security issues right from the get-go?

Anyway, I just object to (pejoratively) labelling a language as "primitive" when it has delivered on what it set out to do.

Again. Sorry.

...

frank

- -- Matej Kosik ICQ: 300133844 skype: matej_kosik

Jason Johnson

8:08 p.m.

New subject: Erlang a primitive language? (was Re: Multy-core CPUs)

On 10/25/07, Matej Kosik kosik@fiit.stuba.sk wrote:

...

I am sorry. This wasn't the proper word.

Certainly, I believe that there are things that cannot be modelled in the pure functional language (whose constructs can be without exceptions mapped to the lambda-calculus). This is not only the case of input/output. This is also the case of modelling stateful systems that interact with their environment over time (not only at the beginning and at the end). So pretending that functional programming can cover all the important aspects of systems we need to model is unfaithful. Those impurities are useful.

Um, you are aware that lambda-calculus is Turing equivalent right?

http://en.wikipedia.org/wiki/Turing-complete

"The untyped lambda calculus is Turing-complete, but many typed lambda calculi, including System F, are not."

Matej Kosik

8:44 p.m.

New subject: Erlang a primitive language? (was Re: Multy-core CPUs)

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Jason Johnson wrote:

...

On 10/25/07, Matej Kosik kosik@fiit.stuba.sk wrote:

...
I am sorry. This wasn't the proper word.

Certainly, I believe that there are things that cannot be modelled in the pure functional language (whose constructs can be without exceptions mapped to the lambda-calculus). This is not only the case of input/output. This is also the case of modelling stateful systems that interact with their environment over time (not only at the beginning and at the end). So pretending that functional programming can cover all the important aspects of systems we need to model is unfaithful. Those impurities are useful.

Um, you are aware that lambda-calculus is Turing equivalent right?

http://en.wikipedia.org/wiki/Turing-complete

Absolutely correct.

But there are strictly more interesting things that we want to describe whose behavior cannot be described in the lambda-calculus.

The thing that comes to my mind is - - ClockMorph - - the web-server - - the programmable interrupt timer - - Erlang concurrent, mutually interacting, processes. ...

There are other (equivalent) formalisms that cover also these systems formally. However, it is true that dealing with them is more difficult. But they exist and it does not make sense that they do not exist only because we (me not excluding) do not fully understand them.

...

"The untyped lambda calculus is Turing-complete, but many typed lambda calculi, including System F, are not."

- -- Matej Kosik ICQ: 300133844 skype: matej_kosik

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHIOQAL+CaXfJI/hgRAs6nAKCsiz4abhVOPEgH4CIRSKOxpzaLAgCgitw4 6F6DWvWGjpA/2+eGI3eWZyQ= =q9RE -----END PGP SIGNATURE-----

Jason Johnson

9:11 p.m.

New subject: Erlang a primitive language? (was Re: Multy-core CPUs)

On 10/25/07, Matej Kosik kosik@fiit.stuba.sk wrote:

...

But there are strictly more interesting things that we want to describe whose behavior cannot be described in the lambda-calculus.

The thing that comes to my mind is

ClockMorph

the web-server

the programmable interrupt timer

Erlang concurrent, mutually interacting, processes.

Why not? You don't need to update variables to have updates. And, you know that Erlang can not modify variables after creation right, although I don't know if this is the property of L-C you had in mind?

Matej Kosik

9:50 p.m.

New subject: Erlang a primitive language? (was Re: Multy-core CPUs)

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Jason Johnson wrote:

...

On 10/25/07, Matej Kosik kosik@fiit.stuba.sk wrote:

...
But there are strictly more interesting things that we want to describe whose behavior cannot be described in the lambda-calculus.

The thing that comes to my mind is

ClockMorph

the web-server

the programmable interrupt timer

Erlang concurrent, mutually interacting, processes.

Why not? You don't need to update variables to have updates. And, you know that Erlang can not modify variables after creation right, although I don't know if this is the property of L-C you had in mind?

Sorry, what is L-C?

Some (infinitely many) processes expressed in the pi-calculus cannot be modeled in the lambda-calculus because they cannot be regarded as algorithms. http://www.amazon.com/s/ref=nb_ss_gw/102-3481753-9537767?initialSearch=1&...

Am I missing something?

Best regard - -- Matej Kosik ICQ: 300133844 skype: matej_kosik

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHIPNxL+CaXfJI/hgRAlgeAJwPJuCZgJXpwQvwgR76h6X8H7xoNQCdHhTo Xm8BOw7excfEOsti21GSg/I= =6IqL -----END PGP SIGNATURE-----

Jason Johnson

10:14 p.m.

New subject: Erlang a primitive language? (was Re: Multy-core CPUs)

Sorry, I was trying to make an abbreviation for lambda-calculus

On 10/25/07, Matej Kosik kosik@fiit.stuba.sk wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Jason Johnson wrote:

...
On 10/25/07, Matej Kosik kosik@fiit.stuba.sk wrote:

...
But there are strictly more interesting things that we want to describe whose behavior cannot be described in the lambda-calculus.

The thing that comes to my mind is

ClockMorph

the web-server

the programmable interrupt timer

Erlang concurrent, mutually interacting, processes.

Why not? You don't need to update variables to have updates. And, you know that Erlang can not modify variables after creation right, although I don't know if this is the property of L-C you had in mind?

Sorry, what is L-C?

Some (infinitely many) processes expressed in the pi-calculus cannot be modeled in the lambda-calculus because they cannot be regarded as algorithms. http://www.amazon.com/s/ref=nb_ss_gw/102-3481753-9537767?initialSearch=1&...

Am I missing something?

Best regard

Matej Kosik ICQ: 300133844 skype: matej_kosik -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHIPNxL+CaXfJI/hgRAlgeAJwPJuCZgJXpwQvwgR76h6X8H7xoNQCdHhTo Xm8BOw7excfEOsti21GSg/I= =6IqL -----END PGP SIGNATURE-----

Peter William Lount

8:47 p.m.

New subject: Erlang a primitive language? (was Re: Multy-core CPUs)

Jason Johnson wrote:

...

On 10/25/07, Matej Kosik kosik@fiit.stuba.sk wrote:

...
I am sorry. This wasn't the proper word.

Certainly, I believe that there are things that cannot be modelled in the pure functional language (whose constructs can be without exceptions mapped to the lambda-calculus). This is not only the case of input/output. This is also the case of modelling stateful systems that interact with their environment over time (not only at the beginning and at the end). So pretending that functional programming can cover all the important aspects of systems we need to model is unfaithful. Those impurities are useful.

Um, you are aware that lambda-calculus is Turing equivalent right?

http://en.wikipedia.org/wiki/Turing-complete

"The untyped lambda calculus is Turing-complete, but many typed lambda calculi, including System F, are not."

Hi,

As an aside, the simplest possible Universal Turning Machine was just discovered by a mathematical proof. See http://www.wolframscience.com/prizes/tm23/solution_news.html.

Yes, almost any computing machine can compute any solution. But ick, not in practice. Shivers.

All the best,

Peter

Jason Johnson

8:14 p.m.

New subject: Erlang a primitive language? (was Re: Multy-core CPUs)

On 10/25/07, Matej Kosik kosik@fiit.stuba.sk wrote:

...

I agree. Neither Erlang nor Smalltalk were designed with security in mind. Funny thing about security is, that if you do not get it right in the beginning, you cannot "add it later".

The funny thing about security is that it is on the opposite end of a sliding scale, with "user friendly-ness" on the other side.

As far as "you cannot add it later", given that nearly all the secure systems I can think off of the top of my head were indeed "added later" (e.g. HTTPS), I think this claim is false.

Peter William Lount

8:49 p.m.

New subject: Erlang a primitive language? (was Re: Multy-core CPUs)

Jason Johnson wrote:

...

On 10/25/07, Matej Kosik kosik@fiit.stuba.sk wrote:

...
I agree. Neither Erlang nor Smalltalk were designed with security in mind. Funny thing about security is, that if you do not get it right in the beginning, you cannot "add it later".

The funny thing about security is that it is on the opposite end of a sliding scale, with "user friendly-ness" on the other side.

As far as "you cannot add it later", given that nearly all the secure systems I can think off of the top of my head were indeed "added later" (e.g. HTTPS), I think this claim is false.

Hi,

I agree with the point of view that it's better to add it before hand when it comes to certain topics like security.

Cheers,

peter

Jason Johnson

9:07 p.m.

New subject: Erlang a primitive language? (was Re: Multy-core CPUs)

And I prefer to see it at interface points. Not deep down in a language where it's making everything inconvenient.

On 10/25/07, Peter William Lount peter@smalltalk.org wrote:

...

Jason Johnson wrote: On 10/25/07, Matej Kosik kosik@fiit.stuba.sk wrote:

I agree. Neither Erlang nor Smalltalk were designed with security in mind. Funny thing about security is, that if you do not get it right in the beginning, you cannot "add it later".

The funny thing about security is that it is on the opposite end of a sliding scale, with "user friendly-ness" on the other side.

As far as "you cannot add it later", given that nearly all the secure systems I can think off of the top of my head were indeed "added later" (e.g. HTTPS), I think this claim is false.

Hi,

I agree with the point of view that it's better to add it before hand when it comes to certain topics like security.

Cheers,

peter

Jason Johnson

6:18 p.m.

On 10/25/07, Matej Kosik kosik@fiit.stuba.sk wrote:

...

Anyone followed links that Andreas gave?

Yes, some of it. Looks like it's based on the futures model.

...

(Erlang does not solve these problems.

No one *solves* these problems, the actor model Erlang uses is just a good way of dealing with most of the cases.

Peter William Lount

7:35 p.m.

Jason Johnson wrote:

...

On 10/25/07, Matej Kosik kosik@fiit.stuba.sk wrote:

...
Anyone followed links that Andreas gave?

Yes, some of it. Looks like it's based on the futures model.

...
(Erlang does not solve these problems.

No one *solves* these problems, the actor model Erlang uses is just a good way of dealing with most of the cases.

Hi,

"...most of the cases"? Hardly! It barely scratches the surface.

The "process-based model of concurrency" - as used in Erlang - is but one approach in a wide range of techniques that provide solutions for concurrency. It doesn't solve every problem in concurrency - I don't even think that they claim that for it. If they do please show us where.

Further the example of the one million object graph being processed by 10,000 compute nodes processing the problem is that you don't know in advance how to slice up the data. If you can know in advance how to slice up the data then you've simplified and possibly optimized the problem solving. However, that's the problem, slicing up real world data object sets that are highly interconnected with each other and processing them in parallel. That's an example of a more general case. There are other examples that won't compute with the slice em and dice em approach using the process-based model of concurrency.

Peter

Jason Johnson

8:03 p.m.

On 10/25/07, Peter William Lount peter@smalltalk.org wrote:

...

The "process-based model of concurrency" - as used in Erlang - is but one approach in a wide range of techniques that provide solutions for concurrency.

A wide range? I'm aware of variations of only 3 ideas. Could you expand on "wide range"?

...

It doesn't solve every problem in concurrency - I don't even think that they claim that for it. If they do please show us where.

Would you please stop making a statement that I obviously didn't say (you even quoted me!) and then attacking that statement you made as though it were mine? I find that quite disingenuous.

...

Further the example of the one million object graph being processed by 10,000 compute nodes processing the problem is that you don't know in advance how to slice up the data. If you can know in advance how to slice up the data then you've simplified and possibly optimized the problem solving. However, that's the problem, slicing up real world data object sets that are highly interconnected with each other and processing them in parallel. That's an example of a more general case. There are other examples that won't compute with the slice em and dice em approach using the process-based model of concurrency.

Do you have any real-world cases where it's a problem? I'm not interested in solving theoretical problems that never come up in actual practice.

Peter William Lount

8:44 p.m.

Hi,

Jason Johnson wrote:

...

On 10/25/07, Peter William Lount peter@smalltalk.org wrote:

...
The "process-based model of concurrency" - as used in Erlang - is but one approach in a wide range of techniques that provide solutions for concurrency.

A wide range? I'm aware of variations of only 3 ideas.

What are they?

...

Could you expand on "wide range"?

Sure, one only has to search the internet for "concurrency" and one sees a wide range of problems and potential solutions. Look at the Little Book of Semaphores for a breathtaking look at a few of the many possible solutions to various problems. Open your eyes to the wider horizon.

...

...
It doesn't solve every problem in concurrency - I don't even think that they claim that for it. If they do please show us where.

Would you please stop making a statement that I obviously didn't say (you even quoted me!) and then attacking that statement you made as though it were mine? I find that quite disingenuous.

I never said you stated that explicitly - I'd have to check all your postings to find that out. It's implied by what you are saying in many of your postings. At least that is the impression that I'm getting from your writing. You've certainly not acknowledged the opposite.

...

...
Further the example of the one million object graph being processed by 10,000 compute nodes processing the problem is that you don't know in advance how to slice up the data. If you can know in advance how to slice up the data then you've simplified and possibly optimized the problem solving. However, that's the problem, slicing up real world data object sets that are highly interconnected with each other and processing them in parallel. That's an example of a more general case. There are other examples that won't compute with the slice em and dice em approach using the process-based model of concurrency.

Do you have any real-world cases where it's a problem? I'm not interested in solving theoretical problems that never come up in actual practice.

Yes, the example I gave is a good summary of real world problems that actually occur (and that I'm working to solve for one project). It's not just theory, it's a harsh reality.

Cheers,

Peter

Jason Johnson

9:04 p.m.

On 10/25/07, Peter William Lount peter@smalltalk.org wrote:

...

What are they?

Here. Again. http://sixtyk.blogspot.com/2007/05/threading.html

...

Sure, one only has to search the internet for "concurrency" and one sees a wide range of problems and potential solutions. Look at the Little Book of Semaphores for a breathtaking look at a few of the many possible solutions to various problems. Open your eyes to the wider horizon.

Open my eyes to the wider horizon of yesterday? I've seen it. It's complicated. I prefer to look at tomorrow.

...

I never said you stated that explicitly - I'd have to check all your postings to find that out. It's implied by what you are saying in many of your postings. At least that is the impression that I'm getting from your writing. You've certainly not acknowledged the opposite.

I really believe that over-all your intentions are good, but this seems downright dishonest. Either that are you simply don't read what I write. I have told you *every single time* you brought up this charge that I don't think it will solve all cases.

Ok, we're getting no where with this. I apologized to the list for what this thread has turned into, and I'll try to do a better job staying out of this sort of pointless "nu uh", "uh hu!", "nu uh!" discussions in the future. (if I start it again just warn me! It's a bit of a weakness of mine).

Peter William Lount

10:04 p.m.

Hi,

Jason Johnson wrote:

...

On 10/25/07, Peter William Lount peter@smalltalk.org wrote:

...
What are they?

Here. Again. http://sixtyk.blogspot.com/2007/05/threading.html

Thanks for the link.

...

...
Sure, one only has to search the internet for "concurrency" and one sees a wide range of problems and potential solutions. Look at the Little Book of Semaphores for a breathtaking look at a few of the many possible solutions to various problems. Open your eyes to the wider horizon.

Open my eyes to the wider horizon of yesterday? I've seen it. It's complicated. I prefer to look at tomorrow.

Well you might create a simpler tomorrow or SOME PROBLEMS but not for many real world problems.

...

...
I never said you stated that explicitly - I'd have to check all your postings to find that out. It's implied by what you are saying in many of your postings. At least that is the impression that I'm getting from your writing. You've certainly not acknowledged the opposite.

I really believe that over-all your intentions are good, but this seems downright dishonest. Either that are you simply don't read what I write. I have told you *every single time* you brought up this charge that I don't think it will solve all cases.

Well I don't recall that, and it's hard enough keeping up with this thread and the other stuff going on. It just seems that you and some of the others are ignoring some of the more complex real problems with simplistic solutions. As Einstein said, simple but not simplistic. In terms of deep copy that means yes by all means a full deep copy is needed but to avoid being simplistic a partial deep copy with or without references is also required.

...

Ok, we're getting no where with this. I apologized to the list for what this thread has turned into, and I'll try to do a better job staying out of this sort of pointless "nu uh", "uh hu!", "nu uh!" discussions in the future. (if I start it again just warn me! It's a bit of a weakness of mine).

I think this has been a very good discussion. It's uncovered some interesting ideas that are out there. It's also shown that the wider Smalltalk group is getting ready - maybe - to accept some of the transaction processing notions that I've been supporting for over fifteen years now.

Keep up the good work Jason.

All the best,

peter

Sebastian Sastre

10:14 p.m.

...

Ok, we're getting no where with this. I apologized to the list for what this thread has turned into, and I'll try to do a better job staying out of this sort of pointless "nu uh", "uh hu!", "nu uh!" discussions in the future. (if I start it again just warn me! It's a bit of a weakness of mine).

You have nothing to apoligize about. Controversy is good when you need to agitate the mind a little. IMHO "fencing of arguments" (plz not persons) is healthy. Without that we will be frozen in comfort.

Please continue defending your thesis as long as they can survive. This is a discussion list. So it is by definition.

cheers,

Sebastian

Jason Johnson

9:19 p.m.

I don't apologize for my statements, just that I don't think we've gained any ground in at least a day and I'm part of that.

On 10/25/07, Sebastian Sastre ssastre@seaswork.com wrote:

...

...
Ok, we're getting no where with this. I apologized to the list for what this thread has turned into, and I'll try to do a better job staying out of this sort of pointless "nu uh", "uh hu!", "nu uh!" discussions in the future. (if I start it again just warn me! It's a bit of a weakness of mine).

You have nothing to apoligize about. Controversy is good when you need to agitate the mind a little. IMHO "fencing of arguments" (plz not persons) is healthy. Without that we will be frozen in comfort.

Please continue defending your thesis as long as they can survive. This is a discussion list. So it is by definition.
    cheers,
Sebastian

Peter William Lount

9:44 p.m.

Sebastian Sastre wrote:

...

...
Ok, we're getting no where with this. I apologized to the list for what this thread has turned into, and I'll try to do a better job staying out of this sort of pointless "nu uh", "uh hu!", "nu uh!" discussions in the future. (if I start it again just warn me! It's a bit of a weakness of mine).

You have nothing to apoligize about. Controversy is good when you need to agitate the mind a little. IMHO "fencing of arguments" (plz not persons) is healthy. Without that we will be frozen in comfort.

Please continue defending your thesis as long as they can survive. This is a discussion list. So it is by definition.

cheers,

Sebastian

Hi,

I concur with Sebastian.

All the best,

Peter

Peter William Lount

7:06 p.m.

Hi,

...

Matej Kosik wrote: Anyone followed links that Andreas gave?

http://www.erights.org

???

There is a dissertation that addresses two fundamental problems:

it introduces synchronization mechanisms that are meant to escape

from the situation where you

either have interference

or non-deterministic deadlock

(Erlang does not solve these problems. The recent book about Erlang does not even mention the word "deadlock")

*security*

It is a point-of-view-changing reading.

It would be helpful if people pointed to specific web pages or papers rather than simply pointing to ENTIRE web sites. Even better would be to quote the material (unless that's inappropriate for copyright reasons) in their posting. It does take a while to digest this stuff and there are lots of links and papers flying around. Thanks very much.

Is there a particularly cogent link on erights.org that we should look at?

Thanks again,

Peter

Matej Kosik

8:09 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Peter William Lount wrote:

...

Hi,

...
Matej Kosik wrote: Anyone followed links that Andreas gave?
http://www.erights.org
???

There is a dissertation that addresses two fundamental problems:

it introduces synchronization mechanisms that are meant to escape

from the situation where you

either have interference

or non-deterministic deadlock

(Erlang does not solve these problems. The recent book about Erlang does not even mention the word

"deadlock")

*security*

It is a point-of-view-changing reading.
It would be helpful if people pointed to specific web pages or papers rather than simply pointing to ENTIRE web sites. Even better would be to quote the material (unless that's inappropriate for copyright reasons) in their posting. It does take a while to digest this stuff and there are lots of links and papers flying around. Thanks very much.

Is there a particularly cogent link on erights.org that we should look at?

Thanks again,

Peter

Some initial examples: After some initial introduction: http://www.erights.org/elang/intro/index.html

These things might be understandable (concerning distributed programming, and event-loop concurrency) http://www.erights.org/elib/index.html

Many things (together with reasons why they are done as they are done) is sequentially described here: http://www.erights.org/talks/thesis/index.html

Some powerbox examples (in the E programming language as well as in other programming language) is here: http://altair.sk:60001/mediawiki/upload/f/f9/Powerbox-rants.article.pdf It is basic technique; not available in Erlang. Available in some (I do not say in every respects perfect) other languages.

I am sure, at e-lang http://www.eros-os.org/mailman/listinfo/e-lang you might get detailed answers for questions.

Best regards - -- Matej Kosik ICQ: 300133844 skype: matej_kosik

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHINvCL+CaXfJI/hgRArWxAKCY7w9w3ZJLPhy+GVugz+Xlmv5TZACgm2+Z XrAU8A9DdeXBN9DJ8XiLad8= =Pbdy -----END PGP SIGNATURE-----

Jason Johnson

8:15 p.m.

Thanks for the summaries. The language does sound interesting, I admit.

On 10/25/07, Matej Kosik kosik@fiit.stuba.sk wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Peter William Lount wrote:

...
Hi,

...
Matej Kosik wrote: Anyone followed links that Andreas gave?
http://www.erights.org
???

There is a dissertation that addresses two fundamental problems:

it introduces synchronization mechanisms that are meant to escape

from the situation where you

either have interference

or non-deterministic deadlock

(Erlang does not solve these problems. The recent book about Erlang does not even mention the word

"deadlock")

*security*

It is a point-of-view-changing reading.
It would be helpful if people pointed to specific web pages or papers rather than simply pointing to ENTIRE web sites. Even better would be to quote the material (unless that's inappropriate for copyright reasons) in their posting. It does take a while to digest this stuff and there are lots of links and papers flying around. Thanks very much.

Is there a particularly cogent link on erights.org that we should look at?

Thanks again,

Peter
Some initial examples: After some initial introduction: http://www.erights.org/elang/intro/index.html

These things might be understandable (concerning distributed programming, and event-loop concurrency) http://www.erights.org/elib/index.html

Many things (together with reasons why they are done as they are done) is sequentially described here: http://www.erights.org/talks/thesis/index.html

Some powerbox examples (in the E programming language as well as in other programming language) is here: http://altair.sk:60001/mediawiki/upload/f/f9/Powerbox-rants.article.pdf It is basic technique; not available in Erlang. Available in some (I do not say in every respects perfect) other languages.

I am sure, at e-lang http://www.eros-os.org/mailman/listinfo/e-lang you might get detailed answers for questions.

Best regards

Matej Kosik ICQ: 300133844 skype: matej_kosik -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHINvCL+CaXfJI/hgRArWxAKCY7w9w3ZJLPhy+GVugz+Xlmv5TZACgm2+Z XrAU8A9DdeXBN9DJ8XiLad8= =Pbdy -----END PGP SIGNATURE-----

6040

Age (days ago)

6050

Last active (days ago)

squeak-dev@lists.squeakfoundation.org

185 comments

37 participants

tags (0)

participants (37)

Andreas Raab
Bergel, Alexandre
Bert Freudenberg
bryce＠kampjes.demon.co.uk
David Mitchell
David T. Lewis
Davide Varvello
dpharris＠telus.net
Frank Shearar
Giovanni Corriga
gruntfuttuck
Hans-Martin Mosner
Herbert König
Igor Stasenko
Jason Johnson
Jecel Assumpcao Jr
John M McIntosh
Jon Hylands
Joshua Gargus
Klaus D. Witzel
Lukas Renggli
Marcel Weiher
Matej Kosik
Michael van der Gulik
nicolas cellier
Peter William Lount
Petr Fischer
Ralph Johnson
Rob Withers
Robert Withers
Ron Teitelbaum
Sebastian Sastre
Steve Wart
Steven Elkins
subbukk
tim Rowledge
Wolfgang Eder