Hi,

I propose that any distributed object messaging system that is developed
for inter-image communication meet a wide range of criteria and
application needs before being considered as a part of the upcoming next
Smalltalk Standard. These criteria would need to be elucidated from the
literature and the needs of members of the Smalltalk community and their
clients.

2) It's been mentioned that it would be straightforward to have squeak
start up multiple copies of the image (or even multiple different
images) in one process (task) memory space with each image having it's
own native thread and keeping it's object table and memory separate
within the larger memory space. This sounds like a very nice approach.

I am not so sure. Squeak VM is a processor hog. Threads within VM will need 
processor for bytecode interpretation. So a VM process can only scale to a 
few threads before it starves for processor.

It's not the byte codes that cause a lot of cpu usage. It's how many processor instructions that are being executed that cause that. If you run lots of code than you can expect higher cpu usage. The more dense the packing of capability into the computer language library of objects the more processor instruction may be executed. To find out what Squeak is doing when it's chewing through while executing the ~12% cpu you mentioned elsewhere you'd have to trace the code. Then you'd see exactly what's going on. Tracing the code at two levels would be helpful, first at the Smalltalk level and then at the VM primitive byte code level. The byte codes may be fine while the image you've deployed might be doing many things that you really don't need for your particular application needs.

On the downside, coding errors 
could trash object memory across threads making testing and debugging 
difficult.

Yes. The point that I'm making is that even with so called simple concurrency models these errors can happen. Basically there is no such think as hassle free simple concurrency when it comes to computers!!! Simple concurrency is a myth and a lie. Don't fall for it.

Will the juice be worth the squeeze?

That depends on what you are using your computer for. If it's an application that benefits from massive parallelism then yes it is worth the squeeze. If you have a very serial sort of application, like a series of complex dependent computations then it might not be worth the squeeze at all.

If you have a complex business application that is highly threaded - running say ten to twenty Smalltalk processes - on a single native thread then it might be worth the squeeze if the users can work noticeably faster without incurring concurrency nightmares then yes it's worth the squeeze. Otherwise, no it's not worth is as users get very frustrated.

3) A single image running on N-cores with M-native threads (M may be
larger than N) is the full generalization of course.
This may be the best way to take advantage of paradigm shaking chips
such as the Tile64 processor from Tilera.

With single or few processors, we tend to "serialize" logic ourselves and 
create huge linear programs. When processors are aplenty, we are free to 
exploit inherent parallelism and create many small co-ordinating programs. So 
the N-cores are a problem only for small N (around 8).

Eh? Why only "small N (around 8)? Please illuminate further.

However, we may need to rethink the entire architecture of the Smalltalk
virtual machine notions since the Tile 64 chip has capabilities that
radically alter the paradigm. Messages between processor nodes take less
time to pass between nodes then the same amount of data takes to be
written into memory. Think about that. It offers a new paradigm
unavailable to other N-Core processors (at this current time).

True. Squeak's VM could virtualize display/sensors and spawn each project in 
its own background process bound to a specific processor. The high-speed, low 
latency paths are well-suited for UI events. Imagine running different 
projects on each face of a rotating hexecontahedron :-)

That would be cool.

The power of the Tile-64 processor from Tilera is that processors can form on the fly arbitrary "compute streams" where data is computed in one processor and passed along to another without ever touching RAM. Oh, WOW! This means for example the six typical stages of rendering could be implemented on six or six * N processors in the Tile-N (where N=36, 64, 128, 512, 1024 or 4096 or more processors). WOW! Now how would you have the Smalltalk system generate objects and messaging binary code from Smalltalk source code to model and program that? How? Let's do it! This requires a shift in paradigm. This requires a shift in your thinking. This requires a shift in my thinking. Think it through. What solutions can you come up with?

All the best,

Peter