Interesting. I really think that to make real progress in parallelization we will have to walk away from the Intel model of shared memory stacked on 3 levels of cache, snoopy busses to propagate writes and so on. Of course they will force that model to scale to a point, but it can't go on forever and there just has to be a simpler way.
On 10/26/07, Marcel Weiher marcel.weiher@gmail.com wrote:
On Oct 25, 2007, at 12:28 PM, Peter William Lount wrote:
The Tile-64 processor is expected to grow to about 4096 processors by pushing the limits of technology beyond what they are today. To reach the levels you are talking about for a current Smalltalk image with millions of objects each having their own thread (or process) isn't going to happen anytime soon.
I work with real hardware.
A couple of numbers:
- Montecito, the new dual-core Itanic has 1.72 billion transistors.
- The ARM6 macrocell has around 35000 transistors
- divide the two, and you will find that you could get more ARM6 cores
for the Montecito transistor budget than the ARM6 has transistors
So we can have a 35K object system with every processor having its own CPU core and all message-passing being asynchronous. This is likely to be highly inefficient, with most of the CPUs waiting/idle most of the time, say 99%. With 1% efficiency, and say, a 200MHz clock, the effective throughput would still be 200M * 35000 / 100 = 70 billion instructions per second. That's a lot of instructions. And wait what happens if we have some really parallel algorithm that cranks efficiency up to 10%!
I am not saying any of these numbers are valid or that this is a realistic system, but I do find the numbers of that little thought experiment...interesting. And of coures, while Moore's law appears to have stoppe for cycle times, it does seem to still be going for transistors per chip.
Marcel