Re: Multy-core CPUs

23 Oct 2007

      On 10/23/07, Peter William Lount peter@smalltalk.org wrote:
...
The principle is that anytime you have more than one thread or process
working on the same memory space, or object space, you WILL have concurrency
issues (unless your code is just running very simple concurrency). The point
is that in order to implement your
utopia-vision-of-simple-problem-free-concurrency
(utopia-concurrencia for lack of a better name) in Smalltalk you MUST
isolate the objects to ONLY ONE thread of possible alteration of their state
otherwise you end up with the possibility of many classes of concurrency
problems.
Yes, this is mostly true.  The insight with Erlang is that they don't
actually have to be in a different memory space, it just has to be
impossible at the language level for one process to get a reference to
an object of another process *and modify it*.
...
Shared memory problems exist even within one protected memory
space and not just between them. To isolate the objects involved in a
process you can have a separate object space which contains the objects that
will be operated on. This is the Erlang way, isn't it?
Kind of.  The Erlang approach works so well for them because variables
can't be changed.  Once you create a variable it is frozen in that
form.  Other process *can* look at it because no change can happen
from either thread.
Obviously more care will have to be taken in Smalltalk as the objects
can always be changed.
...
The thing about
Erlang, unless I'm mistaken (and if I am mistaken I'd expect to be
corrected), is that the objects in a process are only visible to that
process until the results are returned. The objects that pass in and out of
an Erlang process are only primitive data types and not complex objects.
The last sentence is incorrect.  The message can be any complexity,
including sending functions, file handles, whatever.
...
However for Smalltalk you'd need to pass in complex object graphs of
arbitrary size and connectedness to be general purpose. This then results in
a version problem.
Only if what you pass can be modified by either side.
...
For example, <snip>
Now
you've got a problem that the magical erlang message passing won't solve.
Problem: your example is using shared data and updating of variables.
In the message passing paradigm *there is no shared data*. Period.
None.  In Erlang specifically there isn't updating of variables even
within a process.  So this would be done in Erlang something like
this:
some_process(DataStructure) ->
  break_up_structure(DataStructure, 10000),
  get_new_structure({}, 10000).                           % return
result of get_new_structure
break_up_structure(_, 0) -> done;
% base case, no processes left
break_up_structure(DataStructure, Processes) ->              % otherwise
  RestOfDataStructure = split_and_send(DataStructure),     % cut off a
piece and send
  break_up_structure(RestOfDataStructure, Processes - 1). % tail call
with new values
get_new_structure(DataStructure, 0) -> DataStructure; % base case,
return what we built
get_new_structure(DataStructure, Processes) ->
  Data = receive,
          %psuedo code for brevity
  NewDataStructure = add_data_to_structure(Data, DataStructure),
  get_new_structure(NewDataStructure, Processes - 1).
The fact that variables are immutable is dealt with in the normal
functional programming way of using tail recursion and passing any
variables that need "updating" as arguments.
In case the above code isn't clear: The process breaks up the parts of
the data structure and farms them out to the different processes, then
waits for responses and incrementally assembles them into the new data
structure.
Now, the issue here is obviously:  This only makes sense when the
processing of the data that was carved out is more expensive then the
carving out and reattaching.  If the structure is very large that may
well not be the case.
In that case I'm not sure how I would handle it, but I look at it like
any other performance issue:  I would try algorithm changes before I
looked at going to a lower level.
...
Now someone mentioned Software Transactional Memory (STM) so briefly that
it would be easy to miss. Is that your solution?
No, if someone else wants to look at this it's ok.  I'm a bit
concerned about the book keeping.
...
If so you still have other
concurrency issues, object versioning issues, plus more to deal with. No
solution is a panacea for all problems unless you are an advocate of silver
bullet solutions.
There is no such thing, but just as a generational garbage collector
is "good enough" in all but the most special cases, I believe message
passing will be "good enough" as well.
...
The problem of editing a large graph of objects with many parallel threads
is the generalized case of a nasty and complex set of concurrency and
transactional issues. There are many ways to solve this. If you reply to
this example I would hope that you do so fully explaining how you'd handle
the concurrency and - importantly - the object consistency issues.
Transactional and concurrency issues arise because you are sharing
something.  If you give one entity alone access to that something and
all access must go through him these issues go away.  They are traded
for new issues, but issues that are much easier to reason about.
...
Yes, I understand that early tests indicate that Erlang can handle
approximately 100,000 or so processes at a time without hickups while Java
can handle about 8,000 or so before blowing up.
No where near 8,000.  At least not on any box I've ever seen (or do
you have a reference?).  The problem is Java's just too fat, on a
32-bit operating system you run out of memory well before 8k processes
or threads.
...
I don't know what the
various Smalltalks can handle, but I doubt it's as high as Erlang and is
more likely less than even Java - just a guess though. Maybe someone has
worked it out.
Actually Smalltalk is not so far from Erlang right now (theoretically.
 The question mark is the scheduling).  Erlang is optimized for this
so the size of each process might be half the size of a Smalltalk one
(but I'm not sure of even this), but it's *certainly* much higher then
any native process or thread solution can hope to achieve.
...
That's only because the current crop of operating systems were designed and
envisioned when a few hundred processes and threads was considered a lot.
Also because native operating system processes take a lot of resources.
It's because of the resources and how the OS deals with them.  Keep in
mind that a thread can call "detach" and become a running process, so
some care has to be taken that space will be available.  Of course
linux deals with this by not having real threads at all, just
processes that have the same memory map as other processes.
...
Yes, and how would the no sharing be implemented in Smalltalk?
This is what my investigations will reveal.  As I alluded to in a
previous mail, any immutable data is not a concurrency issue.  It
doesn't matter who can see it so long as it can't be updated.  Mutable
data (e.g. objects) are also no issue provided you can guarantee no
process can get access to it besides the process that created it.
So that leaves globals, especially classes.  Until I get into this I'm
now 100% how I'll deal with it, but I can't image that it's not
solvable.
...
How would you solve the concurrency one million node editing problem above
without locking in your utopian threading implementation?
As described above.
...
What would you do to Smalltalk to make it do this. So far you and the
others have been very short on specifics and have just argued that something
magical can be done to make concurrency happen without locks.
With current hardware/OSes, there will be locks, but in the VM where
they belong.  The only structure in Erlang that must be atomic is the
message "mailbox", it's the only place that should can be accessed at
the same time by multiple processes.
...
A few papers
and web sites have been linked to but no one has written down what they are
proposing or what they mean past it can be done.
Well, I'm a Smalltalker.  I form a vague idea and then go try to do
it.  I'll let you know what the specification is when I've implemented
it. :)  But I have researched into this as far as what exists today
and I haven't seen anything I feel is a show stopper, nor anything
that will require a change in Smalltalk semantics.  It's very possible
(even likely) that there's something I've overlooked, but I'll need to
get into it to find that out.
...
I'll grant you that you can see that it can be done. Please illuminate what
it is that you see can be done in detail and how you might do it. Thanks.
Is it clearer now?  I feel that I have detailed it out twice now (the
relevant details anyway).
...
However, you'll still end up with concurrency control issues and you've
got an object version explosion problem occurring as well. How will you
control concurrency problems with your simplified system? Is there a
succinct description of the way that Erlang does it? Would that apply to
Smalltalk?
Can you give an example of one of these issues, so I can explain how I
would deal with it?  Please note, there is *no data sharing, period*
in this paradigm.  At least at the language level.
...
Ok, so there would be 10,000 separate process-object-spaces with the one
million nodes being edited and new nodes being created in each of these
10,000 separate spaces. How do you expect to "merge" the results and solve
the edits that will inevitably cause "logical data inconsistency"
collisions?
By having just one process that owns the data (or lots of processes
that own their own piece of it) that all processes must talk to if
they wish to make changes.
...
You simplified concurrency system also dramatically alters the Smalltalk
paradigm.
The current paradigm is fine-grained locked/shared state.
So?
So obviously this part of the current paradigm will be altered, and I
say it needs to be.  Even if we find that certain parallel tasks need
the old shared state method, this shouldn't be provide anywhere most
people will find it.  The problem is that most people who know how to
do concurrency code only know this shared state model, so if you
present multiple options they will all use this, the familiar.
...
Why? Please provide more than anticidal or belief driven comments for this
point of view. What are the reasons? What is it that you'd be moving
towards?
Because of the reasons I've laid out several times in this thread:  1)
it does not scale, 2) it can not be composed, 3) it's incredibly
difficult to reason about, 4) it's a low level detail, 5) it ensures
encapsulation violation and on and on.
There are plenty of papers out there on this subject, if you are
looking for me to go through them all and condense it for you in a
summary more then I've already done then I'm afraid that's not going
to happen.  It's a pretty well known fact that shared-state fine
grained locking *can not scale*.
...
It's a huge mistake on their part in my humble view.
While it may be easy from the point of view of adapting their image it's a
huge mistake. I've had many people comment that that's one of the reasons
that Java is better than Smalltalk
If someone thinks that mess that is Java is better then Smalltalk, I
already question what useful information they can bring to the table.
Java has *some things* better then Smalltalk sure, but such a
statement is an "information smell" or a "taste smell" to say the
least.
...

it already works with multiple cpu

cores. Yes they have to solve the concurrency problems, but those are NO
WORSE than the concurrency problems that already exist within Smalltalk when
running with a single native process and multiple (green threads aka)
Smalltalk Processes. No different. Do you actually get that?
For someone who is so violently against personally attacks, you sure
hang over the fence, eh?  Just because you're not understanding where
I'm coming from doesn't mean these concepts are just beyond me and
only you get it.
...
If you don't
then you fail to appreciate that the approach that Cincom is taking isn't
going to solve the concurrency problems since - unless they correct me on
this - it seems that their direction is to simply have N-instances of their
image (in the same memory space or in separate operating system processes)
where N would frequently be the same as the number of cores on the computer
(or server) in question (although the instances could be more or less as
needed).
Which *does* solve it!  And conveniently walks right past all the
terrible issues that shared-state concurrency programming has.  Once
again, while Java people are trying to debug issues the Smalltalk guys
will already be adding features to the next release.
...
Each individual image would still have the problems of
multi-threading within it IF AND ONLY IF there are multiple threads forked.
Right, so don't do that. :)
...
This is of course a far cry from the radical concurrency system that is
being proposed by the erlangization concurrency proponents.
Actually not so much.  Erlang spanned actual CPU's by running more
images, just like Smalltalk.  So only the processes inside the image
are different, but even this can be done today with discipline.  I
would like to remove this need for discipline by making it
*impossible* to affect other processes, but so long as you make sure
you don't update anything that other processes can see you could do
this kind of message passing today.
...
There isn't any need for new syntax with the "!" character. Now sure you're
using it with a binary message selector "!" but why obfuscate it. I'd
recommend using a keyword selector for better clarity. Thanks.
This is just what Erlang uses.  I want inter-process sends to stand out clearly.
...
Not so. You'd have to transmit - in my example above - one million objects
to the various images and have them compute and return their resutls which
would then have to be combined in a manner that leaves the graph of objects
in a consistent state with one and a half million objects and 70% more
interconnections between them. It is this parallel updating of many parts of
the same data graph that will require the concurrency controls.
No.  You are describing shared data which doesn't exist.  No shared
data = no locking needed.
...
Nothing but you've got to address the concurrency problem that I've
mentioned above.
It wasn't a problem with message passing style, but for shared-state
concurrency programming.
...
Are you talking about forking a new operating system process with a copy of
the image?
I'm talking about:   [ "some code" ] fork
...
These are object database problems and attempting to split the processing
into multiple threads to avoid the "locking" issues does not solve the
problem. It just pushes it further away. While it might work for some
applications like telephone switching systems it can't generalize to ALL
types of problems which could benefit from concurrency solutions.
No it can't, and I don't believe I ever said it did.  But garbage
collection can't either and we do fine with that as our only option in
Squeak.  If we need more we step outside the normal bounds, as it
should be.
...
All Object Databases have a couple of rooted objects. Maybe many more than
a couple.
Object databases are a whole other can of worms.  I don't know how I
would deal with it, but I would start by looking at what Mnesia
(basically an object db for Erlang) does.
...
Yes, a variant of the Software Transactional Memory. However, you still
have the problems mentioned above.
No, Software transactional memory means we update several variables
inside an "automic" block and if the system notices something changed
while these changes were being made it rolls the block back to what it
was before.
I'm talking about a VM optimization to deal with metaclasses.
...
Having two spaces, old and new space, won't solve the problems mentioned
above when you have N processes (threads) running on M-objects in parallel
and need to combine the results of the parallel computations.
Old space and new space is purely for dealing with live code updates.
Nothing more.  I'm not trying to solve any object versioning issues,
because I haven't seen any real evidence they will exist.
...
Many problems have this "split processes off with their chunk of data" and
"recombine" the results. Many of these problems are simplified - if possible

so that the results can't collide with the issues presented above.

However, we are not talking about those special cases - such as parallel ray
tracing algorithms. We are talking about the completely generic cases that
occur in general purpose and every day use of code in Smalltalk applications
The only things I can think of that wouldn't work in this model is
problems where splitting up and rebuilding a dataset is more expensive
then the actual processing.  But I think can usually be solved by
design changes.
...

such as the massive Smalltalk business database front end applications

which are typical at many corporations today and which utilize many threads
to accomplish their parallel tasks in order to speed up the user experience.
A real world consequence of this is increased productivity of thousands of
users day in and day out at these corporations.
I'm not sure what you're saying here.  Apparently these Smalltalk
applications aren't doing real multithreading now right (since it's
only an option on a few ST implementations)?  So how is offering a
simple way to achieve concurrency going to make this worse?
...
Maybe your applications aren't a complex as these but I don't see the
benefits of an Erlang ONLY approach. I do see the benefit of STM and Erlang
approaches in some cases but why intentionally limit the tool box to just a
few cases? It makes no sense to ignore the harsh reality of concurrency
issues by picking a limited set of solutions.
For the reasons mentioned above.  Choice isn't the holy grail you seem
to think it is.  If it was we would all be on the C++ list talking.
Funny that we ended up in a language that 1) doesn't allow you to
allocate your own memory, 2) forces you to use single inheritance, 3)
forces you to use an image instead of files, etc.
I'm comfortable with the simplest thing being what works 90-99% the
time and having to work much harder if I need something more.