Re: Concurrent Futures

30 Oct 2007


      I wish you had gotten involved in this thread earlier on.  I think you
explained everything better in this one message then I have in the
whole thread.  :)
On 10/30/07, Andreas Raab andreas.raab@gmx.de wrote:
...
Igor Stasenko wrote:
...
How would you define a boundaries of these entities in same image?
It is defined implicitly by the island in which a message executes. All
objects created by the execution of a message are part of the island the
computation occurs in.
To create an object in another island you need to artificially "move the
computation" there. That's why islands implement the #new: message, so
that you can create an object in another island by moving the
computation, for example:
space := island future new: TSpace.
This will create an instance of TSpace in the target island. Once we
have created the "root object" further messages that create objects will
be inside that island, too. For example, take this method:
TSpace>>makeNewCube
   "Create a new cube in this space"
   cube := TCube new.
   self addChild: cube.
   ^cube
and then:
cube := space future makeNewCube.
Both, cube and space will be in the same island.
...
Could you illustrate by some simple examples, or strategy which can be
used for using them for concurrent execution within single VM?
I'm confused about your use of the term "concurrent". Earlier you wrote
"There is a BIG difference between concurrency (parallel execution with
shared memory) and distributed computing." which seems to imply that you
discount all means of concurrency that do not use shared memory. If that
is really what you mean (which is clearly different from the usual
meaning of the term concurrent) then indeed, there is no way for it to
be "concurrent" because there simply is no shared mutable state between
islands.
...
I'm very interested in practical usage of futures myself.
What will you do, or how you would avoid the situation , when
sometimes a two different islands containing a reference to the same
object in VM will send direct messages to it, causing racing
condition?
The implementation of future message sending uses locks and mutexes. You
might say "aha! so it *is* using locks and mutexes" but just as with
automatic garbage collection (which uses pointers and pointer arithmetic
and explicit freeing) it is simply a means to implement the higher-level
semantics. And since no mutual/nested locks are required
deadlock-freeness can again be proven.
...
Yes. But this example is significant one. Sometimes i want these
messages run in parallel, sometimes i don't. Even for single 'island'.
In the island model, this is not an option. The unit of concurrency is
an island, period. If want to run computations in parallel that share
data you either make the data immutable (which can enable sharing in
some limited cases) or you copy the needed data to "worker islands".
Basic load balancing.
...
Then, for general solution we need these islands be a very small (a
smaller one is a single object) or contain big number of objects. The
question is, how to give control of their sizes to developer. How
developer can define a boundaries of island within single image?
By sending messages. See above.
...
I will not accept any solutions like 'multiple images' because this
drives us into distributed computing domain, which is _NOT_ concurrent
computing anymore, simple because its not using shared memory, and in
fact there is no sharing at all, only a glimpse of it.
Again, you have a strange definition of the term concurrency. It does
not (neither in general english nor in CS) require use of shared memory.
  There are two main classes of concurrent systems, namely those relying
on (mutable) shared memory and those relying on message passing
(sometimes utilizing immutable shared memory for optimization purposes
because it's indistinguishable from copying). Erlang and E (and Croquet
as long as you use it "correctly") all fall into the latter category.
...
...
This may be the outcome for an interim period. The good thing here is
that you can *prove* that your program is deadlock-free simply by not
using waits. And ain't that a nice property to have.
you mean waits like this (consider following two lines of code run in parallel):
[ a isUnlocked ] whileFalse: [ ]. b unlock.
and
[ b isUnlocked] whileFalse: []. a unlock.
Just like in your previous example, this code is meaningless in Croquet.
You are assuming that a and b can be sent synchronous messages to and
that they resolve while being in the busy-loop. As I have pointed out
earlier this simply doesn't happen. Think of it that way: Results are
itself communicated using future messages, e.g.,
Island>>invokeMessage: aMessage
   "Invoke the message and post the result back to the sender island"
   result := aMessage value. "compute result of the message"
   aMessage promise future value: result. "resolve associated promise"
so you cannot possibly wait for the response to a message you just
scheduled. It is simply not possible, neither actively nor passively.
...
And how could you guarantee, that any bit of code in current ST image
does not contain such hidden locks - like a loops or recursive loops
which will never return until some external entity will change the
state of some object(s)?
No more than I can or have to guarantee that any particular bit of the
Squeak library is free of infinite loops. All we need to guarantee is
that we don't introduce new dependencies, which thanks to future
messages and promises we can guarantee. So if the subsystem is deadlock
free before it will stay so in our usage of it. If it's not then, well,
broken code is broken code no matter how you look at it.
...
...
...
I pointed that futures as an 'automatic lock-free' approach is not
quite parallel to 'automatic memory management by GC'.
The similarity is striking. Both in terms of tradeoffs (trade low-level
control for better productivity) as well as the style of arguments made
against it ;-) Not that I mind by the way, I find these discussions
necessary.
The striking is, that introducing GC does good things - removing a
necessity to care about memory, which helps a lot in developing and
makes code more clear and smaller. But i can't see how futures does
same. There are still lot things to consider for developer even by
using futures.
The main advantages are increased robustness and productivity. We worry
a *lot* about deadlocks since some our usage of Croquet shows exactly
the kind of "mixed usage" that you pointed out. But never, not once,
have we had a deadlock or even have had to worry about it, in places
where we used event-loop concurrency consistently. (interesting aside:
Just today we had a very complex deadlock on one of our servers and my
knee-jerk reaction was to try to convert it to event-loop concurrency
because although we got stack traces we may not be able to completely
figure out how the system ended up in that deadlock :-)
We've gradually continued to move to event-loop concurrency more and
more in many areas of our code because the knowledge that this code will
be deadlock-free allows us to concentrate on solving the problem at hand
instead of figuring out the most unlikely occurences that can cause
deadlock - I suspect that I'll be faster rewriting the code from today
as event-loops than figuring out what caused and how to avoid that deadlock.
And that is in my understanding the most important part - how many hours
have you spent thinking about how exactly a highly concurrent system
could possibly deadlock? What if you could spend this time on improving
the system instead, knowing that deadlock *simply cannot happen*?
Cheers,

Andreas