Hi,
This link will be of interest to those wanting easy concurrency. Works best for simple cases and can be used as a primitive in building up more complex concurrency controls. However, use with caution, especially when larger amounts of code are being executed that could conflict with other threads is involved. This applies even in the single native thread multiple Smalltalk "light weight green" processes environment of most Smalltalks.
http://onsmalltalk.com/programming/smalltalk/smalltalk-concurrency-playing-w...
Cheers,
Peter
That's a nice example. The difference with E's promises is that a Promise doesn't block for value, it sends the new msg eventually and returns a second promise. The way I have this implemented in SqueakElib, is you send #eventual to get an eventual ref, then you can send a msg to the eventual ref, returning a promise, which you can then send a second msg to, returning a second promise.
| promise1 promise2 | promise1 := anObject eventual foo. promise2 := promise1 bar. promise1 whenResolved: [:val | Transcript show: ' foo: ', val printString]. promise2 whenResolved: [:val | Transcript show: ' bar: ', val printString]. Transcript show: 'sent foo bar...'
This prevents deadlocks. This is also very interesting when looked at in a distributed system, using Promise Pipelining. Let us say that anObject resides in Vat B on a different machine. Instead of having a 2 round trip communications situation, we have 1 round trip. So instead of 1) send #foo, 2) receive fooResult, 3) send bar, 4) receive barResult We have: 1) send #foo, 2) send #bar to fooResultPromise, 3) receive barResult. Where #foo and #bar are both sent to the objects in VatB. They are sent in the same packet.
Rob
----- Original Message ----- From: "Peter William Lount" peter@smalltalk.org To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Saturday, October 27, 2007 11:21 AM Subject: Concurrent Futures
Hi,
This link will be of interest to those wanting easy concurrency. Works best for simple cases and can be used as a primitive in building up more complex concurrency controls. However, use with caution, especially when larger amounts of code are being executed that could conflict with other threads is involved. This applies even in the single native thread multiple Smalltalk "light weight green" processes environment of most Smalltalks.
http://onsmalltalk.com/programming/smalltalk/smalltalk-concurrency-playing-w...
Cheers,
Peter
Hi Rob,
That sounds quite interesting - chains of promises until the results start flowing. It should be easy to adapt the http://onsmalltalk.com/programming/smalltalk/smalltalk-concurrency-playing-w... example implementation to do the same kind of deferral.
Of all the classes of possible deadlocks which do the E promises help resolve? (Please don't say all unless you back up such a general statement with a proof. Thanks).
All the best,
Peter
Rob Withers wrote:
That's a nice example. The difference with E's promises is that a Promise doesn't block for value, it sends the new msg eventually and returns a second promise. The way I have this implemented in SqueakElib, is you send #eventual to get an eventual ref, then you can send a msg to the eventual ref, returning a promise, which you can then send a second msg to, returning a second promise.
| promise1 promise2 | promise1 := anObject eventual foo. promise2 := promise1 bar. promise1 whenResolved: [:val | Transcript show: ' foo: ', val printString]. promise2 whenResolved: [:val | Transcript show: ' bar: ', val printString]. Transcript show: 'sent foo bar...'
This prevents deadlocks. This is also very interesting when looked at in a distributed system, using Promise Pipelining. Let us say that anObject resides in Vat B on a different machine. Instead of having a 2 round trip communications situation, we have 1 round trip. So instead of
- send #foo, 2) receive fooResult, 3) send bar, 4) receive barResult
We have:
- send #foo, 2) send #bar to fooResultPromise, 3) receive barResult.
Where #foo and #bar are both sent to the objects in VatB. They are sent in the same packet.
Rob
----- Original Message ----- From: "Peter William Lount" peter@smalltalk.org To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Saturday, October 27, 2007 11:21 AM Subject: Concurrent Futures
Hi,
This link will be of interest to those wanting easy concurrency. Works best for simple cases and can be used as a primitive in building up more complex concurrency controls. However, use with caution, especially when larger amounts of code are being executed that could conflict with other threads is involved. This applies even in the single native thread multiple Smalltalk "light weight green" processes environment of most Smalltalks.
http://onsmalltalk.com/programming/smalltalk/smalltalk-concurrency-playing-w...
Cheers,
Peter
----- Original Message ----- From: "Peter William Lount" peter@smalltalk.org
Hi Rob,
That sounds quite interesting - chains of promises until the results start flowing. It should be easy to adapt the http://onsmalltalk.com/programming/smalltalk/smalltalk-concurrency-playing-w... example implementation to do the same kind of deferral.
It is easy. The Promise just needs to hold onto pending msgs, sent to the result when the promise resolves. The hard part is getting the Squeak environment to behave in the presence of promises. To that end I added the protocol #immediate, which, you guessed it, waits on a Semaphore. I'll be removing that in Phase 3.
Of all the classes of possible deadlocks which do the E promises help resolve? (Please don't say all unless you back up such a general statement with a proof. Thanks).
What are all the classes of possible deadlocks? (Please provide simple Squeak examples demonstrating each class of deadlock). I'll simply refer you to http://www.erights.org/elib/concurrency/event-loop.html, see towards the bottom.
Cheers, Rob
All the best,
Peter
Rob Withers wrote:
That's a nice example. The difference with E's promises is that a Promise doesn't block for value, it sends the new msg eventually and returns a second promise. The way I have this implemented in SqueakElib, is you send #eventual to get an eventual ref, then you can send a msg to the eventual ref, returning a promise, which you can then send a second msg to, returning a second promise.
| promise1 promise2 | promise1 := anObject eventual foo. promise2 := promise1 bar. promise1 whenResolved: [:val | Transcript show: ' foo: ', val printString]. promise2 whenResolved: [:val | Transcript show: ' bar: ', val printString]. Transcript show: 'sent foo bar...'
This prevents deadlocks. This is also very interesting when looked at in a distributed system, using Promise Pipelining. Let us say that anObject resides in Vat B on a different machine. Instead of having a 2 round trip communications situation, we have 1 round trip. So instead of
- send #foo, 2) receive fooResult, 3) send bar, 4) receive barResult
We have:
- send #foo, 2) send #bar to fooResultPromise, 3) receive barResult.
Where #foo and #bar are both sent to the objects in VatB. They are sent in the same packet.
Rob
----- Original Message ----- From: "Peter William Lount" peter@smalltalk.org To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Saturday, October 27, 2007 11:21 AM Subject: Concurrent Futures
Hi,
This link will be of interest to those wanting easy concurrency. Works best for simple cases and can be used as a primitive in building up more complex concurrency controls. However, use with caution, especially when larger amounts of code are being executed that could conflict with other threads is involved. This applies even in the single native thread multiple Smalltalk "light weight green" processes environment of most Smalltalks.
http://onsmalltalk.com/programming/smalltalk/smalltalk-concurrency-playing-w...
Cheers,
Peter
What are all the classes of possible deadlocks?
The first one I thought of was the Dining Philosopher's problem. Google showed me ttp://www.erights.org/e/satan/index.html
This is an interesting article. It uses capabilities to ensure freedom from deadlock. If a philosopher has a fork for two long, the system will revoke his capability for the fork.
If E was automatically free from deadlock, why was this solution so complex?
-Ralph
Ralph Johnson wrote:
If E was automatically free from deadlock, why was this solution so complex?
Because it's trying to illustrate three three issues at once (security, capabilities, concurrency) and consequently fails to illustrate either one very well :-(
It is trivial to solve this problem via event-loop concurrency though. Consider a class Table with operations "grabLeftFork:", "grabRightFork:" (which take a seat index as argument and return the fork or nil). You can now implement the dining philosophers "Croquet style" as follows:
Philosopher>>eat "Start eating" | leftForkPromise rightForkPromise |
"Try to aquire the left fork first" leftForkPromise := table future grabLeftFork: seat. leftForkPromise whenResolved:[:leftFork|
"If we couldn't get the left fork, we need to decide what to do. Simply retry later but use a random amount of time to avoid live-lock" leftFork ifNil:[^(self future: self randomMSecs) eat].
rightForkPromise := table future grabRightFork: seat. rightForkPromise whenResolved:[:rightFork|
rightFork ifNil:[ "Same as before, but return the fork first" table future returnLeftFork: seat. ^(self future self randomMSecs) eat "still hungry" ] ifNotNil:[ "We got both forks, eat" state := #eating.
"And return the forks when time is up" ^(self future: self timeToEat) returnFork: leftFork and: rightFork ]. ]. ].
Note that there is no wait anywhere in the above - the only thing that exists are #future sends that schedule messages to be executed at some point later. Because of this, the philosopher never really "waits" for anything; he will happily keep thinking (or grumbling ;-) until he obtains the forks. Even *while* he is eating one could schedule an emergency request to "drop those forks" if that were necessary.
Cheers, - Andreas
On 28/10/2007, Andreas Raab andreas.raab@gmx.de wrote:
Ralph Johnson wrote:
If E was automatically free from deadlock, why was this solution so complex?
Because it's trying to illustrate three three issues at once (security, capabilities, concurrency) and consequently fails to illustrate either one very well :-(
It is trivial to solve this problem via event-loop concurrency though. Consider a class Table with operations "grabLeftFork:", "grabRightFork:" (which take a seat index as argument and return the fork or nil). You can now implement the dining philosophers "Croquet style" as follows:
Philosopher>>eat "Start eating" | leftForkPromise rightForkPromise |
"Try to aquire the left fork first" leftForkPromise := table future grabLeftFork: seat. leftForkPromise whenResolved:[:leftFork|
"If we couldn't get the left fork, we need to decide what to do. Simply retry later but use a random amount of time to avoid live-lock" leftFork ifNil:[^(self future: self randomMSecs) eat]. rightForkPromise := table future grabRightFork: seat. rightForkPromise whenResolved:[:rightFork| rightFork ifNil:[ "Same as before, but return the fork first" table future returnLeftFork: seat. ^(self future self randomMSecs) eat "still hungry" ] ifNotNil:[ "We got both forks, eat" state := #eating. "And return the forks when time is up" ^(self future: self timeToEat) returnFork: leftFork and: rightFork ]. ].
].
Note that there is no wait anywhere in the above - the only thing that exists are #future sends that schedule messages to be executed at some point later. Because of this, the philosopher never really "waits" for anything; he will happily keep thinking (or grumbling ;-) until he obtains the forks. Even *while* he is eating one could schedule an emergency request to "drop those forks" if that were necessary.
Cheers,
- Andreas
A small correction: the problem of dining philosophers are about how to share a limited resources (forks) among consumers (philosophers) effectively to prevent starvation of any of them. And your solution are based on 'nice' behavior of philosopher, which drops the first fork, when he's unable to obtain second one. Now imagine a greedy philosopher, who grabs the first fork and never hang it over until he obtain second one and done eating. This is a good illustration that you must not pass responsibility of resource management to consumers - you must manage them yourself (of course, if you want to prevent any kind of deadlocks/resource starvation).
A 'bad' behavior is much more probable , because most developers tend writing a code in simple imperative style, like: do this, after done it, do that. And understanding that they code can contain problems (deadlocks or whatever) comes in mind only after problem shows itself. And considering that deadlocks are very hard to track and reproduce they'll never know what may cause it :)
Igor Stasenko wrote:
A small correction: the problem of dining philosophers are about how to share a limited resources (forks) among consumers (philosophers) effectively to prevent starvation of any of them. And your solution are based on 'nice' behavior of philosopher, which drops the first fork, when he's unable to obtain second one.
Right. I was trying to keep things simple for the benefit of illustrating the concurrency aspects only. If you look at the full version in E it addresses that and probably half a dozen of problems that none of us has ever thought about (these guys really are amongst the smartest people that I've ever met).
A 'bad' behavior is much more probable , because most developers tend writing a code in simple imperative style, like: do this, after done it, do that. And understanding that they code can contain problems (deadlocks or whatever) comes in mind only after problem shows itself. And considering that deadlocks are very hard to track and reproduce they'll never know what may cause it :)
Indeed. However, this is where E really helps you. Everything you can express in E is by definition deadlock-free (in the classic sense) so you stop worrying about these things and focus more on the solution to the problem at hand. Not so different to manual memory management where before the advent of GC you always had that nagging feeling in the back of your head saying "and who's gonna know when and where to free that reference?". The point is that as GC issues become tractable by not causing a segfault, concurrency issues become tractable by not having your system come to a screaching halt every time something goes wrong.
Cheers, - Andreas
On 29/10/2007, Andreas Raab andreas.raab@gmx.de wrote:
Igor Stasenko wrote:
A small correction: the problem of dining philosophers are about how to share a limited resources (forks) among consumers (philosophers) effectively to prevent starvation of any of them. And your solution are based on 'nice' behavior of philosopher, which drops the first fork, when he's unable to obtain second one.
Right. I was trying to keep things simple for the benefit of illustrating the concurrency aspects only. If you look at the full version in E it addresses that and probably half a dozen of problems that none of us has ever thought about (these guys really are amongst the smartest people that I've ever met).
A 'bad' behavior is much more probable , because most developers tend writing a code in simple imperative style, like: do this, after done it, do that. And understanding that they code can contain problems (deadlocks or whatever) comes in mind only after problem shows itself. And considering that deadlocks are very hard to track and reproduce they'll never know what may cause it :)
Indeed. However, this is where E really helps you. Everything you can express in E is by definition deadlock-free (in the classic sense) so you stop worrying about these things and focus more on the solution to the problem at hand. Not so different to manual memory management where before the advent of GC you always had that nagging feeling in the back of your head saying "and who's gonna know when and where to free that reference?". The point is that as GC issues become tractable by not causing a segfault, concurrency issues become tractable by not having your system come to a screaching halt every time something goes wrong.
I'm not really sure, that GC example is good parallel for 'automatic deadlock-free'. What prevents me (or anyone) to write a deadlocks like following: [ self grabFork ] whileFalse: ["do nothing " ]
There is no such language (except from ones, which don't have loops/branches) which prevents from writing this. And the example above is 'busy waiting' which is even worser than waiting using semaphores, because it wasting CPU resources for evaluating same expression until some state will change by external entity. So, unless a developer writes a better code, we will have same issues. As for GC - you have automatic memory management instead of manual. But there's no automatic algorithm management and never will be , given any language :)
Cheers,
- Andreas
Igor Stasenko wrote:
Indeed. However, this is where E really helps you. Everything you can express in E is by definition deadlock-free (in the classic sense) so you stop worrying about these things and focus more on the solution to the problem at hand. Not so different to manual memory management where before the advent of GC you always had that nagging feeling in the back of your head saying "and who's gonna know when and where to free that reference?". The point is that as GC issues become tractable by not causing a segfault, concurrency issues become tractable by not having your system come to a screaching halt every time something goes wrong.
I'm not really sure, that GC example is good parallel for 'automatic deadlock-free'. What prevents me (or anyone) to write a deadlocks like following: [ self grabFork ] whileFalse: ["do nothing " ]
I know this can be hard to grok, but the above is sort of the equivalent of asking "how do I free an object in Smalltalk". It doesn't make sense in the way you are asking the question, because in the above you assume that #grabFork is a remote message returning a value. In E (and Croquet) remote message invocations *do not return values* they return promises (futures) which get resolved only *after* the current message is completed. In other words, writing a loop like this:
p := self future doSomething. [p isResolved] whileFalse.
will *never* get passed that line. Not once, not ever. So in order to do what you are trying to do you need to write it like here:
p := self future doSomething. p whenComplete:[:value| ...].
Which allows the message invoking the above to complete, and once the promise is resolved, the associated resolver block will be executed as a new message. BTW, the above *does* allow you to write a recursion similar to what you were trying to do in the above, e.g.,
getMyFork table future grabFork whenComplete:[:haveIt| haveIt ifFalse:[self getMyFork]. "retry" ].
but it is still deadlock-free since there is no wait involved (neither "classic" nor "busy-wait). In fact, we use recursions like the above in various places in Croquet.
There is no such language (except from ones, which don't have loops/branches) which prevents from writing this. And the example above is 'busy waiting' which is even worser than waiting using semaphores, because it wasting CPU resources for evaluating same expression until some state will change by external entity.
You can *write* it, but it makes about as much sense as writing "Object new free". Because the promise will not resolve, not ever, while you are "busy-waiting" (just as the object will not get freed when you send the free message or assign nil to a slot) so you'll very quickly abstain from that practice ;-)
So, unless a developer writes a better code, we will have same issues.
Absolutely not. See above. There is no deadlock in it.
As for GC - you have automatic memory management instead of manual. But there's no automatic algorithm management and never will be , given any language :)
And what's that supposed to mean?
Cheers, - Andreas
On 29/10/2007, Andreas Raab andreas.raab@gmx.de wrote:
Igor Stasenko wrote:
Indeed. However, this is where E really helps you. Everything you can express in E is by definition deadlock-free (in the classic sense) so you stop worrying about these things and focus more on the solution to the problem at hand. Not so different to manual memory management where before the advent of GC you always had that nagging feeling in the back of your head saying "and who's gonna know when and where to free that reference?". The point is that as GC issues become tractable by not causing a segfault, concurrency issues become tractable by not having your system come to a screaching halt every time something goes wrong.
I'm not really sure, that GC example is good parallel for 'automatic deadlock-free'. What prevents me (or anyone) to write a deadlocks like following: [ self grabFork ] whileFalse: ["do nothing " ]
I know this can be hard to grok, but the above is sort of the equivalent of asking "how do I free an object in Smalltalk". It doesn't make sense in the way you are asking the question, because in the above you assume that #grabFork is a remote message returning a value. In E (and Croquet) remote message invocations *do not return values* they return promises (futures) which get resolved only *after* the current message is completed. In other words, writing a loop like this:
p := self future doSomething. [p isResolved] whileFalse.
will *never* get passed that line. Not once, not ever. So in order to do what you are trying to do you need to write it like here:
p := self future doSomething. p whenComplete:[:value| ...].
Which allows the message invoking the above to complete, and once the promise is resolved, the associated resolver block will be executed as a new message. BTW, the above *does* allow you to write a recursion similar to what you were trying to do in the above, e.g.,
getMyFork table future grabFork whenComplete:[:haveIt| haveIt ifFalse:[self getMyFork]. "retry" ].
but it is still deadlock-free since there is no wait involved (neither "classic" nor "busy-wait). In fact, we use recursions like the above in various places in Croquet.
See, unless you make all message sends in language as futures, you can't guarantee that some code will not end up with locking semantics. This is exactly what GC doing - it revoking all manual memory management control from developer. But can we do the same with all message sends?
Lets see , what is happen if we have only a future sends. Then, given code: a print. b print.
will not guarantee that a will be printed before b. Now you must ensure that you preserve imperative semantics, which may be done like following: futureA := a print. futureA whenComplete: [ b print ].
Yes, we can make 'futureA whenComplete:' check implicitly (by modifying VM), then we can preserve old code. But do we really need a futures everywhere?
As i personally see, an imperative code writing style is something, that stands on the opposite side from future message sends. If we using first, then its difficult to effectively support second and vise versa.
Or we give up with an imperative style and use something different which fits better with futures, or we give up with futures. Or, we using both of them by mixing.. (which i think is most appropriate).. But then, stating that such system can be really lock-free, is wrong, because it depends on decision of concrete developer and his code.
There is no such language (except from ones, which don't have loops/branches) which prevents from writing this. And the example above is 'busy waiting' which is even worser than waiting using semaphores, because it wasting CPU resources for evaluating same expression until some state will change by external entity.
You can *write* it, but it makes about as much sense as writing "Object new free". Because the promise will not resolve, not ever, while you are "busy-waiting" (just as the object will not get freed when you send the free message or assign nil to a slot) so you'll very quickly abstain from that practice ;-)
So, unless a developer writes a better code, we will have same issues.
Absolutely not. See above. There is no deadlock in it.
As for GC - you have automatic memory management instead of manual. But there's no automatic algorithm management and never will be , given any language :)
And what's that supposed to mean?
I pointed that futures as an 'automatic lock-free' approach is not quite parallel to 'automatic memory management by GC'.
Cheers,
- Andreas
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com
Lets see , what is happen if we have only a future sends. Then, given code: a print. b print.
will not guarantee that a will be printed before b.
In SqueakElib, it will guarantee order if a and b are in the same Vat and already resolved (not Promises/Futures). If a is a promise and b is resolved, then they will print out of order. Likewise, if a is remote and b is local, they will print out of order. Msgs to objects in the same Vat are scheduled in order.
Yes, we can make 'futureA whenComplete:' check implicitly (by modifying VM), then we can preserve old code. But do we really need a futures everywhere?
This is what I am trying to do with SqueakElib. Any old object referenced in the system is an eventual local ref, but the system should handle promises or non-local refs anywhere.
Or, we using both of them by mixing.. (which i think is most appropriate).. But then, stating that such system can be really lock-free, is wrong, because it depends on decision of concrete developer and his code.
We may be able to rewrite Semaphores in terms of promises, and modify Processes to run on a particular vat as an event. Thus remove possible deadlocks from old code.
Rob
On 29/10/2007, Rob Withers reefedjib@yahoo.com wrote:
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com
Lets see , what is happen if we have only a future sends. Then, given code: a print. b print.
will not guarantee that a will be printed before b.
In SqueakElib, it will guarantee order if a and b are in the same Vat and already resolved (not Promises/Futures). If a is a promise and b is resolved, then they will print out of order. Likewise, if a is remote and b is local, they will print out of order. Msgs to objects in the same Vat are scheduled in order.
Yes, we can make 'futureA whenComplete:' check implicitly (by modifying VM), then we can preserve old code. But do we really need a futures everywhere?
This is what I am trying to do with SqueakElib. Any old object referenced in the system is an eventual local ref, but the system should handle promises or non-local refs anywhere.
Or, we using both of them by mixing.. (which i think is most appropriate).. But then, stating that such system can be really lock-free, is wrong, because it depends on decision of concrete developer and his code.
We may be able to rewrite Semaphores in terms of promises, and modify Processes to run on a particular vat as an event. Thus remove possible deadlocks from old code.
Yes we do, but what prevents others from implementing own locking semantics based on direct message sends (not futures)?
Rob
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Monday, October 29, 2007 2:16 PM Subject: Re: Concurrent Futures
On 29/10/2007, Rob Withers reefedjib@yahoo.com wrote:
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com
Lets see , what is happen if we have only a future sends. Then, given code: a print. b print.
will not guarantee that a will be printed before b.
In SqueakElib, it will guarantee order if a and b are in the same Vat and already resolved (not Promises/Futures). If a is a promise and b is resolved, then they will print out of order. Likewise, if a is remote and b is local, they will print out of order. Msgs to objects in the same Vat are scheduled in order.
Yes, we can make 'futureA whenComplete:' check implicitly (by modifying VM), then we can preserve old code. But do we really need a futures everywhere?
This is what I am trying to do with SqueakElib. Any old object referenced in the system is an eventual local ref, but the system should handle promises or non-local refs anywhere.
Or, we using both of them by mixing.. (which i think is most appropriate).. But then, stating that such system can be really lock-free, is wrong, because it depends on decision of concrete developer and his code.
We may be able to rewrite Semaphores in terms of promises, and modify Processes to run on a particular vat as an event. Thus remove possible deadlocks from old code.
Yes we do, but what prevents others from implementing own locking semantics based on direct message sends (not futures)?
Remove #primitiveWait from the VM.
Rob
On Oct 29, 2007, at 2:16 PM, Igor Stasenko wrote:
Yes we do, but what prevents others from implementing own locking semantics based on direct message sends (not futures)?
What prevents me from using FFI to allocate memory that isn't managed by the garbage collector? Nothing, of course. But if I create a memory leak in this way, it's silly to blame the garbage collector.
I think the analogy is clear, but I'll be explicit... if you circumvent a future-based concurrency mechanism by implementing locking mechanisms, then it is silly to blame the futures for the deadlock that you've created.
Cheers, Josh
On 10/29/07, Rob Withers reefedjib@yahoo.com wrote:
This is what I am trying to do with SqueakElib. Any old object referenced in the system is an eventual local ref, but the system should handle promises or non-local refs anywhere.
Are you going to make the "vats" Smalltalk processes? Are you going to isolate them from other Smalltalk processes in the same image? If so then I will probably have some synergy with that.
----- Original Message ----- From: "Jason Johnson" jason.johnson.081@gmail.com To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Tuesday, October 30, 2007 1:31 PM Subject: Re: Concurrent Futures
On 10/29/07, Rob Withers reefedjib@yahoo.com wrote:
This is what I am trying to do with SqueakElib. Any old object referenced in the system is an eventual local ref, but the system should handle promises or non-local refs anywhere.
Are you going to make the "vats" Smalltalk processes?
They already are.
Are you going to isolate them from other Smalltalk processes in the same image? If so then I will probably have some synergy with that.
I am trying to ensure that objects in one Vat don't get directly manipulated by objects in other Processes.
Rob
On 10/31/07, Rob Withers reefedjib@yahoo.com wrote:
I am trying to ensure that objects in one Vat don't get directly manipulated by objects in other Processes.
I don't see this part as too difficult, since you can't modify what you can't get a reference to. But the issue is mutable globals. The most obvious example here are class side variables: if multiple vats can see the same class and that class has class side variables that it mutates, then that's an issue.
On 31/10/2007, Jason Johnson jason.johnson.081@gmail.com wrote:
On 10/31/07, Rob Withers reefedjib@yahoo.com wrote:
I am trying to ensure that objects in one Vat don't get directly manipulated by objects in other Processes.
I don't see this part as too difficult, since you can't modify what you can't get a reference to. But the issue is mutable globals. The most obvious example here are class side variables: if multiple vats can see the same class and that class has class side variables that it mutates, then that's an issue.
This is not the only case. Its just a special case when two or more processing units having access to same mutable object. There are other problems related with reflective nature of smalltalk. A killer example is passing reference to context (thisContext) or stack to other vat/thread/processing units.
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Wednesday, October 31, 2007 6:30 AM Subject: Re: Concurrent Futures
On 31/10/2007, Jason Johnson jason.johnson.081@gmail.com wrote:
On 10/31/07, Rob Withers reefedjib@yahoo.com wrote:
I am trying to ensure that objects in one Vat don't get directly manipulated by objects in other Processes.
I don't see this part as too difficult, since you can't modify what you can't get a reference to. But the issue is mutable globals. The most obvious example here are class side variables: if multiple vats can see the same class and that class has class side variables that it mutates, then that's an issue.
This is not the only case. Its just a special case when two or more processing units having access to same mutable object. There are other problems related with reflective nature of smalltalk. A killer example is passing reference to context (thisContext) or stack to other vat/thread/processing units.
In both of these cases, the objects are owned by a particular Vat. Any messages to them from another Vat would be eventual. If they were passed to a different Vat (1->2) they would be VatRefs from 2 into 1. Anyway, this is the plan, it isn't working at this time.
A concern of mine is how well the development tools would work in a different Vat. So let's say we have the class browser open and we try to define a new class. The superclass is eventual, since it is owned by a different Vat. Where does the new class get created, in the current Vat or in the superclass' Vat? There are definitely issues here.
Igor Stasenko wrote:
but it is still deadlock-free since there is no wait involved (neither "classic" nor "busy-wait). In fact, we use recursions like the above in various places in Croquet.
See, unless you make all message sends in language as futures, you can't guarantee that some code will not end up with locking semantics.
Not "all messages sends". Only messages between concurrent entities (islands). This is the main difference to the all-out actors model (where each object is its own unit of concurrency) and has the advantage that you can reuse all of todays single-threaded code.
Lets see , what is happen if we have only a future sends. Then, given code: a print. b print.
will not guarantee that a will be printed before b.
Actually it will, if and only if a and b are in the same unit of concurrency (island). Your example is a bit misleading because of having different receivers - if those were in different islands then indeed there will be no guarantee that a prints before b. So for simplicity let's change this to:
Transcript future print: a. Transcript future print: b.
Do we need a whenResolved: block to serialize execution? No we don't because messages between two islands are executed in the order in which they were scheduled. Everything else would be a straight ticket to insanity ;-)
Yes, we can make 'futureA whenComplete:' check implicitly (by modifying VM), then we can preserve old code. But do we really need a futures everywhere?
No, we don't. See above.
Or we give up with an imperative style and use something different which fits better with futures, or we give up with futures.
The nice thing about futures is that they can be put on top of everything else. We use them in Croquet today.
Or, we using both of them by mixing.. (which i think is most appropriate).. But then, stating that such system can be really lock-free, is wrong, because it depends on decision of concrete developer and his code.
This may be the outcome for an interim period. The good thing here is that you can *prove* that your program is deadlock-free simply by not using waits. And ain't that a nice property to have.
As for GC - you have automatic memory management instead of manual. But there's no automatic algorithm management and never will be , given any language :)
And what's that supposed to mean?
I pointed that futures as an 'automatic lock-free' approach is not quite parallel to 'automatic memory management by GC'.
The similarity is striking. Both in terms of tradeoffs (trade low-level control for better productivity) as well as the style of arguments made against it ;-) Not that I mind by the way, I find these discussions necessary.
Cheers, - Andreas
Il giorno lun, 29/10/2007 alle 13.34 -0800, Andreas Raab ha scritto:
Not "all messages sends". Only messages between concurrent entities (islands). This is the main difference to the all-out actors model (where each object is its own unit of concurrency) and has the advantage that you can reuse all of todays single-threaded code.
Just out of curiosity, how much work do you think it would be necessary to port the Islands system to the standard Squeak image?
Giovanni
On 30/10/2007, Giovanni Corriga giovanni@corriga.net wrote:
Il giorno lun, 29/10/2007 alle 13.34 -0800, Andreas Raab ha scritto:
Not "all messages sends". Only messages between concurrent entities (islands). This is the main difference to the all-out actors model (where each object is its own unit of concurrency) and has the advantage that you can reuse all of todays single-threaded code.
Just out of curiosity, how much work do you think it would be necessary to port the Islands system to the standard Squeak image?
Giovanni
This is what i mean. There is a BIG difference between concurrency (parallel execution with shared memory) and distributed computing. An 'islands' is fitting good for distributed computing, but does they fit for concurrent parallel execution? I doubt.
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Monday, October 29, 2007 3:52 PM Subject: Re: Concurrent Futures
On 30/10/2007, Giovanni Corriga giovanni@corriga.net wrote:
Il giorno lun, 29/10/2007 alle 13.34 -0800, Andreas Raab ha scritto:
Not "all messages sends". Only messages between concurrent entities (islands). This is the main difference to the all-out actors model (where each object is its own unit of concurrency) and has the advantage that you can reuse all of todays single-threaded code.
Just out of curiosity, how much work do you think it would be necessary to port the Islands system to the standard Squeak image?
Giovanni
This is what i mean. There is a BIG difference between concurrency (parallel execution with shared memory) and distributed computing. An 'islands' is fitting good for distributed computing, but does they fit for concurrent parallel execution? I doubt.
Igor, where would you place concurrency with disjoint memories?
Would assigning objects to specific processes, within a shared memory, such that only those processes could mutate that object and the processes were non-interruptable and non-waitable, would that be sufficient to make it disjoint? Imagine that every object reference (header) was assigned to a specific Vat, and only processes within that Vat could interact with that object. All msg sends to that object, from other Vats, would be eventually scheduled with that object's Vat.
Rob
On 30/10/2007, Rob Withers reefedjib@yahoo.com wrote:
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Monday, October 29, 2007 3:52 PM Subject: Re: Concurrent Futures
On 30/10/2007, Giovanni Corriga giovanni@corriga.net wrote:
Il giorno lun, 29/10/2007 alle 13.34 -0800, Andreas Raab ha scritto:
Not "all messages sends". Only messages between concurrent entities (islands). This is the main difference to the all-out actors model (where each object is its own unit of concurrency) and has the advantage that you can reuse all of todays single-threaded code.
Just out of curiosity, how much work do you think it would be necessary to port the Islands system to the standard Squeak image?
Giovanni
This is what i mean. There is a BIG difference between concurrency (parallel execution with shared memory) and distributed computing. An 'islands' is fitting good for distributed computing, but does they fit for concurrent parallel execution? I doubt.
Igor, where would you place concurrency with disjoint memories?
Would assigning objects to specific processes, within a shared memory, such that only those processes could mutate that object and the processes were non-interruptable and non-waitable, would that be sufficient to make it disjoint? Imagine that every object reference (header) was assigned to a specific Vat, and only processes within that Vat could interact with that object. All msg sends to that object, from other Vats, would be eventually scheduled with that object's Vat.
Simply because its not scales well. Consider a 1000 and 1 Vats ( 1000 and 1 tales comes in mind :). A 1000 Vats sending a message to the same object, which is scheduled in single Vat. So, there will be a HUGE difference in time between first sender and last sender when they will receive an answer. Yes, for some messages its the only answer, but for other messages, like simply asking a state (like accessor), there's no need in synchronization, because there's no changes in state and no real need in scheduling.
Concept you give is really close to one , which i have in mind. Except the example above, which seems will be always a bottleneck.
Rob
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com
This is what i mean. There is a BIG difference between concurrency (parallel execution with shared memory) and distributed computing. An 'islands' is fitting good for distributed computing, but does they fit for concurrent parallel execution? I doubt.
Igor, where would you place concurrency with disjoint memories?
Would assigning objects to specific processes, within a shared memory, such that only those processes could mutate that object and the processes were non-interruptable and non-waitable, would that be sufficient to make it disjoint? Imagine that every object reference (header) was assigned to a specific Vat, and only processes within that Vat could interact with that object. All msg sends to that object, from other Vats, would be eventually scheduled with that object's Vat.
Simply because its not scales well. Consider a 1000 and 1 Vats ( 1000 and 1 tales comes in mind :). A 1000 Vats sending a message to the same object, which is scheduled in single Vat. So, there will be a HUGE difference in time between first sender and last sender when they will receive an answer. Yes, for some messages its the only answer, but for other messages, like simply asking a state (like accessor), there's no need in synchronization, because there's no changes in state and no real need in scheduling.
Concept you give is really close to one , which i have in mind. Except the example above, which seems will be always a bottleneck.
Ok, so we have coded a bottleneck situation. It's not a surprise, since we have a 1000 objects asking for it's services. It's time for a refactor and remove that bottleneck with some service duplication and load balancing. Using promises or futures doesn't remove bad code, just limits the kinds of bad code you can have. This would also be a problem in a distributed application that was sending 1000 service requests to a single object.
Regarding accessing some state for a read. If 2 Vats or Islands are on the same processor with shared memory between them and Vat A tries to access the state of an object owned by Vat B, as long as there is no relocation occurring, we should be able to optimize access to this state with a direct memory read operation. Of course, in Smalltalk we access state owned by an object by sending that object a read accessor msg, and I don't see how we could determine that a given msg sent was there just for a memory read and do the optimization. It wouldn't surpise me if Exupery might know such things.
I'd like to hear some description if the concept you have in mind, if you have the time. What are you thinking about in this arena?
-- Best regards, Igor Stasenko AKA sig.
Cheers, Rob
On 30/10/2007, Rob Withers reefedjib@yahoo.com wrote:
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com
This is what i mean. There is a BIG difference between concurrency (parallel execution with shared memory) and distributed computing. An 'islands' is fitting good for distributed computing, but does they fit for concurrent parallel execution? I doubt.
Igor, where would you place concurrency with disjoint memories?
Would assigning objects to specific processes, within a shared memory, such that only those processes could mutate that object and the processes were non-interruptable and non-waitable, would that be sufficient to make it disjoint? Imagine that every object reference (header) was assigned to a specific Vat, and only processes within that Vat could interact with that object. All msg sends to that object, from other Vats, would be eventually scheduled with that object's Vat.
Simply because its not scales well. Consider a 1000 and 1 Vats ( 1000 and 1 tales comes in mind :). A 1000 Vats sending a message to the same object, which is scheduled in single Vat. So, there will be a HUGE difference in time between first sender and last sender when they will receive an answer. Yes, for some messages its the only answer, but for other messages, like simply asking a state (like accessor), there's no need in synchronization, because there's no changes in state and no real need in scheduling.
Concept you give is really close to one , which i have in mind. Except the example above, which seems will be always a bottleneck.
Ok, so we have coded a bottleneck situation. It's not a surprise, since we have a 1000 objects asking for it's services. It's time for a refactor and remove that bottleneck with some service duplication and load balancing. Using promises or futures doesn't remove bad code, just limits the kinds of bad code you can have. This would also be a problem in a distributed application that was sending 1000 service requests to a single object.
Regarding accessing some state for a read. If 2 Vats or Islands are on the same processor with shared memory between them and Vat A tries to access the state of an object owned by Vat B, as long as there is no relocation occurring, we should be able to optimize access to this state with a direct memory read operation. Of course, in Smalltalk we access state owned by an object by sending that object a read accessor msg, and I don't see how we could determine that a given msg sent was there just for a memory read and do the optimization. It wouldn't surpise me if Exupery might know such things.
I'd like to hear some description if the concept you have in mind, if you have the time. What are you thinking about in this arena?
Well, in my concept i don't have a vats. A computing resources(native threads) managed by VM and not visible at language level. So, a VM parallelism is not a language parallelism. You can't predict what native thread will serve concrete message for concrete object, thus a load is distributed evenly. This haves own limitations, but mostly when you interacting with external libraries through primitives - some of the libraries can not work (or can work differently) when you accessing them from different native threads. Some of the libraries designed with no multithreading in mind, so using them concurrently may cause a system crash. But i don't think that i'm alone with this problem. We need to find some good ways how to control that all calls to some external library must be invoked only from a single thread of execution, while for other libraries its ok to call them from anywhere.
-- Best regards, Igor Stasenko AKA sig.
Cheers, Rob
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Monday, October 29, 2007 5:59 PM Subject: Re: Concurrent Futures
On 30/10/2007, Rob Withers reefedjib@yahoo.com wrote:
I'd like to hear some description if the concept you have in mind, if you have the time. What are you thinking about in this arena?
Well, in my concept i don't have a vats. A computing resources(native threads) managed by VM and not visible at language level. So, a VM parallelism is not a language parallelism. You can't predict what native thread will serve concrete message for concrete object, thus a load is distributed evenly. This haves own limitations, but mostly when you interacting with external libraries through primitives - some of the libraries can not work (or can work differently) when you accessing them from different native threads. Some of the libraries designed with no multithreading in mind, so using them concurrently may cause a system crash. But i don't think that i'm alone with this problem. We need to find some good ways how to control that all calls to some external library must be invoked only from a single thread of execution, while for other libraries its ok to call them from anywhere.
How do you protect against simultaneous accesses to memory for writes/reads?
I agree that thread-safing the primitives is a key task.
Cheers, Rob
On 30/10/2007, Rob Withers reefedjib@yahoo.com wrote:
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Monday, October 29, 2007 5:59 PM Subject: Re: Concurrent Futures
On 30/10/2007, Rob Withers reefedjib@yahoo.com wrote:
I'd like to hear some description if the concept you have in mind, if you have the time. What are you thinking about in this arena?
Well, in my concept i don't have a vats. A computing resources(native threads) managed by VM and not visible at language level. So, a VM parallelism is not a language parallelism. You can't predict what native thread will serve concrete message for concrete object, thus a load is distributed evenly. This haves own limitations, but mostly when you interacting with external libraries through primitives - some of the libraries can not work (or can work differently) when you accessing them from different native threads. Some of the libraries designed with no multithreading in mind, so using them concurrently may cause a system crash. But i don't think that i'm alone with this problem. We need to find some good ways how to control that all calls to some external library must be invoked only from a single thread of execution, while for other libraries its ok to call them from anywhere.
How do you protect against simultaneous accesses to memory for writes/reads?
I have described this previously. The idea is simple. Since we have limited set of write operations in object memory we could design a system which prevents a primitive operations (such as primat: put:) from running concurrently on the same object. This means (like in your concept), that each object should have an additional slot identifying a context where it used as receiver. The only write operations which could modify object memory within any method body is write to receiver memory (like setting ivar or indexed var). Other operations (like pushing on stack) is safe, because we'll have separate stack for each execution thread. The message evaluation sequence then can be done as following: - check receiver's assigned context - if it's nil, then assign it to current and start executing method - if its not nil, then schedule method execution to the thread, which owns given context.
when performing a message send in method we deactivate the context, which means that we can set assigned object's context to nil, and set it again when activate context again. Or if we not, then all message sends to given object will be scheduled to same native thread. I don't know what is best - both approaches having own pros and cos. Retaining a context can help us with making all calls to some external library be invoked from single thread.
Either way we should ensure that there's a single active context at some point of time in system running using same receiver. Also, we can leave most primitives unmodified and be careless about concurrency in these primitives, because by design, there can be only single primitive running in system for particular object at any point of time.
If you can see, such approach gives not much in solving concurrency issues at language side. But that's where i think developers should choose the way. We can provide all what they need (semaphores/promises/futures e.t.c.) in libraries and provide some additional primitives for controlling new aspects of VM.
I agree that thread-safing the primitives is a key task.
Cheers, Rob
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Monday, October 29, 2007 7:59 PM Subject: Re: Concurrent Futures
On 30/10/2007, Rob Withers reefedjib@yahoo.com wrote:
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Monday, October 29, 2007 5:59 PM Subject: Re: Concurrent Futures
On 30/10/2007, Rob Withers reefedjib@yahoo.com wrote:
I'd like to hear some description if the concept you have in mind, if you have the time. What are you thinking about in this arena?
Well, in my concept i don't have a vats. A computing resources(native threads) managed by VM and not visible at language level. So, a VM parallelism is not a language parallelism. You can't predict what native thread will serve concrete message for concrete object, thus a load is distributed evenly. This haves own limitations, but mostly when you interacting with external libraries through primitives - some of the libraries can not work (or can work differently) when you accessing them from different native threads. Some of the libraries designed with no multithreading in mind, so using them concurrently may cause a system crash. But i don't think that i'm alone with this problem. We need to find some good ways how to control that all calls to some external library must be invoked only from a single thread of execution, while for other libraries its ok to call them from anywhere.
How do you protect against simultaneous accesses to memory for writes/reads?
I have described this previously.
I must have missed it.
The idea is simple. Since we have limited set of write operations in object memory we could design a system which prevents a primitive operations (such as primat: put:) from running concurrently on the same object. This means (like in your concept), that each object should have an additional slot identifying a context where it used as receiver. The only write operations which could modify object memory within any method body is write to receiver memory (like setting ivar or indexed var). Other operations (like pushing on stack) is safe, because we'll have separate stack for each execution thread. The message evaluation sequence then can be done as following:
- check receiver's assigned context
- if it's nil, then assign it to current and start executing method
- if its not nil, then schedule method execution to the thread, which
owns given context.
It seems to me that you are building event-loops in the VM. Consider anObject that is sent a msg on thread 1, so he is assigned to thread 1 and you start processing it with thread 1. In the meantime, 3 msgs are sent to this object on threads 2, 3, and 4. Each message send is scheduled for processing by thread 1, when it becomes available. There is your event-loop.
The difference is that you allow objects to be reassigned based on thread availability. Since we have a shared memory, that is as simple as setting the new context. This is similar to ideas I had for redistributing objects to different Vats based upon msg send density, which could be monitored, or user specification.
when performing a message send in method we deactivate the context, which means that we can set assigned object's context to nil, and set it again when activate context again. Or if we not, then all message sends to given object will be scheduled to same native thread. I don't know what is best - both approaches having own pros and cos. Retaining a context can help us with making all calls to some external library be invoked from single thread.
Indeed. That would be a Vat.
Either way we should ensure that there's a single active context at some point of time in system running using same receiver. Also, we can leave most primitives unmodified and be careless about concurrency in these primitives, because by design, there can be only single primitive running in system for particular object at any point of time.
But the primitive may have internal state that can't be shared.
If you can see, such approach gives not much in solving concurrency issues at language side. But that's where i think developers should choose the way. We can provide all what they need (semaphores/promises/futures e.t.c.) in libraries and provide some additional primitives for controlling new aspects of VM.
I could certainly build the language features I am interested in on top of such a VM.
cheers, Rob
On 30/10/2007, Rob Withers reefedjib@yahoo.com wrote:
The idea is simple. Since we have limited set of write operations in object memory we could design a system which prevents a primitive operations (such as primat: put:) from running concurrently on the same object. This means (like in your concept), that each object should have an additional slot identifying a context where it used as receiver. The only write operations which could modify object memory within any method body is write to receiver memory (like setting ivar or indexed var). Other operations (like pushing on stack) is safe, because we'll have separate stack for each execution thread. The message evaluation sequence then can be done as following:
- check receiver's assigned context
- if it's nil, then assign it to current and start executing method
- if its not nil, then schedule method execution to the thread, which
owns given context.
It seems to me that you are building event-loops in the VM. Consider anObject that is sent a msg on thread 1, so he is assigned to thread 1 and you start processing it with thread 1. In the meantime, 3 msgs are sent to this object on threads 2, 3, and 4. Each message send is scheduled for processing by thread 1, when it becomes available. There is your event-loop.
Yes, exactly. I'm still unsure if it really needed for all objects in system. For instance, the true/false/nil object are just singletons without any internal state. Thus scheduling them into single thread of execution can be simply omitted.
The difference is that you allow objects to be reassigned based on thread availability. Since we have a shared memory, that is as simple as setting the new context. This is similar to ideas I had for redistributing objects to different Vats based upon msg send density, which could be monitored, or user specification.
The other thing which bothers me is a reflection. Should we expose a properties of native threads at language side (for using/showing them in debugger)? Or how well new contexts will deal with continuations, which is used in seaside.. Its hard to give a brief answer for such things.
when performing a message send in method we deactivate the context, which means that we can set assigned object's context to nil, and set it again when activate context again. Or if we not, then all message sends to given object will be scheduled to same native thread. I don't know what is best - both approaches having own pros and cos. Retaining a context can help us with making all calls to some external library be invoked from single thread.
Indeed. That would be a Vat.
Either way we should ensure that there's a single active context at some point of time in system running using same receiver. Also, we can leave most primitives unmodified and be careless about concurrency in these primitives, because by design, there can be only single primitive running in system for particular object at any point of time.
But the primitive may have internal state that can't be shared.
I know, but i'm talking about basic primitives which used most frequently and operating only with object memory. Such primitives usually don't have any internal state, or using an interpreter state.
All other primitives which have internal state should be reviewed. And at first stages we could add support for scheduling all these 'old' primitives into single 'main' thread. So they will work as in current VM not knowing that VM is actually multithreaded.
If you can see, such approach gives not much in solving concurrency issues at language side. But that's where i think developers should choose the way. We can provide all what they need (semaphores/promises/futures e.t.c.) in libraries and provide some additional primitives for controlling new aspects of VM.
I could certainly build the language features I am interested in on top of such a VM.
I'd like to hear more critics about such model :) If it proves to be viable and much or less easily doable (comparing to other models) then i could start working on it :)
Igor Stasenko a écrit :
I'd like to hear more critics about such model :) If it proves to be viable and much or less easily doable (comparing to other models) then i could start working on it :)
Hi Igor, I like your approach. Main problem I see is that a lot of methods in current image are not multi-process safe! Imagine one SortedCollection, one Process iterates on it, another add to it. Even now, with a single threaded image, you have to care! (see http://bugs.squeak.org/view.php?id=6030).
Exactly the OrderedCollection counter argument you served to Andreas! Except Andreas knows he has to carefully choose shared states, and also has explicit futures and promises he is using sparingly, and can identify easily.
Your users will need atomic operations. Thus you have to introduce another state attached to receiver object telling that you can read concurrently, but not put a write lock on it. (in fact, you have 3 states, read-locked <-> free <-> write-locked)
From pragmatic POV, prepare to put atomic blocks everywhere before having something usable! (maybe an <#atomic> pragma to be lighter). You cannot simply state "that's a programmer problem, I provide the framework". Bugs might occur from time to time, very hard to debug ones! And your framework won't help much.
Beside, your users will have to deal with deadlock cases too... We'd better start thinking of automatic deadlock detection...
Nicolas
On 30/10/2007, nicolas cellier ncellier@ifrance.com wrote:
Igor Stasenko a écrit :
I'd like to hear more critics about such model :) If it proves to be viable and much or less easily doable (comparing to other models) then i could start working on it :)
Hi Igor, I like your approach. Main problem I see is that a lot of methods in current image are not multi-process safe! Imagine one SortedCollection, one Process iterates on it, another add to it. Even now, with a single threaded image, you have to care! (see http://bugs.squeak.org/view.php?id=6030).
Exactly the OrderedCollection counter argument you served to Andreas! Except Andreas knows he has to carefully choose shared states, and also has explicit futures and promises he is using sparingly, and can identify easily.
Your users will need atomic operations. Thus you have to introduce another state attached to receiver object telling that you can read concurrently, but not put a write lock on it. (in fact, you have 3 states, read-locked <-> free <-> write-locked)
From pragmatic POV, prepare to put atomic blocks everywhere before having something usable! (maybe an <#atomic> pragma to be lighter). You cannot simply state "that's a programmer problem, I provide the framework". Bugs might occur from time to time, very hard to debug ones! And your framework won't help much.
Beside, your users will have to deal with deadlock cases too... We'd better start thinking of automatic deadlock detection...
Hi Nicolas. I stated previously that my approach not dealing with mutithreading problems at language side. There is no ways how to deal with them at low level (such as VM). And if you read in previous discussions, they proven that there can't be a single generic solution for all problems which raising when we go in parallel world. Some solutions work best for ones, but can be too ineffective for others.
That's why i proposed to not bind VM parallelism with language parallelism. We can't have a multiple ways in having concurrency in single VM implementation simply because of complexity of such system will be paramount. So we must choose a single solution (be it good or bad :) ). In same way as currently you can have multiple ST processes running in parallel, same you can do in future VM. The rest is on your own hands. You are free to use mutexes/futures or anything you like to deal with concurrency. A new VM simply should utilize CPU power better, so that if you have more and more cores from year to year, your code runs faster and faster. Of course this happens only, if you using algorithms which can divide your task into number of parallel sub-tasks.
Nicolas
On 10/30/07, Igor Stasenko siguctua@gmail.com wrote:
And if you read in previous discussions, they proven that there can't be a single generic solution for all problems which raising when we go in parallel world.
Uh, no such thing was *proven* in any of these discussions. It was strongly suggested, and I even personally conceded that it may be the case. But that is by no means *proof*.
On 10/30/07, Igor Stasenko siguctua@gmail.com wrote:
Simply because its not scales well. Consider a 1000 and 1 Vats ( 1000 and 1 tales comes in mind :). A 1000 Vats sending a message to the same object, which is scheduled in single Vat. So, there will be a HUGE difference in time between first sender and last sender when they will receive an answer.
Are you some how under the impression that a shared state solution would actually scale *better*? Think about that. In the E solution, those 1000 vats basically post an event to that same object in the single vat.... then they go on about their business.
Shared state on the other hand..... if there is *any* code *anywhere* in the image that can modify this object then *all* access to the object has to be synchronized [1]. This means that while the E code is chugging away doing all kinds of things, your synchronized, native thread code has 1000 processes all sitting in a queue waiting their turn to get the lock. Of course you could spawn a new thread for every blocking access so you don't have to wait, but then you'll just be where E already is with a much higher resource usage.
[1] The exception here would be if the object is so simple that you can *prove* that any writes to it are atomic. But you better put a huge flashing comment with it saying that if anyone adds *anything* they will have to add synchronization in a *lot* of places, and will possibly fundamentally alter the run time performance of the program.
On 29/10/2007, Andreas Raab andreas.raab@gmx.de wrote:
Igor Stasenko wrote:
but it is still deadlock-free since there is no wait involved (neither "classic" nor "busy-wait). In fact, we use recursions like the above in various places in Croquet.
See, unless you make all message sends in language as futures, you can't guarantee that some code will not end up with locking semantics.
Not "all messages sends". Only messages between concurrent entities (islands). This is the main difference to the all-out actors model (where each object is its own unit of concurrency) and has the advantage that you can reuse all of todays single-threaded code.
How would you define a boundaries of these entities in same image? Could you illustrate by some simple examples, or strategy which can be used for using them for concurrent execution within single VM? I'm very interested in practical usage of futures myself. What will you do, or how you would avoid the situation , when sometimes a two different islands containing a reference to the same object in VM will send direct messages to it, causing racing condition?
The difference between distributed computing and parallel (or concurrent) computing with shared memory is, that with shared memory, any change of state of any object are automatically propagated on entire image, while in distributed - not.
Lets see , what is happen if we have only a future sends. Then, given code: a print. b print.
will not guarantee that a will be printed before b.
Actually it will, if and only if a and b are in the same unit of concurrency (island). Your example is a bit misleading because of having different receivers - if those were in different islands then indeed there will be no guarantee that a prints before b. So for simplicity let's change this to:
Transcript future print: a. Transcript future print: b.
Do we need a whenResolved: block to serialize execution? No we don't because messages between two islands are executed in the order in which they were scheduled. Everything else would be a straight ticket to insanity ;-)
Yes. But this example is significant one. Sometimes i want these messages run in parallel, sometimes i don't. Even for single 'island'. Then, for general solution we need these islands be a very small (a smaller one is a single object) or contain big number of objects. The question is, how to give control of their sizes to developer. How developer can define a boundaries of island within single image? I will not accept any solutions like 'multiple images' because this drives us into distributed computing domain, which is _NOT_ concurrent computing anymore, simple because its not using shared memory, and in fact there is no sharing at all, only a glimpse of it.
Yes, we can make 'futureA whenComplete:' check implicitly (by modifying VM), then we can preserve old code. But do we really need a futures everywhere?
No, we don't. See above.
Or we give up with an imperative style and use something different which fits better with futures, or we give up with futures.
The nice thing about futures is that they can be put on top of everything else. We use them in Croquet today.
Or, we using both of them by mixing.. (which i think is most appropriate).. But then, stating that such system can be really lock-free, is wrong, because it depends on decision of concrete developer and his code.
This may be the outcome for an interim period. The good thing here is that you can *prove* that your program is deadlock-free simply by not using waits. And ain't that a nice property to have.
you mean waits like this (consider following two lines of code run in parallel):
[ a isUnlocked ] whileFalse: [ ]. b unlock.
and
[ b isUnlocked] whileFalse: []. a unlock.
You can remove waits, but can't remove a bad usage patterns from the brain of developer :) And how could you guarantee, that any bit of code in current ST image does not contain such hidden locks - like a loops or recursive loops which will never return until some external entity will change the state of some object(s)?
As for GC - you have automatic memory management instead of manual. But there's no automatic algorithm management and never will be , given any language :)
And what's that supposed to mean?
I pointed that futures as an 'automatic lock-free' approach is not quite parallel to 'automatic memory management by GC'.
The similarity is striking. Both in terms of tradeoffs (trade low-level control for better productivity) as well as the style of arguments made against it ;-) Not that I mind by the way, I find these discussions necessary.
The striking is, that introducing GC does good things - removing a necessity to care about memory, which helps a lot in developing and makes code more clear and smaller. But i can't see how futures does same. There are still lot things to consider for developer even by using futures. Of course, i dont have practice in using futures, and can't tell the difference, but to what i see, it's not that easy to use comparing to automatic memory management. Yes, its easy to pick a random piece of code and blindly type a word 'future'. But i don't think that every such piece will work as before ;)
Cheers,
- Andreas
Igor Stasenko wrote:
How would you define a boundaries of these entities in same image?
It is defined implicitly by the island in which a message executes. All objects created by the execution of a message are part of the island the computation occurs in.
To create an object in another island you need to artificially "move the computation" there. That's why islands implement the #new: message, so that you can create an object in another island by moving the computation, for example:
space := island future new: TSpace.
This will create an instance of TSpace in the target island. Once we have created the "root object" further messages that create objects will be inside that island, too. For example, take this method:
TSpace>>makeNewCube "Create a new cube in this space" cube := TCube new. self addChild: cube. ^cube
and then:
cube := space future makeNewCube.
Both, cube and space will be in the same island.
Could you illustrate by some simple examples, or strategy which can be used for using them for concurrent execution within single VM?
I'm confused about your use of the term "concurrent". Earlier you wrote "There is a BIG difference between concurrency (parallel execution with shared memory) and distributed computing." which seems to imply that you discount all means of concurrency that do not use shared memory. If that is really what you mean (which is clearly different from the usual meaning of the term concurrent) then indeed, there is no way for it to be "concurrent" because there simply is no shared mutable state between islands.
I'm very interested in practical usage of futures myself. What will you do, or how you would avoid the situation , when sometimes a two different islands containing a reference to the same object in VM will send direct messages to it, causing racing condition?
The implementation of future message sending uses locks and mutexes. You might say "aha! so it *is* using locks and mutexes" but just as with automatic garbage collection (which uses pointers and pointer arithmetic and explicit freeing) it is simply a means to implement the higher-level semantics. And since no mutual/nested locks are required deadlock-freeness can again be proven.
Yes. But this example is significant one. Sometimes i want these messages run in parallel, sometimes i don't. Even for single 'island'.
In the island model, this is not an option. The unit of concurrency is an island, period. If want to run computations in parallel that share data you either make the data immutable (which can enable sharing in some limited cases) or you copy the needed data to "worker islands". Basic load balancing.
Then, for general solution we need these islands be a very small (a smaller one is a single object) or contain big number of objects. The question is, how to give control of their sizes to developer. How developer can define a boundaries of island within single image?
By sending messages. See above.
I will not accept any solutions like 'multiple images' because this drives us into distributed computing domain, which is _NOT_ concurrent computing anymore, simple because its not using shared memory, and in fact there is no sharing at all, only a glimpse of it.
Again, you have a strange definition of the term concurrency. It does not (neither in general english nor in CS) require use of shared memory. There are two main classes of concurrent systems, namely those relying on (mutable) shared memory and those relying on message passing (sometimes utilizing immutable shared memory for optimization purposes because it's indistinguishable from copying). Erlang and E (and Croquet as long as you use it "correctly") all fall into the latter category.
This may be the outcome for an interim period. The good thing here is that you can *prove* that your program is deadlock-free simply by not using waits. And ain't that a nice property to have.
you mean waits like this (consider following two lines of code run in parallel):
[ a isUnlocked ] whileFalse: [ ]. b unlock.
and
[ b isUnlocked] whileFalse: []. a unlock.
Just like in your previous example, this code is meaningless in Croquet. You are assuming that a and b can be sent synchronous messages to and that they resolve while being in the busy-loop. As I have pointed out earlier this simply doesn't happen. Think of it that way: Results are itself communicated using future messages, e.g.,
Island>>invokeMessage: aMessage "Invoke the message and post the result back to the sender island" result := aMessage value. "compute result of the message" aMessage promise future value: result. "resolve associated promise"
so you cannot possibly wait for the response to a message you just scheduled. It is simply not possible, neither actively nor passively.
And how could you guarantee, that any bit of code in current ST image does not contain such hidden locks - like a loops or recursive loops which will never return until some external entity will change the state of some object(s)?
No more than I can or have to guarantee that any particular bit of the Squeak library is free of infinite loops. All we need to guarantee is that we don't introduce new dependencies, which thanks to future messages and promises we can guarantee. So if the subsystem is deadlock free before it will stay so in our usage of it. If it's not then, well, broken code is broken code no matter how you look at it.
I pointed that futures as an 'automatic lock-free' approach is not quite parallel to 'automatic memory management by GC'.
The similarity is striking. Both in terms of tradeoffs (trade low-level control for better productivity) as well as the style of arguments made against it ;-) Not that I mind by the way, I find these discussions necessary.
The striking is, that introducing GC does good things - removing a necessity to care about memory, which helps a lot in developing and makes code more clear and smaller. But i can't see how futures does same. There are still lot things to consider for developer even by using futures.
The main advantages are increased robustness and productivity. We worry a *lot* about deadlocks since some our usage of Croquet shows exactly the kind of "mixed usage" that you pointed out. But never, not once, have we had a deadlock or even have had to worry about it, in places where we used event-loop concurrency consistently. (interesting aside: Just today we had a very complex deadlock on one of our servers and my knee-jerk reaction was to try to convert it to event-loop concurrency because although we got stack traces we may not be able to completely figure out how the system ended up in that deadlock :-)
We've gradually continued to move to event-loop concurrency more and more in many areas of our code because the knowledge that this code will be deadlock-free allows us to concentrate on solving the problem at hand instead of figuring out the most unlikely occurences that can cause deadlock - I suspect that I'll be faster rewriting the code from today as event-loops than figuring out what caused and how to avoid that deadlock.
And that is in my understanding the most important part - how many hours have you spent thinking about how exactly a highly concurrent system could possibly deadlock? What if you could spend this time on improving the system instead, knowing that deadlock *simply cannot happen*?
Cheers, - Andreas
On 30/10/2007, Andreas Raab andreas.raab@gmx.de wrote:
Igor Stasenko wrote:
How would you define a boundaries of these entities in same image?
It is defined implicitly by the island in which a message executes. All objects created by the execution of a message are part of the island the computation occurs in.
To create an object in another island you need to artificially "move the computation" there. That's why islands implement the #new: message, so that you can create an object in another island by moving the computation, for example:
space := island future new: TSpace.
This will create an instance of TSpace in the target island. Once we have created the "root object" further messages that create objects will be inside that island, too. For example, take this method:
TSpace>>makeNewCube "Create a new cube in this space" cube := TCube new. self addChild: cube. ^cube
and then:
cube := space future makeNewCube.
Both, cube and space will be in the same island.
When i talking about boundaries, i talking about how you could prevent a particular object from slipping through these boundaries and then used in multiple islands by sending direct messages to it.
An islands provide an isolation layer for some computation. Its just a tool with which we can obtain a data results of some computation. Now suppose that you returned a result (OrderedCollection for instance) which is then used in other islands for further processing. But at same time, a reference to this object are kept in original island. Now there is a probability that we can send messages to mutable object from different islands, and there is no protection. Or it is? I talking about current implementation. Is it possible such situation to happen? And if not, what mechanisms you provide to prevent that?
Could you illustrate by some simple examples, or strategy which can be used for using them for concurrent execution within single VM?
I'm confused about your use of the term "concurrent". Earlier you wrote "There is a BIG difference between concurrency (parallel execution with shared memory) and distributed computing." which seems to imply that you discount all means of concurrency that do not use shared memory. If that is really what you mean (which is clearly different from the usual meaning of the term concurrent) then indeed, there is no way for it to be "concurrent" because there simply is no shared mutable state between islands.
Sorry if my definition was misleading or incorrect. I just see the difference between running two islands on different images and running them on same image in separate native threads. Even if we suppose the we don't have shared state (to some extent) between islands, we definitely will have shared state in same image (on VM side at least), and we should carefully deal with it.
I'm very interested in practical usage of futures myself. What will you do, or how you would avoid the situation , when sometimes a two different islands containing a reference to the same object in VM will send direct messages to it, causing racing condition?
The implementation of future message sending uses locks and mutexes. You might say "aha! so it *is* using locks and mutexes" but just as with automatic garbage collection (which uses pointers and pointer arithmetic and explicit freeing) it is simply a means to implement the higher-level semantics. And since no mutual/nested locks are required deadlock-freeness can again be proven.
No, i don't wanna say 'a-ha' :) The implementation details not really matter here because they can be improved later. What is more important is the proof of concept. For instance, we could add an 'atomic update' primitive which could be used for implementing a non-locking queues and/or lists.
Yes. But this example is significant one. Sometimes i want these messages run in parallel, sometimes i don't. Even for single 'island'.
In the island model, this is not an option. The unit of concurrency is an island, period. If want to run computations in parallel that share data you either make the data immutable (which can enable sharing in some limited cases) or you copy the needed data to "worker islands". Basic load balancing.
Got it.
Then, for general solution we need these islands be a very small (a smaller one is a single object) or contain big number of objects. The question is, how to give control of their sizes to developer. How developer can define a boundaries of island within single image?
By sending messages. See above.
I will not accept any solutions like 'multiple images' because this drives us into distributed computing domain, which is _NOT_ concurrent computing anymore, simple because its not using shared memory, and in fact there is no sharing at all, only a glimpse of it.
Again, you have a strange definition of the term concurrency. It does not (neither in general english nor in CS) require use of shared memory. There are two main classes of concurrent systems, namely those relying on (mutable) shared memory and those relying on message passing (sometimes utilizing immutable shared memory for optimization purposes because it's indistinguishable from copying). Erlang and E (and Croquet as long as you use it "correctly") all fall into the latter category.
Correctly. Exactly! But i'm interesting in making 'use incorrectly' impossible.
This may be the outcome for an interim period. The good thing here is that you can *prove* that your program is deadlock-free simply by not using waits. And ain't that a nice property to have.
you mean waits like this (consider following two lines of code run in parallel):
[ a isUnlocked ] whileFalse: [ ]. b unlock.
and
[ b isUnlocked] whileFalse: []. a unlock.
Just like in your previous example, this code is meaningless in Croquet. You are assuming that a and b can be sent synchronous messages to and that they resolve while being in the busy-loop. As I have pointed out earlier this simply doesn't happen. Think of it that way: Results are itself communicated using future messages, e.g.,
Island>>invokeMessage: aMessage "Invoke the message and post the result back to the sender island" result := aMessage value. "compute result of the message" aMessage promise future value: result. "resolve associated promise"
so you cannot possibly wait for the response to a message you just scheduled. It is simply not possible, neither actively nor passively.
And how could you guarantee, that any bit of code in current ST image does not contain such hidden locks - like a loops or recursive loops which will never return until some external entity will change the state of some object(s)?
No more than I can or have to guarantee that any particular bit of the Squeak library is free of infinite loops. All we need to guarantee is that we don't introduce new dependencies, which thanks to future messages and promises we can guarantee. So if the subsystem is deadlock free before it will stay so in our usage of it. If it's not then, well, broken code is broken code no matter how you look at it.
I pointed that futures as an 'automatic lock-free' approach is not quite parallel to 'automatic memory management by GC'.
The similarity is striking. Both in terms of tradeoffs (trade low-level control for better productivity) as well as the style of arguments made against it ;-) Not that I mind by the way, I find these discussions necessary.
The striking is, that introducing GC does good things - removing a necessity to care about memory, which helps a lot in developing and makes code more clear and smaller. But i can't see how futures does same. There are still lot things to consider for developer even by using futures.
The main advantages are increased robustness and productivity. We worry a *lot* about deadlocks since some our usage of Croquet shows exactly the kind of "mixed usage" that you pointed out. But never, not once, have we had a deadlock or even have had to worry about it, in places where we used event-loop concurrency consistently. (interesting aside: Just today we had a very complex deadlock on one of our servers and my knee-jerk reaction was to try to convert it to event-loop concurrency because although we got stack traces we may not be able to completely figure out how the system ended up in that deadlock :-)
We've gradually continued to move to event-loop concurrency more and more in many areas of our code because the knowledge that this code will be deadlock-free allows us to concentrate on solving the problem at hand instead of figuring out the most unlikely occurences that can cause deadlock - I suspect that I'll be faster rewriting the code from today as event-loops than figuring out what caused and how to avoid that deadlock.
And that is in my understanding the most important part - how many hours have you spent thinking about how exactly a highly concurrent system could possibly deadlock? What if you could spend this time on improving the system instead, knowing that deadlock *simply cannot happen*?
Cheers,
- Andreas
I ran into a problem when I tried to process the svg example files that come with the SVG specification from W3C
The following is valid xml and even a valid SVG (Scalable Vector Graphics) file.
<?xml version="1.0" standalone="no"?> <svg width="10cm" height="3cm" viewBox="0 0 1000 300" xmlns="http://www.w3.org/2000/svg">
<text x="200" y="150" fill="blue" font-family="Verdana" font-size="45"> Text with <tspan fill="red" >read</tspan> and <tspan fill="green">green</tspan> text spans </text> </svg>
(This code draws one line of blue text, the words 'red' and 'green' are displayed in red and in green. The tspan elements are used to encode text adornment. Try a newer release of Mozilla Firefox to render this svg file. Note also that some xml readers expect a DOCTYPE specification. A suitable DOCTYPE specification for svg is <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20010904//EN" "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">
)
The text element is a mixed contents element (in XML terminology, see section 3.2.2 of the XML reference from Oct 2000), it contains three #PCDATA items and two child elementrs (The tspan elements).
When we parse this piece of xml with XMLDOMParser, the text element is translated into an instance of XMLElement with the following values of its more important instance variables:
name = 'text' element an OrderedCollection with two XMLElements, one for each tspan item contents an OrderedCollection with the three instances of XMLStringNode for the strings 'Text with ', 'and', 'text spans'. attributes a Dictionary with 5 elements.
The problem here is that this XMLElement does not contain information about the sequence of the strings and the text spans. This is however a very important information when you want to render a svg text element. I really think that it is an error to put #PCDATA and child elements into separate collections.
For now, I changed XMLDOMParser>>characters from
characters: aString | newElement | newElement _ XMLStringNode string: aString. self top addContent: newElement.
to
characters: aString | newElement | newElement _ XMLStringNode string: aString. self top addElement: newElement.
With that change, I put all substructures into the 'elements' collection; the 'contents' collection becomes obsolete.
This solves my problem with decorated svg text but my change will certainly break a lot of other applications that use the XMLDOMParser.
My questions: * what is your experience with xml elements that have mixed contents? * what do you think should be done with svg text like the one of my example?
Any comments are welcome.
Greetings Boris
2007/10/30, Boris.Gaertner Boris.Gaertner@gmx.net:
I ran into a problem when I tried to process the svg example files that come with the SVG specification from W3C
The following is valid xml and even a valid SVG (Scalable Vector Graphics) file.
<?xml version="1.0" standalone="no"?>
<svg width="10cm" height="3cm" viewBox="0 0 1000 300" xmlns="http://www.w3.org/2000/svg">
<text x="200" y="150" fill="blue" font-family="Verdana" font-size="45"> Text with <tspan fill="red" >read</tspan> and <tspan fill="green">green</tspan> text spans </text>
</svg>
(This code draws one line of blue text, the words 'red' and 'green' are displayed in red and in green. The tspan elements are used to encode text adornment. Try a newer release of Mozilla Firefox to render this svg file. Note also that some xml readers expect a DOCTYPE specification. A suitable DOCTYPE specification for svg is
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20010904//EN" "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">
)
The text element is a mixed contents element (in XML terminology, see section 3.2.2 of the XML reference from Oct 2000), it contains three #PCDATA items and two child elementrs (The tspan elements).
When we parse this piece of xml with XMLDOMParser, the text element is translated into an instance of XMLElement with the following values of its more important instance variables:
name = 'text' element an OrderedCollection with two XMLElements, one for each tspan item contents an OrderedCollection with the three instances of XMLStringNode for the strings 'Text with ', 'and', 'text spans'. attributes a Dictionary with 5 elements.
The problem here is that this XMLElement does not contain information about the sequence of the strings and the text spans. This is however a very important information when you want to render a svg text element. I really think that it is an error to put #PCDATA and child elements into separate collections.
For now, I changed XMLDOMParser>>characters from
characters: aString | newElement | newElement _ XMLStringNode string: aString. self top addContent: newElement.
to
characters: aString | newElement | newElement _ XMLStringNode string: aString. self top addElement: newElement.
With that change, I put all substructures into the 'elements' collection; the 'contents' collection becomes obsolete.
This solves my problem with decorated svg text but my change will certainly break a lot of other applications that use the XMLDOMParser.
My questions:
- what is your experience with xml elements that have mixed contents?
They're broken Yaxo. The other Yaxo bug I know of is: http://bugs.squeak.org/view.php?id=3082
Cheers Philippe
- what do you think should be done with svg text like the one of my example?
Any comments are welcome.
Greetings Boris
Philippe Marschall wrote:
2007/10/30, Boris.Gaertner Boris.Gaertner@gmx.net:
Finally managed to look into these long known issues. Hopefully will have a new version soon :-)
Michael
Hi all,
there is a new version of Yaxo up at http://source.impara.de/infrastructure/XML-Parser-mir.10.mcz
You need the two attached 3.8.2/3.10 fixes for the new package to work.
Please test the new version. For now I only tried to verify against some examples I had readily available.
Once declared stable I'll officially release it on SqueakSource and also push the 3.8.2 changes and new release images.
Michael
----------
Fixed a number of issues (see below) and converted _ to :=.
There are two major changes in this version: whitespace handling and the unification of elements and contents. For backward compatibility elements and contents methods preserve their semantics. elementsAndContents and elementsAndContentsDo: access the new unified collection
Some of the fixes rely on fixes in 3.8.2 or 3.10, most prominently String class>>findFirstInString:inSet:startingAt:
http://bugs.squeak.org/view.php?id=32 http://bugs.squeak.org/view.php?id=33 http://bugs.squeak.org/view.php?id=34 http://bugs.squeak.org/view.php?id=547 http://bugs.squeak.org/view.php?id=888 http://bugs.squeak.org/view.php?id=928 http://bugs.squeak.org/view.php?id=3082 http://bugs.squeak.org/view.php?id=3083 http://bugs.squeak.org/view.php?id=6746
"Michael Rueger" michael@impara.de wrote:
there is a new version of Yaxo up at http://source.impara.de/infrastructure/XML-Parser-mir.10.mcz
Thank you very much for your prompt fix!
You need the two attached 3.8.2/3.10 fixes for the new package to work.
Please test the new version. For now I only tried to verify against some examples I had readily available.
Here are my first test results: 1. Squeak 3.9 #7067 does not contain the classes CharacterSetComplement and WideCharacterSet, therefore I could not immediately use your code with that version. To proceed, I dropped works with 3.9 for now and moved to 3.10beta #7158.
2. In Squeak 3.10beta #7158 I could try your code. The surprising result is, that a structured xml element with subelements can be parsed, but in the parse tree, all subelements are missing.
Problem analysis: 1. During parsing, a new element is inserted into its parent in one of the methods XMLDOMParser>>startElement:attributeList: XMLDOMParser>>startElement:namespaceURI:namespace:attributeList:
The relevant statement is. self top addElement: newElement
addElement: is defined in XMLNodeWithElements where we find:
addElement: element self elements add: element
Method #elements is an accessor with lazy initialization in XMLNodeWithElements, but in the subclass XMLElement it is now a method that uses a #select: to construct a new collection.
Conclusion: We do not add into the instance variable element, but in a collection that was constructed on the fly and that is subsequently dropped.
Proposal of a work around. As #elements does not answer an lazily initialized instance variable, we cannot use in XMLElement the #addElements: method that we inherit from XMLNodeWithElements.
Rather, I use these instance methods in XMLElement:
addContent: contentString super elements add: contentString
addElement: element super elements add: element
The first method is a modification of your method, the second one is added.
With this change, I can correctly build the parse tree for xml elements with subelements - also for xml elements that mix #PCDATA (that is, strings) with structured elements.
Comments to these proposals are of cource welcome. Perhaps you can find a better solution.
By the way: Do we have any tests for the XML parser? At this moment I am deeply involved in xml, that would be a good moment to write some tests. Any interest?
Greetings Boris
Boris.Gaertner wrote:
Here are my first test results:
- Squeak 3.9 #7067 does not contain the classes CharacterSetComplement and WideCharacterSet, therefore I could not immediately use your code with that version. To proceed, I dropped works with 3.9 for now and moved to 3.10beta #7158.
Yes, those are changes that came in in 3.10. And I just realized that I need to include them in the 3.8.2 fixes as well.
- In Squeak 3.10beta #7158 I could try your code. The surprising result is, that a structured xml element with subelements can be parsed, but in the parse tree, all subelements are missing.
Problem analysis:
Thanks for tracking this down. I will look at your proposed changes.
By the way: Do we have any tests for the XML parser? At this moment I am deeply involved in xml, that would be a good moment to write some tests. Any interest?
There is a huge (generated) test library that I need to dig out again and make available.
Michael
Michael Rueger wrote:
Boris.Gaertner wrote:
- In Squeak 3.10beta #7158 I could try your code. The surprising result is, that a structured xml element with subelements can be parsed, but in the parse tree, all subelements are missing. Problem analysis:
Thanks for tracking this down. I will look at your proposed changes.
Fixes are in http://source.impara.de/infrastructure/XML-Parser-mir.11.mcz
Remember that the fixed parser will only work with 3.8.2 (not released yet) and 3.10.
Michael
On Wed, 2007-11-14 at 17:38 +0100, Michael Rueger wrote:
Michael Rueger wrote:
Boris.Gaertner wrote:
- In Squeak 3.10beta #7158 I could try your code. The surprising result is, that a structured xml element with subelements can be parsed, but in the parse tree, all subelements are missing. Problem analysis:
Thanks for tracking this down. I will look at your proposed changes.
Fixes are in http://source.impara.de/infrastructure/XML-Parser-mir.11.mcz
Remember that the fixed parser will only work with 3.8.2 (not released yet) and 3.10.
Is there a short summary you can give why it won't run in 3.9?
thanks,
Norbert
Il giorno mer, 14/11/2007 alle 21.24 +0100, Michael Rueger ha scritto:
Norbert Hartl wrote:
Is there a short summary you can give why it won't run in 3.9?
it misses the character and string related fixes that are in 3.10 and 3.8.2.
Maybe it's time to think about Squeak 3.9.1?
Giovanni
Giovanni Corriga wrote:
Il giorno mer, 14/11/2007 alle 21.24 +0100, Michael Rueger ha scritto:
Norbert Hartl wrote:
Is there a short summary you can give why it won't run in 3.9?
it misses the character and string related fixes that are in 3.10 and 3.8.2.
Maybe it's time to think about Squeak 3.9.1?
Giovanni
Stephane and Marcus both left after the release so I think 3.9.1 is up for grab. Karl
Giovanni Corriga wrote:
Il giorno mer, 14/11/2007 alle 21.24 +0100, Michael Rueger ha scritto:
Norbert Hartl wrote:
Is there a short summary you can give why it won't run in 3.9?
it misses the character and string related fixes that are in 3.10 and 3.8.2.
Maybe it's time to think about Squeak 3.9.1?
Giovanni
I did think about it for a long while, there are/were scripts available with bugs to be harvested for this... but to be honest I conceded that 3.10 is as good a 3.9.1 as you are likely to get.
best regards
Keith
Keith Hodges wrote:
Giovanni Corriga wrote:
Maybe it's time to think about Squeak 3.9.1?
Giovanni
I did think about it for a long while, there are/were scripts available with bugs to be harvested for this... but to be honest I conceded that 3.10 is as good a 3.9.1 as you are likely to get.
That is not true for someone who uses 3.9 in production settings. For anyone who is using a particular version of Squeak in production the step to the next version needs to be very carefully evaluated. Providing a small set of fixes addressing the most important problems without fear "what else it may break" is a very valuable service. Unfortunately, few people in the Squeak community seem to understand that.
Cheers, - Andreas
On Nov 15, 2007, at 06:56 , Andreas Raab wrote:
Maybe it's time to think about Squeak 3.9.1?
Giovanni
I did think about it for a long while, there are/were scripts available with bugs to be harvested for this... but to be honest I conceded that 3.10 is as good a 3.9.1 as you are likely to get.
That is not true for someone who uses 3.9 in production settings. For anyone who is using a particular version of Squeak in production the step to the next version needs to be very carefully evaluated. Providing a small set of fixes addressing the most important problems without fear "what else it may break" is a very valuable service. Unfortunately, few people in the Squeak community seem to understand that.
Indeed.
At netstyle.ch we are using 3.9 for many projects now and we will not move to 3.10 soon. We're using some of the critical fixes, so basically that means we already have our own 3.9.1 image... I assume, quite some people are using 3.9 in production as well. So, if somebody is interested we could join forces and release a 3.9.x from time to time.
Adrian
Adrian Lienhard wrote:
At netstyle.ch we are using 3.9 for many projects now and we will not move to 3.10 soon. We're using some of the critical fixes, so basically that means we already have our own 3.9.1 image... I assume, quite some people are using 3.9 in production as well. So, if somebody is interested we could join forces and release a 3.9.x from time to time.
Which fixes are you using? It might make sense to also include them in the 3.8.2, if not already. The 3.8.2 updates should be working in 3.9 as well, but need to check. So we would have at least a starting point.
Michael
Hi Michael,
We are currently using the following 3 fixes:
0006581: Image freezes (background processes like Seaside make no progress) and Squeak hoggs CPU
This is the image freezing problem, where connecting with VNC and moving the mouse would bring the image alive again. When being stuck, Squeak hoggs the CPU, memory consumtion is stable.
http://bugs.squeak.org/view.php?id=6581 Note: fix takes only effect if Preferences enable: #serverMode
----- 0006588: Broken Semaphore>>critical: leads to frozen processes in Delay
VNC doesn't respond to UI events, 0% cpu usage, several processes frozen in Delay although our Seaside server still responds.
http://bugs.squeak.org/view.php?id=6588
----- 0006576: Delay is not thread-safe
Delay is not thread-safe since currently the calling process updates the internal Delay structure itself. If the process gets terminated while doing this (e.g., adding/removing from SuspendedDelays) the whole system is left in an inconsistent state and breaks.
http://bugs.squeak.org/view.php?id=6576
-----
In addition, I unload OmniBrowser (which is then added again with some other IDE enhancements during our automatic build process but only for development images and not for test server and production images). Also, I do some cleanup and change some preferences, and do some manual cleanup (e.g., remove workspaces, hide flaps etc.).
Adrian
On Nov 15, 2007, at 09:19 , Michael Rueger wrote:
Adrian Lienhard wrote:
At netstyle.ch we are using 3.9 for many projects now and we will not move to 3.10 soon. We're using some of the critical fixes, so basically that means we already have our own 3.9.1 image... I assume, quite some people are using 3.9 in production as well. So, if somebody is interested we could join forces and release a 3.9.x from time to time.
Which fixes are you using? It might make sense to also include them in the 3.8.2, if not already. The 3.8.2 updates should be working in 3.9 as well, but need to check. So we would have at least a starting point.
Michael
Hi,
we are using the latter two. Isn't first solved by the other two? Maybe I should incorporate the first one, too.
Norbert
0006581: Image freezes (background processes like Seaside make no progress) and Squeak hoggs CPU
0006588: Broken Semaphore>>critical: leads to frozen processes in Delay
0006576: Delay is not thread-safe
On Nov 15, 2007, at 10:39 , Norbert Hartl wrote:
Hi,
we are using the latter two. Isn't first solved by the other two?
No, it is a very different problem (but has similar symptoms). For details see my mail http://lists.squeakfoundation.org/pipermail/ squeak-dev/2007-July/119023.html
Maybe I should incorporate the first one, too.
I have seen this problem on only one server so far (but it was very annoying). The SqueakSource problems reported by Impara, however, seem like the very same problem IIRC.
Adrian
Norbert
0006581: Image freezes (background processes like Seaside make no progress) and Squeak hoggs CPU
0006588: Broken Semaphore>>critical: leads to frozen processes in Delay
0006576: Delay is not thread-safe
I could help if needed.
Maybe it's time to think about Squeak 3.9.1?
Giovanni
I did think about it for a long while, there are/were scripts available with bugs to be harvested for this... but to be honest I conceded that 3.10 is as good a 3.9.1 as you are likely to get.
That is not true for someone who uses 3.9 in production settings. For anyone who is using a particular version of Squeak in production the step to the next version needs to be very carefully evaluated. Providing a small set of fixes addressing the most important problems without fear "what else it may break" is a very valuable service. Unfortunately, few people in the Squeak community seem to understand that.
Indeed.
At netstyle.ch we are using 3.9 for many projects now and we will not move to 3.10 soon. We're using some of the critical fixes, so basically that means we already have our own 3.9.1 image... I assume, quite some people are using 3.9 in production as well. So, if somebody is interested we could join forces and release a 3.9.x from time to time.
I imagine a lot of reason not moving to 3.10 but can you let us know yours? :)
Stef
On Nov 15, 2007, at 10:32 , stephane ducasse wrote:
At netstyle.ch we are using 3.9 for many projects now and we will not move to 3.10 soon. We're using some of the critical fixes, so basically that means we already have our own 3.9.1 image... I assume, quite some people are using 3.9 in production as well. So, if somebody is interested we could join forces and release a 3.9.x from time to time.
I imagine a lot of reason not moving to 3.10 but can you let us know yours? :)
Well, in general, upgrading can be a major effort and you always have the risk that stuff will break in unexpected ways. Obviously, problems and the effort to fix them are potentially expensive and customers may loose their confidence.
In particular, I haven't looked in detail what happened in 3.10, but since 3.9 has proven to be a very good release (excluding the recently addressed production problems, which have been around for a long time), we will stick to for now. In summary, I trust in 3.9 more than in 3.10 and I don't know a reason that would justify the potential hassle to upgrade.
Adrian
Hi folks!
A few things for possible inclusion, I don't have time right now to check if some of these are already included but I think not.
Two of them are nice speedups (time and printing SmallIntegers), the other one is a simple addition of behavior (parsing dates) that I wrote as a reaction to a post on the beginners list a while back. Oh, and I also added a fix for SMTPClient - well, two rolled into one changeset actually - harvested from the Gjallar project.
I consider 6513 to be trivial to add (just added behavior - can't break stuff), 6512 seems trivial too (easy to verify), 4669 should be green with all Date/Time tests and has been recommended for inclusion earlier, but perhaps someone else did something in that area that has made it obsolete (Keith?), I don't recall. 6768 also seems easy to include, and it has been battle tested in Gjallar.
regards, Göran
----------------------------------------- http://bugs.squeak.org/view.php?id=6512
Much faster printString (and thus also asString) for SmallIntegers: One override of #printString for SmallInteger and a new method for counting digits in base 10. This makes #printString for SmallIntegers more than 4 times faster than before. ----------------------------------------- http://bugs.squeak.org/view.php?id=6513
Read a Date from the stream based on the pattern which can include the tokens:
y = A year with 1-n digits yy = A year with 2 digits yyyy = A year with 4 digits m = A month with 1-n digits mm = A month with 2 digits d = A day with 1-n digits dd = A day with 2 digits
...and any other Strings inbetween. Representing $y, $m and $d is done using \y, \m and \d and slash itself with \. Simple example patterns:
'yyyy-mm-dd' 'yyyymmdd' 'yy.mm.dd' 'y-m-d'
A year given using only two decimals is considered to be >2000. -------------------------------------------- http://bugs.squeak.org/view.php?id=4669
Ok, so we were trying to figure out why our logger in Gjallar spent so much time in creating DateAndTime etc... after looking, tweaking, profiling and even some thinking I came up with this "SpeedPack" of changes.
I might have broken some subtleties (I don't think I have though) but the tests are still green. Glad for any indepth review.
And? Well, the speed differences are quite huge. :) Lots of it comes from better Duration instance creation methods and from not going back and forth too much between nanos and seconds etc.
"DateAndTime now" is about 6 times faster and "Date today" 4 times. The full chronology test suite runs almost 2 times faster. See preamble for a do-it that shows this (run before and after installing cs).
So far only tested in 3.8. -------------------------------------------- http://bugs.squeak.org/view.php?id=6768
Added support for EHLO (sending SMTP using Exchange) and fixed a problem with authentication - #initiateSession needs to be sent before login occurs.
Cool, are these going to be harvested?
On Nov 15, 2007 12:16 PM, goran@krampe.se wrote:
Hi folks!
A few things for possible inclusion, I don't have time right now to check if some of these are already included but I think not.
Two of them are nice speedups (time and printing SmallIntegers), the other one is a simple addition of behavior (parsing dates) that I wrote as a reaction to a post on the beginners list a while back. Oh, and I also added a fix for SMTPClient - well, two rolled into one changeset actually - harvested from the Gjallar project.
I consider 6513 to be trivial to add (just added behavior - can't break stuff), 6512 seems trivial too (easy to verify), 4669 should be green with all Date/Time tests and has been recommended for inclusion earlier, but perhaps someone else did something in that area that has made it obsolete (Keith?), I don't recall. 6768 also seems easy to include, and it has been battle tested in Gjallar.
regards, Göran
http://bugs.squeak.org/view.php?id=6512
Much faster printString (and thus also asString) for SmallIntegers: One override of #printString for SmallInteger and a new method for counting digits in base 10. This makes #printString for SmallIntegers more than 4 times faster than before.
http://bugs.squeak.org/view.php?id=6513
Read a Date from the stream based on the pattern which can include
the tokens:
y = A year with 1-n digits yy = A year with 2 digits yyyy = A year with 4 digits m = A month with 1-n digits mm = A month with 2 digits d = A day with 1-n digits dd = A day with 2 digits ...and any other Strings inbetween. Representing $y, $m and $d is
done using \y, \m and \d and slash itself with \. Simple example patterns:
'yyyy-mm-dd' 'yyyymmdd' 'yy.mm.dd' 'y-m-d' A year given using only two decimals is considered to be >2000.
http://bugs.squeak.org/view.php?id=4669
Ok, so we were trying to figure out why our logger in Gjallar spent so much time in creating DateAndTime etc... after looking, tweaking, profiling and even some thinking I came up with this "SpeedPack" of changes.
I might have broken some subtleties (I don't think I have though) but the tests are still green. Glad for any indepth review.
And? Well, the speed differences are quite huge. :) Lots of it comes from better Duration instance creation methods and from not going back and forth too much between nanos and seconds etc.
"DateAndTime now" is about 6 times faster and "Date today" 4 times. The full chronology test suite runs almost 2 times faster. See preamble for a do-it that shows this (run before and after installing cs).
So far only tested in 3.8.
http://bugs.squeak.org/view.php?id=6768
Added support for EHLO (sending SMTP using Exchange) and fixed a problem with authentication - #initiateSession needs to be sent before login occurs.
Adrian Lienhard wrote:
On Nov 15, 2007, at 10:32 , stephane ducasse wrote:
At netstyle.ch we are using 3.9 for many projects now and we will not move to 3.10 soon. We're using some of the critical fixes, so basically that means we already have our own 3.9.1 image... I assume, quite some people are using 3.9 in production as well. So, if somebody is interested we could join forces and release a 3.9.x from time to time.
I imagine a lot of reason not moving to 3.10 but can you let us know yours? :)
Well, in general, upgrading can be a major effort and you always have the risk that stuff will break in unexpected ways. Obviously, problems and the effort to fix them are potentially expensive and customers may loose their confidence.
In particular, I haven't looked in detail what happened in 3.10, but since 3.9 has proven to be a very good release (excluding the recently addressed production problems, which have been around for a long time), we will stick to for now. In summary, I trust in 3.9 more than in 3.10 and I don't know a reason that would justify the potential hassle to upgrade.
Adrian
All you have to do is add your fixes to those listed on http://squeak310.pbwiki.com and run the script
cheers
Keith
All you have to do is add your fixes to those listed on http://squeak310.pbwiki.com and run the script
cheers
Keith
Just to add, that the contents of that site were placed over a year ago and everything was fairly speculative at best. So a lot of what you see there would need to be rethought and stripped down, in the light of more conservative goals.
best regards
Keith
stephane ducasse wrote:
At netstyle.ch we are using 3.9 for many projects now and we will not move to 3.10 soon. We're using some of the critical fixes, so basically that means we already have our own 3.9.1 image... I assume, quite some people are using 3.9 in production as well. So, if somebody is interested we could join forces and release a 3.9.x from time to time.
I imagine a lot of reason not moving to 3.10 but can you let us know yours? :)
It amazes me that you even need to ask the question. It's simply basic risk management. It would be crazy for anyone in a production setting to play the version jumping game. If you have a stable and working environment you use it until there is a real need to change. This of course holds for OLPC just as well as for Netstyle which is why I found your recent comments on the eToys list so completely inappropriate.
Cheers, - Andreas
I wish you had gotten involved in this thread earlier on. I think you explained everything better in this one message then I have in the whole thread. :)
On 10/30/07, Andreas Raab andreas.raab@gmx.de wrote:
Igor Stasenko wrote:
How would you define a boundaries of these entities in same image?
It is defined implicitly by the island in which a message executes. All objects created by the execution of a message are part of the island the computation occurs in.
To create an object in another island you need to artificially "move the computation" there. That's why islands implement the #new: message, so that you can create an object in another island by moving the computation, for example:
space := island future new: TSpace.
This will create an instance of TSpace in the target island. Once we have created the "root object" further messages that create objects will be inside that island, too. For example, take this method:
TSpace>>makeNewCube "Create a new cube in this space" cube := TCube new. self addChild: cube. ^cube
and then:
cube := space future makeNewCube.
Both, cube and space will be in the same island.
Could you illustrate by some simple examples, or strategy which can be used for using them for concurrent execution within single VM?
I'm confused about your use of the term "concurrent". Earlier you wrote "There is a BIG difference between concurrency (parallel execution with shared memory) and distributed computing." which seems to imply that you discount all means of concurrency that do not use shared memory. If that is really what you mean (which is clearly different from the usual meaning of the term concurrent) then indeed, there is no way for it to be "concurrent" because there simply is no shared mutable state between islands.
I'm very interested in practical usage of futures myself. What will you do, or how you would avoid the situation , when sometimes a two different islands containing a reference to the same object in VM will send direct messages to it, causing racing condition?
The implementation of future message sending uses locks and mutexes. You might say "aha! so it *is* using locks and mutexes" but just as with automatic garbage collection (which uses pointers and pointer arithmetic and explicit freeing) it is simply a means to implement the higher-level semantics. And since no mutual/nested locks are required deadlock-freeness can again be proven.
Yes. But this example is significant one. Sometimes i want these messages run in parallel, sometimes i don't. Even for single 'island'.
In the island model, this is not an option. The unit of concurrency is an island, period. If want to run computations in parallel that share data you either make the data immutable (which can enable sharing in some limited cases) or you copy the needed data to "worker islands". Basic load balancing.
Then, for general solution we need these islands be a very small (a smaller one is a single object) or contain big number of objects. The question is, how to give control of their sizes to developer. How developer can define a boundaries of island within single image?
By sending messages. See above.
I will not accept any solutions like 'multiple images' because this drives us into distributed computing domain, which is _NOT_ concurrent computing anymore, simple because its not using shared memory, and in fact there is no sharing at all, only a glimpse of it.
Again, you have a strange definition of the term concurrency. It does not (neither in general english nor in CS) require use of shared memory. There are two main classes of concurrent systems, namely those relying on (mutable) shared memory and those relying on message passing (sometimes utilizing immutable shared memory for optimization purposes because it's indistinguishable from copying). Erlang and E (and Croquet as long as you use it "correctly") all fall into the latter category.
This may be the outcome for an interim period. The good thing here is that you can *prove* that your program is deadlock-free simply by not using waits. And ain't that a nice property to have.
you mean waits like this (consider following two lines of code run in parallel):
[ a isUnlocked ] whileFalse: [ ]. b unlock.
and
[ b isUnlocked] whileFalse: []. a unlock.
Just like in your previous example, this code is meaningless in Croquet. You are assuming that a and b can be sent synchronous messages to and that they resolve while being in the busy-loop. As I have pointed out earlier this simply doesn't happen. Think of it that way: Results are itself communicated using future messages, e.g.,
Island>>invokeMessage: aMessage "Invoke the message and post the result back to the sender island" result := aMessage value. "compute result of the message" aMessage promise future value: result. "resolve associated promise"
so you cannot possibly wait for the response to a message you just scheduled. It is simply not possible, neither actively nor passively.
And how could you guarantee, that any bit of code in current ST image does not contain such hidden locks - like a loops or recursive loops which will never return until some external entity will change the state of some object(s)?
No more than I can or have to guarantee that any particular bit of the Squeak library is free of infinite loops. All we need to guarantee is that we don't introduce new dependencies, which thanks to future messages and promises we can guarantee. So if the subsystem is deadlock free before it will stay so in our usage of it. If it's not then, well, broken code is broken code no matter how you look at it.
I pointed that futures as an 'automatic lock-free' approach is not quite parallel to 'automatic memory management by GC'.
The similarity is striking. Both in terms of tradeoffs (trade low-level control for better productivity) as well as the style of arguments made against it ;-) Not that I mind by the way, I find these discussions necessary.
The striking is, that introducing GC does good things - removing a necessity to care about memory, which helps a lot in developing and makes code more clear and smaller. But i can't see how futures does same. There are still lot things to consider for developer even by using futures.
The main advantages are increased robustness and productivity. We worry a *lot* about deadlocks since some our usage of Croquet shows exactly the kind of "mixed usage" that you pointed out. But never, not once, have we had a deadlock or even have had to worry about it, in places where we used event-loop concurrency consistently. (interesting aside: Just today we had a very complex deadlock on one of our servers and my knee-jerk reaction was to try to convert it to event-loop concurrency because although we got stack traces we may not be able to completely figure out how the system ended up in that deadlock :-)
We've gradually continued to move to event-loop concurrency more and more in many areas of our code because the knowledge that this code will be deadlock-free allows us to concentrate on solving the problem at hand instead of figuring out the most unlikely occurences that can cause deadlock - I suspect that I'll be faster rewriting the code from today as event-loops than figuring out what caused and how to avoid that deadlock.
And that is in my understanding the most important part - how many hours have you spent thinking about how exactly a highly concurrent system could possibly deadlock? What if you could spend this time on improving the system instead, knowing that deadlock *simply cannot happen*?
Cheers,
- Andreas
I would like to mention some of my previous work in this area:
- tinySelf 1 (1996) http://www.lsi.usp.br/~jecel/tiny.html#rel1
This was a Self interepreter written in Self which implemented the one thread per object model. All messages were future messages but since sending a message to an unresolved future would block, you would have deadlock on any recursion (direct or indirect). This problem was solved by detecting the cycles and preempting the blocked mesasge with the one it depends on. This results in interleaved execution, but since the semantics are exactly the same as in a sequential execution of the recursive code any bugs that appear won't be due to concurrency.
I was able to test simple expressions and was very happy with how much parallelism I was able to extract from seemingly sequential code, but I made the mistake of introducing a significant optimization (tail send elimination) that made debugging so much harder that I was unable to finish in the two weeks that I was able to dedicate to this project.
- 64 node Smalltalk machine (1992) http://www.lsi.usp.br/~jecel/ms8702.html
The most interesting result in this project was the notion that most objects in the system are immutable at any given time and that a security system might be used to detect this. For example, just because you can edit some font today doesn't mean that you will do it. And if you and everyone currently logged on the local system only have read permission for that font then it is effectively immutable. Only when the font's owner logs in is this assumption invalid.
The advantage of knowing that an object is immutable is that you can replicate it and you can allow multiple threads to access it at the same time.
The only paper in English from this project describes how adaptive compilation could be used to trim away excessive concurrency by transforming future message passing into sequential message passing (the semantics allow this) and then inlining them away. So if a machine has 64 processors and the application initially starts out with 10 thousand threads, the compiler will eventually change this into code with 200 or so threads (some are blocked at any given instant, so going down to 64 threads would not be good).. http://www.lsi.usp.br/~jecel/jabs1.html
- operating system in an Objective-C like language (1988) http://www.lsi.usp.br/~jecel/atos.html (this page has download links but the text still hasn't been written)
This operating system for 286 machine used the virtual memory of that hardware to isolate groups of objects, with one thread per group. This would be similar to the vat/island model. All messages were sent in exactly the same way and if the receiver was a local object then it was just a fancy subroutine call but for remote objects you got a "segment not present" fault and the message was packed up and sent to the other task (possibly over the network). All messages were synchronous since I was not aware of futures at that time.
-- current model --
I moved back to the one thread per object group model since I feel that makes it easier for programmers to control things without having to worry to much about details most of the time. Since my target is children this is particularly important. An alternative that I experimented with was having a separation between active and passive objects. A passive object could be known only to a single active one, but it is just too hard to program without ever accidentally letting references to passive objects "leak". With the group/vat/island model there is just one kind of object and things are simpler for the programmer (but more complicated for the implementor). I have a limitation that you can only create new objects in your own group or in an entirely new group - I think forcing some other random group to create an object for you is rude, though of course you can always ask for an object there to please do it.
Some of the loaded groups are read/write but many are read-only. The latter don't actually have their own threads but instead their code executes in the thread of the calling group. I have hardware support for this.
Speaking of hardware, I would like to stress how fantastically slow (relatively speaking) main memory is these days. If I have a good network connecting processor cores in a single chip then I can probably send a message from one to another, get a reply, send a second message and get another reply in the time that it takes to read a byte from external RAM. So we should start thinking of DDR SDRAM as a really fast disk to swap objects to/from and not as a shared memory. We should start to take message passing seriously.
-- Jecel
On 10/30/07, Igor Stasenko siguctua@gmail.com wrote:
which is _NOT_ concurrent computing anymore, simple because its not using shared memory, and in fact there is no sharing at all, only a glimpse of it.
Huh? What does sharing have to do with concurrency? The one and only thing shared state has to do with concurrency is the desire to speed it up, i.e. a premature optimization. That's it.
On 30/10/2007, Jason Johnson jason.johnson.081@gmail.com wrote:
On 10/30/07, Igor Stasenko siguctua@gmail.com wrote:
which is _NOT_ concurrent computing anymore, simple because its not using shared memory, and in fact there is no sharing at all, only a glimpse of it.
Huh? What does sharing have to do with concurrency? The one and only thing shared state has to do with concurrency is the desire to speed it up, i.e. a premature optimization. That's it.
Look. A current multi-core architecture uses shared memory. So the logical way how we can utilize such architecture at maximum power is to build on top of it. Any other (such as share nothing) introducing too much noise on such architectures.
Igor Stasenko wrote:
Look. A current multi-core architecture uses shared memory. So the logical way how we can utilize such architecture at maximum power is to build on top of it.
That's like saying: "Look. A current multi-core architecture uses x86 instructions. So the logical way to utilize it is to write assembler programs". It's neither logical nor true.
Any other (such as share nothing) introducing too much noise on such architectures.
And if that statement were true, then Erlang shouldn't be able to benefit from multiple cores. In fact, not even operating systems should be able because running processes in their own address spaces is a "share nothing" approach.
More importantly, you are missing a major point in the comparison: Programmer effectiveness. The biggest cost factor for any software is programmer hours. If you can get 80% of the efficiency by investing half the resources you have a powerful advantage already. If you put this together with added robustness in a modern 24/7 software-as-a-service environment, you got a winner. In this sense it may be true that these solutions "introduce noise" (I'm not really sure what you mean when saying this but whatever) but they make it more than up in the amount of time you spend actually solving the problem at hand.
Cheers, - Andreas
On 31/10/2007, Andreas Raab andreas.raab@gmx.de wrote:
Igor Stasenko wrote:
Look. A current multi-core architecture uses shared memory. So the logical way how we can utilize such architecture at maximum power is to build on top of it.
That's like saying: "Look. A current multi-core architecture uses x86 instructions. So the logical way to utilize it is to write assembler programs". It's neither logical nor true.
Any other (such as share nothing) introducing too much noise on such architectures.
And if that statement were true, then Erlang shouldn't be able to benefit from multiple cores. In fact, not even operating systems should be able because running processes in their own address spaces is a "share nothing" approach.
I don't saying that it shouldn't be able to benefit. I'm just like to say, that this haves a considerable cost (a noise). And my point is to select such model, which will have minimum noise for given architecture. (what is noise read below).
More importantly, you are missing a major point in the comparison: Programmer effectiveness. The biggest cost factor for any software is programmer hours. If you can get 80% of the efficiency by investing half the resources you have a powerful advantage already. If you put this together with added robustness in a modern 24/7 software-as-a-service environment, you got a winner. In this sense it may be true that these solutions "introduce noise" (I'm not really sure what you mean when saying this but whatever) but they make it more than up in the amount of time you spend actually solving the problem at hand.
By saying noise i saying that to perform a simple two integers sum we need (for instance) write first integer in file A, write second in file B, then run OS script which reads them, computes the sum and writes result in file C. Then you read from file C, converting result from text and finally got a result. That's what i call the noise :)
Cheers,
- Andreas
On Oct 30, 2007, at 3:28 PM, Igor Stasenko wrote:
On 30/10/2007, Jason Johnson jason.johnson.081@gmail.com wrote:
On 10/30/07, Igor Stasenko siguctua@gmail.com wrote:
which is _NOT_ concurrent computing anymore, simple because its not using shared memory, and in fact there is no sharing at all, only a glimpse of it.
Huh? What does sharing have to do with concurrency? The one and only thing shared state has to do with concurrency is the desire to speed it up, i.e. a premature optimization. That's it.
Look. A current multi-core architecture uses shared memory. So the logical way how we can utilize such architecture at maximum power is to build on top of it. Any other (such as share nothing) introducing too much noise on such architectures.
It is unreasonable to assume that ad-hoc, fine-grained sharing of objects between processors will give you the fastest performance on the upcoming machines with 100s and 1000s of cores. What about memory locality and cache coherency? It is not cheap to juggle an object between processors now, and it will become more expensive as the number of cores increase.
In a different email in the thread, you made it clear that you consider distributed computing to be a fundamentally different beast from concurrency. Intel's chip designers don't see it this way. In fact, they explicitly formulate inter-core communication as a networking problem. For example, see http://www.intel.com/technology/ itj/2007/v11i3/1-integration/4-on-die.htm (I've better links in the past, but this is the best that I could quickly find now).
I think that your proposal is very "clever", elegant, and fun to think about. But I don't see what real problem it solves. It doesn't help the application programmer write correct programs (you delegate this responsibility to the language/libraries). It doesn't make code at maximum speed, since it doesn't handle memory locality. In short, it seems like too much work to do for such uncertain gains... I think that we can get farther by examining some of our assumptions before we start, and revising or goals accordingly.
Cheers, Josh
-- Best regards, Igor Stasenko AKA sig.
On 31/10/2007, Joshua Gargus schwa@fastmail.us wrote:
On Oct 30, 2007, at 3:28 PM, Igor Stasenko wrote:
On 30/10/2007, Jason Johnson jason.johnson.081@gmail.com wrote:
On 10/30/07, Igor Stasenko siguctua@gmail.com wrote:
which is _NOT_ concurrent computing anymore, simple because its not using shared memory, and in fact there is no sharing at all, only a glimpse of it.
Huh? What does sharing have to do with concurrency? The one and only thing shared state has to do with concurrency is the desire to speed it up, i.e. a premature optimization. That's it.
Look. A current multi-core architecture uses shared memory. So the logical way how we can utilize such architecture at maximum power is to build on top of it. Any other (such as share nothing) introducing too much noise on such architectures.
It is unreasonable to assume that ad-hoc, fine-grained sharing of objects between processors will give you the fastest performance on the upcoming machines with 100s and 1000s of cores. What about memory locality and cache coherency? It is not cheap to juggle an object between processors now, and it will become more expensive as the number of cores increase.
In a different email in the thread, you made it clear that you consider distributed computing to be a fundamentally different beast from concurrency. Intel's chip designers don't see it this way. In fact, they explicitly formulate inter-core communication as a networking problem. For example, see http://www.intel.com/technology/ itj/2007/v11i3/1-integration/4-on-die.htm (I've better links in the past, but this is the best that I could quickly find now).
Then i wonder, why they don't drop the idea of having shared memory at all? Each CPU then could have own memory, and they could interact by sending messages in network-style fashion. And we then would write a code which uses such architecture in best way. But while this is not true, should we assume that such code will work faster than code which 'knows' that there is a single shared memory for all CPUs and uses such knowledge in best way?
I think that your proposal is very "clever", elegant, and fun to think about.
Thanks :)
But I don't see what real problem it solves. It doesn't help the application programmer write correct programs (you delegate this responsibility to the language/libraries). It doesn't make code at maximum speed, since it doesn't handle memory locality. In short, it seems like too much work to do for such uncertain gains... I think that we can get farther by examining some of our assumptions before we start, and revising or goals accordingly.
I thought that goals was pretty clear. We have a single image. And we want to run multiple native threads upon it to utilize all cores of multi-core CPU's. What we currently have is a VM, which can't do that. So, i think, any other , even naively implemented, which can do, is better than nothing. If you have any ideas how such VM would look like i'm glad to hear.
On Oct 30, 2007, at 5:26 PM, Igor Stasenko wrote:
On 31/10/2007, Joshua Gargus schwa@fastmail.us wrote:
On Oct 30, 2007, at 3:28 PM, Igor Stasenko wrote:
On 30/10/2007, Jason Johnson jason.johnson.081@gmail.com wrote:
On 10/30/07, Igor Stasenko siguctua@gmail.com wrote:
which is _NOT_ concurrent computing anymore, simple because its not using shared memory, and in fact there is no sharing at all, only a glimpse of it.
Huh? What does sharing have to do with concurrency? The one and only thing shared state has to do with concurrency is the desire to speed it up, i.e. a premature optimization. That's it.
Look. A current multi-core architecture uses shared memory. So the logical way how we can utilize such architecture at maximum power is to build on top of it. Any other (such as share nothing) introducing too much noise on such architectures.
It is unreasonable to assume that ad-hoc, fine-grained sharing of objects between processors will give you the fastest performance on the upcoming machines with 100s and 1000s of cores. What about memory locality and cache coherency? It is not cheap to juggle an object between processors now, and it will become more expensive as the number of cores increase.
In a different email in the thread, you made it clear that you consider distributed computing to be a fundamentally different beast from concurrency. Intel's chip designers don't see it this way. In fact, they explicitly formulate inter-core communication as a networking problem. For example, see http://www.intel.com/ technology/ itj/2007/v11i3/1-integration/4-on-die.htm (I've better links in the past, but this is the best that I could quickly find now).
Then i wonder, why they don't drop the idea of having shared memory at all?
It's convenient for programmers. Aside from the huge complexity of programming everything this way, we might also have to program AMD chips differently from Intel ones (at least until a standard emerged).
Each CPU then could have own memory, and they could interact by sending messages in network-style fashion. And we then would write a code which uses such architecture in best way. But while this is not true, should we assume that such code will work faster than code which 'knows' that there is a single shared memory for all CPUs and uses such knowledge in best way?
Could you restate this? I don't understand what you mean.
I think that your proposal is very "clever", elegant, and fun to think about.
Thanks :)
You're welcome :-)
But I don't see what real problem it solves. It doesn't help the application programmer write correct programs (you delegate this responsibility to the language/libraries). It doesn't make code at maximum speed, since it doesn't handle memory locality. In short, it seems like too much work to do for such uncertain gains... I think that we can get farther by examining some of our assumptions before we start, and revising or goals accordingly.
I thought that goals was pretty clear. We have a single image. And we want to run multiple native threads upon it to utilize all cores of multi-core CPU's. What we currently have is a VM, which can't do that. So, i think, any other , even naively implemented, which can do, is better than nothing. If you have any ideas how such VM would look like i'm glad to hear.
Very briefly, because this is Andreas's idea (my original one was similar but worse), and I think I convinced him to write it up. My take on it is to rework the VM so that it can support multiple images in one process, each with its own thread. Give each image an event- loop to process messages from other images. Make it easy for an image to spawn another image. A small Spoon-like image could be spawned in very few milliseconds.
I like this approach because it is dead simple, and it keeps all objects that can talk directly to one another on the same processor. Because of its simplicity, it is a low-risk way to quickly get somewhere useful. Once we have some experience with it, we can decide whether we need finer-grained object access between threads (as I've said, I don't think that we will).
Cheers, Josh
-- Best regards, Igor Stasenko AKA sig.
On 31/10/2007, Joshua Gargus schwa@fastmail.us wrote:
Then i wonder, why they don't drop the idea of having shared memory at all?
It's convenient for programmers. Aside from the huge complexity of programming everything this way, we might also have to program AMD chips differently from Intel ones (at least until a standard emerged).
Each CPU then could have own memory, and they could interact by sending messages in network-style fashion. And we then would write a code which uses such architecture in best way. But while this is not true, should we assume that such code will work faster than code which 'knows' that there is a single shared memory for all CPUs and uses such knowledge in best way?
Could you restate this? I don't understand what you mean.
I don't know what to add to above. I just said that we should use approaches which is best fit for architecture where our project(s) will run on. Of course what is best fit is arguable. But i don't think we should drop a shared memory model support when we building a system on top of architecture which haves it.
I think that your proposal is very "clever", elegant, and fun to think about.
Thanks :)
You're welcome :-)
Very briefly, because this is Andreas's idea (my original one was similar but worse), and I think I convinced him to write it up. My take on it is to rework the VM so that it can support multiple images in one process, each with its own thread. Give each image an event- loop to process messages from other images. Make it easy for an image to spawn another image. A small Spoon-like image could be spawned in very few milliseconds.
I like this approach because it is dead simple, and it keeps all objects that can talk directly to one another on the same processor. Because of its simplicity, it is a low-risk way to quickly get somewhere useful. Once we have some experience with it, we can decide whether we need finer-grained object access between threads (as I've said, I don't think that we will).
Ah, yes, this was mentioned before. And i like that idea in general because of its simplicity but don't like a memory overhead and some other issues, like: - persistence - generic usage
Lets imagine that i run a squeak , started couple of servers, started tetris game and playing it. Then at some moment i feel the urgent need of shutting show my PC to insert additional 64 cores to my CPU to be able run two tetrises instead of one. ;) What is interesting now, is how i can have a dead-simple 'save image(s)' one-click action so, that after i restart system i will able to continue running same set of images from the same point as we have in current single-image squeak? If we store each separate image to separate file(s), then soon there will be a thousands of them polluting same directory. And i will lost, what do i really need to move my project on different PC. Or, maybe we should merge multiple images into same file? This is much better. But how about .changes file?
About generic usage. A small images like an ants is good when you deal with small sized tasks and job of concrete ant is simple and monotonic. But how about real-world applications such as CAD systems, or business applications which could have a couple hundred megabytes of image size? Spawning multiple images of such size is a waste of resources. Ok, then lets suppose we can have a single 'queen' image and smaller 'ant' images. But now we need a sophisticated framework for coordinating moves. And also it is clear that most work load still will be in queen image(because most of objects located there) which means that its not scales well.
On Oct 30, 2007, at 7:56 PM, Igor Stasenko wrote:
On 31/10/2007, Joshua Gargus schwa@fastmail.us wrote:
Then i wonder, why they don't drop the idea of having shared memory at all?
It's convenient for programmers. Aside from the huge complexity of programming everything this way, we might also have to program AMD chips differently from Intel ones (at least until a standard emerged).
Each CPU then could have own memory, and they could interact by sending messages in network-style fashion. And we then would write a code which uses such architecture in best way. But while this is not true, should we assume that such code will work faster than code which 'knows' that there is a single shared memory for all CPUs and uses such knowledge in best way?
Could you restate this? I don't understand what you mean.
I don't know what to add to above. I just said that we should use approaches which is best fit for architecture where our project(s) will run on. Of course what is best fit is arguable. But i don't think we should drop a shared memory model support when we building a system on top of architecture which haves it.
OK, this is basically what I thought you meant, but wasn't sure.
<snip>
Lets imagine that i run a squeak , started couple of servers, started tetris game and playing it. Then at some moment i feel the urgent need of shutting show my PC to insert additional 64 cores to my CPU to be able run two tetrises instead of one. ;)
Only 2 tetrises? I hope we can do better than that :-)
What is interesting now, is how i can have a dead-simple 'save image(s)' one-click action so, that after i restart system i will able to continue running same set of images from the same point as we have in current single-image squeak? If we store each separate image to separate file(s), then soon there will be a thousands of them polluting same directory. And i will lost, what do i really need to move my project on different PC. Or, maybe we should merge multiple images into same file? This is much better. But how about .changes file?
Or source code control in general. Good point.
There are a number of options. For simplicity, I'd say to connect a single blessed development image to the single changes file. Another would be to use a Monticello-like approach (although there are still things that changesets are needed for, and we don't want to bring in all sorts of new fundamental dependencies). However, these are the first thoughts that popped into my head... you seem to have thought about this more than I have.
Re: persistence in general... I used to do all of my thinking in a Squeak image, using morphs and projects. I grew tired of the breakage of my media when I updated my code to the latest version or screwing up my image in some way. I much prefer the way that we deal with data in Croquet, using a separate serialization format instead of the Squeak image format. So for me, it's acceptable to have many of the images to be transient... created on startup and discarded ("garbage-collected" :-) ) at shutdown.
(BTW, I hope you'll excuse me for continually saying "we" even there's no chance I'll have time to work on it with you all... I'll have to be content with cheering from the sidelines :-( )
About generic usage. A small images like an ants is good when you deal with small sized tasks and job of concrete ant is simple and monotonic. But how about real-world applications such as CAD systems, or business applications which could have a couple hundred megabytes of image size? Spawning multiple images of such size is a waste of resources.
Each image can be as big as it needs to be. We don't need to spawn off identical clones of a single image.
I don't have a lot of first-hand experience with business systems. I'm picturing large tables of relational data... I'm not sure how I would approach this with the model I've described.
Honestly, it's difficult me to think about this stuff outside of the context of Croquet. There might be use cases that aren't addressed well. But my instinct is that if the model is good enough for a world-wide network of interconnected, collaborative 3D spaces (and I believe that it is), then it is good enough for most anything you'd want to do with it.
Ok, then lets suppose we can have a single 'queen' image and smaller 'ant' images. But now we need a sophisticated framework for coordinating moves.
If the application is so sophisticated, it will need a sophisticated framework for coordinating concurrency anyway. No way around it. It doesn't matter if it is one queen and may ants, or a heterogeneous mix of islands.
Cheers, Josh
And also it is clear that most work load still will be in queen image(because most of objects located there) which means that its not scales well.
-- Best regards, Igor Stasenko AKA sig.
On 10/31/07, Igor Stasenko siguctua@gmail.com wrote:
I don't know what to add to above. I just said that we should use approaches which is best fit for architecture where our project(s) will run on. Of course what is best fit is arguable. But i don't think we should drop a shared memory model support when we building a system on top of architecture which haves it.
So what we can build must be constrained by an implementation detail that's not even visible to us [1]? If I had seen this on a C++ list I wouldn't be so surprised but Smalltalk? :)
[1] Obviously we don't because Intel and AMD don't handle shared memory access the same way. AMD already does something a bit closer to message passing: http://www.digit-life.com/articles2/cpu/rmma-numa.html
On 31/10/2007, Jason Johnson jason.johnson.081@gmail.com wrote:
On 10/31/07, Igor Stasenko siguctua@gmail.com wrote:
I don't know what to add to above. I just said that we should use approaches which is best fit for architecture where our project(s) will run on. Of course what is best fit is arguable. But i don't think we should drop a shared memory model support when we building a system on top of architecture which haves it.
So what we can build must be constrained by an implementation detail that's not even visible to us [1]? If I had seen this on a C++ list I wouldn't be so surprised but Smalltalk? :)
But why smalltalk? I'm talking about VM, which is much closer to hardware than smalltalk. As you know a squeak VM are compiled from C sources. You are free to use any model you want in smalltalk, but for VM?
[1] Obviously we don't because Intel and AMD don't handle shared memory access the same way. AMD already does something a bit closer to message passing: http://www.digit-life.com/articles2/cpu/rmma-numa.html
Igor Stasenko wrote:
If you have any ideas how such VM would look like i'm glad to hear.
Okay, so Josh convinced me to write up the ideas. The main problem as I see it with a *practical* solution to the problem is that all of the solutions so far require huge leaps and can't be implemented step-by-step (which almost certainly dooms them to failure).
So what do we know and what do we actually all pretty much agree on? It's that we need to be able to utilize multiple cores and that we need a practical way to get there (if you disagree with the latter this message is not meant for you ;-) Running multiple processes is one option but it is not always sufficient. For example, some OSes would have trouble firing off a couple of thousand processes whereas the same OS may have no problem at all with a couple of thousand threads in one process. To give an example, starting a thread on Windows cost somewhere in the range of a millisecond which is admittedly slow, but still orders of magnitude faster than creating a new process. Then there are issues with resource sharing (like file handles) which are practically guaranteed not to work across process boundaries etc. So while there are perfectly good reasons to run multiple processes, there are reasons just as good to wanting to run multiple threads in one process.
The question then is, can we find an easy way to extend the Squeak VM to run multiple threads and if so how? Given the simplistic nature of the Squeak interpreter, there is actually very little global state that is not encapsulated in objects on the Squeak heap - basically all the variables in class interpreter. So if we would put them into state that is local to each thread, we could trivially run multiple instances of the byte code interpreter in the same VM. This gets us to the two major questions:
* How do we encapsulate the interpreter state? * How do we deal with primitives and plugins?
Let's start with the first one. Obviously, the answer is "make it an object". The way how I would go about is by modifying the CCodeGenerator such that it generates all functions with an argument of type "struct VM" and that variable accesses prefix things properly and that all functions calls pass the extra argument along. In short, what used to be translated as:
sqInt primitiveAdd(void) { integerResult = stackIntegerValue(1) + stackIntegerValue(0) /* etc. */ }
will then become something like here:
sqInt primitiveAdd(struct VM *vm) { integerResult = stackIntegerValue(vm,1) + stackIntegerValue(vm,0) /* etc. */ }
This is a *purely* mechanical step that can be done independent of anything else. It should be possible to generate code that is entirely equivalent to todays code and with a bit of tweaking it should be possible to make that code roughly as fast as we have today (not that I think it matters but understanding the speed difference between this and the default interpreter is important for judging relative speed improvements later).
The above takes care about the interpreter but there are still primitives and plugins that need to be dealt with. What I would do here is define operations like ioLock(struct VM) and ioUnlock(struct VM) that are the effective equivalent of Python's GIL (global interpreter lock) and allow exclusive access to primitives that have not been converted to multi-threading yet. How exactly this conversion should happen is deliberately left open here; maybe changing the VMs major proxy version is the right thing to do to indicate the changed semantics. In any case, the GIL allows us to readily reuse all existing plugins without having to worry about conversion early on.
So now we've taken care of the two major parts of Squeak: We have the ability to run new interpreters and we have the ability to use primitives. This is when the fun begins, because at this point we have options:
For example, if you are into shared-state concurrency, you might implement a primitive that forks a new instance of the interpreter running in the same object memory that your previous interpreter is running in.
Or, and that would be the path that I would take, implement a primitive that loads an image into a new object memory (I can explain in more detail how memory allocation needs to work for that; it is a fairly straightforward scheme but a little too long for this message) and run that interpreter.
And at this point, the *real* fun begins because we can now start to define the communication patterns we'd like to use (initially sockets, later shared memory or event queues or whatever else). We can have tiny worker images that only do minimal stuff but we can also do a Spoon-like thing where we have a "master image" that contains all the code possibly needed and fire off micro-images that (via imprinting) swap in just the code they need to run.
[Whoa! I just got interrupted by a little 5.6 quake some 50 miles away]
Sorry but I lost my train of thought here. Happens at 5.6 Richter ;-) Anyway, the main thing I'm trying to say in the above is that for a *practical* solution to the problem there are some steps that are pretty much required whichever way you look at it. And I think that regardless of your interest in shared state or message passing concurrency we may be able to define a road that leads to interesting experiments without sacrificing the practical artifact. A VM built like described in the above would be strictly a superset of the current VM so it would be able to run any current images and leave room for further experiments.
Cheers, - Andreas
This is a very good plan, Andreas.
It will immediately make available the power of multicore CPUs to software researchers and [perhaps] to production environments. I believe that can bring a new class of users (and their experiments and their experience) to the Squeak community.
For the primitive that forks a new instance of the interpreter running in the same object memory, wouldn't it be necessary to do something on the GC side as well.
Anyways, your suggestion is the top entry in the category of the best *practical* solution.
/Klaus
On Wed, 31 Oct 2007 04:53:11 +0100, Andreas Raab wrote:
Igor Stasenko wrote:
If you have any ideas how such VM would look like i'm glad to hear.
Okay, so Josh convinced me to write up the ideas. The main problem as I see it with a *practical* solution to the problem is that all of the solutions so far require huge leaps and can't be implemented step-by-step (which almost certainly dooms them to failure).
So what do we know and what do we actually all pretty much agree on? It's that we need to be able to utilize multiple cores and that we need a practical way to get there (if you disagree with the latter this message is not meant for you ;-) Running multiple processes is one option but it is not always sufficient. For example, some OSes would have trouble firing off a couple of thousand processes whereas the same OS may have no problem at all with a couple of thousand threads in one process. To give an example, starting a thread on Windows cost somewhere in the range of a millisecond which is admittedly slow, but still orders of magnitude faster than creating a new process. Then there are issues with resource sharing (like file handles) which are practically guaranteed not to work across process boundaries etc. So while there are perfectly good reasons to run multiple processes, there are reasons just as good to wanting to run multiple threads in one process.
The question then is, can we find an easy way to extend the Squeak VM to run multiple threads and if so how? Given the simplistic nature of the Squeak interpreter, there is actually very little global state that is not encapsulated in objects on the Squeak heap - basically all the variables in class interpreter. So if we would put them into state that is local to each thread, we could trivially run multiple instances of the byte code interpreter in the same VM. This gets us to the two major questions:
- How do we encapsulate the interpreter state?
- How do we deal with primitives and plugins?
Let's start with the first one. Obviously, the answer is "make it an object". The way how I would go about is by modifying the CCodeGenerator such that it generates all functions with an argument of type "struct VM" and that variable accesses prefix things properly and that all functions calls pass the extra argument along. In short, what used to be translated as:
sqInt primitiveAdd(void) { integerResult = stackIntegerValue(1) + stackIntegerValue(0) /* etc. */ }
will then become something like here:
sqInt primitiveAdd(struct VM *vm) { integerResult = stackIntegerValue(vm,1) + stackIntegerValue(vm,0) /* etc. */ }
This is a *purely* mechanical step that can be done independent of anything else. It should be possible to generate code that is entirely equivalent to todays code and with a bit of tweaking it should be possible to make that code roughly as fast as we have today (not that I think it matters but understanding the speed difference between this and the default interpreter is important for judging relative speed improvements later).
The above takes care about the interpreter but there are still primitives and plugins that need to be dealt with. What I would do here is define operations like ioLock(struct VM) and ioUnlock(struct VM) that are the effective equivalent of Python's GIL (global interpreter lock) and allow exclusive access to primitives that have not been converted to multi-threading yet. How exactly this conversion should happen is deliberately left open here; maybe changing the VMs major proxy version is the right thing to do to indicate the changed semantics. In any case, the GIL allows us to readily reuse all existing plugins without having to worry about conversion early on.
So now we've taken care of the two major parts of Squeak: We have the ability to run new interpreters and we have the ability to use primitives. This is when the fun begins, because at this point we have options:
For example, if you are into shared-state concurrency, you might implement a primitive that forks a new instance of the interpreter running in the same object memory that your previous interpreter is running in.
Or, and that would be the path that I would take, implement a primitive that loads an image into a new object memory (I can explain in more detail how memory allocation needs to work for that; it is a fairly straightforward scheme but a little too long for this message) and run that interpreter.
And at this point, the *real* fun begins because we can now start to define the communication patterns we'd like to use (initially sockets, later shared memory or event queues or whatever else). We can have tiny worker images that only do minimal stuff but we can also do a Spoon-like thing where we have a "master image" that contains all the code possibly needed and fire off micro-images that (via imprinting) swap in just the code they need to run.
[Whoa! I just got interrupted by a little 5.6 quake some 50 miles away]
Sorry but I lost my train of thought here. Happens at 5.6 Richter ;-) Anyway, the main thing I'm trying to say in the above is that for a *practical* solution to the problem there are some steps that are pretty much required whichever way you look at it. And I think that regardless of your interest in shared state or message passing concurrency we may be able to define a road that leads to interesting experiments without sacrificing the practical artifact. A VM built like described in the above would be strictly a superset of the current VM so it would be able to run any current images and leave room for further experiments.
Cheers,
- Andreas
That is a very interesting plan, Andreas. However, I don't see garbage collection on the list. Won't you have to make a concurrent garbage collecter?
-Ralph
Ralph Johnson wrote:
That is a very interesting plan, Andreas. However, I don't see garbage collection on the list. Won't you have to make a concurrent garbage collecter?
I don't think so. First, you don't need a new garbage collector for the direction that I would take. Since the threads don't operate on the same object memory, no change to the collector is needed. And for shared state concurrency, simple solutions (like a gc request flag per thread which is checked on each send) can be used to ensure atomic operation of the collector.
Cheers, - Andreas
Andreas Raab writes:
Ralph Johnson wrote:
That is a very interesting plan, Andreas. However, I don't see garbage collection on the list. Won't you have to make a concurrent garbage collecter?
I don't think so. First, you don't need a new garbage collector for the direction that I would take. Since the threads don't operate on the same object memory, no change to the collector is needed. And for shared state concurrency, simple solutions (like a gc request flag per thread which is checked on each send) can be used to ensure atomic operation of the collector.
You'd need to serialise object creation and accessing the root table in the write barrier. That may be possible without too much work but there's likely to be some overhead.
Providing a parallel object memory as part of a garbage collector rewrite that speed up single CPU code should be possible. The major design change would be changing the write barrier from a remembered set to card marking. That unfortunately might make it necessary to separate pointer object space from byte storage space.
From the reading I did when tuning Exupery's memory access, it looks
like a mostly parallel old space collector should be about the same amount of work as writing an incremental collector. The trick is to only run the big mark phase and the big sweep phase in parallel with the interpreter then stop the interpreter to do the final marks.
That said, share nothing scales to multiple computers. If you really need CPU power it's often cheaper to buy many smaller boxes than a few big ones.
Bryce
bryce@kampjes.demon.co.uk wrote:
You'd need to serialise object creation and accessing the root table in the write barrier. That may be possible without too much work but there's likely to be some overhead.
Yes. Such is the price of shared state concurrency.
Providing a parallel object memory as part of a garbage collector rewrite that speed up single CPU code should be possible. The major design change would be changing the write barrier from a remembered set to card marking. That unfortunately might make it necessary to separate pointer object space from byte storage space.
Given the sensitivity of GC algorithms in real-world situations, rewriting the garbage collector is not my understanding of a practical approach. The overhead of using an atomic allocator/GC mechanism can be reduced to a single decrement and test for the default single-threaded case, so that the price would only be paid when you run multiple threads. That seems like more straight-forward approach in particular if one can tweak the GC parameters to run GC less often (and incur the overhead less often).
Cheers, - Andreas
On 31/10/2007, Andreas Raab andreas.raab@gmx.de wrote:
Igor Stasenko wrote:
If you have any ideas how such VM would look like i'm glad to hear.
Okay, so Josh convinced me to write up the ideas. The main problem as I see it with a *practical* solution to the problem is that all of the solutions so far require huge leaps and can't be implemented step-by-step (which almost certainly dooms them to failure).
So what do we know and what do we actually all pretty much agree on? It's that we need to be able to utilize multiple cores and that we need a practical way to get there (if you disagree with the latter this message is not meant for you ;-) Running multiple processes is one option but it is not always sufficient. For example, some OSes would have trouble firing off a couple of thousand processes whereas the same OS may have no problem at all with a couple of thousand threads in one process. To give an example, starting a thread on Windows cost somewhere in the range of a millisecond which is admittedly slow, but still orders of magnitude faster than creating a new process. Then there are issues with resource sharing (like file handles) which are practically guaranteed not to work across process boundaries etc. So while there are perfectly good reasons to run multiple processes, there are reasons just as good to wanting to run multiple threads in one process.
The question then is, can we find an easy way to extend the Squeak VM to run multiple threads and if so how? Given the simplistic nature of the Squeak interpreter, there is actually very little global state that is not encapsulated in objects on the Squeak heap - basically all the variables in class interpreter. So if we would put them into state that is local to each thread, we could trivially run multiple instances of the byte code interpreter in the same VM. This gets us to the two major questions:
- How do we encapsulate the interpreter state?
- How do we deal with primitives and plugins?
Let's start with the first one. Obviously, the answer is "make it an object". The way how I would go about is by modifying the CCodeGenerator such that it generates all functions with an argument of type "struct VM" and that variable accesses prefix things properly and that all functions calls pass the extra argument along. In short, what used to be translated as:
sqInt primitiveAdd(void) { integerResult = stackIntegerValue(1) + stackIntegerValue(0) /* etc. */ }
will then become something like here:
sqInt primitiveAdd(struct VM *vm) { integerResult = stackIntegerValue(vm,1) + stackIntegerValue(vm,0) /* etc. */ }
This is a *purely* mechanical step that can be done independent of anything else. It should be possible to generate code that is entirely equivalent to todays code and with a bit of tweaking it should be possible to make that code roughly as fast as we have today (not that I think it matters but understanding the speed difference between this and the default interpreter is important for judging relative speed improvements later).
There are already some steps done in this direction. A sources for RISC architecture generate a foo struct , which holds all interpreter globals. Also, i did some changes in Exupery to create a single struct of all VM globals (not only variables, but functions too). This was done to make it easier to get address of any global symbol what Exupery needs. I'm also experimented to replace all direct calls to function to indirect (i.e. foo->primAdd(x,y) instead of primAdd(x,y)). This caused about ~1% of speed degradation in tinyBenchmarks :) Also, moving forward on this renders an InterpreterProxy struct useless, because we can just pass an address to our 'foo' struct to plugins which already contains everything what plugin can reach.
The above takes care about the interpreter but there are still primitives and plugins that need to be dealt with. What I would do here is define operations like ioLock(struct VM) and ioUnlock(struct VM) that are the effective equivalent of Python's GIL (global interpreter lock) and allow exclusive access to primitives that have not been converted to multi-threading yet. How exactly this conversion should happen is deliberately left open here; maybe changing the VMs major proxy version is the right thing to do to indicate the changed semantics. In any case, the GIL allows us to readily reuse all existing plugins without having to worry about conversion early on.
Or as i proposed in earlier posts, the other way could be to schedule all primitive calls, which currently don't support multi-threading to single 'main' thread. Then we don't need the GIL.
So now we've taken care of the two major parts of Squeak: We have the ability to run new interpreters and we have the ability to use primitives. This is when the fun begins, because at this point we have options:
For example, if you are into shared-state concurrency, you might implement a primitive that forks a new instance of the interpreter running in the same object memory that your previous interpreter is running in.
Or, and that would be the path that I would take, implement a primitive that loads an image into a new object memory (I can explain in more detail how memory allocation needs to work for that; it is a fairly straightforward scheme but a little too long for this message) and run that interpreter.
And at this point, the *real* fun begins because we can now start to define the communication patterns we'd like to use (initially sockets, later shared memory or event queues or whatever else). We can have tiny worker images that only do minimal stuff but we can also do a Spoon-like thing where we have a "master image" that contains all the code possibly needed and fire off micro-images that (via imprinting) swap in just the code they need to run.
[Whoa! I just got interrupted by a little 5.6 quake some 50 miles away]
Sorry but I lost my train of thought here. Happens at 5.6 Richter ;-) Anyway, the main thing I'm trying to say in the above is that for a *practical* solution to the problem there are some steps that are pretty much required whichever way you look at it. And I think that regardless of your interest in shared state or message passing concurrency we may be able to define a road that leads to interesting experiments without sacrificing the practical artifact. A VM built like described in the above would be strictly a superset of the current VM so it would be able to run any current images and leave room for further experiments.
Cheers,
- Andreas
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com
On 31/10/2007, Andreas Raab andreas.raab@gmx.de wrote:
sqInt primitiveAdd(void) { integerResult = stackIntegerValue(1) + stackIntegerValue(0) /* etc. */ }
will then become something like here:
sqInt primitiveAdd(struct VM *vm) { integerResult = stackIntegerValue(vm,1) + stackIntegerValue(vm,0) /* etc. */ }
This is a *purely* mechanical step that can be done independent of anything else. It should be possible to generate code that is entirely equivalent to todays code and with a bit of tweaking it should be possible to make that code roughly as fast as we have today (not that I think it matters but understanding the speed difference between this and the default interpreter is important for judging relative speed improvements later).
There are already some steps done in this direction. A sources for RISC architecture generate a foo struct , which holds all interpreter globals. Also, i did some changes in Exupery to create a single struct of all VM globals (not only variables, but functions too). This was done to make it easier to get address of any global symbol what Exupery needs. I'm also experimented to replace all direct calls to function to indirect (i.e. foo->primAdd(x,y) instead of primAdd(x,y)). This caused about ~1% of speed degradation in tinyBenchmarks :)
Wouldn't this mean that consideration of using C++ isn't off the mark? There would be some degradation of performance. However, there would be the benefit of structuring the VM classes, of not having to add VM as an argument everywhere, and it may even be possible to subclass Thread so we know where the thread-local storage is.
Just a thought.
Igor Stasenko wrote:
There are already some steps done in this direction. A sources for RISC architecture generate a foo struct , which holds all interpreter globals. Also, i did some changes in Exupery to create a single struct of all VM globals (not only variables, but functions too). This was done to make it easier to get address of any global symbol what Exupery needs. I'm also experimented to replace all direct calls to function to indirect (i.e. foo->primAdd(x,y) instead of primAdd(x,y)). This caused about ~1% of speed degradation in tinyBenchmarks :)
Ah, indeed, I forgot about that.
Also, moving forward on this renders an InterpreterProxy struct useless, because we can just pass an address to our 'foo' struct to plugins which already contains everything what plugin can reach.
But this isn't quite true. One of the reasons for the proxy is to abstract from the actual implementation since C doesn't do proper name lookup for names but rather uses indexes. And so, if you happen to add or remove a method from that struct, your plugins will be screwed ;-)
The above takes care about the interpreter but there are still primitives and plugins that need to be dealt with. What I would do here is define operations like ioLock(struct VM) and ioUnlock(struct VM) that are the effective equivalent of Python's GIL (global interpreter lock) and allow exclusive access to primitives that have not been converted to multi-threading yet. How exactly this conversion should happen is deliberately left open here; maybe changing the VMs major proxy version is the right thing to do to indicate the changed semantics. In any case, the GIL allows us to readily reuse all existing plugins without having to worry about conversion early on.
Or as i proposed in earlier posts, the other way could be to schedule all primitive calls, which currently don't support multi-threading to single 'main' thread. Then we don't need the GIL.
I had missed that. Yes, that would work just as well.
Cheers, - Andreas
On 31/10/2007, Andreas Raab andreas.raab@gmx.de wrote:
Igor Stasenko wrote:
There are already some steps done in this direction. A sources for RISC architecture generate a foo struct , which holds all interpreter globals. Also, i did some changes in Exupery to create a single struct of all VM globals (not only variables, but functions too). This was done to make it easier to get address of any global symbol what Exupery needs. I'm also experimented to replace all direct calls to function to indirect (i.e. foo->primAdd(x,y) instead of primAdd(x,y)). This caused about ~1% of speed degradation in tinyBenchmarks :)
Ah, indeed, I forgot about that.
Also, moving forward on this renders an InterpreterProxy struct useless, because we can just pass an address to our 'foo' struct to plugins which already contains everything what plugin can reach.
But this isn't quite true. One of the reasons for the proxy is to abstract from the actual implementation since C doesn't do proper name lookup for names but rather uses indexes. And so, if you happen to add or remove a method from that struct, your plugins will be screwed ;-)
You mean for dynamically linked plugins? Yes, that can be a problem. But again, a struct which is generated contains not only address of a variable, but names too (in string literals).
a single entry for function: accessibleObjectAfter, "accessibleObjectAfter:", "sqInt (*accessibleObjectAfter)(sqInt oop)",
a single entry for a var: &activeContext, "activeContext" , "<var>".
So, there is enough info to get everything you need even with dynamic linkage. And even without linkage, you can parse a function prototypes and use some FFI to call them :) All you need is to have a pointer to that struct and number of entries.
Igor Stasenko wrote:
Or as i proposed in earlier posts, the other way could be to schedule all primitive calls, which currently don't support multi-threading to single 'main' thread.
I thought about this a little more today and decided this is indeed the better way to go. It preserves all the semantics that currently exist in Squeak with regards of running in the UI thread etc. It is also trivial to implement since we'll need some sort of thread identifier per VM instance so the test can be done without any additional synchronization overhead.
So the (updated) steps to the multi-threaded VM would then be:
1) Objectify the Interpreter by making the code generator generate the appropriate code.
2) Implement the "failure mode" for calling primitives from non-primary threads and possibly implement the first few plugins that are multi-threaded (files, sockets, ffi come to mind).
And then, either:
3a) For message passing concurrency: Implement a "load image" primitive which allows loading and running another image in parallel.
3b) For shared state concurrency: Implement allocation and GC sync points.
Any interest in actually doing this? I think I might even be able to find some funding for a project like this if someone has a bit of spare time to work on it. If you are interested, let me know off-list.
Cheers, - Andreas
On 01/11/2007, Andreas Raab andreas.raab@gmx.de wrote:
Igor Stasenko wrote:
Or as i proposed in earlier posts, the other way could be to schedule all primitive calls, which currently don't support multi-threading to single 'main' thread.
I thought about this a little more today and decided this is indeed the better way to go. It preserves all the semantics that currently exist in Squeak with regards of running in the UI thread etc. It is also trivial to implement since we'll need some sort of thread identifier per VM instance so the test can be done without any additional synchronization overhead.
So the (updated) steps to the multi-threaded VM would then be:
- Objectify the Interpreter by making the code generator generate the
appropriate code.
- Implement the "failure mode" for calling primitives from non-primary
threads and possibly implement the first few plugins that are multi-threaded (files, sockets, ffi come to mind).
Writing a generic threading framework comes in mind. A brief description: - each object in system should have a way how to get it's assigned thread id (and how assign it , of course). - if we see, that object having assigned thread, that means, that this object is 'captured' by given thread for processing, and we need to schedule new messages to that thread.
Early i proposed this approach only for smalltalk objects and their active contexts, but this concept can be easily expanded to virtually any objects in system (Interpreter/GC/primitives).
For instance, a concurrent GC mark phase can be implemented by following: - we have a single thread which dedicated for marking - for each object we want to mark we should send a 'mark' message to it. If given object is not 'busy' (not having a thread id set) we simply set it and begin marking its references. if its already set, then we schedule a mark message to the thread where it currently assigned and can continue with own queue (mark other objects). You can see that with this concept we can virtually mix an interpreter 'sends' with VM low-level 'sends'. In that way i even thinking what if not having a dedicated thread for marking, just start it in some single thread (when we have a GC criteria met) and let it go until it converges. Eventually if we push further, we can come up with parallel marking, then we don't need to have a 'sync' points and never need a GIL.
And then, either:
3a) For message passing concurrency: Implement a "load image" primitive which allows loading and running another image in parallel.
3b) For shared state concurrency: Implement allocation and GC sync points.
Any interest in actually doing this? I think I might even be able to find some funding for a project like this if someone has a bit of spare time to work on it. If you are interested, let me know off-list.
I'm on board :)
Cheers,
- Andreas
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Thursday, November 01, 2007 8:21 AM Subject: Re: Thoughts on a concurrent Squeak VM
On 01/11/2007, Andreas Raab andreas.raab@gmx.de wrote:
- Implement the "failure mode" for calling primitives from non-primary
threads and possibly implement the first few plugins that are multi-threaded (files, sockets, ffi come to mind).
Writing a generic threading framework comes in mind. A brief description:
- each object in system should have a way how to get it's assigned
thread id (and how assign it , of course).
- if we see, that object having assigned thread, that means, that this
object is 'captured' by given thread for processing, and we need to schedule new messages to that thread.
Early i proposed this approach only for smalltalk objects and their active contexts, but this concept can be easily expanded to virtually any objects in system (Interpreter/GC/primitives).
+1.
Please expose the ability for a use to do a scheduled move of an object from one thread to another thread.
You can see that with this concept we can virtually mix an interpreter 'sends' with VM low-level 'sends'.
Yes, indeed.
On 01/11/2007, Rob Withers reefedjib@yahoo.com wrote:
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Thursday, November 01, 2007 8:21 AM Subject: Re: Thoughts on a concurrent Squeak VM
On 01/11/2007, Andreas Raab andreas.raab@gmx.de wrote:
- Implement the "failure mode" for calling primitives from non-primary
threads and possibly implement the first few plugins that are multi-threaded (files, sockets, ffi come to mind).
Writing a generic threading framework comes in mind. A brief description:
- each object in system should have a way how to get it's assigned
thread id (and how assign it , of course).
- if we see, that object having assigned thread, that means, that this
object is 'captured' by given thread for processing, and we need to schedule new messages to that thread.
Early i proposed this approach only for smalltalk objects and their active contexts, but this concept can be easily expanded to virtually any objects in system (Interpreter/GC/primitives).
+1.
Please expose the ability for a use to do a scheduled move of an object from one thread to another thread.
I now writing a small code snippet in ST to show how it will look like.
You can see that with this concept we can virtually mix an interpreter 'sends' with VM low-level 'sends'.
Yes, indeed.
This report is for Squeak3.10beta #7158
I ran into problems when I tried to load code into a fresh image. The method
CharacterSetComplement>> hash
sends the message bitXOr: which is nowhere defined. I think the correct writing of the message is bitXor:
Greetings Boris
El 11/1/07 2:46 PM, "Boris.Gaertner" Boris.Gaertner@gmx.net escribió:
This report is for Squeak3.10beta #7158
I ran into problems when I tried to load code into a fresh image. The method
CharacterSetComplement>> hash
sends the message bitXOr: which is nowhere defined. I think the correct writing of the message is bitXor:
Greetings Boris
Very thanks, Boris. I confirm method have four on 7130 , Integer, LargePositiveInteger, SmallInteger, ThirtyTwoBitRegister so I fix tomorrow. Edgar
El 11/1/07 2:46 PM, "Boris.Gaertner" Boris.Gaertner@gmx.net escribió:
This report is for Squeak3.10beta #7158
I ran into problems when I tried to load code into a fresh image. The method
CharacterSetComplement>> hash
sends the message bitXOr: which is nowhere defined. I think the correct writing of the message is bitXor:
Greetings Boris
Very thanks, Boris. I confirm method have four on 7130 , Integer, LargePositiveInteger, SmallInteger, ThirtyTwoBitRegister so I fix tomorrow. Edgar
Is on new blue http://ftp.squeak.org/3.10/Squeak3.10.1beta.7141.zip
Edgar
Boris.Gaertner a écrit :
This report is for Squeak3.10beta #7158
I ran into problems when I tried to load code into a fresh image. The method
CharacterSetComplement>> hash
sends the message bitXOr: which is nowhere defined. I think the correct writing of the message is bitXor:
Greetings Boris
Thanks!
Sure I clicked one bit off in correction menu... Obviously, this class lacks testings, among its friends of Collections-Support, only Association has an AssociationTest!
Too bad... Because, WideCharacterSet also has a buggy hash.
Nicolas
Ok, here some code to illustrate what i have in mind. There's nothing sophisticated and VM can be modified relatively easy to support it. We can do same for primitives and inter-image sends.
We can have a same simple struct in C to encapsulate context state, like: struct context { void (fn*) (void*param) ; // function pointer void * param; // arbitrary parameter struct context * sender; // pointer to context which will receive result and be activated after fn done void * result; // can be present to store fn result , if fn is a function returning non void };
On 01/11/2007, Rob Withers reefedjib@yahoo.com wrote:
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Thursday, November 01, 2007 8:21 AM Subject: Re: Thoughts on a concurrent Squeak VM
On 01/11/2007, Andreas Raab andreas.raab@gmx.de wrote:
- Implement the "failure mode" for calling primitives from non-primary
threads and possibly implement the first few plugins that are multi-threaded (files, sockets, ffi come to mind).
Writing a generic threading framework comes in mind. A brief description:
- each object in system should have a way how to get it's assigned
thread id (and how assign it , of course).
- if we see, that object having assigned thread, that means, that this
object is 'captured' by given thread for processing, and we need to schedule new messages to that thread.
Early i proposed this approach only for smalltalk objects and their active contexts, but this concept can be easily expanded to virtually any objects in system (Interpreter/GC/primitives).
+1.
Please expose the ability for a use to do a scheduled move of an object from one thread to another thread.
You can see that with this concept we can virtually mix an interpreter 'sends' with VM low-level 'sends'.
Yes, indeed.
Forgot to add.
A futures/promises fits well with this. A future is just an action (message send) which should be enqueued after receiving result of some operation. This means that future can be simply a special kind of context, which activates after original send and then uses result to send new message for it. This means that futures can be seemly handled at VM level.
Hi Igor,
As I wait for SystemTracer to do its thing...
I've been reading Mark Miller's thesis on E, over at erights.org. It's very interesting. He is a proponent of non-shared memory event loops. He describes each Vat as being a stack, a queue and a heap. Its a stack of immediate calls, of the currently executing event, a queue of pending events, and a heap of objects this Vat controls. On the other hand, he talks about a BootStrapComm (I think this is what it is called) system which allows Vats in the same address space to pass references back and forth, so E supports shared-memory event loops as well. I thought you'd find this interesting.
You have yourself a queue and a stack (as you activate a pending context). I think of a future/promise more as a pending action that get's scheduled, before it has a value in its continuation. It just so happens that the resolve: action for activating the continuation of a eventual message send, is also an eventual message send, but with no continuation of its own.
This means that future can be simply a special kind of context, which activates after original send and then uses result to send new message for it.
That sounds right, although I am unclear on what "uses result to send new message for it" means.
The other thought I had was that garbage collection in squeak seems to happen when needed, immediately. Things may be so dire for memory that it has to do or die. This would give us a problem making it scheduled as another event, wouldn't it?
cheers, Rob
On 03/11/2007, Rob Withers reefedjib@yahoo.com wrote:
Hi Igor,
As I wait for SystemTracer to do its thing...
I've been reading Mark Miller's thesis on E, over at erights.org. It's very interesting. He is a proponent of non-shared memory event loops. He describes each Vat as being a stack, a queue and a heap. Its a stack of immediate calls, of the currently executing event, a queue of pending events, and a heap of objects this Vat controls. On the other hand, he talks about a BootStrapComm (I think this is what it is called) system which allows Vats in the same address space to pass references back and forth, so E supports shared-memory event loops as well. I thought you'd find this interesting.
I found another thing, which may be interesting: http://www.stackless.com/
You have yourself a queue and a stack (as you activate a pending context). I think of a future/promise more as a pending action that get's scheduled, before it has a value in its continuation. It just so happens that the resolve: action for activating the continuation of a eventual message send, is also an eventual message send, but with no continuation of its own.
This means that future can be simply a special kind of context, which activates after original send and then uses result to send new message for it.
That sounds right, although I am unclear on what "uses result to send new message for it" means.
i meant that future send needs a receiver to send message. so when it activates, it means that receiver(result) is now known and thus, a message send can be performed.
The other thought I had was that garbage collection in squeak seems to happen when needed, immediately. Things may be so dire for memory that it has to do or die. This would give us a problem making it scheduled as another event, wouldn't it?
What makes you think that futures will die upon GC? For working properly, a reference to the future are placed as 'sender context' in context of our interest. So, when such context will done working and return result, a sender context will be activated - which is our future message send.
I'm personally much more worrying about non-local returns.
If we suppose that we built a chain of future message sends in:
object future message1 message2 message3 ... messageN.
then if an error occurs in message1 (or there is non-local return), it means that all chain of futures, which awaits for activation (message2 ... messageN) should be thrown overboard. It seems building long chains of futures is impractical.
Of course, in this case its better ask developer, why he uses futures with methods which can do non-local returns. :)
cheers, Rob
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com
I found another thing, which may be interesting: http://www.stackless.com/
Ok, I took a look and I decided I do not like Python. :) What a horrible syntax. On the topic of microstacks in stackless, I figure we already do this in Squeak. Each method has a stack and the msg sending is like the channel comms.
The other thought I had was that garbage collection in squeak seems to happen when needed, immediately. Things may be so dire for memory that it has to do or die. This would give us a problem making it scheduled as another event, wouldn't it?
What makes you think that futures will die upon GC? For working properly, a reference to the future are placed as 'sender context' in context of our interest. So, when such context will done working and return result, a sender context will be activated - which is our future message send.
Ok, you are sending messages to futures, as am I. No, the comments about GC had to do with a comment made where GC actions are scheduled with normal image activities, and I thought they might not be executed when needed.
I'm personally much more worrying about non-local returns.
Me too.
If we suppose that we built a chain of future message sends in:
object future message1 message2 message3 ... messageN.
then if an error occurs in message1 (or there is non-local return), it means that all chain of futures, which awaits for activation (message2 ... messageN) should be thrown overboard.
Actually, the error should propogate through all future msgs, not thrown overboard.
It seems building long chains of futures is impractical.
Of course, in this case its better ask developer, why he uses futures with methods which can do non-local returns. :)
He wants to because the capability is there. He must use them everywhere. :)
Rob
On 04/11/2007, Rob Withers reefedjib@yahoo.com wrote:
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com
I found another thing, which may be interesting: http://www.stackless.com/
Ok, I took a look and I decided I do not like Python. :) What a horrible syntax.
I don't like a python myself, a two weeks of programming on it was enough for the rest of my life ;) But i pointed more on concepts which allowed concurrency in Python VM , and i think we can learn on their experience.
On the topic of microstacks in stackless, I figure we already do this in Squeak. Each method has a stack and the msg sending is like the channel comms.
Not exactly. See, the one main difference, i think, that we can't lookup for a method before all arguments are evaluated even for known receiver.
I mean that in code:
object1 method1: object2 method2
we can't do lookup for #method1 before done evaluating a method2, because in method2 there can't be things which can change the class of object. Thus, we need to push args on stack or create an argument-vector to collect all arguments, and then, only after having all of them, we can actually do lookup and create a context for receiver's method.
The other thought I had was that garbage collection in squeak seems to happen when needed, immediately. Things may be so dire for memory that it has to do or die. This would give us a problem making it scheduled as another event, wouldn't it?
What makes you think that futures will die upon GC? For working properly, a reference to the future are placed as 'sender context' in context of our interest. So, when such context will done working and return result, a sender context will be activated - which is our future message send.
Ok, you are sending messages to futures, as am I. No, the comments about GC had to do with a comment made where GC actions are scheduled with normal image activities, and I thought they might not be executed when needed.
Ah, i see. A run-time GC should be designed in a way that it will guarantee that marking will be done before application allocates N bytes of memory.
I'm personally much more worrying about non-local returns.
Me too.
If we suppose that we built a chain of future message sends in:
object future message1 message2 message3 ... messageN.
then if an error occurs in message1 (or there is non-local return), it means that all chain of futures, which awaits for activation (message2 ... messageN) should be thrown overboard.
Actually, the error should propogate through all future msgs, not thrown overboard.
Err, why? IIRC, an error(exception) initiates a stack unwinding looking for context which can handle it.
It seems building long chains of futures is impractical.
Of course, in this case its better ask developer, why he uses futures with methods which can do non-local returns. :)
He wants to because the capability is there. He must use them everywhere. :)
Rob
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Saturday, November 03, 2007 4:57 PM Subject: Re: Thoughts on a concurrent Squeak VM
On 04/11/2007, Rob Withers reefedjib@yahoo.com wrote:
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com
I found another thing, which may be interesting: http://www.stackless.com/
Ok, I took a look and I decided I do not like Python. :) What a horrible syntax.
I don't like a python myself, a two weeks of programming on it was enough for the rest of my life ;) But i pointed more on concepts which allowed concurrency in Python VM , and i think we can learn on their experience.
On the topic of microstacks in stackless, I figure we already do this in Squeak. Each method has a stack and the msg sending is like the channel comms.
Not exactly. See, the one main difference, i think, that we can't lookup for a method before all arguments are evaluated even for known receiver.
I mean that in code:
object1 method1: object2 method2
we can't do lookup for #method1 before done evaluating a method2, because in method2 there can't be things which can change the class of object. Thus, we need to push args on stack or create an argument-vector to collect all arguments, and then, only after having all of them, we can actually do lookup and create a context for receiver's method.
Unless you are using futures. :) In that case the invocation schedules the message for later, and we move to sending method1: before method2 is evaluated. Even if method2 changes the class of object1, it just happened to late to be of importance.
If we suppose that we built a chain of future message sends in:
object future message1 message2 message3 ... messageN.
then if an error occurs in message1 (or there is non-local return), it means that all chain of futures, which awaits for activation (message2 ... messageN) should be thrown overboard.
Actually, the error should propogate through all future msgs, not thrown overboard.
Err, why? IIRC, an error(exception) initiates a stack unwinding looking for context which can handle it.
object future message1 throws an error. message2 will now be sent to this error and will itself break with an error, and so on up the line. It is another way of propogating errors.
Rob
On 04/11/2007, Rob Withers reefedjib@yahoo.com wrote:
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Saturday, November 03, 2007 4:57 PM Subject: Re: Thoughts on a concurrent Squeak VM
On 04/11/2007, Rob Withers reefedjib@yahoo.com wrote:
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com
I found another thing, which may be interesting: http://www.stackless.com/
Ok, I took a look and I decided I do not like Python. :) What a horrible syntax.
I don't like a python myself, a two weeks of programming on it was enough for the rest of my life ;) But i pointed more on concepts which allowed concurrency in Python VM , and i think we can learn on their experience.
On the topic of microstacks in stackless, I figure we already do this in Squeak. Each method has a stack and the msg sending is like the channel comms.
Not exactly. See, the one main difference, i think, that we can't lookup for a method before all arguments are evaluated even for known receiver.
I mean that in code:
object1 method1: object2 method2
we can't do lookup for #method1 before done evaluating a method2, because in method2 there can't be things which can change the class of object. Thus, we need to push args on stack or create an argument-vector to collect all arguments, and then, only after having all of them, we can actually do lookup and create a context for receiver's method.
Unless you are using futures. :) In that case the invocation schedules the message for later, and we move to sending method1: before method2 is evaluated. Even if method2 changes the class of object1, it just happened to late to be of importance.
If we suppose that we built a chain of future message sends in:
object future message1 message2 message3 ... messageN.
then if an error occurs in message1 (or there is non-local return), it means that all chain of futures, which awaits for activation (message2 ... messageN) should be thrown overboard.
Actually, the error should propogate through all future msgs, not thrown overboard.
Err, why? IIRC, an error(exception) initiates a stack unwinding looking for context which can handle it.
object future message1 throws an error. message2 will now be sent to this error and will itself break with an error, and so on up the line. It is another way of propogating errors.
True, but not for non-local returns.
Rob
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com
I'm personally much more worrying about non-local returns.
As an example of the problem of non-local return, let's look at this simple method:
foo: bar
bar ifTrue: [^ 1]. self snafu. ^ 0
If bar is eventual, we don't know at the time of invocation whether the method will exit through the non-local return or through the return at the end of the method.
How to best deal with this?
My thought is that the context needs to become eventual, be sent as a lambda to the bar promise, and the context return a promise. The return will check to see if there is a resolver on the context and #resolve: it with the appropriate return value. The lambda needs to be all sequential code from the point where a message is sent to the promise that includes a block as an argument to the end of the context. So the lambda would be:
f := [:barVow | barVow ifTrue: [^1]. self snafu. ^ 0]
and we would send: ^ bar whenResolved: f.
So the challenge here would be in capturing the lambda and in effect rewriting the code for this method, on the fly. The lambda is just a one arg blockContext, where the ip points to the pushTemp: 0.
17 <10> pushTemp: 0 18 <99> jumpFalse: 21 19 <76> pushConstant: 1 20 <7C> returnTop 21 <70> self 22 <D0> send: snafu 23 <87> pop 24 <75> pushConstant: 0 25 <7C> returnTop
and the two return bytecodes, would each need to resolve the promise returned. This needs to become something like the following, where lines 28-36 is the original block with three rewrites: first, pushTemp: 1 instead of pushTemp: 0, since the block arg is a separate temp. second, the two returnTops needs to become a jumpTo: and a blockReturn, so we can resolve the promise.
21 <10> pushTemp: 0 22 <89> pushThisContext: 23 <76> pushConstant: 1 24 <C8> send: blockCopy: 25 <A4 0A> jumpTo: 37 27 <69> popIntoTemp: 1 28 <11> pushTemp: 1 29 <99> jumpFalse: 32 30 <76> pushConstant: 1 31 <93> jumpTo: 36 32 <70> self 33 <D1> send: snafu 34 <87> pop 35 <75> pushConstant: 0 36 <7D> blockReturn 37 <E0> send: whenResolved: 38 <7C> returnTop
Of course, we don't want to modify the original method, since another call to it may not involve a promise. So we are looking at creating a new MethodContext, which we rewrite to create a new BlockContext defined from lines 22-25 through 36 above. For every non-local return method when called with a promise.
What do you think?
Rob
On 04/11/2007, Rob Withers reefedjib@yahoo.com wrote:
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com
I'm personally much more worrying about non-local returns.
As an example of the problem of non-local return, let's look at this simple method:
foo: bar
bar ifTrue: [^ 1]. self snafu. ^ 0
If bar is eventual, we don't know at the time of invocation whether the method will exit through the non-local return or through the return at the end of the method.
How to best deal with this?
My thought is that the context needs to become eventual, be sent as a lambda to the bar promise, and the context return a promise. The return will check to see if there is a resolver on the context and #resolve: it with the appropriate return value. The lambda needs to be all sequential code from the point where a message is sent to the promise that includes a block as an argument to the end of the context. So the lambda would be:
f := [:barVow | barVow ifTrue: [^1]. self snafu. ^ 0]
and we would send: ^ bar whenResolved: f.
So the challenge here would be in capturing the lambda and in effect rewriting the code for this method, on the fly. The lambda is just a one arg blockContext, where the ip points to the pushTemp: 0.
17 <10> pushTemp: 0 18 <99> jumpFalse: 21 19 <76> pushConstant: 1 20 <7C> returnTop 21 <70> self 22 <D0> send: snafu 23 <87> pop 24 <75> pushConstant: 0 25 <7C> returnTop
and the two return bytecodes, would each need to resolve the promise returned. This needs to become something like the following, where lines 28-36 is the original block with three rewrites: first, pushTemp: 1 instead of pushTemp: 0, since the block arg is a separate temp. second, the two returnTops needs to become a jumpTo: and a blockReturn, so we can resolve the promise.
21 <10> pushTemp: 0 22 <89> pushThisContext: 23 <76> pushConstant: 1 24 <C8> send: blockCopy: 25 <A4 0A> jumpTo: 37 27 <69> popIntoTemp: 1 28 <11> pushTemp: 1 29 <99> jumpFalse: 32 30 <76> pushConstant: 1 31 <93> jumpTo: 36 32 <70> self 33 <D1> send: snafu 34 <87> pop 35 <75> pushConstant: 0 36 <7D> blockReturn 37 <E0> send: whenResolved: 38 <7C> returnTop
Of course, we don't want to modify the original method, since another call to it may not involve a promise. So we are looking at creating a new MethodContext, which we rewrite to create a new BlockContext defined from lines 22-25 through 36 above. For every non-local return method when called with a promise.
What do you think?
Hmm.. i think, you given a bad example. It depends on 'knowledge' that #ifTrue: is optimized by compiler as an execution branch, but not a regular message send. For general case we should consider passing a block as a argument, like:
bar someMethod: [^1].
But this again, not includes cases, when blocks are assigned to temps/ivars within context and never passed as arguments to message(s).
I thought a bit about marking a method context containing blocks with non-local returns with special flag, which will force context to not activate until all free variables in method are resolved (agruments/temps). This means that before any activation (entering method and each send in method it will wait for resolving all promises it contains). But i'm still doubt if this gives us anything.
There is many methods which operate with blocks by sending #value.. to them. And these methods actually don't care if those blocks contain non-local returns or not. They simply do their job.
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com
Hmm.. i think, you given a bad example. It depends on 'knowledge' that #ifTrue: is optimized by compiler as an execution branch, but not a regular message send.
My example was specifically what to do when we have ifTrue: optimized. But you are right to consider the general case first.
For general case we should consider passing a block as a argument, like:
bar someMethod: [^1].
But this again, not includes cases, when blocks are assigned to temps/ivars within context and never passed as arguments to message(s).
But we do know that these blocks are actually created in the home context (unlike the optimizing example I gave-no blocks are created). So if it were possible to mark the block context, that it includes an explicit return, then those are the blocks that should cause synchronization with promises. We could know that a block has a return at compile time, so we would just need to have a new block creation bytecode that marked the block: #bytecodePrimBlockCopyWithReturn.
I thought a bit about marking a method context containing blocks with non-local returns with special flag, which will force context to not activate until all free variables in method are resolved (agruments/temps). This means that before any activation (entering method and each send in method it will wait for resolving all promises it contains). But i'm still doubt if this gives us anything.
Where does a block return from if it has an explicit return? The home context or the calling context? I think it is the calling context.
So what you just said about marking a method context if it contains blocks with non-local returns. How do you know the block has a non-local return? As I mentioned, we could mark the blocks. We could mark the methods that CREATE blocks that return. At runtime, we could look at the method flag for non-local return and look at all args and temps for blocks with the flag and synchronize. I agree with you that it will have to be before every send.
There is many methods which operate with blocks by sending #value.. to them. And these methods actually don't care if those blocks contain non-local returns or not. They simply do their job.
What do you mean here?
Cheers, Rob
On Nov 2, 2007, at 5:53 PM, Rob Withers wrote:
The other thought I had was that garbage collection in squeak seems to happen when needed, immediately. Things may be so dire for memory that it has to do or die. This would give us a problem making it scheduled as another event, wouldn't it?
cheers, Rob
Well it happens when there is no free space to allocate something, or some limit reached, like free space becoming too low, or number of objects allocated over some limit. Certainly if there is no memory that's critical, the others are soft types of conditions.
however...
"GC scheduled as another event" I know of one smalltalk implementation that would signal when memory was becoming low, thus letting some process deal with the situation, however as machines got really fast the amount of head room became too small, and the VM memory allocator would run out of memory before the GC process could wake up... Usually increasing the headroom by as little as 10K would solve the problem.
-- ======================================================================== === John M. McIntosh johnmci@smalltalkconsulting.com Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== ===
Found it using google-fu :)
http://www.slideshare.net/Arbow/stackless-python-in-eve
I'd like to hear your comments about concepts represented in stackless python. I'm still didn't have a time to study it deeply, but slides in link above telling about themselves.
"> that leads to interesting experiments without sacrificing the
practical artifact. A VM built like described in the above would be strictly a superset of the current VM so it would be able to run any current images and leave room for further experiments."
Hi Andreas,
earquakes apart your proposal seems to me remarkably valuable for short and medium term. Is not a deadly earthquake for the VM. Curiously seems like just a little shake with insignificant damage to the current environment to accommodate things to keep it working ;-)
best regards,
Sebastian Sastre
-----Mensaje original----- De: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] En nombre de Andreas Raab Enviado el: Miércoles, 31 de Octubre de 2007 00:53 Para: The general-purpose Squeak developers list Asunto: Thoughts on a concurrent Squeak VM (was: Re: Concurrent Futures)
Igor Stasenko wrote:
If you have any ideas how such VM would look like i'm glad to hear.
Okay, so Josh convinced me to write up the ideas. The main problem as I see it with a *practical* solution to the problem is that all of the solutions so far require huge leaps and can't be implemented step-by-step (which almost certainly dooms them to failure).
So what do we know and what do we actually all pretty much agree on? It's that we need to be able to utilize multiple cores and that we need a practical way to get there (if you disagree with the latter this message is not meant for you ;-) Running multiple processes is one option but it is not always sufficient. For example, some OSes would have trouble firing off a couple of thousand processes whereas the same OS may have no problem at all with a couple of thousand threads in one process. To give an example, starting a thread on Windows cost somewhere in the range of a millisecond which is admittedly slow, but still orders of magnitude faster than creating a new process. Then there are issues with resource sharing (like file handles) which are practically guaranteed not to work across process boundaries etc. So while there are perfectly good reasons to run multiple processes, there are reasons just as good to wanting to run multiple threads in one process.
The question then is, can we find an easy way to extend the Squeak VM to run multiple threads and if so how? Given the simplistic nature of the Squeak interpreter, there is actually very little global state that is not encapsulated in objects on the Squeak heap - basically all the variables in class interpreter. So if we would put them into state that is local to each thread, we could trivially run multiple instances of the byte code interpreter in the same VM. This gets us to the two major questions:
- How do we encapsulate the interpreter state?
- How do we deal with primitives and plugins?
Let's start with the first one. Obviously, the answer is "make it an object". The way how I would go about is by modifying the CCodeGenerator such that it generates all functions with an argument of type "struct VM" and that variable accesses prefix things properly and that all functions calls pass the extra argument along. In short, what used to be translated as:
sqInt primitiveAdd(void) { integerResult = stackIntegerValue(1) + stackIntegerValue(0) /* etc. */ }
will then become something like here:
sqInt primitiveAdd(struct VM *vm) { integerResult = stackIntegerValue(vm,1) + stackIntegerValue(vm,0) /* etc. */ }
This is a *purely* mechanical step that can be done independent of anything else. It should be possible to generate code that is entirely equivalent to todays code and with a bit of tweaking it should be possible to make that code roughly as fast as we have today (not that I think it matters but understanding the speed difference between this and the default interpreter is important for judging relative speed improvements later).
The above takes care about the interpreter but there are still primitives and plugins that need to be dealt with. What I would do here is define operations like ioLock(struct VM) and ioUnlock(struct VM) that are the effective equivalent of Python's GIL (global interpreter lock) and allow exclusive access to primitives that have not been converted to multi-threading yet. How exactly this conversion should happen is deliberately left open here; maybe changing the VMs major proxy version is the right thing to do to indicate the changed semantics. In any case, the GIL allows us to readily reuse all existing plugins without having to worry about conversion early on.
So now we've taken care of the two major parts of Squeak: We have the ability to run new interpreters and we have the ability to use primitives. This is when the fun begins, because at this point we have options:
For example, if you are into shared-state concurrency, you might implement a primitive that forks a new instance of the interpreter running in the same object memory that your previous interpreter is running in.
Or, and that would be the path that I would take, implement a primitive that loads an image into a new object memory (I can explain in more detail how memory allocation needs to work for that; it is a fairly straightforward scheme but a little too long for this message) and run that interpreter.
And at this point, the *real* fun begins because we can now start to define the communication patterns we'd like to use (initially sockets, later shared memory or event queues or whatever else). We can have tiny worker images that only do minimal stuff but we can also do a Spoon-like thing where we have a "master image" that contains all the code possibly needed and fire off micro-images that (via imprinting) swap in just the code they need to run.
[Whoa! I just got interrupted by a little 5.6 quake some 50 miles away]
Sorry but I lost my train of thought here. Happens at 5.6 Richter ;-) Anyway, the main thing I'm trying to say in the above is that for a *practical* solution to the problem there are some steps that are pretty much required whichever way you look at it. And I think that regardless of your interest in shared state or message passing concurrency we may be able to define a road that leads to interesting experiments without sacrificing the practical artifact. A VM built like described in the above would be strictly a superset of the current VM so it would be able to run any current images and leave room for further experiments.
Cheers,
- Andreas
Andreas,
What about using C++? There would be some degradation of performance. However, there would be the benefit of structuring the VM classes, of not having to add VM as an argument everywhere, and it may even be possible to subclass Thread so we know where the thread-local storage is.
Rob
----- Original Message ----- From: "Andreas Raab" andreas.raab@gmx.de To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Tuesday, October 30, 2007 8:53 PM Subject: Thoughts on a concurrent Squeak VM (was: Re: Concurrent Futures)
Igor Stasenko wrote:
If you have any ideas how such VM would look like i'm glad to hear.
Okay, so Josh convinced me to write up the ideas. The main problem as I see it with a *practical* solution to the problem is that all of the solutions so far require huge leaps and can't be implemented step-by-step (which almost certainly dooms them to failure).
So what do we know and what do we actually all pretty much agree on? It's that we need to be able to utilize multiple cores and that we need a practical way to get there (if you disagree with the latter this message is not meant for you ;-) Running multiple processes is one option but it is not always sufficient. For example, some OSes would have trouble firing off a couple of thousand processes whereas the same OS may have no problem at all with a couple of thousand threads in one process. To give an example, starting a thread on Windows cost somewhere in the range of a millisecond which is admittedly slow, but still orders of magnitude faster than creating a new process. Then there are issues with resource sharing (like file handles) which are practically guaranteed not to work across process boundaries etc. So while there are perfectly good reasons to run multiple processes, there are reasons just as good to wanting to run multiple threads in one process.
The question then is, can we find an easy way to extend the Squeak VM to run multiple threads and if so how? Given the simplistic nature of the Squeak interpreter, there is actually very little global state that is not encapsulated in objects on the Squeak heap - basically all the variables in class interpreter. So if we would put them into state that is local to each thread, we could trivially run multiple instances of the byte code interpreter in the same VM. This gets us to the two major questions:
- How do we encapsulate the interpreter state?
- How do we deal with primitives and plugins?
Let's start with the first one. Obviously, the answer is "make it an object". The way how I would go about is by modifying the CCodeGenerator such that it generates all functions with an argument of type "struct VM" and that variable accesses prefix things properly and that all functions calls pass the extra argument along. In short, what used to be translated as:
sqInt primitiveAdd(void) { integerResult = stackIntegerValue(1) + stackIntegerValue(0) /* etc. */ }
will then become something like here:
sqInt primitiveAdd(struct VM *vm) { integerResult = stackIntegerValue(vm,1) + stackIntegerValue(vm,0) /* etc. */ }
This is a *purely* mechanical step that can be done independent of anything else. It should be possible to generate code that is entirely equivalent to todays code and with a bit of tweaking it should be possible to make that code roughly as fast as we have today (not that I think it matters but understanding the speed difference between this and the default interpreter is important for judging relative speed improvements later).
The above takes care about the interpreter but there are still primitives and plugins that need to be dealt with. What I would do here is define operations like ioLock(struct VM) and ioUnlock(struct VM) that are the effective equivalent of Python's GIL (global interpreter lock) and allow exclusive access to primitives that have not been converted to multi-threading yet. How exactly this conversion should happen is deliberately left open here; maybe changing the VMs major proxy version is the right thing to do to indicate the changed semantics. In any case, the GIL allows us to readily reuse all existing plugins without having to worry about conversion early on.
So now we've taken care of the two major parts of Squeak: We have the ability to run new interpreters and we have the ability to use primitives. This is when the fun begins, because at this point we have options:
For example, if you are into shared-state concurrency, you might implement a primitive that forks a new instance of the interpreter running in the same object memory that your previous interpreter is running in.
Or, and that would be the path that I would take, implement a primitive that loads an image into a new object memory (I can explain in more detail how memory allocation needs to work for that; it is a fairly straightforward scheme but a little too long for this message) and run that interpreter.
And at this point, the *real* fun begins because we can now start to define the communication patterns we'd like to use (initially sockets, later shared memory or event queues or whatever else). We can have tiny worker images that only do minimal stuff but we can also do a Spoon-like thing where we have a "master image" that contains all the code possibly needed and fire off micro-images that (via imprinting) swap in just the code they need to run.
[Whoa! I just got interrupted by a little 5.6 quake some 50 miles away]
Sorry but I lost my train of thought here. Happens at 5.6 Richter ;-) Anyway, the main thing I'm trying to say in the above is that for a *practical* solution to the problem there are some steps that are pretty much required whichever way you look at it. And I think that regardless of your interest in shared state or message passing concurrency we may be able to define a road that leads to interesting experiments without sacrificing the practical artifact. A VM built like described in the above would be strictly a superset of the current VM so it would be able to run any current images and leave room for further experiments.
Cheers,
- Andreas
Rob Withers wrote:
What about using C++? There would be some degradation of performance. However, there would be the benefit of structuring the VM classes, of not having to add VM as an argument everywhere, and it may even be possible to subclass Thread so we know where the thread-local storage is.
For the VM internally, I don't really care. Since this is generated code there is really no difference to me. For plugins it is not feasible to use C++ since name mangling not standardized so you can't link reliably to C++ APIs.
Cheers, - Andreas
----- Original Message ----- From: "Andreas Raab" andreas.raab@gmx.de To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Wednesday, October 31, 2007 9:37 AM Subject: Re: Thoughts on a concurrent Squeak VM
Rob Withers wrote:
What about using C++? There would be some degradation of performance. However, there would be the benefit of structuring the VM classes, of not having to add VM as an argument everywhere, and it may even be possible to subclass Thread so we know where the thread-local storage is.
For the VM internally, I don't really care. Since this is generated code there is really no difference to me. For plugins it is not feasible to use C++ since name mangling not standardized so you can't link reliably to C++ APIs.
That's true that it's internal to the VM so it shouldn't matter. I suppose the benefi of structuring the classes was more of an in image issue with me. Even using C, we could separate off the primitives into a Primitives class and compile with ObjectMemory, Interpreter, and Primitives so they are all generated in the same file. Then we would just need to make sure the InterpreterSimulator knew about the Primitives class. The same issue as would apply if ObjectMemory and Interpreter were no longer part of the same hierarchy.
It makes sense that primitives would have a problem with name mangling, so named primitives can't be in C++ classes...indexed could be, though, as long as the primitive table were initialized with the mangled names.
cheers, Rob
On Oct 31, 2007, at 17:57 , Rob Withers wrote:
----- Original Message ----- From: "Andreas Raab" andreas.raab@gmx.de To: "The general-purpose Squeak developers list" <squeak- dev@lists.squeakfoundation.org> Sent: Wednesday, October 31, 2007 9:37 AM Subject: Re: Thoughts on a concurrent Squeak VM
Rob Withers wrote:
What about using C++? There would be some degradation of performance. However, there would be the benefit of structuring the VM classes, of not having to add VM as an argument everywhere, and it may even be possible to subclass Thread so we know where the thread-local storage is.
For the VM internally, I don't really care. Since this is generated code there is really no difference to me. For plugins it is not feasible to use C++ since name mangling not standardized so you can't link reliably to C++ APIs.
That's true that it's internal to the VM so it shouldn't matter. I suppose the benefi of structuring the classes was more of an in image issue with me. Even using C, we could separate off the primitives into a Primitives class and compile with ObjectMemory, Interpreter, and Primitives so they are all generated in the same file. Then we would just need to make sure the InterpreterSimulator knew about the Primitives class. The same issue as would apply if ObjectMemory and Interpreter were no longer part of the same hierarchy.
It makes sense that primitives would have a problem with name mangling, so named primitives can't be in C++ classes...indexed could be, though, as long as the primitive table were initialized with the mangled names.
I don't see any point in switching to C++.
- Bert -
----- Original Message ----- From: "Bert Freudenberg" bert@freudenbergs.de To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Wednesday, October 31, 2007 10:16 AM Subject: Re: Thoughts on a concurrent Squeak VM
On Oct 31, 2007, at 17:57 , Rob Withers wrote:
----- Original Message ----- From: "Andreas Raab" andreas.raab@gmx.de To: "The general-purpose Squeak developers list" <squeak- dev@lists.squeakfoundation.org> Sent: Wednesday, October 31, 2007 9:37 AM Subject: Re: Thoughts on a concurrent Squeak VM
Rob Withers wrote:
What about using C++? There would be some degradation of performance. However, there would be the benefit of structuring the VM classes, of not having to add VM as an argument everywhere, and it may even be possible to subclass Thread so we know where the thread-local storage is.
For the VM internally, I don't really care. Since this is generated code there is really no difference to me. For plugins it is not feasible to use C++ since name mangling not standardized so you can't link reliably to C++ APIs.
That's true that it's internal to the VM so it shouldn't matter. I suppose the benefi of structuring the classes was more of an in image issue with me. Even using C, we could separate off the primitives into a Primitives class and compile with ObjectMemory, Interpreter, and Primitives so they are all generated in the same file. Then we would just need to make sure the InterpreterSimulator knew about the Primitives class. The same issue as would apply if ObjectMemory and Interpreter were no longer part of the same hierarchy.
It makes sense that primitives would have a problem with name mangling, so named primitives can't be in C++ classes...indexed could be, though, as long as the primitive table were initialized with the mangled names.
I don't see any point in switching to C++.
I'm convinced. It was a little hard to let go since I like an OO representation, but as Andraes observed, the VM being generated means I don't really need to look at it too closely. For me it is more about the class representation of the VM in the image. Interpreter is a busy class and some of it's methods could be broken out in separate Squeak classes.
cheers, Rob
On 31/10/2007, Rob Withers reefedjib@yahoo.com wrote:
Andreas,
What about using C++? There would be some degradation of performance. However, there would be the benefit of structuring the VM classes, of not having to add VM as an argument everywhere, and it may even be possible to subclass Thread so we know where the thread-local storage is.
I'd rather prefer to make modifications to slang to be able to generate VM sources for any target language/platform and keep platform dependent code in image instead in separate file(s). This all to simplify build process and to keep all things together.
Rob
----- Original Message ----- From: "Andreas Raab" andreas.raab@gmx.de To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Tuesday, October 30, 2007 8:53 PM Subject: Thoughts on a concurrent Squeak VM (was: Re: Concurrent Futures)
Igor Stasenko wrote:
If you have any ideas how such VM would look like i'm glad to hear.
Okay, so Josh convinced me to write up the ideas. The main problem as I see it with a *practical* solution to the problem is that all of the solutions so far require huge leaps and can't be implemented step-by-step (which almost certainly dooms them to failure).
So what do we know and what do we actually all pretty much agree on? It's that we need to be able to utilize multiple cores and that we need a practical way to get there (if you disagree with the latter this message is not meant for you ;-) Running multiple processes is one option but it is not always sufficient. For example, some OSes would have trouble firing off a couple of thousand processes whereas the same OS may have no problem at all with a couple of thousand threads in one process. To give an example, starting a thread on Windows cost somewhere in the range of a millisecond which is admittedly slow, but still orders of magnitude faster than creating a new process. Then there are issues with resource sharing (like file handles) which are practically guaranteed not to work across process boundaries etc. So while there are perfectly good reasons to run multiple processes, there are reasons just as good to wanting to run multiple threads in one process.
The question then is, can we find an easy way to extend the Squeak VM to run multiple threads and if so how? Given the simplistic nature of the Squeak interpreter, there is actually very little global state that is not encapsulated in objects on the Squeak heap - basically all the variables in class interpreter. So if we would put them into state that is local to each thread, we could trivially run multiple instances of the byte code interpreter in the same VM. This gets us to the two major questions:
- How do we encapsulate the interpreter state?
- How do we deal with primitives and plugins?
Let's start with the first one. Obviously, the answer is "make it an object". The way how I would go about is by modifying the CCodeGenerator such that it generates all functions with an argument of type "struct VM" and that variable accesses prefix things properly and that all functions calls pass the extra argument along. In short, what used to be translated as:
sqInt primitiveAdd(void) { integerResult = stackIntegerValue(1) + stackIntegerValue(0) /* etc. */ }
will then become something like here:
sqInt primitiveAdd(struct VM *vm) { integerResult = stackIntegerValue(vm,1) + stackIntegerValue(vm,0) /* etc. */ }
This is a *purely* mechanical step that can be done independent of anything else. It should be possible to generate code that is entirely equivalent to todays code and with a bit of tweaking it should be possible to make that code roughly as fast as we have today (not that I think it matters but understanding the speed difference between this and the default interpreter is important for judging relative speed improvements later).
The above takes care about the interpreter but there are still primitives and plugins that need to be dealt with. What I would do here is define operations like ioLock(struct VM) and ioUnlock(struct VM) that are the effective equivalent of Python's GIL (global interpreter lock) and allow exclusive access to primitives that have not been converted to multi-threading yet. How exactly this conversion should happen is deliberately left open here; maybe changing the VMs major proxy version is the right thing to do to indicate the changed semantics. In any case, the GIL allows us to readily reuse all existing plugins without having to worry about conversion early on.
So now we've taken care of the two major parts of Squeak: We have the ability to run new interpreters and we have the ability to use primitives. This is when the fun begins, because at this point we have options:
For example, if you are into shared-state concurrency, you might implement a primitive that forks a new instance of the interpreter running in the same object memory that your previous interpreter is running in.
Or, and that would be the path that I would take, implement a primitive that loads an image into a new object memory (I can explain in more detail how memory allocation needs to work for that; it is a fairly straightforward scheme but a little too long for this message) and run that interpreter.
And at this point, the *real* fun begins because we can now start to define the communication patterns we'd like to use (initially sockets, later shared memory or event queues or whatever else). We can have tiny worker images that only do minimal stuff but we can also do a Spoon-like thing where we have a "master image" that contains all the code possibly needed and fire off micro-images that (via imprinting) swap in just the code they need to run.
[Whoa! I just got interrupted by a little 5.6 quake some 50 miles away]
Sorry but I lost my train of thought here. Happens at 5.6 Richter ;-) Anyway, the main thing I'm trying to say in the above is that for a *practical* solution to the problem there are some steps that are pretty much required whichever way you look at it. And I think that regardless of your interest in shared state or message passing concurrency we may be able to define a road that leads to interesting experiments without sacrificing the practical artifact. A VM built like described in the above would be strictly a superset of the current VM so it would be able to run any current images and leave room for further experiments.
Cheers,
- Andreas
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Wednesday, October 31, 2007 9:39 AM Subject: Re: Thoughts on a concurrent Squeak VM (was: Re: Concurrent Futures)
On 31/10/2007, Rob Withers reefedjib@yahoo.com wrote:
Andreas,
What about using C++? There would be some degradation of performance. However, there would be the benefit of structuring the VM classes, of not having to add VM as an argument everywhere, and it may even be possible to subclass Thread so we know where the thread-local storage is.
I'd rather prefer to make modifications to slang to be able to generate VM sources for any target language/platform and keep platform dependent code in image instead in separate file(s). This all to simplify build process and to keep all things together.
You mean subclassing a Thread class? Is that platform dependent? If so, I didn't know that and I agree with you - it's should be out in a separate file, if used at all.
cheers, Rob
On 31/10/2007, Rob Withers reefedjib@yahoo.com wrote:
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Wednesday, October 31, 2007 9:39 AM Subject: Re: Thoughts on a concurrent Squeak VM (was: Re: Concurrent Futures)
On 31/10/2007, Rob Withers reefedjib@yahoo.com wrote:
Andreas,
What about using C++? There would be some degradation of performance. However, there would be the benefit of structuring the VM classes, of not having to add VM as an argument everywhere, and it may even be possible to subclass Thread so we know where the thread-local storage is.
I'd rather prefer to make modifications to slang to be able to generate VM sources for any target language/platform and keep platform dependent code in image instead in separate file(s). This all to simplify build process and to keep all things together.
You mean subclassing a Thread class? Is that platform dependent? If so, I didn't know that and I agree with you - it's should be out in a separate file, if used at all.
No, i mean to keep ALL plugins code in corresponding methods, and never use external sources. For example, a SocketPlugin can have subclasses Win32SocketPlugin, UnixSocketPlugin and in these subclasses we should keep a code for different platforms. But not in .c sources.
cheers, Rob
I agree with Igor. Slang is a powerful concept that has helped Squeak a lot.
On 10/31/07, Igor Stasenko siguctua@gmail.com wrote:
On 31/10/2007, Rob Withers reefedjib@yahoo.com wrote:
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Wednesday, October 31, 2007 9:39 AM Subject: Re: Thoughts on a concurrent Squeak VM (was: Re: Concurrent Futures)
On 31/10/2007, Rob Withers reefedjib@yahoo.com wrote:
Andreas,
What about using C++? There would be some degradation of performance. However, there would be the benefit of structuring the VM classes, of not having to add VM as an argument everywhere, and it may even be possible to subclass Thread so we know where the thread-local storage is.
I'd rather prefer to make modifications to slang to be able to generate VM sources for any target language/platform and keep platform dependent code in image instead in separate file(s). This all to simplify build process and to keep all things together.
You mean subclassing a Thread class? Is that platform dependent? If so, I didn't know that and I agree with you - it's should be out in a separate file, if used at all.
No, i mean to keep ALL plugins code in corresponding methods, and never use external sources. For example, a SocketPlugin can have subclasses Win32SocketPlugin, UnixSocketPlugin and in these subclasses we should keep a code for different platforms. But not in .c sources.
cheers, Rob
-- Best regards, Igor Stasenko AKA sig.
On Wed, Oct 31, 2007 at 08:10:25PM +0200, Igor Stasenko wrote:
On 31/10/2007, Rob Withers reefedjib@yahoo.com wrote:
----- Original Message ----- From: "Igor Stasenko" siguctua@gmail.com
I'd rather prefer to make modifications to slang to be able to generate VM sources for any target language/platform and keep platform dependent code in image instead in separate file(s). This all to simplify build process and to keep all things together.
You mean subclassing a Thread class? Is that platform dependent? If so, I didn't know that and I agree with you - it's should be out in a separate file, if used at all.
No, i mean to keep ALL plugins code in corresponding methods, and never use external sources. For example, a SocketPlugin can have subclasses Win32SocketPlugin, UnixSocketPlugin and in these subclasses we should keep a code for different platforms. But not in .c sources.
OSProcessPlugin is organized like this, primarily because I did not want to have external file dependencies and platform code for OSPP. The approach works well in the case where all or most of the code can be done in Slang (there is no external C support code for OSPP). I don't know that it helps in the case where the intent of the plugin is to wrap some external library.
There are certainly other areas of the VM and plugins where external support code could be moved back into Slang, although I think this is largely a matter of preference for the folks doing the platform code support.
Dave
On 31-Oct-07, at 9:39 AM, Igor Stasenko wrote:
On 31/10/2007, Rob Withers reefedjib@yahoo.com wrote:
Andreas,
What about using C++? There would be some degradation of performance. However, there would be the benefit of structuring the VM classes, of not having to add VM as an argument everywhere, and it may even be possible to subclass Thread so we know where the thread-local storage is.
I'd rather prefer to make modifications to slang to be able to generate VM sources for any target language/platform and keep platform dependent code in image instead in separate file(s). This all to simplify build process and to keep all things together.
Hah!
Once upon a time we had platform code 'files' in the image. In particular, the Mac files. Everyone else had to go through irritating and time consuming gymnastics to build a vm.
You couldn't edit these 'files' sensibly because Squeaks text editing ad gc has conniptions when you try to edit large strings. The changes log got polluted by massive chunks of text with each edit. It was not fun. People complained incessantly about not having vm code in a sourcecode control system. The they still bitch about not having all the vm code that way, about how dreadfull it is to have to actually *run squeak* in order to build a vm.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: EIV: Erase IPL Volume
On 01/11/2007, tim Rowledge tim@rowledge.org wrote:
On 31-Oct-07, at 9:39 AM, Igor Stasenko wrote:
On 31/10/2007, Rob Withers reefedjib@yahoo.com wrote:
Andreas,
What about using C++? There would be some degradation of performance. However, there would be the benefit of structuring the VM classes, of not having to add VM as an argument everywhere, and it may even be possible to subclass Thread so we know where the thread-local storage is.
I'd rather prefer to make modifications to slang to be able to generate VM sources for any target language/platform and keep platform dependent code in image instead in separate file(s). This all to simplify build process and to keep all things together.
Hah!
Once upon a time we had platform code 'files' in the image. In particular, the Mac files. Everyone else had to go through irritating and time consuming gymnastics to build a vm.
You couldn't edit these 'files' sensibly because Squeaks text editing ad gc has conniptions when you try to edit large strings.
Heh, then don't put large strings. We all know that such code stinks :)
The changes log got polluted by massive chunks of text with each edit. It was not fun. People complained incessantly about not having vm code in a sourcecode control system.
Huh? What does MC for? I could complain about keeping one plugin code divided by two different source control systems. It seems very impractical.
The they still bitch about not having all the vm code that way, about how dreadfull it is to have to actually *run squeak* in order to build a vm.
Yeah, and i'm the one of them. ;) I wrote a small plugin which requires a platform dependent code.. To my experience its much easier to code it in external editor and at some point, after debugging, put it in some method as string and forget about it. Then it can be shipped with plugin code rather than shipped in two different parts with tons of instructions how and where to unpack sources, what modify in makefiles e.t.c e.t.c..
Ohh, i really don't know.. Maybe we should step aside and use idst for building new VM? Ian mentioned previously that he will welcome an efforts in this way. But this means refactoring all VM code from scratch..
I'm just asking myself, when we could finally get rid of burden of C?
On 31-Oct-07, at 5:37 PM, Igor Stasenko wrote:
Once upon a time we had platform code 'files' in the image. In particular, the Mac files. Everyone else had to go through irritating and time consuming gymnastics to build a vm.
You couldn't edit these 'files' sensibly because Squeaks text editing ad gc has conniptions when you try to edit large strings.
Heh, then don't put large strings. We all know that such code stinks :)
Do feel free to travel back in time ten years and try to persuade SqC to change things.
The changes log got polluted by massive chunks of text with each edit. It was not fun. People complained incessantly about not having vm code in a sourcecode control system.
Huh? What does MC for?
And similarly, feel free to travel back in time so that MC can be available ten years ago.
I'm just asking myself, when we could finally get rid of burden of C?
Never. Not going to happen. Probably against the lore.
Yes, we could probably rewrite a lot of code currently in C source files and put it into Slang methods. Yes, we could probably improve Slang (we tried to get some of that done at Interval but ran out of time) to be more friendly. Yes, we could do lots of things. Got time to do them? Or money to pay me/Craig/Andreas/Ian/John/Mike/Bryce/ whoever to do it fulltime? That's the kind of thing that would be required to be able to make any major changes
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: ESC: Emulate Small Child
tim Rowledge wrote:
Yes, we could probably rewrite a lot of code currently in C source files and put it into Slang methods. Yes, we could probably improve Slang (we tried to get some of that done at Interval but ran out of time) to be more friendly. Yes, we could do lots of things. Got time to do them? Or money to pay me/Craig/Andreas/Ian/John/Mike/Bryce/whoever to do it fulltime? That's the kind of thing that would be required to be able to make any major changes
Well, but let's not throw out the baby with the bath water. Improvements would be welcome, in particular if they are easy to review and to integrate. It is probably unwise to start this as the lets-rewrite-the-vm-and-its-tools-from-scratch approach but there are plenty of things that we could do better. For example, I would welcome a patch that enables the code generator to optionally build the entire VM as an object. That'd be a very nice stepping stone towared a multi-threaded VM and can probably be done in a fairly incremental way.
Cheers, - Andreas
A Slang to Squeak refactoring tool, perhaps...
-----Original Message----- From: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org]On Behalf Of Andreas Raab Sent: 01 November 2007 1:00 AM To: The general-purpose Squeak developers list Subject: Re: Thoughts on a concurrent Squeak VM
tim Rowledge wrote:
Yes, we could probably rewrite a lot of code currently in C
source files
and put it into Slang methods. Yes, we could probably improve Slang (we tried to get some of that done at Interval but ran out of time) to be more friendly. Yes, we could do lots of things. Got time to do them? Or money to pay me/Craig/Andreas/Ian/John/Mike/Bryce/whoever to do it fulltime? That's the kind of thing that would be required to be able to make any major changes
Well, but let's not throw out the baby with the bath water. Improvements would be welcome, in particular if they are easy to review and to integrate. It is probably unwise to start this as the lets-rewrite-the-vm-and-its-tools-from-scratch approach but there are plenty of things that we could do better. For example, I would welcome a patch that enables the code generator to optionally build the entire VM as an object. That'd be a very nice stepping stone towared a multi-threaded VM and can probably be done in a fairly incremental way.
Cheers,
- Andreas
On Oct 31, 2007, at 6:00 PM, Andreas Raab wrote:
tim Rowledge wrote:
Yes, we could probably rewrite a lot of code currently in C source files and put it into Slang methods. Yes, we could probably improve Slang (we tried to get some of that done at Interval but ran out of time) to be more friendly. Yes, we could do lots of things. Got time to do them? Or money to pay me/Craig/Andreas/Ian/ John/Mike/Bryce/whoever to do it fulltime? That's the kind of thing that would be required to be able to make any major changes
Well, but let's not throw out the baby with the bath water. Improvements would be welcome, in particular if they are easy to review and to integrate. It is probably unwise to start this as the lets-rewrite-the-vm-and-its-tools-from-scratch approach but there are plenty of things that we could do better. For example, I would welcome a patch that enables the code generator to optionally build the entire VM as an object. That'd be a very nice stepping stone towared a multi-threaded VM and can probably be done in a fairly incremental way.
Cheers,
- Andreas
Well building the interpreter.c file to use a single structure for globals and then setup a pointer to that allocated structure somewhere is trivial as compared to the other ideas kicked about. However to complete this concept is you need to move the support code and plugins to use memory hung off the interpreter structure for their globals, that is a more time consuming process, but certainly not difficult, unlike oh concurrent multi- threaded GC algorithms.
-- ======================================================================== === John M. McIntosh johnmci@smalltalkconsulting.com Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== ===
On Nov 1, 2007, at 2:10 , John M McIntosh wrote:
On Oct 31, 2007, at 6:00 PM, Andreas Raab wrote:
tim Rowledge wrote:
Yes, we could probably rewrite a lot of code currently in C source files and put it into Slang methods. Yes, we could probably improve Slang (we tried to get some of that done at Interval but ran out of time) to be more friendly. Yes, we could do lots of things. Got time to do them? Or money to pay me/Craig/ Andreas/Ian/John/Mike/Bryce/whoever to do it fulltime? That's the kind of thing that would be required to be able to make any major changes
Well, but let's not throw out the baby with the bath water. Improvements would be welcome, in particular if they are easy to review and to integrate. It is probably unwise to start this as the lets-rewrite-the-vm-and-its-tools-from-scratch approach but there are plenty of things that we could do better. For example, I would welcome a patch that enables the code generator to optionally build the entire VM as an object. That'd be a very nice stepping stone towared a multi-threaded VM and can probably be done in a fairly incremental way.
Cheers,
- Andreas
Well building the interpreter.c file to use a single structure for globals and then setup a pointer to that allocated structure somewhere is trivial as compared to the other ideas kicked about. However to complete this concept is you need to move the support code and plugins to use memory hung off the interpreter structure for their globals, that is a more time consuming process, but certainly not difficult, unlike oh concurrent multi- threaded GC algorithms.
The idea was to use a GIL for that problem.
- Bert -
On Oct 31, 2007, at 6:00 PM, Andreas Raab wrote:
tim Rowledge wrote:
Yes, we could probably rewrite a lot of code currently in C source files and put it into Slang methods. Yes, we could probably improve Slang (we tried to get some of that done at Interval but ran out of time) to be more friendly. Yes, we could do lots of things. Got time to do them? Or money to pay me/Craig/Andreas/Ian/John/Mike/ Bryce/whoever to do it fulltime? That's the kind of thing that would be required to be able to make any major changes
Well, but let's not throw out the baby with the bath water. Improvements would be welcome, in particular if they are easy to review and to integrate. It is probably unwise to start this as the lets-rewrite-the-vm-and-its-tools-from-scratch approach but there are plenty of things that we could do better. For example, I would welcome a patch that enables the code generator to optionally build the entire VM as an object. That'd be a very nice stepping stone towared a multi-threaded VM and can probably be done in a fairly incremental way.
I had some code a while ago that generated the entire VM as an Objective-C object, with the actual class hierarchy (ObjectMemory -> Interpreter ) still intact. It interpreted a couple of thousand byte codes until it hit a function that had been generated from literal C code embedded in the Smalltalk code for the VM. I gave up at that point because there turned out to be quite a bit of that literal code that would need to be rewritten, as it would probably need to be for any such effort unless these C snippets have since been replaced.
Now that we seem to be seriously discussing turning the VM into an object, I would suggest at least having a very close look at leveraging Objective-C, and yes, I can hear the virtual groans ;-) Anyhow, here it goes:
First, unlike the C++ approach there are no problems in terms of linkage compatibility: Objective-C behave just like C in that respect. I also don't see any speed issues, because Objective-C methods are just C functions with some extra metadata and some implicit arguments. So functions can still be called and inlined pretty much exactly the way they are now. (This would entail either generating the functions and adding the metadata to make them known to the runtime or generating methods and getting function pointers from the runtime. Either approach would work, with the former possibly retaining better C compatibility).
Speaking of compatibility: despite the rumors, Objective-C is not an Apple/MacOS-X thing. It comes with gcc and AFAIK most every 32 (and 64?) bit platform gcc supports. Furthermore, as I alluded to before, and unlike C++, Objective-C isn't really a new language per-se, it really is just C with an object-runtime library that works just as well from C and largely optional syntactic sugar.
So if there's so little difference, what's the upside? Well, the runtime does come with all or most of the niggly details you need for on OO system. So on the purely practical side, you can instantiate multiple objects (interpreters, plugins), load classes (plugins again). If you have multiple instances of the interpreter, you can also have subclasses of the interpreter, override methods on-the-fly, do introspection etc. Plugins can be just objects, with the plugin interface regular messages sends. Etc.
Then there's all that with syntactic and semantic similarity, passing messages between the VM and the programs it is running (nice metacircularity...) and so on and so forth.
Anyway, just my € 0.02
Marcel
On 10/31/07, Igor Stasenko siguctua@gmail.com wrote:
Then i wonder, why they don't drop the idea of having shared memory at all?
Oh they will. If you read up on all the incredible hoops they jump through to make this model work on 2+ cores now, it has to be obvious that this can't scale indefinitely.
Intel has a lot of resources, and much invested in this model so they may try to push it further then it should go. But I don't see that having a good end for them.
Igor Stasenko wrote:
Then i wonder, why they don't drop the idea of having shared memory at all?
The major reason is cost, not performance. With a single shared memory subsystem you can allocate memory dynamically to the cores as you need it. Not using shared memory at all means you need to pre-allocate memory for each core. Which leaves you with two options: Either over-allocate memory for each core (expensive) or assume that the programmer can keep relatively small caches utilized effectively. The PS2 had that approach and failed miserably (this is one of the reasons why it took so long before the games could actually utilize its full power - keeping those caches filled was a major pain in the neck despite the bandwidth and computational power available).
The same effect can be seen with GPUs - the cheapest (usually Intel) GPUs utilize shared (main) memory to drive cost down. But that's all. It doesn't mean that just because Intel likes cheap graphics they're the fastest (in fact, the precise opposite is true - lots of VRAM and a fast bus outperforms shared memory by far).
Each CPU then could have own memory, and they could interact by sending messages in network-style fashion. And we then would write a code which uses such architecture in best way. But while this is not true, should we assume that such code will work faster than code which 'knows' that there is a single shared memory for all CPUs and uses such knowledge in best way?
No. But the opposite isn't necessarily true either. We shouldn't assume either way, we should measure and compare. And not only cycles but also programming effort, correctness and robustness.
I thought that goals was pretty clear. We have a single image. And we want to run multiple native threads upon it to utilize all cores of multi-core CPU's. What we currently have is a VM, which can't do that. So, i think, any other , even naively implemented, which can do, is better than nothing. If you have any ideas how such VM would look like i'm glad to hear.
See my other post.
Cheers, - Andreas
On 31/10/2007, Andreas Raab andreas.raab@gmx.de wrote:
Igor Stasenko wrote:
Then i wonder, why they don't drop the idea of having shared memory at all?
The major reason is cost, not performance. With a single shared memory subsystem you can allocate memory dynamically to the cores as you need it. Not using shared memory at all means you need to pre-allocate memory for each core. Which leaves you with two options: Either over-allocate memory for each core (expensive) or assume that the programmer can keep relatively small caches utilized effectively. The PS2 had that approach and failed miserably (this is one of the reasons why it took so long before the games could actually utilize its full power - keeping those caches filled was a major pain in the neck despite the bandwidth and computational power available).
The same effect can be seen with GPUs - the cheapest (usually Intel) GPUs utilize shared (main) memory to drive cost down. But that's all. It doesn't mean that just because Intel likes cheap graphics they're the fastest (in fact, the precise opposite is true - lots of VRAM and a fast bus outperforms shared memory by far).
Each CPU then could have own memory, and they could interact by sending messages in network-style fashion. And we then would write a code which uses such architecture in best way. But while this is not true, should we assume that such code will work faster than code which 'knows' that there is a single shared memory for all CPUs and uses such knowledge in best way?
No. But the opposite isn't necessarily true either. We shouldn't assume either way, we should measure and compare. And not only cycles but also programming effort, correctness and robustness.
I thought that goals was pretty clear. We have a single image. And we want to run multiple native threads upon it to utilize all cores of multi-core CPU's. What we currently have is a VM, which can't do that. So, i think, any other , even naively implemented, which can do, is better than nothing. If you have any ideas how such VM would look like i'm glad to hear.
If we look at multi-core problem as networking problem then, to what i see, a shared memory helps us in minimizing traffic between cores. Because we don't need to spend time of serializing data and transfer it between cores if its located in shared memory and can be easily accessed from both ends. But share-nothing model proposes to not use shared memory, which in own turn means that there will be a much higher traffic between cores comparing to model which uses shared memory. So, there a balance should be found between network load and using shared resources. We can't win if we choose one of opposite sides, only something in the middle. I am still wrong here?
On Oct 31, 2007, at 6:09 AM, Igor Stasenko wrote:
If we look at multi-core problem as networking problem then, to what i see, a shared memory helps us in minimizing traffic between cores.
Shared memory is an abstraction that pretends that there is no traffic between cores, but of course there really is. Letting hardware threads access objects "at random" (i.e. with no regard to their location in memory) will certainly not help us minimize traffic between cores; why do you think it will?
Because we don't need to spend time of serializing data and transfer it between cores if its located in shared memory and can be easily accessed from both ends. But share-nothing model proposes to not use shared memory, which in own turn means that there will be a much higher traffic between cores comparing to model which uses shared memory.
It implies nothing of the sort. The shared-nothing model gives you control over this traffic. The model that you propose gives you no control; I think it will probably give degenerate results in practice, with lots of needless cache overhead. Do you think that the performance will scale linearly w/ each processor added? It seems unlikely to me. If you disagree, please explain why.
BTW the time spent serializing data is completely irrelevant when considering traffic between cores. Also, I think it will be a small overhead on overall performance. The reason is the, in practice, the amount of data sent between cores/images will be small. It will be trivial for the application programmer to measure the number and size of messages set between images, and to design the computation so that the overhead is low (i.e. lots of computation happens in-image for each message between images).
So, there a balance should be found between network load and using shared resources. We can't win if we choose one of opposite sides, only something in the middle.
There are some cases where it doesn't make sense to serialize data into a message. If I have a large video "file" in a ByteArray in one image, and I want to play it (decode, upload to OpenGL, etc.), I don't want to serialize the whole thing. It would be much more efficient to ensure that GC won't move it, and then just pass a pointer to the data. I don't think that this sort of thing should be disallowed.
I think we agree on this point.
Thanks, Josh
I am still wrong here?
-- Best regards, Igor Stasenko AKA sig.
On 10/31/07, Joshua Gargus schwa@fastmail.us wrote:
It is unreasonable to assume that ad-hoc, fine-grained sharing of objects between processors will give you the fastest performance on the upcoming machines with 100s and 1000s of cores. What about memory locality and cache coherency? It is not cheap to juggle an object between processors now, and it will become more expensive as the number of cores increase.
Great point.
On 10/29/07, Andreas Raab andreas.raab@gmx.de wrote:
Not "all messages sends". Only messages between concurrent entities (islands). This is the main difference to the all-out actors model (where each object is its own unit of concurrency) and has the advantage that you can reuse all of todays single-threaded code.
I hope you, or anyone else are not under the impression that I am pushing for an "all-out" actors model. I want actor sending to be as explicit as Croquet futures are. And this would mean the same; all of today's single-threaded code continues to work.
The similarity is striking. Both in terms of tradeoffs (trade low-level control for better productivity) as well as the style of arguments made against it ;-)
I see the similarity! :)
Not that I mind by the way, I find these discussions necessary.
Ah good, I assumed you had kill-filed me for this one, if not before. :)
On Oct 29, 2007, at 1:50 PM, Igor Stasenko wrote:
Lets see , what is happen if we have only a future sends. Then, given code: a print. b print.
will not guarantee that a will be printed before b. Now you must ensure that you preserve imperative semantics, which may be done like following: futureA := a print. futureA whenComplete: [ b print ].
No, that's not true. Croquet and E both ensure that the future messages are processed in the order that they are sent. No #whenComplete: is required, since the second print expression does not depend on the first.
Yes, we can make 'futureA whenComplete:' check implicitly (by modifying VM), then we can preserve old code. But do we really need a futures everywhere?
We don't need futures everywhere. Croquet has chosen to make futures explicit; your example would be written:
a future print. b future print.
At least in Croquet (I don't know what a "typical" E program looks like), future sends are used sparingly... the vast majority of messages are regular immediate sends.
Josh
----- Original Message ----- From: "Ralph Johnson" johnson@cs.uiuc.edu To: "The general-purpose Squeak developers list" squeak-dev@lists.squeakfoundation.org Sent: Saturday, October 27, 2007 9:00 PM Subject: Re: Concurrent Futures
What are all the classes of possible deadlocks?
The first one I thought of was the Dining Philosopher's problem. Google showed me ttp://www.erights.org/e/satan/index.html
This is the first time I've read that link.
This is an interesting article. It uses capabilities to ensure freedom from deadlock. If a philosopher has a fork for two long, the system will revoke his capability for the fork.
This problem is much more than a deadlock problem. It is a resource sharing problem which includes a liveness problem. It seems that it is classically solved with a semaphore representing each fork, as well as a strategy to ensure liveness. See the section on page 93 in http://www.greenteapress.com/semaphores/downey05semaphores.pdf
If E was automatically free from deadlock, why was this solution so complex?
There is no danger of deadlock. It is impossible, since there are no Semaphores. They still run into the problem of shared resource utilization and livelock/starvation/liveness issues. E doesn't claim the address those in its language. Perhaps its claim to avoid deadlock is reduced in importance since it can experience other concurrency problems, but I still think it's a neat feature.
I agree it seems complex, but they are revoking the fork after a delay. The classical problem states that each philosopher eats for a finite time (see above reference). They also provide an implementation for the philosopher and the act of eating, by getting shrimp from the plate with the forks, and they prevent unauthorized access. Finally, they restrict access to references to the classes, like with the protections for access to the plate: only the fork knows the plate, not the philosopher.
Note that they say this was done with a previous version of E, so it may be somewhat different in current E. I really don't know the language that well, so I couldn't say.
Here is a pared down version, that ignores a lot of other stuff and just shows the resource acquisition, but also includes the revocation after timeout. It is 5 methods, 3 on the Philosopher and 2 on the ForkDispenserMaker, which could easily be seen as methods in Smalltalk. I don't see how a language could make it easier, but a class library could.
They are doing these jobs it seems:
ResourceDispenser "ForDispenserMaker" - resourceRequestCallback "serveOther" - resourceRequest "forkPlease" ResourceConsumer "Philosopher" - resourceAcquisition "startEating" - acquisitionCallback "hereIsYourFork" - useResources "eat" oh, yes they also have thye following referenced in code: Resource - revoke
Below is my scratch code (it is incomplete, because I eliminated state variables and other methods).
Regards, Rob
def ForkDispenserMaker(mySource) { to forkPlease(who) { if (nowServing == null) { nowServing := who nowServing <- hereIsYourFork(theFork) } else if (nowServing == who) { nowServing <- hereIsYourFork(theFork) } else { myTimer after(10_000, def listener noticeTimeout() { theFork <- revoke serveOther ()) } } def serveOther() { if (nowServing == firstPhilosopher) { nowServing := secondPhilosopher } else { nowServing := firstPhilosopher } nowServing <- hereIsYourFork(theFork) } }
def NicePhilosopherMaker() { to startEating { firstForkDispenser <- forkPlease(self) secondForkDispenser <- forkPlease(self) } to hereIsYourFork(theFork) { if (firstFork == null) { firstFork := theFork } else { secondFork := theFork eat() } } def eat() { *** } }
On 10/28/07, Rob Withers reefedjib@yahoo.com wrote:
There is no danger of deadlock. It is impossible, since there are no Semaphores.
This does not compute. It is like saying "There is no danger of murder, since there are no guns." There are many ways of having deadlock without semaphores. Deadlock is possible in an asynchronous messaging system. An algorithm might be waiting for an event that will never occur.
I think the real answer is that there is no way to have a deadlock, because there is no way to wait. Programs can only busy-wait, so deadlock problems all get converted into starvation problems. This might not be very efficient on a machine with a large number of processes running on a small number of processors, but becomes efficient as the process/processor ration drops.
An Andreas' solution, a philosopher who can't pick up a fork will drop one that he already has, and will wait a random amount of time. This has the possibility of starvation, but with possibility 0 as time goes to infinity.
-Ralph
----- Original Message ----- From: "Ralph Johnson" johnson@cs.uiuc.edu
On 10/28/07, Rob Withers reefedjib@yahoo.com wrote:
There is no danger of deadlock. It is impossible, since there are no Semaphores.
This does not compute. It is like saying "There is no danger of murder, since there are no guns." There are many ways of having deadlock without semaphores. Deadlock is possible in an asynchronous messaging system. An algorithm might be waiting for an event that will never occur.
My assumption is that the only mechanism of waiting is using a semaphore. Is this incorrect?
I know that E talks about converting control flow problems into data flow problems, using the continuation passing style, and that deadlocks go away but datalocks become possible. It seems to me that an algorithm pending an event is in a datalock situation.
I think the real answer is that there is no way to have a deadlock, because there is no way to wait.
This is a clearer way of stating it.
Rob
Ralph Johnson wrote:
I think the real answer is that there is no way to have a deadlock, because there is no way to wait. Programs can only busy-wait, so deadlock problems all get converted into starvation problems.
The first sentence is correct, but the second is wrong. It's wrong because there is no wait ;-) so even "busy-waiting" becomes meaningless. (Simulation) time, for example, does not advance in Croquet while a message is being executed. In this sense busy-waiting (here being the idea of performing computations until a measurable amount of time has passed) can't be done.
As for "deadlock problems all get converted into starvation problems" let's not forget that the problem we were looking at is over-constraint by definition. I *chose* to accept starvation mostly because this is the solution that is closest to the original. There are most certainly other (but more complex) solutions which could avoid starvation. The simplest one (which dodges the question of "and how exactly would that work") is to post requests for pairs of forks to the table and have the table implement a scheduling algorithm for the handing out forks.
Cheers, - Andreas
Hi Peter, It struck me that I should clarify something I said, that probably misled you.
Rob Withers wrote:
That's a nice example. The difference with E's promises is that a Promise doesn't block for value, it sends the new msg eventually and returns a second promise. The way I have this implemented in SqueakElib, is you send #eventual to get an eventual ref, then you can send a msg to the eventual ref, returning a promise, which you can then send a second msg to, returning a second promise.
| promise1 promise2 | promise1 := anObject eventual foo. promise2 := promise1 bar. promise1 whenResolved: [:val | Transcript show: ' foo: ', val printString]. promise2 whenResolved: [:val | Transcript show: ' bar: ', val printString]. Transcript show: 'sent foo bar...'
This prevents deadlocks.
From the first paragraph, I talk about both E and SqueakElib. When I make
this claim about eventual refs and promises preventing deadlocks, I mean it does this for E, not for SqueakElib. Since E processes msg sends in an event-loop, they have no Processes nor Semaphores. Since they restrict scope when compiling and executing code, so you can't even reference them. There is no way to block in E.
This is what I aspire to with SqueakElib but it won't make it, except perhaps in a very constrained environment, namely a restricted scope Island. An Island that likewise does not include Processes nor Semaphore (nor lots of other stuff that may use them, internally). In the general case, which I am interested in, I want to integrate refs and promises in the general image, with its Processes and Semaphores. I may need to block for immediate values, processing primitives with promises and processing method contexts which include promises with multiple points of return. I am going to be blocking and I make, or I intent to make, no claim that I don't block.
Sorry for the confusion, Rob
squeak-dev@lists.squeakfoundation.org