We often talk about making the VM faster. How about making it slower? In 1980, there were some optimizations that were needed for Smalltalk to be even usable, but now: - Moore's Law has theoretically given us 131072 more computing power (2^((2014-1980)/2)) - Cog runs up to 3x slower than C [1] - Ruby, which is widely accepted, seems to be much slower than Cog [2]
For example, inlined functions can be baffling for new users. I just ran into this myself when writing an #ifNil:ifNotNil: that was not picked up by the system [3], and Ungar and Smith describe several cases in the History of Self (pg. 9-5).
How many of these are premature optimizations that can be eliminated, or at least turned off by default until they're actually needed? I know Clement mentioned in [3] that some make a big difference, but it would certainly make the system more uniform and easy to understand.
[1] http://lists.gforge.inria.fr/pipermail/pharo-project/2011-February/042489.ht... [2] http://benchmarksgame.alioth.debian.org/u32/benchmark.php?test=all&lang=... [3] https://www.mail-archive.com/pharo-dev@lists.pharo.org/msg11694.html
----- Cheers, Sean -- View this message in context: http://forum.world.st/Making-a-Slower-VM-tp4742391.html Sent from the Squeak VM mailing list archive at Nabble.com.
Hello Sean,
That's true that the ruby interpreter and CPython are around 20x slower that Cog. Now the use cases are different.
Firstly, their tool suite is not written in ruby/python so they do not need speed to have a good IDE. For example, see the new SqueakJS VM, as it is slower Morphic is hardly usable, therefore they had to fall back on the old and fast MVC UI. We do not want to have to do that in Pharo/Squeak.
In addition, Ruby/python work well due to their good integration with C, because a ruby/python programmer needs to bind its performance critical methods to C methods. In most cases, we do not bing performance critical method in Pharo/Squeak with C method because we don't need to, and I don't think we want to do that.
So I wouldn't say that we can have pharo/squeak running 20x slower and still be happy.
One thing that you didn't mention is the Stack VM. This interpreter based VM is less efficient than Cog (2x-10x slower) but much more flexible IMO. For example, overriding each message send interpretation or adding new byte codes is quite easy. So somehow we have already a slower VM more flexible.
In addition, the Opal's compiler options allows to disable some optimized constructs as you mention, but this is static infos with pragmas.
Disabling conditions inlining would decrease by 2.5x the performance of Pharo/Squeak according to Urs hozle phd, but recent attempts showed that the speed problem is even worst due to the fact the kernel was optimized knowing these constructs where inlined.
*Solution for this problem*
As you may have seen, in the Self VM, they do not have these optimized constructs but not because they are slower but because they have an adaptive recompiler. Currently I am working with Eliot on speculative inlining and different optimizations for the Cog. You can see a description of the project here: http://clementbera.wordpress.com/2014/01/09/the-sista-chronicles-i-an-introd.... I wrote that article quickly so there might be some typos and English errors but the overall should be OK. This is a big project, so we will have a production ready result in several months, perhaps even in a few years.
This will allow to both increase Cog's performance and reduce the code complexity due to optimizations with inlined constructs.
Precise solutions needs to be discussed and benchmarked, be we could have, as their performance impact will be lowered: - ifNil:/ifNotNil: not inlined. - all the specific messages as regular message sends in all cases (including #==): #(#+ 1 #- 1 #< 1 #> 1 #<= 1 #>= 1 #= 1 #~= 1 #* 1 #/ 1 #\ 1 #@ 1 #bitShift: 1 #// 1 #bitAnd: 1 #bitOr: 1 #at: 1 #at:put: 2 #size 0 #next 0 #nextPut: 1 #atEnd 0 #== 1 nil 0 #blockCopy: 1 #value 0 #value: 1 #do: 1 #new 0 #new: 1 #x 0 #y 0)
Loops methods (whileTrue:, to:do:) are usually not a problem, the only constraint is that you cannot override these 4 methods: (SmallInteger>>to:do:, SmallInteger>>to:by:do:, BlockClosure>>#whileTrue:, BlockClosure>>#whileFalse) but you can override these selectors in your objects. If you want to override one of these methods, there's no simple solution without performance cost (one solution is to rewrite them as primitive and stop inlining them in the compiler but even there we'll have some performance cost that needs to be checked.
ifTrue:ifFalse: is the most complex case. I know Eliot has a plan for it. You can look at the video at the bottom of the sista article where in the end Eliot explains AoSta (the ancestor of sista) and he mentions somethings about mustBeBoolean.
Best,
Clément
2014-02-09 5:37 GMT+01:00 Sean P. DeNigris sean@clipperadams.com:
We often talk about making the VM faster. How about making it slower? In 1980, there were some optimizations that were needed for Smalltalk to be even usable, but now:
- Moore's Law has theoretically given us 131072 more computing power
(2^((2014-1980)/2))
- Cog runs up to 3x slower than C [1]
- Ruby, which is widely accepted, seems to be much slower than Cog [2]
For example, inlined functions can be baffling for new users. I just ran into this myself when writing an #ifNil:ifNotNil: that was not picked up by the system [3], and Ungar and Smith describe several cases in the History of Self (pg. 9-5).
How many of these are premature optimizations that can be eliminated, or at least turned off by default until they're actually needed? I know Clement mentioned in [3] that some make a big difference, but it would certainly make the system more uniform and easy to understand.
[1]
http://lists.gforge.inria.fr/pipermail/pharo-project/2011-February/042489.ht... [2]
http://benchmarksgame.alioth.debian.org/u32/benchmark.php?test=all&lang=... [3] https://www.mail-archive.com/pharo-dev@lists.pharo.org/msg11694.html
Cheers, Sean -- View this message in context: http://forum.world.st/Making-a-Slower-VM-tp4742391.html Sent from the Squeak VM mailing list archive at Nabble.com.
Sean P. DeNigris wrote:
We often talk about making the VM faster. How about making it slower? In 1980, there were some optimizations that were needed for Smalltalk to be even usable, but now:
- Moore's Law has theoretically given us 131072 more computing power
(2^((2014-1980)/2))
- Cog runs up to 3x slower than C [1]
- Ruby, which is widely accepted, seems to be much slower than Cog [2]
For example, inlined functions can be baffling for new users.
Not VM related but it sparks a random idea - how about syntax highlighting inlined messages with a different colour? cheers -ben
I just ran into this myself when writing an #ifNil:ifNotNil: that was not picked up by the system [3], and Ungar and Smith describe several cases in the History of Self (pg. 9-5).
How many of these are premature optimizations that can be eliminated, or at least turned off by default until they're actually needed? I know Clement mentioned in [3] that some make a big difference, but it would certainly make the system more uniform and easy to understand.
[1] http://lists.gforge.inria.fr/pipermail/pharo-project/2011-February/042489.ht... [2] http://benchmarksgame.alioth.debian.org/u32/benchmark.php?test=all&lang=... [3] https://www.mail-archive.com/pharo-dev@lists.pharo.org/msg11694.html
Cheers, Sean -- View this message in context: http://forum.world.st/Making-a-Slower-VM-tp4742391.html Sent from the Squeak VM mailing list archive at Nabble.com.
On 09.02.2014, at 05:37, Sean P. DeNigris sean@clipperadams.com wrote:
We often talk about making the VM faster. How about making it slower? In 1980, there were some optimizations that were needed for Smalltalk to be even usable, but now:
- Moore's Law has theoretically given us 131072 more computing power
(2^((2014-1980)/2))
- Cog runs up to 3x slower than C [1]
- Ruby, which is widely accepted, seems to be much slower than Cog [2]
For example, inlined functions can be baffling for new users. I just ran into this myself when writing an #ifNil:ifNotNil: that was not picked up by the system [3], and Ungar and Smith describe several cases in the History of Self (pg. 9-5).
How many of these are premature optimizations that can be eliminated, or at least turned off by default until they're actually needed? I know Clement mentioned in [3] that some make a big difference, but it would certainly make the system more uniform and easy to understand.
This is not a VM problem. The compiler is doing the inlining, not the VM. The VM just executes what it is told to.
If the VM encounters an #ifNil:ifNotNil: send, it will faithfully do a method lookup and execute that. It will even do that if it sees an #ifTrue:. There is no short-circuiting of actual message sends in the VM.
What *does* happen is that the Compiler replaces an #ifNil:ifNotNil: send with "== nil ifTrue:ifFalse:" and then compiles the latter into jump bytecodes. That means the VM never sees the original #ifNil:ifNotNil: message.
It is pretty simple to turn off the Compiler's inlining of ifNil:ifNotNil:. It should also be pretty simple to make ifTrue:/ifFalse: be actual message sends, although I would expect a pretty big slow down since it will need real blocks. But at least their Smalltalk implementation is "executable". It's harder for whileTrue:/whileFalse: because if you wanted to implement them with real messages you would need tail call optimization, which Smalltalk VMs don't ususally do. Hence the implementation in the image that relies on compiler inlining.
- Bert -
On Sun, Feb 9, 2014 at 9:08 AM, Bert Freudenberg bert@freudenbergs.dewrote:
It is pretty simple to turn off the Compiler's inlining of ifNil:ifNotNil:. It should also be pretty simple to make ifTrue:/ifFalse: be actual message sends, although I would expect a pretty big slow down since it will need real blocks. But at least their Smalltalk implementation is "executable". It's harder for whileTrue:/whileFalse: because if you wanted to implement them with real messages you would need tail call optimization, which Smalltalk VMs don't ususally do. Hence the implementation in the image that relies on compiler inlining.
Well, since we're talking about de-optimizing here, you *could* do #whileTrue: without optimizing tail calls. It's just that it would be really slow, especially if you wanted to guard against run-away memory use for loops with lots of iterations. If you want to make things slower, the sky's the limit!
Colin
On 09.02.2014, at 17:49, Colin Putney colin@wiresong.com wrote:
On Sun, Feb 9, 2014 at 9:08 AM, Bert Freudenberg bert@freudenbergs.de wrote:
It is pretty simple to turn off the Compiler's inlining of ifNil:ifNotNil:. It should also be pretty simple to make ifTrue:/ifFalse: be actual message sends, although I would expect a pretty big slow down since it will need real blocks. But at least their Smalltalk implementation is "executable". It's harder for whileTrue:/whileFalse: because if you wanted to implement them with real messages you would need tail call optimization, which Smalltalk VMs don't ususally do. Hence the implementation in the image that relies on compiler inlining.
Well, since we're talking about de-optimizing here, you *could* do #whileTrue: without optimizing tail calls. It's just that it would be really slow, especially if you wanted to guard against run-away memory use for loops with lots of iterations. If you want to make things slower, the sky's the limit!
True. But in any case you would have to touch the implementation, otherwise you just get an infinite recursion :)
- Bert -
On Sat, Feb 08, 2014 at 08:37:46PM -0800, Sean P. DeNigris wrote:
We often talk about making the VM faster. How about making it slower?
We do not usually get too many requests to make the VM slower, what a refreshing change of perspective ;-)
http://www.ispot.tv/ad/Y94D/xfinity-internet-traffic-featuring-bill-and-karo...
Joking aside, there actually is one legitimate reason for wanting a slow VM. With high performance VMs and with ever faster hardware, it is very easy to implement sloppy things in the image that go unnoticed until someone runs the image on an old machine or on limited hardware. It is sometimes useful to test on old hardware or on a slow VM to check for this.
I think someone mentioned it earlier, but a very easy way to produce an intentionally slow VM is to generate the sources from VMMaker with the inlining step disabled. The slang inliner is extremely effective, and turning it off produces impressively sluggish results.
Dave
On 09-02-2014, at 10:07 AM, David T. Lewis lewis@mail.msen.com wrote:
Joking aside, there actually is one legitimate reason for wanting a slow VM. With high performance VMs and with ever faster hardware, it is very easy to implement sloppy things in the image that go unnoticed until someone runs the image on an old machine or on limited hardware. It is sometimes useful to test on old hardware or on a slow VM to check for this.
The cheapest and easiest way to do it these days is to buy a Raspberry Pi. You’ll learn very quickly where you have used crappy algorithms or poor technique… though of course you do have to put up with X windows as well. Unless you try RISC OS, which although not able to make the raw compute performance faster at least has a window system that doesn’t send every pixel to the screen via Deep Space Network to the relay on Sedna.
I think someone mentioned it earlier, but a very easy way to produce an intentionally slow VM is to generate the sources from VMMaker with the inlining step disabled. The slang inliner is extremely effective, and turning it off produces impressively sluggish results.
Does that actually work these days? Last I remember was that turning inlining off wouldn’t produce a buildable interp.c file. If someone has had the patience to make it work then I’m impressed.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: SDLI: Shift Disk Left Immediate
On Sun, Feb 09, 2014 at 10:23:37AM -0800, tim Rowledge wrote:
On 09-02-2014, at 10:07 AM, David T. Lewis lewis@mail.msen.com wrote:
Joking aside, there actually is one legitimate reason for wanting a slow VM. With high performance VMs and with ever faster hardware, it is very easy to implement sloppy things in the image that go unnoticed until someone runs the image on an old machine or on limited hardware. It is sometimes useful to test on old hardware or on a slow VM to check for this.
The cheapest and easiest way to do it these days is to buy a Raspberry Pi. You?ll learn very quickly where you have used crappy algorithms or poor technique? though of course you do have to put up with X windows as well. Unless you try RISC OS, which although not able to make the raw compute performance faster at least has a window system that doesn?t send every pixel to the screen via Deep Space Network to the relay on Sedna.
I think someone mentioned it earlier, but a very easy way to produce an intentionally slow VM is to generate the sources from VMMaker with the inlining step disabled. The slang inliner is extremely effective, and turning it off produces impressively sluggish results.
Does that actually work these days? Last I remember was that turning inlining off wouldn?t produce a buildable interp.c file. If someone has had the patience to make it work then I?m impressed.
Dang it, you're right, it's not working. I guess I have not tried this in a while, though I know that it used to work. Making things go slower seems like a worthwhile thing to do on a Sunday afternoon, so I think I'll see if I can fix it.
Dave
On Sun, Feb 9, 2014 at 11:46 AM, David T. Lewis lewis@mail.msen.com wrote:
On Sun, Feb 09, 2014 at 10:23:37AM -0800, tim Rowledge wrote:
On 09-02-2014, at 10:07 AM, David T. Lewis lewis@mail.msen.com wrote:
Joking aside, there actually is one legitimate reason for wanting a
slow VM.
With high performance VMs and with ever faster hardware, it is very
easy to
implement sloppy things in the image that go unnoticed until someone
runs the
image on an old machine or on limited hardware. It is sometimes useful
to
test on old hardware or on a slow VM to check for this.
The cheapest and easiest way to do it these days is to buy a Raspberry
Pi. You?ll learn very quickly where you have used crappy algorithms or poor technique? though of course you do have to put up with X windows as well. Unless you try RISC OS, which although not able to make the raw compute performance faster at least has a window system that doesn?t send every pixel to the screen via Deep Space Network to the relay on Sedna.
I think someone mentioned it earlier, but a very easy way to produce an intentionally slow VM is to generate the sources from VMMaker with the inlining step disabled. The slang inliner is extremely effective, and
turning
it off produces impressively sluggish results.
Does that actually work these days? Last I remember was that turning
inlining off wouldn?t produce a buildable interp.c file. If someone has had the patience to make it work then I?m impressed.
Dang it, you're right, it's not working. I guess I have not tried this in a while, though I know that it used to work. Making things go slower seems like a worthwhile thing to do on a Sunday afternoon, so I think I'll see if I can fix it.
I *think* the issue is the internal/external split brought abut by the introduction of the localFoo variables, such as localSP and localIP. This optimization absolutely depends on inlining. Which reminds me that anyone who is interested in creating a StackInterpreter or CoInterpreter that *doesn't* use the internal methods and uses only stackPointer, framePointer and instructionPointer would have my full support. I'm very curious to see what the performance of stack+internal vs stack-internal, and cog+internal vs cog-internal will be. I'm hoping that the performance of the -internal versions is good enough that we could eliminate all that duplication.
On 10-02-2014, at 11:53 AM, Eliot Miranda eliot.miranda@gmail.com wrote:
I *think* the issue is the internal/external split brought abut by the introduction of the localFoo variables, such as localSP and localIP.
It’s really hard to be sure but I suspect that this isn’t the (only) issue. IIRC we used to be able to make non-inlined VMs at one point and that was well after the internalFoo code was added.
OK, some quick email searching reveals some work done in ’03 by johnMcI, Craig & me. Craig found the following code helped -
!'From Squeak3.6alpha of ''17 March 2003'' [latest update: #5325] on 21 July 2003 at 1:11:25 pm'!
!Interpreter methodsFor: 'contexts' stamp: 'crl 7/19/2003 15:59'! primitiveFindNextUnwindContext "Primitive. Search up the context stack for the next method context marked for unwind handling from the receiver up to but not including the argument. Return nil if none found." | thisCntx nilOop aContext isUnwindMarked header meth pIndex | aContext _ self popStack. thisCntx _ self fetchPointer: SenderIndex ofObject: self popStack. nilOop _ nilObj.
[(thisCntx = aContext) or: [thisCntx = nilOop]] whileFalse: [
header _ self baseHeader: aContext.
(self isMethodContextHeader: header) ifTrue: [ meth _ self fetchPointer: MethodIndex ofObject: aContext. pIndex _ self primitiveIndexOf: meth. isUnwindMarked _ pIndex == 198] ifFalse: [isUnwindMarked _ false]. isUnwindMarked ifTrue:[ self push: thisCntx. ^nil]. thisCntx _ self fetchPointer: SenderIndex ofObject: thisCntx].
^self push: nilOop! !
!Interpreter methodsFor: 'interpreter shell' stamp: 'crl 7/19/2003 15:33'! interpret "This is the main interpreter loop. It normally loops forever, fetching and executing bytecodes. When running in the context of a browser plugin VM, however, it must return control to the browser periodically. This should done only when the state of the currently running Squeak thread is safely stored in the object heap. Since this is the case at the moment that a check for interrupts is performed, that is when we return to the browser if it is time to do so. Interrupt checks happen quite frequently."
"record entry time when running as a browser plug-in" "self browserPluginInitialiseIfNeeded" self internalizeIPandSP. self fetchNextBytecode. [true] whileTrue: [self dispatchOn: currentBytecode in: BytecodeTable]. localIP _ localIP - 1. "undo the pre-increment of IP before returning" self externalizeIPandSP. ! !
!Interpreter methodsFor: 'return bytecodes' stamp: 'crl 7/19/2003 16:05'! returnValueTo "Note: Assumed to be inlined into the dispatch loop."
| nilOop thisCntx contextOfCaller localCntx localVal isUnwindMarked header meth pIndex | self inline: true. self sharedCodeNamed: 'commonReturn' inCase: 120.
nilOop _ nilObj. "keep in a register" thisCntx _ activeContext. localCntx _ cntx. localVal _ val.
"make sure we can return to the given context" ((localCntx = nilOop) or: [(self fetchPointer: InstructionPointerIndex ofObject: localCntx) = nilOop]) ifTrue: [ "error: sender's instruction pointer or context is nil; cannot return" ^self internalCannotReturn: localVal].
"If this return is not to our immediate predecessor (i.e. from a method to its sender, or from a block to its caller), scan the stack for the first unwind marked context and inform this context and let it deal with it. This provides a chance for ensure unwinding to occur." thisCntx _ self fetchPointer: SenderIndex ofObject: activeContext.
"Just possibly a faster test would be to compare the homeContext and activeContext - they are of course different for blocks. Thus we might be able to optimise a touch by having a different returnTo for the blockreteurn (since we know that must return to caller) and then if active ~= home we must be doing a non-local return. I think. Maybe." [thisCntx = localCntx] whileFalse: [ thisCntx = nilObj ifTrue:[ "error: sender's instruction pointer or context is nil; cannot return" ^self internalCannotReturn: localVal]. "Climb up stack towards localCntx. Break out to a send of #aboutToReturn:through: if an unwind marked context is found" header _ self baseHeader: thisCntx.
(self isMethodContextHeader: header) ifTrue: [ meth _ self fetchPointer: MethodIndex ofObject: thisCntx. pIndex _ self primitiveIndexOf: meth. isUnwindMarked _ pIndex == 198] ifFalse: [isUnwindMarked _ false].
isUnwindMarked ifTrue:[ "context is marked; break out" ^self internalAboutToReturn: localVal through: thisCntx]. thisCntx _ self fetchPointer: SenderIndex ofObject: thisCntx. ].
"If we get here there is no unwind to worry about. Simply terminate the stack up to the localCntx - often just the sender of the method" thisCntx _ activeContext. [thisCntx = localCntx] whileFalse: ["climb up stack to localCntx" contextOfCaller _ self fetchPointer: SenderIndex ofObject: thisCntx.
"zap exited contexts so any future attempted use will be caught" self storePointerUnchecked: SenderIndex ofObject: thisCntx withValue: nilOop. self storePointerUnchecked: InstructionPointerIndex ofObject: thisCntx withValue: nilOop. reclaimableContextCount > 0 ifTrue: ["try to recycle this context" reclaimableContextCount _ reclaimableContextCount - 1. self recycleContextIfPossible: thisCntx]. thisCntx _ contextOfCaller].
activeContext _ thisCntx. (thisCntx < youngStart) ifTrue: [ self beRootIfOld: thisCntx ].
self internalFetchContextRegisters: thisCntx. "updates local IP and SP" self fetchNextBytecode. self internalPush: localVal. ! !
Shortly after that I released the VMMaker3.6 with a note that it couldn’t produce a completely non-inlined VM because of a problem in fetchByte if globalstruct was enabled, and some odd problems in B2DPlugin. When VMMaker3.7 was released a year late (march 04) I apparently thought it could make the core vm non-inlined. Since this is all a bazillion years ago I can’t remember any context to help extend the history.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Science is imagination equipped with grappling hooks.
I was looking at the trunk VMM yesterday and found that most of the issues were just caused by accessor methods, where #foo and #foo: generate conflicting foo(void) and foo(aParameter). In most cases, a convention of #setFoo: rather than #foo: takes care of the problem. There were a few other miscellaneous issues as well, but nothing that looked serious.
The variable 'memory' is a challenge because it is used extensively both directly and through #memory and #memory:. I was considering changing the variable name to something like memoryBase, and leaving the accessors alone though I'm not sure that would be a very good idea.
I ran out of time yesterday and did not pursue it beyond this.
Dave
On 10-02-2014, at 11:53 AM, Eliot Miranda eliot.miranda@gmail.com wrote:
I *think* the issue is the internal/external split brought abut by the introduction of the localFoo variables, such as localSP and localIP.
Its really hard to be sure but I suspect that this isnt the (only) issue. IIRC we used to be able to make non-inlined VMs at one point and that was well after the internalFoo code was added.
OK, some quick email searching reveals some work done in 03 by johnMcI, Craig & me. Craig found the following code helped -
!'From Squeak3.6alpha of ''17 March 2003'' [latest update: #5325] on 21 July 2003 at 1:11:25 pm'!
!Interpreter methodsFor: 'contexts' stamp: 'crl 7/19/2003 15:59'! primitiveFindNextUnwindContext "Primitive. Search up the context stack for the next method context marked for unwind handling from the receiver up to but not including the argument. Return nil if none found." | thisCntx nilOop aContext isUnwindMarked header meth pIndex | aContext _ self popStack. thisCntx _ self fetchPointer: SenderIndex ofObject: self popStack. nilOop _ nilObj.
[(thisCntx = aContext) or: [thisCntx = nilOop]] whileFalse: [
header _ self baseHeader: aContext.
(self isMethodContextHeader: header) ifTrue: [ meth _ self fetchPointer: MethodIndex ofObject: aContext. pIndex _ self primitiveIndexOf: meth. isUnwindMarked _ pIndex == 198] ifFalse: [isUnwindMarked _ false]. isUnwindMarked ifTrue:[ self push: thisCntx. ^nil]. thisCntx _ self fetchPointer: SenderIndex ofObject: thisCntx].
^self push: nilOop! !
!Interpreter methodsFor: 'interpreter shell' stamp: 'crl 7/19/2003 15:33'! interpret "This is the main interpreter loop. It normally loops forever, fetching and executing bytecodes. When running in the context of a browser plugin VM, however, it must return control to the browser periodically. This should done only when the state of the currently running Squeak thread is safely stored in the object heap. Since this is the case at the moment that a check for interrupts is performed, that is when we return to the browser if it is time to do so. Interrupt checks happen quite frequently."
"record entry time when running as a browser plug-in" "self browserPluginInitialiseIfNeeded" self internalizeIPandSP. self fetchNextBytecode. [true] whileTrue: [self dispatchOn: currentBytecode in: BytecodeTable]. localIP _ localIP - 1. "undo the pre-increment of IP before returning" self externalizeIPandSP. ! !
!Interpreter methodsFor: 'return bytecodes' stamp: 'crl 7/19/2003 16:05'! returnValueTo "Note: Assumed to be inlined into the dispatch loop."
| nilOop thisCntx contextOfCaller localCntx localVal isUnwindMarked header meth pIndex | self inline: true. self sharedCodeNamed: 'commonReturn' inCase: 120.
nilOop _ nilObj. "keep in a register" thisCntx _ activeContext. localCntx _ cntx. localVal _ val.
"make sure we can return to the given context" ((localCntx = nilOop) or: [(self fetchPointer: InstructionPointerIndex ofObject: localCntx) = nilOop]) ifTrue: [ "error: sender's instruction pointer or context is nil; cannot return" ^self internalCannotReturn: localVal].
"If this return is not to our immediate predecessor (i.e. from a method to its sender, or from a block to its caller), scan the stack for the first unwind marked context and inform this context and let it deal with it. This provides a chance for ensure unwinding to occur." thisCntx _ self fetchPointer: SenderIndex ofObject: activeContext.
"Just possibly a faster test would be to compare the homeContext and activeContext - they are of course different for blocks. Thus we might be able to optimise a touch by having a different returnTo for the blockreteurn (since we know that must return to caller) and then if active ~= home we must be doing a non-local return. I think. Maybe." [thisCntx = localCntx] whileFalse: [ thisCntx = nilObj ifTrue:[ "error: sender's instruction pointer or context is nil; cannot return" ^self internalCannotReturn: localVal]. "Climb up stack towards localCntx. Break out to a send of #aboutToReturn:through: if an unwind marked context is found" header _ self baseHeader: thisCntx.
(self isMethodContextHeader: header) ifTrue: [ meth _ self fetchPointer: MethodIndex ofObject: thisCntx. pIndex _ self primitiveIndexOf: meth. isUnwindMarked _ pIndex == 198] ifFalse: [isUnwindMarked _ false].
isUnwindMarked ifTrue:[ "context is marked; break out" ^self internalAboutToReturn: localVal through: thisCntx]. thisCntx _ self fetchPointer: SenderIndex ofObject: thisCntx.
].
"If we get here there is no unwind to worry about. Simply terminate the stack up to the localCntx - often just the sender of the method" thisCntx _ activeContext. [thisCntx = localCntx] whileFalse: ["climb up stack to localCntx" contextOfCaller _ self fetchPointer: SenderIndex ofObject: thisCntx.
"zap exited contexts so any future attempted use will be caught" self storePointerUnchecked: SenderIndex ofObject: thisCntx withValue:
nilOop. self storePointerUnchecked: InstructionPointerIndex ofObject: thisCntx withValue: nilOop. reclaimableContextCount > 0 ifTrue: ["try to recycle this context" reclaimableContextCount _ reclaimableContextCount - 1. self recycleContextIfPossible: thisCntx]. thisCntx _ contextOfCaller].
activeContext _ thisCntx. (thisCntx < youngStart) ifTrue: [ self beRootIfOld: thisCntx ].
self internalFetchContextRegisters: thisCntx. "updates local IP and SP" self fetchNextBytecode. self internalPush: localVal. ! !
Shortly after that I released the VMMaker3.6 with a note that it couldnt produce a completely non-inlined VM because of a problem in fetchByte if globalstruct was enabled, and some odd problems in B2DPlugin. When VMMaker3.7 was released a year late (march 04) I apparently thought it could make the core vm non-inlined. Since this is all a bazillion years ago I cant remember any context to help extend the history.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Science is imagination equipped with grappling hooks.
Hi David, do you realize that Eliot is (ab)using this in COG in order to eliminate some direct cCode: '...' inclusion? So setFoo: is not an option (or i misunderstood something)
2014-02-10 21:51 GMT+01:00 David T. Lewis lewis@mail.msen.com:
I was looking at the trunk VMM yesterday and found that most of the issues were just caused by accessor methods, where #foo and #foo: generate conflicting foo(void) and foo(aParameter). In most cases, a convention of #setFoo: rather than #foo: takes care of the problem. There were a few other miscellaneous issues as well, but nothing that looked serious.
The variable 'memory' is a challenge because it is used extensively both directly and through #memory and #memory:. I was considering changing the variable name to something like memoryBase, and leaving the accessors alone though I'm not sure that would be a very good idea.
I ran out of time yesterday and did not pursue it beyond this.
Dave
On 10-02-2014, at 11:53 AM, Eliot Miranda eliot.miranda@gmail.com
wrote:
I *think* the issue is the internal/external split brought abut by the introduction of the localFoo variables, such as localSP and localIP.
It's really hard to be sure but I suspect that this isn't the (only) issue. IIRC we used to be able to make non-inlined VMs at one point and that was well after the internalFoo code was added.
OK, some quick email searching reveals some work done in '03 by johnMcI, Craig & me. Craig found the following code helped -
!'From Squeak3.6alpha of ''17 March 2003'' [latest update: #5325] on 21 July 2003 at 1:11:25 pm'!
!Interpreter methodsFor: 'contexts' stamp: 'crl 7/19/2003 15:59'! primitiveFindNextUnwindContext "Primitive. Search up the context stack for the next method context marked for unwind handling from the receiver up to but not including the argument. Return nil if none found." | thisCntx nilOop aContext isUnwindMarked header meth pIndex | aContext _ self popStack. thisCntx _ self fetchPointer: SenderIndex ofObject: self popStack. nilOop _ nilObj.
[(thisCntx = aContext) or: [thisCntx = nilOop]] whileFalse: [ header _ self baseHeader: aContext. (self isMethodContextHeader: header) ifTrue: [ meth _ self fetchPointer: MethodIndex ofObject:
aContext.
pIndex _ self primitiveIndexOf: meth. isUnwindMarked _ pIndex == 198] ifFalse: [isUnwindMarked _ false]. isUnwindMarked ifTrue:[ self push: thisCntx. ^nil]. thisCntx _ self fetchPointer: SenderIndex ofObject:
thisCntx].
^self push: nilOop! !
!Interpreter methodsFor: 'interpreter shell' stamp: 'crl 7/19/2003
15:33'!
interpret "This is the main interpreter loop. It normally loops forever,
fetching
and executing bytecodes. When running in the context of a browser plugin VM, however, it must return control to the browser periodically. This should done only when the state of the currently running Squeak thread is safely stored in the object heap. Since this is the case at the moment that a check for interrupts is performed, that is when we return to the browser if it is time to do so. Interrupt checks happen quite frequently."
"record entry time when running as a browser plug-in" "self browserPluginInitialiseIfNeeded" self internalizeIPandSP. self fetchNextBytecode. [true] whileTrue: [self dispatchOn: currentBytecode in:
BytecodeTable].
localIP _ localIP - 1. "undo the pre-increment of IP before
returning"
self externalizeIPandSP.
! !
!Interpreter methodsFor: 'return bytecodes' stamp: 'crl 7/19/2003 16:05'! returnValueTo "Note: Assumed to be inlined into the dispatch loop."
| nilOop thisCntx contextOfCaller localCntx localVal isUnwindMarked
header meth pIndex | self inline: true. self sharedCodeNamed: 'commonReturn' inCase: 120.
nilOop _ nilObj. "keep in a register" thisCntx _ activeContext. localCntx _ cntx. localVal _ val. "make sure we can return to the given context" ((localCntx = nilOop) or: [(self fetchPointer: InstructionPointerIndex ofObject: localCntx)
=
nilOop]) ifTrue: [ "error: sender's instruction pointer or context is nil;
cannot return"
^self internalCannotReturn: localVal]. "If this return is not to our immediate predecessor (i.e. from a
method
to its sender, or from a block to its caller), scan the stack for the first unwind marked context and inform this context and let it deal with it. This provides a chance for ensure unwinding to occur." thisCntx _ self fetchPointer: SenderIndex ofObject: activeContext.
"Just possibly a faster test would be to compare the homeContext
and
activeContext - they are of course different for blocks. Thus we might be able to optimise a touch by having a different returnTo for the blockreteurn (since we know that must return to caller) and then if active ~= home we must be doing a non-local return. I think. Maybe." [thisCntx = localCntx] whileFalse: [ thisCntx = nilObj ifTrue:[ "error: sender's instruction pointer or context is
nil; cannot return"
^self internalCannotReturn: localVal]. "Climb up stack towards localCntx. Break out to a send of
#aboutToReturn:through: if an unwind marked context is found" header _ self baseHeader: thisCntx.
(self isMethodContextHeader: header) ifTrue: [ meth _ self fetchPointer: MethodIndex ofObject:
thisCntx.
pIndex _ self primitiveIndexOf: meth. isUnwindMarked _ pIndex == 198] ifFalse: [isUnwindMarked _ false]. isUnwindMarked ifTrue:[ "context is marked; break out" ^self internalAboutToReturn: localVal through:
thisCntx].
thisCntx _ self fetchPointer: SenderIndex ofObject:
thisCntx.
].
"If we get here there is no unwind to worry about. Simply
terminate the
stack up to the localCntx - often just the sender of the method" thisCntx _ activeContext. [thisCntx = localCntx] whileFalse: ["climb up stack to localCntx" contextOfCaller _ self fetchPointer: SenderIndex ofObject:
thisCntx.
"zap exited contexts so any future attempted use will be
caught"
self storePointerUnchecked: SenderIndex ofObject: thisCntx
withValue:
nilOop. self storePointerUnchecked: InstructionPointerIndex
ofObject: thisCntx
withValue: nilOop. reclaimableContextCount > 0 ifTrue: ["try to recycle this context" reclaimableContextCount _ reclaimableContextCount
self recycleContextIfPossible: thisCntx]. thisCntx _ contextOfCaller]. activeContext _ thisCntx. (thisCntx < youngStart) ifTrue: [ self beRootIfOld: thisCntx ]. self internalFetchContextRegisters: thisCntx. "updates local IP
and SP"
self fetchNextBytecode. self internalPush: localVal.
! !
Shortly after that I released the VMMaker3.6 with a note that it couldn't produce a completely non-inlined VM because of a problem in fetchByte if globalstruct was enabled, and some odd problems in B2DPlugin. When VMMaker3.7 was released a year late (march 04) I apparently thought it could make the core vm non-inlined. Since this is all a bazillion years ago I can't remember any context to help extend the history.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Science is imagination equipped with grappling hooks.
On Mon, Feb 10, 2014 at 10:12:32PM +0100, Nicolas Cellier wrote:
Hi David, do you realize that Eliot is (ab)using this in COG in order to eliminate some direct cCode: '...' inclusion? So setFoo: is not an option (or i misunderstood something)
Hi Nicolas,
Actually I am not sure what you are referring to here, so probably I am missing something. Can you explain why setFoo: would be a problem in Cog? I cannot check it myself right now but I am interested to know if I am missing something important.
Thanks, Dave
2014-02-10 21:51 GMT+01:00 David T. Lewis lewis@mail.msen.com:
I was looking at the trunk VMM yesterday and found that most of the issues were just caused by accessor methods, where #foo and #foo: generate conflicting foo(void) and foo(aParameter). In most cases, a convention of #setFoo: rather than #foo: takes care of the problem. There were a few other miscellaneous issues as well, but nothing that looked serious.
The variable 'memory' is a challenge because it is used extensively both directly and through #memory and #memory:. I was considering changing the variable name to something like memoryBase, and leaving the accessors alone though I'm not sure that would be a very good idea.
I ran out of time yesterday and did not pursue it beyond this.
Dave
On 10-02-2014, at 11:53 AM, Eliot Miranda eliot.miranda@gmail.com
wrote:
I *think* the issue is the internal/external split brought abut by the introduction of the localFoo variables, such as localSP and localIP.
It's really hard to be sure but I suspect that this isn't the (only) issue. IIRC we used to be able to make non-inlined VMs at one point and that was well after the internalFoo code was added.
OK, some quick email searching reveals some work done in '03 by johnMcI, Craig & me. Craig found the following code helped -
!'From Squeak3.6alpha of ''17 March 2003'' [latest update: #5325] on 21 July 2003 at 1:11:25 pm'!
!Interpreter methodsFor: 'contexts' stamp: 'crl 7/19/2003 15:59'! primitiveFindNextUnwindContext "Primitive. Search up the context stack for the next method context marked for unwind handling from the receiver up to but not including the argument. Return nil if none found." | thisCntx nilOop aContext isUnwindMarked header meth pIndex | aContext _ self popStack. thisCntx _ self fetchPointer: SenderIndex ofObject: self popStack. nilOop _ nilObj.
[(thisCntx = aContext) or: [thisCntx = nilOop]] whileFalse: [ header _ self baseHeader: aContext. (self isMethodContextHeader: header) ifTrue: [ meth _ self fetchPointer: MethodIndex ofObject:
aContext.
pIndex _ self primitiveIndexOf: meth. isUnwindMarked _ pIndex == 198] ifFalse: [isUnwindMarked _ false]. isUnwindMarked ifTrue:[ self push: thisCntx. ^nil]. thisCntx _ self fetchPointer: SenderIndex ofObject:
thisCntx].
^self push: nilOop! !
!Interpreter methodsFor: 'interpreter shell' stamp: 'crl 7/19/2003
15:33'!
interpret "This is the main interpreter loop. It normally loops forever,
fetching
and executing bytecodes. When running in the context of a browser plugin VM, however, it must return control to the browser periodically. This should done only when the state of the currently running Squeak thread is safely stored in the object heap. Since this is the case at the moment that a check for interrupts is performed, that is when we return to the browser if it is time to do so. Interrupt checks happen quite frequently."
"record entry time when running as a browser plug-in" "self browserPluginInitialiseIfNeeded" self internalizeIPandSP. self fetchNextBytecode. [true] whileTrue: [self dispatchOn: currentBytecode in:
BytecodeTable].
localIP _ localIP - 1. "undo the pre-increment of IP before
returning"
self externalizeIPandSP.
! !
!Interpreter methodsFor: 'return bytecodes' stamp: 'crl 7/19/2003 16:05'! returnValueTo "Note: Assumed to be inlined into the dispatch loop."
| nilOop thisCntx contextOfCaller localCntx localVal isUnwindMarked
header meth pIndex | self inline: true. self sharedCodeNamed: 'commonReturn' inCase: 120.
nilOop _ nilObj. "keep in a register" thisCntx _ activeContext. localCntx _ cntx. localVal _ val. "make sure we can return to the given context" ((localCntx = nilOop) or: [(self fetchPointer: InstructionPointerIndex ofObject: localCntx)
=
nilOop]) ifTrue: [ "error: sender's instruction pointer or context is nil;
cannot return"
^self internalCannotReturn: localVal]. "If this return is not to our immediate predecessor (i.e. from a
method
to its sender, or from a block to its caller), scan the stack for the first unwind marked context and inform this context and let it deal with it. This provides a chance for ensure unwinding to occur." thisCntx _ self fetchPointer: SenderIndex ofObject: activeContext.
"Just possibly a faster test would be to compare the homeContext
and
activeContext - they are of course different for blocks. Thus we might be able to optimise a touch by having a different returnTo for the blockreteurn (since we know that must return to caller) and then if active ~= home we must be doing a non-local return. I think. Maybe." [thisCntx = localCntx] whileFalse: [ thisCntx = nilObj ifTrue:[ "error: sender's instruction pointer or context is
nil; cannot return"
^self internalCannotReturn: localVal]. "Climb up stack towards localCntx. Break out to a send of
#aboutToReturn:through: if an unwind marked context is found" header _ self baseHeader: thisCntx.
(self isMethodContextHeader: header) ifTrue: [ meth _ self fetchPointer: MethodIndex ofObject:
thisCntx.
pIndex _ self primitiveIndexOf: meth. isUnwindMarked _ pIndex == 198] ifFalse: [isUnwindMarked _ false]. isUnwindMarked ifTrue:[ "context is marked; break out" ^self internalAboutToReturn: localVal through:
thisCntx].
thisCntx _ self fetchPointer: SenderIndex ofObject:
thisCntx.
].
"If we get here there is no unwind to worry about. Simply
terminate the
stack up to the localCntx - often just the sender of the method" thisCntx _ activeContext. [thisCntx = localCntx] whileFalse: ["climb up stack to localCntx" contextOfCaller _ self fetchPointer: SenderIndex ofObject:
thisCntx.
"zap exited contexts so any future attempted use will be
caught"
self storePointerUnchecked: SenderIndex ofObject: thisCntx
withValue:
nilOop. self storePointerUnchecked: InstructionPointerIndex
ofObject: thisCntx
withValue: nilOop. reclaimableContextCount > 0 ifTrue: ["try to recycle this context" reclaimableContextCount _ reclaimableContextCount
self recycleContextIfPossible: thisCntx]. thisCntx _ contextOfCaller]. activeContext _ thisCntx. (thisCntx < youngStart) ifTrue: [ self beRootIfOld: thisCntx ]. self internalFetchContextRegisters: thisCntx. "updates local IP
and SP"
self fetchNextBytecode. self internalPush: localVal.
! !
Shortly after that I released the VMMaker3.6 with a note that it couldn't produce a completely non-inlined VM because of a problem in fetchByte if globalstruct was enabled, and some odd problems in B2DPlugin. When VMMaker3.7 was released a year late (march 04) I apparently thought it could make the core vm non-inlined. Since this is all a bazillion years ago I can't remember any context to help extend the history.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Science is imagination equipped with grappling hooks.
Hi David, I wanted to say that COG depends on (self malloc: n) to be translated malloc(n); and not setMalloc(n); for example (you can have many others by browsing unimplemented calls), but maybe foo was not a generic ID in your case?
2014-02-11 15:05 GMT+01:00 David T. Lewis lewis@mail.msen.com:
On Mon, Feb 10, 2014 at 10:12:32PM +0100, Nicolas Cellier wrote:
Hi David, do you realize that Eliot is (ab)using this in COG in order to eliminate some direct cCode: '...' inclusion? So setFoo: is not an option (or i misunderstood something)
Hi Nicolas,
Actually I am not sure what you are referring to here, so probably I am missing something. Can you explain why setFoo: would be a problem in Cog? I cannot check it myself right now but I am interested to know if I am missing something important.
Thanks, Dave
2014-02-10 21:51 GMT+01:00 David T. Lewis lewis@mail.msen.com:
I was looking at the trunk VMM yesterday and found that most of the
issues
were just caused by accessor methods, where #foo and #foo: generate conflicting foo(void) and foo(aParameter). In most cases, a convention
of
#setFoo: rather than #foo: takes care of the problem. There were a few other miscellaneous issues as well, but nothing that looked serious.
The variable 'memory' is a challenge because it is used extensively
both
directly and through #memory and #memory:. I was considering changing
the
variable name to something like memoryBase, and leaving the accessors alone though I'm not sure that would be a very good idea.
I ran out of time yesterday and did not pursue it beyond this.
Dave
On 10-02-2014, at 11:53 AM, Eliot Miranda eliot.miranda@gmail.com
wrote:
I *think* the issue is the internal/external split brought abut by
the
introduction of the localFoo variables, such as localSP and localIP.
It's really hard to be sure but I suspect that this isn't the (only) issue. IIRC we used to be able to make non-inlined VMs at one point
and
that was well after the internalFoo code was added.
OK, some quick email searching reveals some work done in '03 by
johnMcI,
Craig & me. Craig found the following code helped -
!'From Squeak3.6alpha of ''17 March 2003'' [latest update: #5325] on
21
July 2003 at 1:11:25 pm'!
!Interpreter methodsFor: 'contexts' stamp: 'crl 7/19/2003 15:59'! primitiveFindNextUnwindContext "Primitive. Search up the context stack for the next method
context
marked for unwind handling from the receiver up to but not including
the
argument. Return nil if none found." | thisCntx nilOop aContext isUnwindMarked header meth pIndex | aContext _ self popStack. thisCntx _ self fetchPointer: SenderIndex ofObject: self
popStack.
nilOop _ nilObj. [(thisCntx = aContext) or: [thisCntx = nilOop]] whileFalse: [ header _ self baseHeader: aContext. (self isMethodContextHeader: header) ifTrue: [ meth _ self fetchPointer: MethodIndex ofObject:
aContext.
pIndex _ self primitiveIndexOf: meth. isUnwindMarked _ pIndex == 198] ifFalse: [isUnwindMarked _ false]. isUnwindMarked ifTrue:[ self push: thisCntx. ^nil]. thisCntx _ self fetchPointer: SenderIndex ofObject:
thisCntx].
^self push: nilOop! !
!Interpreter methodsFor: 'interpreter shell' stamp: 'crl 7/19/2003
15:33'!
interpret "This is the main interpreter loop. It normally loops forever,
fetching
and executing bytecodes. When running in the context of a browser
plugin
VM, however, it must return control to the browser periodically. This should done only when the state of the currently running Squeak
thread is
safely stored in the object heap. Since this is the case at the
moment
that a check for interrupts is performed, that is when we return to
the
browser if it is time to do so. Interrupt checks happen quite frequently."
"record entry time when running as a browser plug-in" "self browserPluginInitialiseIfNeeded" self internalizeIPandSP. self fetchNextBytecode. [true] whileTrue: [self dispatchOn: currentBytecode in:
BytecodeTable].
localIP _ localIP - 1. "undo the pre-increment of IP before
returning"
self externalizeIPandSP.
! !
!Interpreter methodsFor: 'return bytecodes' stamp: 'crl 7/19/2003
16:05'!
returnValueTo "Note: Assumed to be inlined into the dispatch loop."
| nilOop thisCntx contextOfCaller localCntx localVal
isUnwindMarked
header meth pIndex | self inline: true. self sharedCodeNamed: 'commonReturn' inCase: 120.
nilOop _ nilObj. "keep in a register" thisCntx _ activeContext. localCntx _ cntx. localVal _ val. "make sure we can return to the given context" ((localCntx = nilOop) or: [(self fetchPointer: InstructionPointerIndex ofObject:
localCntx)
=
nilOop]) ifTrue: [ "error: sender's instruction pointer or context is nil;
cannot return"
^self internalCannotReturn: localVal]. "If this return is not to our immediate predecessor (i.e. from
a
method
to its sender, or from a block to its caller), scan the stack for the first unwind marked context and inform this context and let it deal
with
it. This provides a chance for ensure unwinding to occur." thisCntx _ self fetchPointer: SenderIndex ofObject:
activeContext.
"Just possibly a faster test would be to compare the
homeContext
and
activeContext - they are of course different for blocks. Thus we
might be
able to optimise a touch by having a different returnTo for the blockreteurn (since we know that must return to caller) and then if active ~= home we must be doing a non-local return. I think. Maybe." [thisCntx = localCntx] whileFalse: [ thisCntx = nilObj ifTrue:[ "error: sender's instruction pointer or
context is
nil; cannot return"
^self internalCannotReturn: localVal]. "Climb up stack towards localCntx. Break out to a send
of
#aboutToReturn:through: if an unwind marked context is found" header _ self baseHeader: thisCntx.
(self isMethodContextHeader: header) ifTrue: [ meth _ self fetchPointer: MethodIndex ofObject:
thisCntx.
pIndex _ self primitiveIndexOf: meth. isUnwindMarked _ pIndex == 198] ifFalse: [isUnwindMarked _ false]. isUnwindMarked ifTrue:[ "context is marked; break out" ^self internalAboutToReturn: localVal through:
thisCntx].
thisCntx _ self fetchPointer: SenderIndex ofObject:
thisCntx.
].
"If we get here there is no unwind to worry about. Simply
terminate the
stack up to the localCntx - often just the sender of the method" thisCntx _ activeContext. [thisCntx = localCntx] whileFalse: ["climb up stack to localCntx" contextOfCaller _ self fetchPointer: SenderIndex
ofObject:
thisCntx.
"zap exited contexts so any future attempted use will
be
caught"
self storePointerUnchecked: SenderIndex ofObject:
thisCntx
withValue:
nilOop. self storePointerUnchecked: InstructionPointerIndex
ofObject: thisCntx
withValue: nilOop. reclaimableContextCount > 0 ifTrue: ["try to recycle this context" reclaimableContextCount _
reclaimableContextCount
self recycleContextIfPossible: thisCntx]. thisCntx _ contextOfCaller]. activeContext _ thisCntx. (thisCntx < youngStart) ifTrue: [ self beRootIfOld: thisCntx ]. self internalFetchContextRegisters: thisCntx. "updates local
IP
and SP"
self fetchNextBytecode. self internalPush: localVal.
! !
Shortly after that I released the VMMaker3.6 with a note that it
couldn't
produce a completely non-inlined VM because of a problem in
fetchByte if
globalstruct was enabled, and some odd problems in B2DPlugin. When VMMaker3.7 was released a year late (march 04) I apparently thought
it
could make the core vm non-inlined. Since this is all a bazillion
years
ago I can't remember any context to help extend the history.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Science is imagination equipped with grappling hooks.
OK thank you, I am aware of that trick so not a problem.
(But you should not blame Eliot, I think I started abusing slang that way in OSProcessPlugin many years ago, so you can blame me just as well)
Thanks a lot, Dave
Hi David, I wanted to say that COG depends on (self malloc: n) to be translated malloc(n); and not setMalloc(n); for example (you can have many others by browsing unimplemented calls), but maybe foo was not a generic ID in your case?
2014-02-11 15:05 GMT+01:00 David T. Lewis lewis@mail.msen.com:
On Mon, Feb 10, 2014 at 10:12:32PM +0100, Nicolas Cellier wrote:
Hi David, do you realize that Eliot is (ab)using this in COG in order to
eliminate
some direct cCode: '...' inclusion? So setFoo: is not an option (or i misunderstood something)
Hi Nicolas,
Actually I am not sure what you are referring to here, so probably I am missing something. Can you explain why setFoo: would be a problem in Cog? I cannot check it myself right now but I am interested to know if I am missing something important.
Thanks, Dave
2014-02-10 21:51 GMT+01:00 David T. Lewis lewis@mail.msen.com:
I was looking at the trunk VMM yesterday and found that most of the
issues
were just caused by accessor methods, where #foo and #foo: generate conflicting foo(void) and foo(aParameter). In most cases, a
convention of
#setFoo: rather than #foo: takes care of the problem. There were a
few
other miscellaneous issues as well, but nothing that looked serious.
The variable 'memory' is a challenge because it is used extensively
both
directly and through #memory and #memory:. I was considering
changing the
variable name to something like memoryBase, and leaving the
accessors
alone though I'm not sure that would be a very good idea.
I ran out of time yesterday and did not pursue it beyond this.
Dave
On 10-02-2014, at 11:53 AM, Eliot Miranda
wrote:
I *think* the issue is the internal/external split brought abut
by the
introduction of the localFoo variables, such as localSP and
localIP.
It's really hard to be sure but I suspect that this isn't the
(only)
issue. IIRC we used to be able to make non-inlined VMs at one
point and
that was well after the internalFoo code was added.
OK, some quick email searching reveals some work done in '03 by
johnMcI,
Craig & me. Craig found the following code helped -
!'From Squeak3.6alpha of ''17 March 2003'' [latest update: #5325]
on 21
July 2003 at 1:11:25 pm'!
!Interpreter methodsFor: 'contexts' stamp: 'crl 7/19/2003 15:59'! primitiveFindNextUnwindContext "Primitive. Search up the context stack for the next method
context
marked for unwind handling from the receiver up to but not
including the
argument. Return nil if none found." | thisCntx nilOop aContext isUnwindMarked header meth pIndex
|
aContext _ self popStack. thisCntx _ self fetchPointer: SenderIndex ofObject: self
popStack.
nilOop _ nilObj. [(thisCntx = aContext) or: [thisCntx = nilOop]] whileFalse:
[
header _ self baseHeader: aContext. (self isMethodContextHeader: header) ifTrue: [ meth _ self fetchPointer: MethodIndex
ofObject:
aContext.
pIndex _ self primitiveIndexOf: meth. isUnwindMarked _ pIndex == 198] ifFalse: [isUnwindMarked _ false]. isUnwindMarked ifTrue:[ self push: thisCntx. ^nil]. thisCntx _ self fetchPointer: SenderIndex ofObject:
thisCntx].
^self push: nilOop! !
!Interpreter methodsFor: 'interpreter shell' stamp: 'crl 7/19/2003
15:33'!
interpret "This is the main interpreter loop. It normally loops
forever,
fetching
and executing bytecodes. When running in the context of a browser
plugin
VM, however, it must return control to the browser periodically.
This
should done only when the state of the currently running Squeak
thread is
safely stored in the object heap. Since this is the case at the
moment
that a check for interrupts is performed, that is when we return
to the
browser if it is time to do so. Interrupt checks happen quite frequently."
"record entry time when running as a browser plug-in" "self browserPluginInitialiseIfNeeded" self internalizeIPandSP. self fetchNextBytecode. [true] whileTrue: [self dispatchOn: currentBytecode in:
BytecodeTable].
localIP _ localIP - 1. "undo the pre-increment of IP before
returning"
self externalizeIPandSP.
! !
!Interpreter methodsFor: 'return bytecodes' stamp: 'crl 7/19/2003
16:05'!
returnValueTo "Note: Assumed to be inlined into the dispatch loop."
| nilOop thisCntx contextOfCaller localCntx localVal
isUnwindMarked
header meth pIndex | self inline: true. self sharedCodeNamed: 'commonReturn' inCase: 120.
nilOop _ nilObj. "keep in a register" thisCntx _ activeContext. localCntx _ cntx. localVal _ val. "make sure we can return to the given context" ((localCntx = nilOop) or: [(self fetchPointer: InstructionPointerIndex ofObject:
localCntx)
=
nilOop]) ifTrue: [ "error: sender's instruction pointer or context is
nil;
cannot return"
^self internalCannotReturn: localVal]. "If this return is not to our immediate predecessor (i.e.
from a
method
to its sender, or from a block to its caller), scan the stack for
the
first unwind marked context and inform this context and let it
deal with
it. This provides a chance for ensure unwinding to occur." thisCntx _ self fetchPointer: SenderIndex ofObject:
activeContext.
"Just possibly a faster test would be to compare the
homeContext
and
activeContext - they are of course different for blocks. Thus we
might be
able to optimise a touch by having a different returnTo for the blockreteurn (since we know that must return to caller) and then
if
active ~= home we must be doing a non-local return. I think.
Maybe."
[thisCntx = localCntx] whileFalse: [ thisCntx = nilObj ifTrue:[ "error: sender's instruction pointer or
context is
nil; cannot return"
^self internalCannotReturn: localVal]. "Climb up stack towards localCntx. Break out to a
send of
#aboutToReturn:through: if an unwind marked context is found" header _ self baseHeader: thisCntx.
(self isMethodContextHeader: header) ifTrue: [ meth _ self fetchPointer: MethodIndex
ofObject:
thisCntx.
pIndex _ self primitiveIndexOf: meth. isUnwindMarked _ pIndex == 198] ifFalse: [isUnwindMarked _ false]. isUnwindMarked ifTrue:[ "context is marked; break out" ^self internalAboutToReturn: localVal
through:
thisCntx].
thisCntx _ self fetchPointer: SenderIndex ofObject:
thisCntx.
].
"If we get here there is no unwind to worry about. Simply
terminate the
stack up to the localCntx - often just the sender of the method" thisCntx _ activeContext. [thisCntx = localCntx] whileFalse: ["climb up stack to localCntx" contextOfCaller _ self fetchPointer: SenderIndex
ofObject:
thisCntx.
"zap exited contexts so any future attempted use
will be
caught"
self storePointerUnchecked: SenderIndex ofObject:
thisCntx
withValue:
nilOop. self storePointerUnchecked: InstructionPointerIndex
ofObject: thisCntx
withValue: nilOop. reclaimableContextCount > 0 ifTrue: ["try to recycle this context" reclaimableContextCount _
reclaimableContextCount
self recycleContextIfPossible: thisCntx]. thisCntx _ contextOfCaller]. activeContext _ thisCntx. (thisCntx < youngStart) ifTrue: [ self beRootIfOld: thisCntx
].
self internalFetchContextRegisters: thisCntx. "updates
local IP
and SP"
self fetchNextBytecode. self internalPush: localVal.
! !
Shortly after that I released the VMMaker3.6 with a note that it
couldn't
produce a completely non-inlined VM because of a problem in
fetchByte if
globalstruct was enabled, and some odd problems in B2DPlugin. When VMMaker3.7 was released a year late (march 04) I apparently
thought it
could make the core vm non-inlined. Since this is all a bazillion
years
ago I can't remember any context to help extend the history.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Science is imagination equipped with grappling hooks.
On Tue, Feb 11, 2014 at 11:05 AM, David T. Lewis lewis@mail.msen.comwrote:
OK thank you, I am aware of that trick so not a problem.
(But you should not blame Eliot, I think I started abusing slang that way in OSProcessPlugin many years ago, so you can blame me just as well)
Personally I fin it far less of an abuse than the horrible cCode: 'aString...' idiom. With "self malloc: n" I can look for senders etc, but more importantly I can actually implement it in the simulator. You'll see in the Cog branch working implementations of str:n:cmp: mem:mo:ve: etc which are actually required by the simulator. Let me plead for those of you writing VM code to avoid cCode: as much as possible. Use it to include code that only the simulator should use by all means, but please try and generate your C calls from Smalltalk code.
Here's the kind of thing I mean. This coerces an address into a simulator's CogMethod:
printCogMethod: cogMethod <api> <var: #cogMethod type: #'CogMethod *'> | address primitive | self cCode: '' inSmalltalk: [self transcript ensureCr. cogMethod isInteger ifTrue: [^self printCogMethod: (self cCoerceSimple: cogMethod to: #'CogMethod *')]]. address := cogMethod asInteger. self printHex: address; print: ' <-> '; printHex: address + cogMethod blockSize. cogMethod cmType = CMMethod ifTrue: ...
Here's the kind of thing to be avoided:
interpreterProxy success: ((interpreterProxy isBytes: oop) and: [(interpreterProxy slotSizeOf: oop) = (self cCode: 'sizeof(AsyncFile)')]).
It could be written as (and if so, simulated!!)
interpreterProxy success: ((interpreterProxy isBytes: oop) and: [(interpreterProxy slotSizeOf: oop) = (self sizeof: #AsyncFile)]).
cheers!
Thanks a lot,
Dave
Hi David, I wanted to say that COG depends on (self malloc: n) to be translated malloc(n); and not setMalloc(n); for example (you can have many others by browsing unimplemented calls), but maybe foo was not a generic ID in your case?
2014-02-11 15:05 GMT+01:00 David T. Lewis lewis@mail.msen.com:
On Mon, Feb 10, 2014 at 10:12:32PM +0100, Nicolas Cellier wrote:
Hi David, do you realize that Eliot is (ab)using this in COG in order to
eliminate
some direct cCode: '...' inclusion? So setFoo: is not an option (or i misunderstood something)
Hi Nicolas,
Actually I am not sure what you are referring to here, so probably I am missing something. Can you explain why setFoo: would be a problem in Cog? I cannot check it myself right now but I am interested to know if I am missing something important.
Thanks, Dave
2014-02-10 21:51 GMT+01:00 David T. Lewis lewis@mail.msen.com:
I was looking at the trunk VMM yesterday and found that most of the
issues
were just caused by accessor methods, where #foo and #foo: generate conflicting foo(void) and foo(aParameter). In most cases, a
convention of
#setFoo: rather than #foo: takes care of the problem. There were a
few
other miscellaneous issues as well, but nothing that looked serious.
The variable 'memory' is a challenge because it is used extensively
both
directly and through #memory and #memory:. I was considering
changing the
variable name to something like memoryBase, and leaving the
accessors
alone though I'm not sure that would be a very good idea.
I ran out of time yesterday and did not pursue it beyond this.
Dave
On 10-02-2014, at 11:53 AM, Eliot Miranda
wrote:
> > I *think* the issue is the internal/external split brought abut
by the
> introduction of the localFoo variables, such as localSP and
localIP.
It's really hard to be sure but I suspect that this isn't the
(only)
issue. IIRC we used to be able to make non-inlined VMs at one
point and
that was well after the internalFoo code was added.
OK, some quick email searching reveals some work done in '03 by
johnMcI,
Craig & me. Craig found the following code helped -
!'From Squeak3.6alpha of ''17 March 2003'' [latest update: #5325]
on 21
July 2003 at 1:11:25 pm'!
!Interpreter methodsFor: 'contexts' stamp: 'crl 7/19/2003 15:59'! primitiveFindNextUnwindContext "Primitive. Search up the context stack for the next method
context
marked for unwind handling from the receiver up to but not
including the
argument. Return nil if none found." | thisCntx nilOop aContext isUnwindMarked header meth pIndex
|
aContext _ self popStack. thisCntx _ self fetchPointer: SenderIndex ofObject: self
popStack.
nilOop _ nilObj. [(thisCntx = aContext) or: [thisCntx = nilOop]] whileFalse:
[
header _ self baseHeader: aContext. (self isMethodContextHeader: header) ifTrue: [ meth _ self fetchPointer: MethodIndex
ofObject:
aContext.
pIndex _ self primitiveIndexOf: meth. isUnwindMarked _ pIndex == 198] ifFalse: [isUnwindMarked _ false]. isUnwindMarked ifTrue:[ self push: thisCntx. ^nil]. thisCntx _ self fetchPointer: SenderIndex ofObject:
thisCntx].
^self push: nilOop! !
!Interpreter methodsFor: 'interpreter shell' stamp: 'crl 7/19/2003
15:33'!
interpret "This is the main interpreter loop. It normally loops
forever,
fetching
and executing bytecodes. When running in the context of a browser
plugin
VM, however, it must return control to the browser periodically.
This
should done only when the state of the currently running Squeak
thread is
safely stored in the object heap. Since this is the case at the
moment
that a check for interrupts is performed, that is when we return
to the
browser if it is time to do so. Interrupt checks happen quite frequently."
"record entry time when running as a browser plug-in" "self browserPluginInitialiseIfNeeded" self internalizeIPandSP. self fetchNextBytecode. [true] whileTrue: [self dispatchOn: currentBytecode in:
BytecodeTable].
localIP _ localIP - 1. "undo the pre-increment of IP before
returning"
self externalizeIPandSP.
! !
!Interpreter methodsFor: 'return bytecodes' stamp: 'crl 7/19/2003
16:05'!
returnValueTo "Note: Assumed to be inlined into the dispatch loop."
| nilOop thisCntx contextOfCaller localCntx localVal
isUnwindMarked
header meth pIndex | self inline: true. self sharedCodeNamed: 'commonReturn' inCase: 120.
nilOop _ nilObj. "keep in a register" thisCntx _ activeContext. localCntx _ cntx. localVal _ val. "make sure we can return to the given context" ((localCntx = nilOop) or: [(self fetchPointer: InstructionPointerIndex ofObject:
localCntx)
=
nilOop]) ifTrue: [ "error: sender's instruction pointer or context is
nil;
cannot return"
^self internalCannotReturn: localVal]. "If this return is not to our immediate predecessor (i.e.
from a
method
to its sender, or from a block to its caller), scan the stack for
the
first unwind marked context and inform this context and let it
deal with
it. This provides a chance for ensure unwinding to occur." thisCntx _ self fetchPointer: SenderIndex ofObject:
activeContext.
"Just possibly a faster test would be to compare the
homeContext
and
activeContext - they are of course different for blocks. Thus we
might be
able to optimise a touch by having a different returnTo for the blockreteurn (since we know that must return to caller) and then
if
active ~= home we must be doing a non-local return. I think.
Maybe."
[thisCntx = localCntx] whileFalse: [ thisCntx = nilObj ifTrue:[ "error: sender's instruction pointer or
context is
nil; cannot return"
^self internalCannotReturn: localVal]. "Climb up stack towards localCntx. Break out to a
send of
#aboutToReturn:through: if an unwind marked context is found" header _ self baseHeader: thisCntx.
(self isMethodContextHeader: header) ifTrue: [ meth _ self fetchPointer: MethodIndex
ofObject:
thisCntx.
pIndex _ self primitiveIndexOf: meth. isUnwindMarked _ pIndex == 198] ifFalse: [isUnwindMarked _ false]. isUnwindMarked ifTrue:[ "context is marked; break out" ^self internalAboutToReturn: localVal
through:
thisCntx].
thisCntx _ self fetchPointer: SenderIndex ofObject:
thisCntx.
].
"If we get here there is no unwind to worry about. Simply
terminate the
stack up to the localCntx - often just the sender of the method" thisCntx _ activeContext. [thisCntx = localCntx] whileFalse: ["climb up stack to localCntx" contextOfCaller _ self fetchPointer: SenderIndex
ofObject:
thisCntx.
"zap exited contexts so any future attempted use
will be
caught"
self storePointerUnchecked: SenderIndex ofObject:
thisCntx
withValue:
nilOop. self storePointerUnchecked: InstructionPointerIndex
ofObject: thisCntx
withValue: nilOop. reclaimableContextCount > 0 ifTrue: ["try to recycle this context" reclaimableContextCount _
reclaimableContextCount
self recycleContextIfPossible: thisCntx]. thisCntx _ contextOfCaller]. activeContext _ thisCntx. (thisCntx < youngStart) ifTrue: [ self beRootIfOld: thisCntx
].
self internalFetchContextRegisters: thisCntx. "updates
local IP
and SP"
self fetchNextBytecode. self internalPush: localVal.
! !
Shortly after that I released the VMMaker3.6 with a note that it
couldn't
produce a completely non-inlined VM because of a problem in
fetchByte if
globalstruct was enabled, and some odd problems in B2DPlugin. When VMMaker3.7 was released a year late (march 04) I apparently
thought it
could make the core vm non-inlined. Since this is all a bazillion
years
ago I can't remember any context to help extend the history.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Science is imagination equipped with grappling hooks.
Let me plead for those of you writing VM code to avoid cCode: as much as possible. Use it to include code that only the simulator should use by all means, but please try and generate your C calls from Smalltalk code.
Hear, hear!
-C
-- Craig Latta www.netjam.org/resume +31 6 2757 7177 (SMS ok) + 1 415 287 3547 (no SMS)
On Mon, Feb 10, 2014 at 12:51 PM, David T. Lewis lewis@mail.msen.comwrote:
I was looking at the trunk VMM yesterday and found that most of the issues were just caused by accessor methods, where #foo and #foo: generate conflicting foo(void) and foo(aParameter). In most cases, a convention of #setFoo: rather than #foo: takes care of the problem. There were a few other miscellaneous issues as well, but nothing that looked serious.
There's a more convenient hack:
memory <cmacro: '() GIV(memory)'> ^memory
memory: aValue ^memory := aValue
The variable 'memory' is a challenge because it is used extensively both directly and through #memory and #memory:. I was considering changing the variable name to something like memoryBase, and leaving the accessors alone though I'm not sure that would be a very good idea.
See above.
I ran out of time yesterday and did not pursue it beyond this.
Dave
On 10-02-2014, at 11:53 AM, Eliot Miranda eliot.miranda@gmail.com
wrote:
I *think* the issue is the internal/external split brought abut by the introduction of the localFoo variables, such as localSP and localIP.
It's really hard to be sure but I suspect that this isn't the (only) issue. IIRC we used to be able to make non-inlined VMs at one point and that was well after the internalFoo code was added.
OK, some quick email searching reveals some work done in '03 by johnMcI, Craig & me. Craig found the following code helped -
!'From Squeak3.6alpha of ''17 March 2003'' [latest update: #5325] on 21 July 2003 at 1:11:25 pm'!
!Interpreter methodsFor: 'contexts' stamp: 'crl 7/19/2003 15:59'! primitiveFindNextUnwindContext "Primitive. Search up the context stack for the next method context marked for unwind handling from the receiver up to but not including the argument. Return nil if none found." | thisCntx nilOop aContext isUnwindMarked header meth pIndex | aContext _ self popStack. thisCntx _ self fetchPointer: SenderIndex ofObject: self popStack. nilOop _ nilObj.
[(thisCntx = aContext) or: [thisCntx = nilOop]] whileFalse: [ header _ self baseHeader: aContext. (self isMethodContextHeader: header) ifTrue: [ meth _ self fetchPointer: MethodIndex ofObject:
aContext.
pIndex _ self primitiveIndexOf: meth. isUnwindMarked _ pIndex == 198] ifFalse: [isUnwindMarked _ false]. isUnwindMarked ifTrue:[ self push: thisCntx. ^nil]. thisCntx _ self fetchPointer: SenderIndex ofObject:
thisCntx].
^self push: nilOop! !
!Interpreter methodsFor: 'interpreter shell' stamp: 'crl 7/19/2003
15:33'!
interpret "This is the main interpreter loop. It normally loops forever,
fetching
and executing bytecodes. When running in the context of a browser plugin VM, however, it must return control to the browser periodically. This should done only when the state of the currently running Squeak thread is safely stored in the object heap. Since this is the case at the moment that a check for interrupts is performed, that is when we return to the browser if it is time to do so. Interrupt checks happen quite frequently."
"record entry time when running as a browser plug-in" "self browserPluginInitialiseIfNeeded" self internalizeIPandSP. self fetchNextBytecode. [true] whileTrue: [self dispatchOn: currentBytecode in:
BytecodeTable].
localIP _ localIP - 1. "undo the pre-increment of IP before
returning"
self externalizeIPandSP.
! !
!Interpreter methodsFor: 'return bytecodes' stamp: 'crl 7/19/2003 16:05'! returnValueTo "Note: Assumed to be inlined into the dispatch loop."
| nilOop thisCntx contextOfCaller localCntx localVal isUnwindMarked
header meth pIndex | self inline: true. self sharedCodeNamed: 'commonReturn' inCase: 120.
nilOop _ nilObj. "keep in a register" thisCntx _ activeContext. localCntx _ cntx. localVal _ val. "make sure we can return to the given context" ((localCntx = nilOop) or: [(self fetchPointer: InstructionPointerIndex ofObject: localCntx)
=
nilOop]) ifTrue: [ "error: sender's instruction pointer or context is nil;
cannot return"
^self internalCannotReturn: localVal]. "If this return is not to our immediate predecessor (i.e. from a
method
to its sender, or from a block to its caller), scan the stack for the first unwind marked context and inform this context and let it deal with it. This provides a chance for ensure unwinding to occur." thisCntx _ self fetchPointer: SenderIndex ofObject: activeContext.
"Just possibly a faster test would be to compare the homeContext
and
activeContext - they are of course different for blocks. Thus we might be able to optimise a touch by having a different returnTo for the blockreteurn (since we know that must return to caller) and then if active ~= home we must be doing a non-local return. I think. Maybe." [thisCntx = localCntx] whileFalse: [ thisCntx = nilObj ifTrue:[ "error: sender's instruction pointer or context is
nil; cannot return"
^self internalCannotReturn: localVal]. "Climb up stack towards localCntx. Break out to a send of
#aboutToReturn:through: if an unwind marked context is found" header _ self baseHeader: thisCntx.
(self isMethodContextHeader: header) ifTrue: [ meth _ self fetchPointer: MethodIndex ofObject:
thisCntx.
pIndex _ self primitiveIndexOf: meth. isUnwindMarked _ pIndex == 198] ifFalse: [isUnwindMarked _ false]. isUnwindMarked ifTrue:[ "context is marked; break out" ^self internalAboutToReturn: localVal through:
thisCntx].
thisCntx _ self fetchPointer: SenderIndex ofObject:
thisCntx.
].
"If we get here there is no unwind to worry about. Simply
terminate the
stack up to the localCntx - often just the sender of the method" thisCntx _ activeContext. [thisCntx = localCntx] whileFalse: ["climb up stack to localCntx" contextOfCaller _ self fetchPointer: SenderIndex ofObject:
thisCntx.
"zap exited contexts so any future attempted use will be
caught"
self storePointerUnchecked: SenderIndex ofObject: thisCntx
withValue:
nilOop. self storePointerUnchecked: InstructionPointerIndex
ofObject: thisCntx
withValue: nilOop. reclaimableContextCount > 0 ifTrue: ["try to recycle this context" reclaimableContextCount _ reclaimableContextCount
self recycleContextIfPossible: thisCntx]. thisCntx _ contextOfCaller]. activeContext _ thisCntx. (thisCntx < youngStart) ifTrue: [ self beRootIfOld: thisCntx ]. self internalFetchContextRegisters: thisCntx. "updates local IP
and SP"
self fetchNextBytecode. self internalPush: localVal.
! !
Shortly after that I released the VMMaker3.6 with a note that it couldn't produce a completely non-inlined VM because of a problem in fetchByte if globalstruct was enabled, and some odd problems in B2DPlugin. When VMMaker3.7 was released a year late (march 04) I apparently thought it could make the core vm non-inlined. Since this is all a bazillion years ago I can't remember any context to help extend the history.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Science is imagination equipped with grappling hooks.
On Sun, Feb 09, 2014 at 10:23:37AM -0800, tim Rowledge wrote:
On 09-02-2014, at 10:07 AM, David T. Lewis lewis@mail.msen.com wrote:
I think someone mentioned it earlier, but a very easy way to produce an intentionally slow VM is to generate the sources from VMMaker with the inlining step disabled. The slang inliner is extremely effective, and turning it off produces impressively sluggish results.
Does that actually work these days? Last I remember was that turning inlining off wouldn?t produce a buildable interp.c file. If someone has had the patience to make it work then I?m impressed.
You're right about one thing, it required a lot of patience ;-)
I did manage to get it working though, and the results are in VMMaker-dtl.342.
This turned out to be a useful exercise, as I flushed out a couple of type declaration bugs along the way.
The major issue was that the refactoring of object memory and interpreter into separate class hierarchies (which is a very good thing IMHO) requires the use of accessor methods, and this leads to name conflicts in the generated code if those accessor methods are not fully inlined.
I went with the approach of naming the accessors getFoo and setFoo: as well as, for the case of array access, fooAt: and fooAt:put:. This is not very pleasing from a readability point of view, but it is simple and it works.
If I compile a VM with inlining disabled and compiler optimization turned off, the result is about 1/8th the speed of the same interpreter VM built normally.
Dave
Hi David,
On Feb 23, 2014, at 8:22 AM, "David T. Lewis" lewis@mail.msen.com wrote:
On Sun, Feb 09, 2014 at 10:23:37AM -0800, tim Rowledge wrote:
On 09-02-2014, at 10:07 AM, David T. Lewis lewis@mail.msen.com wrote:
I think someone mentioned it earlier, but a very easy way to produce an intentionally slow VM is to generate the sources from VMMaker with the inlining step disabled. The slang inliner is extremely effective, and turning it off produces impressively sluggish results.
Does that actually work these days? Last I remember was that turning inlining off wouldn?t produce a buildable interp.c file. If someone has had the patience to make it work then I?m impressed.
You're right about one thing, it required a lot of patience ;-)
I did manage to get it working though, and the results are in VMMaker-dtl.342.
This turned out to be a useful exercise, as I flushed out a couple of type declaration bugs along the way.
The major issue was that the refactoring of object memory and interpreter into separate class hierarchies (which is a very good thing IMHO) requires the use of accessor methods, and this leads to name conflicts in the generated code if those accessor methods are not fully inlined.
I went with the approach of naming the accessors getFoo and setFoo: as well as, for the case of array access, fooAt: and fooAt:put:. This is not very pleasing from a readability point of view, but it is simple and it works.
If I compile a VM with inlining disabled and compiler optimization turned off, the result is about 1/8th the speed of the same interpreter VM built normally.
But more to the point, what's the speed with the same level of optimization as the normal VM?
and does this affect the internalFoo inlining? Does this VM have everything that uses localSP & localIP inclined in interpret or are localSP & localIP no longer local to interpret?
Dave
Eliot (phone)
On Sun, Feb 23, 2014 at 08:45:16AM -0800, Eliot Miranda wrote:
Hi David,
On Feb 23, 2014, at 8:22 AM, "David T. Lewis" lewis@mail.msen.com wrote:
On Sun, Feb 09, 2014 at 10:23:37AM -0800, tim Rowledge wrote:
On 09-02-2014, at 10:07 AM, David T. Lewis lewis@mail.msen.com wrote:
I think someone mentioned it earlier, but a very easy way to produce an intentionally slow VM is to generate the sources from VMMaker with the inlining step disabled. The slang inliner is extremely effective, and turning it off produces impressively sluggish results.
Does that actually work these days? Last I remember was that turning inlining off wouldn?t produce a buildable interp.c file. If someone has had the patience to make it work then I?m impressed.
You're right about one thing, it required a lot of patience ;-)
I did manage to get it working though, and the results are in VMMaker-dtl.342.
This turned out to be a useful exercise, as I flushed out a couple of type declaration bugs along the way.
The major issue was that the refactoring of object memory and interpreter into separate class hierarchies (which is a very good thing IMHO) requires the use of accessor methods, and this leads to name conflicts in the generated code if those accessor methods are not fully inlined.
I went with the approach of naming the accessors getFoo and setFoo: as well as, for the case of array access, fooAt: and fooAt:put:. This is not very pleasing from a readability point of view, but it is simple and it works.
If I compile a VM with inlining disabled and compiler optimization turned off, the result is about 1/8th the speed of the same interpreter VM built normally.
But more to the point, what's the speed with the same level of optimization as the normal VM?
I did not test this very carefully, but I saw this:
Normal interpreter VM: 0 tinyBenchmarks. '906194690 bytecodes/sec; 25262862 sends/sec' 0 tinyBenchmarks. '905393457 bytecodes/sec; 25413364 sends/sec' 0 tinyBenchmarks. '906997342 bytecodes/sec; 25786444 sends/sec'
No slang inlining, normal gcc optimization: 0 tinyBenchmarks. '452696728 bytecodes/sec; 15353518 sends/sec' 0 tinyBenchmarks. '459192825 bytecodes/sec; 15759973 sends/sec' 0 tinyBenchmarks. '458370635 bytecodes/sec; 15639770 sends/sec'
No slang inlining, no gcc optimization: 0 tinyBenchmarks. '205457463 bytecodes/sec; 7075541 sends/sec' 0 tinyBenchmarks. '206451612 bytecodes/sec; 7182476 sends/sec' 0 tinyBenchmarks. '206952303 bytecodes/sec; 7218843 sends/sec'
This is less of a difference than I expected for turning off the slang inlining. Either the gcc optimization has gotten better, or my memory has gotten worse, because I thought I remembered getting a bigger difference the last time I tried this (a long time ago).
I could slow the VM down quite a bit more if I use the MemoryAccess package. By itself, MemoryAccess will have no performance impact, but if you turn off slang inlining it should slow things down considerably. Perhaps that is what I am remembering from the earlier test. Unfortunately some bit rot has set in on MemoryAccess, so I'll have to fix that before I can confirm.
and does this affect the internalFoo inlining? Does this VM have everything that uses localSP & localIP inclined in interpret or are localSP & localIP no longer local to interpret?
There is no inlining in the interpret() loop, and the gnuification step is skipped. I believe that the localSP and localIP usage is uneffected, so yes they would still be local to interpret().
Dave
David T. Lewis wrote:
The major issue was that the refactoring of object memory and interpreter into separate class hierarchies (which is a very good thing IMHO) requires the use of accessor methods, and this leads to name conflicts in the generated code if those accessor methods are not fully inlined.
YAY!!!!
I did a half-assed attempt at that ten years ago....
My thinking was to run an Interpreter for each physical CPU (I had a SMP machine back then).
Now that we're starting the process of moving away from x86, there are could be specialized interpreters for different processors in the system and, possibly, several Object Memories for things such as the GPU memory area... Then you get Xeon Phi or NUMA type machines...
On Sun, Feb 23, 2014 at 12:20:02PM -0500, Alan Grimes wrote:
David T. Lewis wrote:
The major issue was that the refactoring of object memory and interpreter into separate class hierarchies (which is a very good thing IMHO) requires the use of accessor methods, and this leads to name conflicts in the generated code if those accessor methods are not fully inlined.
YAY!!!!
I did a half-assed attempt at that ten years ago....
Eliot gets credit for the original refactoring in the Cog VMMaker, although I have since extended his work somewhat in VMM trunk to more clearly separate the "classic" object memory and NewObjectMemory, and the stack and context interpreters.
Dave
My thinking was to run an Interpreter for each physical CPU (I had a SMP machine back then).
Now that we're starting the process of moving away from x86, there are could be specialized interpreters for different processors in the system and, possibly, several Object Memories for things such as the GPU memory area... Then you get Xeon Phi or NUMA type machines...
-- IQ is a measure of how stupid you feel.
Powers are not rights.
On Sun, Feb 09, 2014 at 10:23:37AM -0800, tim Rowledge wrote:
On 09-02-2014, at 10:07 AM, David T. Lewis lewis@mail.msen.com wrote:
Joking aside, there actually is one legitimate reason for wanting a slow VM. With high performance VMs and with ever faster hardware, it is very easy to implement sloppy things in the image that go unnoticed until someone runs the image on an old machine or on limited hardware. It is sometimes useful to test on old hardware or on a slow VM to check for this.
The cheapest and easiest way to do it these days is to buy a Raspberry Pi. You?ll learn very quickly where you have used crappy algorithms or poor technique? though of course you do have to put up with X windows as well. Unless you try RISC OS, which although not able to make the raw compute performance faster at least has a window system that doesn?t send every pixel to the screen via Deep Space Network to the relay on Sedna.
I think someone mentioned it earlier, but a very easy way to produce an intentionally slow VM is to generate the sources from VMMaker with the inlining step disabled. The slang inliner is extremely effective, and turning it off produces impressively sluggish results.
Does that actually work these days? Last I remember was that turning inlining off wouldn?t produce a buildable interp.c file. If someone has had the patience to make it work then I?m impressed.
OK, now that some inlining issues are fixed, and the MemoryAccess replacement for sqMemoryAccess.h macros is working again, I can now report that I am able to make a VM that runs at about 1/25th the speed of Cog. Maybe even worse, although I don't want to overstate my claim.
After observing the popularity of certain PC based software products for a number of years, I am convinced that there is vast and continuing demand for software that is slower and worse, so I anticipate an enthusiastic response to this announcement. Early adopters can obtain their own Extremely Slow Virtual Machine (ESVM) as follows:
1) Make sure that MemoryAccess is loaded. Evaluate "MemoryAccess enable" to instruct the code generator to use the slang implementations rather than the C macros. Normally these MemoryAccess methods would be fully inlined and would not affect performance, but we are going to turn off inlining, so this they will end up creating a very large number lot of function calls at the lowest levels of the interpreter loop.
2) Open your VMMakerTool and inspect its VMMaker instance. Set the 'inline' instance variable to false. This will completely disable all method inlining in the C code generation.
3) When compiling, set the CFLAGS such that C compiler optimization is disabled (for gcc, this would be -O0 in the CFLAGS).
There you have it. Sit back, brew a warm cup of chai tea, relax and enjoy the experience.
;-)
Dave
Thank you for all the work needed to combat the dastardly Eliot and confound his Evil Schemes For Fast VMs.
After observing the popularity of certain PC based software products for a number of years, I am convinced that there is vast and continuing demand for software that is slower and worse, so I anticipate an enthusiastic response to this announcement.
The tragic thing is that your humour only conceals slightly the truth. Just take a look around at commercial software and sadly too much OSS. Word? OpenOffice?
A truly valuable result of your work is that we will be able to test performance of new and clever UI widgets and make sure they (should) actually work ok for older machines. There’s a lot of them around. Obviously I’m concerned about a couple of million Raspberry Pis, for example.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Meets quality standards: It compiles without errors.
vm-dev@lists.squeakfoundation.org