Hi Ralph,
...
I was aware of caseOf: in Squeak. I always found it awkward to use and felt a true case statement would be simpler. Alas, it's impossible to have a true case statement added to Smalltalk now I think.
So what's a "true" case statement? For me, at least, the Squeak one *is*, and is more general than one limited to purely integer keys, as for
example
is C's switch statement. A number of languages provide case statements that are like Squeak's. What do you consider a "true" case statement?
I mean that: caseOf: is not part of the language itself but rather part of the standard library or set of packages that one finds in the IDE. To be part of the language it would need to be something the compiler is aware of. That is to day the Smalltalk language is not very much. Smalltalk (Squeak) the language would not include Sets or Dictionaries but would include (some) Array classes because some aspects of Arrays are dealt with directly by the compiler. Selectors such as ifTrue: and to:do: are part of the language because they are inlined by the compiler. Put another way, if I could get my doBlockAt: method incorporated into the Squeak IDE it would nevertheless NOT be part of Squeak the language. The consequence of caseOf: not being part of the language is that the compiler/VM cannot perform optimizations when caseOf: is run into but must treat it as user written code.
Squeak's caseOf: is more general than C's switch statement but it could be more general in that there is a hard coded message (=). I would like to be able to replace the '=' message by an arbitrary binary operator such as includes: or '>'.
I have to backtrack here: I looked at the code and it looks like the compiler inlines caseOf: and caseOf:otherwise. If so then these selectors are part of the language by my definition.
...
But I wouldn't want to be forced to implement my FSMs this way. It might be acceptable for small FSMs. I want to avoid sequential search and even binary search might be rather expensive. I look at computed gotos as the solution but, as you pointed out, computed gotos pose problems for JIT. Admittedly, for large FSM's, it might be best or necessary to use a FSM simulator anyway, as I do now.
Nah. One should always be able to map it down somehow. Tis will be
easier
with the Spur instruction set which lifts number of literals and length of branches limits.
Good to hear.
Again, for my FSM, case this would often be considered to be good. But if the state transition tables are sparse then Dictionaries might be preferable to Arrays.
Yes, but getting to the limit of what the VM can reasonably interpret. Better would be an Array of value. pc pairs, where the keys are the values the switch bytecode compares top of stack against, and the pcs are where
to
jump to on a match. The JIT can therefore implement the table as it sees fit, whereas the interpreter can just do a linear search through the
Array.
I am looking at this from the point of view of a compiler writer/generator and consider your proposal as inadequate for my needs. You, I think, are looking at this from the point of view of a VM writer and what can reasonably be delivered. I don't think what I want is overly difficult for the interpreter to deliver but as you pointed out, and you know much better than I, what I want causes serious problems for the VM.
My expection is that at: be sent to the collection object to get the address to go to. Knowing that the collection is an array though makes it easier for the compiler/VM to ensure that the addresses stored in the collection are valid. Actually, the compiler will be generating the addresses. Does the VM have absolute trust in the compiler to generate valid addresses?
Yes. Generate bad bytecode and the VM crashes.
This is what I expected to hear but wanted it to be clear for compilers generated by my parser generator tool as you did.
Ralph
On Sat, Nov 8, 2014 at 11:21 AM, Ralph Boland rpboland@gmail.com wrote:
Hi Ralph,
...
I was aware of caseOf: in Squeak. I always found it awkward to use and felt a true case statement would be simpler. Alas, it's impossible to have a true case statement added to Smalltalk now I think.
So what's a "true" case statement? For me, at least, the Squeak one
*is*,
and is more general than one limited to purely integer keys, as for
example
is C's switch statement. A number of languages provide case statements that are like Squeak's. What do you consider a "true" case statement?
I mean that: caseOf: is not part of the language itself but rather part of the standard library or set of packages that one finds in the IDE. To be part of the language it would need to be something the compiler is aware of.
Ah OK. I see what you mean. But you're wrong on a few counts. First, there are *no* control structures in the language beyond closures and polymorphism. ifTrue:, to:do:, and: whileTrue: et al are all defined in the library, not by the compiler. Second, tehse structures, /including/ caseOf: are understood by the compiler and compiled to non-message-sending code. So none of the blocks in caseOf:, ifTrue: and: whileTrue: et al, the optimized selectors, are created and all are inlined by the compiler. So a) by your criterion of being in the compiler caseOf: is in the language, but b) it all control structures in Smalltalk are defined in the library, and some are optimized by the compiler.
That is to
day the Smalltalk language is not very much. Smalltalk (Squeak) the language would not include Sets or Dictionaries but would include (some) Array classes because some aspects of Arrays are dealt with directly by the compiler.
There is a syntactic form for creating Array, but really the notion that the Smalltalk compiler defines the language is a limited one. It's fair to say that language is defined by a small set of variables, return, blocks, an object representation (ability to create classes that define a sequence of named inst vars and inherit from other classes), and message lookup rules (normal sends and super sends), and a small number of literal forms (Array, Integer, Float, Fraction, ByteArray, String and Symbol literals), and a method syntax. The rest is in the library. What this really means is that Smalltalk can't be reduced to a language, becaue the anguage doesn't defne enough. Instead it is a small language and a large library.
Selectors such as ifTrue: and to:do: are part of the language because
they are inlined by the compiler.
No. One can change the compiler to not inline them. This is merely an optimization.
Put another way, if I could get my doBlockAt: method incorporated into the Squeak IDE it would nevertheless NOT be part of Squeak the language. The consequence of caseOf: not being part of the language is that the compiler/VM cannot perform optimizations when caseOf: is run into but must treat it as user written code.
Squeak's caseOf: is more general than C's switch statement but it could be more general in that there is a hard coded message (=). I would like to be able to replace the '=' message by an arbitrary binary operator such as includes: or '>'.
I have to backtrack here: I looked at the code and it looks like the compiler inlines caseOf: and caseOf:otherwise. If so then these selectors are part of the language by my definition.
Well, live and learn :-)
...
But I wouldn't want to be forced to implement my FSMs this way. It might be acceptable for small FSMs. I want to avoid sequential search and even binary search might be rather expensive. I look at computed gotos as the solution but, as you pointed out, computed gotos pose problems for JIT. Admittedly, for large FSM's, it might be best or necessary to use a FSM simulator anyway, as I do now.
Nah. One should always be able to map it down somehow. Tis will be
easier
with the Spur instruction set which lifts number of literals and length
of
branches limits.
Good to hear.
Again, for my FSM, case this would often be considered to be good. But if the state transition tables are sparse then Dictionaries might be preferable to Arrays.
Yes, but getting to the limit of what the VM can reasonably interpret. Better would be an Array of value. pc pairs, where the keys are the
values
the switch bytecode compares top of stack against, and the pcs are where
to
jump to on a match. The JIT can therefore implement the table as it sees fit, whereas the interpreter can just do a linear search through the
Array.
I am looking at this from the point of view of a compiler writer/generator and consider your proposal as inadequate for my needs. You, I think, are looking at this from the point of view of a VM writer and what can reasonably be delivered. I don't think what I want is overly difficult for the interpreter to deliver but as you pointed out, and you know much better than I, what I want causes serious problems for the VM.
My expection is that at: be sent to the collection object to get the address to go to. Knowing that the collection is an array though makes it easier for the compiler/VM to ensure that the addresses stored in the collection are valid. Actually, the compiler will be generating the addresses. Does the VM have absolute trust in the compiler to generate valid addresses?
Yes. Generate bad bytecode and the VM crashes.
This is what I expected to hear but wanted it to be clear for compilers generated by my parser generator tool as you did.
Ralph
Eliot Miranda wrote:
On Sat, Nov 8, 2014 at 11:21 AM, Ralph Boland <rpboland@gmail.com mailto:rpboland@gmail.com> wrote:
> Hi Ralph, ... > > > > I was aware of caseOf: in Squeak. I always found it awkward to use and > > felt a true case statement would be simpler. Alas, it's impossible to > > have a true case statement added to Smalltalk now I think. > So what's a "true" case statement? For me, at least, the Squeak one *is*, > and is more general than one limited to purely integer keys, as for example > is C's switch statement. A number of languages provide case statements > that are like Squeak's. What do you consider a "true" case statement? I mean that: caseOf: is not part of the language itself but rather part of the standard library or set of packages that one finds in the IDE. To be part of the language it would need to be something the compiler is aware of.
Ah OK. I see what you mean. But you're wrong on a few counts. First, there are *no* control structures in the language beyond closures and polymorphism. ifTrue:, to:do:, and: whileTrue: et al are all defined in the library, not by the compiler. Second, tehse structures, /including/ caseOf: are understood by the compiler and compiled to non-message-sending code. So none of the blocks in caseOf:, ifTrue: and: whileTrue: et al, the optimized selectors, are created and all are inlined by the compiler. So a) by your criterion of being in the compiler caseOf: is in the language, but b) it all control structures in Smalltalk are defined in the library, and some are optimized by the compiler.
Reviewing the code for the following is enlightening: True>ifTrue: True>>ifFalse: False>>ifTrue: False>>ifFalse: to see as the original implementation, but remembering that as an optimization these are inlined, so that code is currently not executed.
Eliot, Would I be right to presume that the Interpreter does execute those methods without optimisation?
cheers -ben
That is to day the Smalltalk language is not very much. Smalltalk (Squeak) the language would not include Sets or Dictionaries but would include (some) Array classes because some aspects of Arrays are dealt with directly by the compiler.
There is a syntactic form for creating Array, but really the notion that the Smalltalk compiler defines the language is a limited one. It's fair to say that language is defined by a small set of variables, return, blocks, an object representation (ability to create classes that define a sequence of named inst vars and inherit from other classes), and message lookup rules (normal sends and super sends), and a small number of literal forms (Array, Integer, Float, Fraction, ByteArray, String and Symbol literals), and a method syntax. The rest is in the library. What this really means is that Smalltalk can't be reduced to a language, becaue the anguage doesn't defne enough. Instead it is a small language and a large library.
Selectors such as ifTrue: and to:do: are part of the language because they are inlined by the compiler.
No. One can change the compiler to not inline them. This is merely an optimization.
Put another way, if I could get my doBlockAt: method incorporated into the Squeak IDE it would nevertheless NOT be part of Squeak the language. The consequence of caseOf: not being part of the language is that the compiler/VM cannot perform optimizations when caseOf: is run into but must treat it as user written code. Squeak's caseOf: is more general than C's switch statement but it could be more general in that there is a hard coded message (=). I would like to be able to replace the '=' message by an arbitrary binary operator such as includes: or '>'. I have to backtrack here: I looked at the code and it looks like the compiler inlines caseOf: and caseOf:otherwise. If so then these selectors are part of the language by my definition.
Well, live and learn :-)
... > > But I wouldn't want to be forced to implement my FSMs this way. > > It might be acceptable for small FSMs. > > I want to avoid sequential search and > > even binary search might be rather expensive. > > I look at computed gotos as the solution but, > > as you pointed out, computed gotos pose problems for JIT. > > Admittedly, for large FSM's, it might be best or necessary to > > use a FSM simulator anyway, as I do now. > Nah. One should always be able to map it down somehow. Tis will be easier > with the Spur instruction set which lifts number of literals and length of > branches limits. Good to hear. > > Again, for my FSM, case this would often be considered to be good. > > But if the state transition tables are sparse then Dictionaries > > might be preferable to Arrays. > > Yes, but getting to the limit of what the VM can reasonably interpret. > Better would be an Array of value. pc pairs, where the keys are the values > the switch bytecode compares top of stack against, and the pcs are where to > jump to on a match. The JIT can therefore implement the table as it sees > fit, whereas the interpreter can just do a linear search through the Array. I am looking at this from the point of view of a compiler writer/generator and consider your proposal as inadequate for my needs. You, I think, are looking at this from the point of view of a VM writer and what can reasonably be delivered. I don't think what I want is overly difficult for the interpreter to deliver but as you pointed out, and you know much better than I, what I want causes serious problems for the VM. > > My expection is that at: be sent to the collection object > > to get the address to go to. Knowing that the collection > > is an array though makes it easier for the compiler/VM to > > ensure that the addresses stored in the collection are valid. > > Actually, the compiler will be generating the addresses. > > Does the VM have absolute trust in the compiler to generate valid > > addresses? > Yes. Generate bad bytecode and the VM crashes. This is what I expected to hear but wanted it to be clear for compilers generated by my parser generator tool as you did. Ralph
-- best, Eliot
Hi Ben,
On Nov 8, 2014, at 3:35 PM, Ben Coman btc@openInWorld.com wrote:
Eliot Miranda wrote:
On Sat, Nov 8, 2014 at 11:21 AM, Ralph Boland <rpboland@gmail.com mailto:rpboland@gmail.com> wrote: > Hi Ralph, ... > > > > I was aware of caseOf: in Squeak. I always found it awkward to use and > > felt a true case statement would be simpler. Alas, it's impossible to > > have a true case statement added to Smalltalk now I think. > So what's a "true" case statement? For me, at least, the Squeak one *is*, > and is more general than one limited to purely integer keys, as for example > is C's switch statement. A number of languages provide case statements > that are like Squeak's. What do you consider a "true" case statement? I mean that: caseOf: is not part of the language itself but rather part of the standard library or set of packages that one finds in the IDE. To be part of the language it would need to be something the compiler is aware of. Ah OK. I see what you mean. But you're wrong on a few counts. First, there are *no* control structures in the language beyond closures and polymorphism. ifTrue:, to:do:, and: whileTrue: et al are all defined in the library, not by the compiler. Second, tehse structures, /including/ caseOf: are understood by the compiler and compiled to non-message-sending code. So none of the blocks in caseOf:, ifTrue: and: whileTrue: et al, the optimized selectors, are created and all are inlined by the compiler. So a) by your criterion of being in the compiler caseOf: is in the language, but b) it all control structures in Smalltalk are defined in the library, and some are optimized by the compiler.
Reviewing the code for the following is enlightening: True>ifTrue: True>>ifFalse: False>>ifTrue: False>>ifFalse: to see as the original implementation, but remembering that as an optimization these are inlined, so that code is currently not executed.
Eliot, Would I be right to presume that the Interpreter does execute those methods without optimisation?
The interpreter directly executes the bytecode produced by the compiler. Go look. So it depends in how the code base is compiled. Right now the interpreter does *not* , because inlined blocks, conditional branches and jumps are much faster than closure creation and messages. The interpreter benefits a lot from this; early Smalltalk implementations were interpreted hence the optimisation in the first place. However, with adaptive optimisation one can allow the JIT to perform the optimisation in context, allowing alternative implementations of ifTrue: et al in other than booleans. In Sista we've chosen not to do that, keeping inlining and using conditional branches as our performance counters. But it may allow the compiler to be smart and optimize these forms in fewer cases.
cheers -ben
That is to day the Smalltalk language is not very much. Smalltalk (Squeak) the language would not include Sets or Dictionaries but would include (some) Array classes because some aspects of Arrays are dealt with directly by the compiler. There is a syntactic form for creating Array, but really the notion that the Smalltalk compiler defines the language is a limited one. It's fair to say that language is defined by a small set of variables, return, blocks, an object representation (ability to create classes that define a sequence of named inst vars and inherit from other classes), and message lookup rules (normal sends and super sends), and a small number of literal forms (Array, Integer, Float, Fraction, ByteArray, String and Symbol literals), and a method syntax. The rest is in the library. What this really means is that Smalltalk can't be reduced to a language, becaue the anguage doesn't defne enough. Instead it is a small language and a large library. Selectors such as ifTrue: and to:do: are part of the language because they are inlined by the compiler. No. One can change the compiler to not inline them. This is merely an optimization. Put another way, if I could get my doBlockAt: method incorporated into the Squeak IDE it would nevertheless NOT be part of Squeak the language. The consequence of caseOf: not being part of the language is that the compiler/VM cannot perform optimizations when caseOf: is run into but must treat it as user written code. Squeak's caseOf: is more general than C's switch statement but it could be more general in that there is a hard coded message (=). I would like to be able to replace the '=' message by an arbitrary binary operator such as includes: or '>'. I have to backtrack here: I looked at the code and it looks like the compiler inlines caseOf: and caseOf:otherwise. If so then these selectors are part of the language by my definition. Well, live and learn :-) ... > > But I wouldn't want to be forced to implement my FSMs this way. > > It might be acceptable for small FSMs. > > I want to avoid sequential search and > > even binary search might be rather expensive. > > I look at computed gotos as the solution but, > > as you pointed out, computed gotos pose problems for JIT. > > Admittedly, for large FSM's, it might be best or necessary to > > use a FSM simulator anyway, as I do now. > Nah. One should always be able to map it down somehow. Tis will be easier > with the Spur instruction set which lifts number of literals and length of > branches limits. Good to hear. > > Again, for my FSM, case this would often be considered to be good. > > But if the state transition tables are sparse then Dictionaries > > might be preferable to Arrays. > > Yes, but getting to the limit of what the VM can reasonably interpret. > Better would be an Array of value. pc pairs, where the keys are the values > the switch bytecode compares top of stack against, and the pcs are where to > jump to on a match. The JIT can therefore implement the table as it sees > fit, whereas the interpreter can just do a linear search through the Array. I am looking at this from the point of view of a compiler writer/generator and consider your proposal as inadequate for my needs. You, I think, are looking at this from the point of view of a VM writer and what can reasonably be delivered. I don't think what I want is overly difficult for the interpreter to deliver but as you pointed out, and you know much better than I, what I want causes serious problems for the VM. > > My expection is that at: be sent to the collection object > > to get the address to go to. Knowing that the collection > > is an array though makes it easier for the compiler/VM to > > ensure that the addresses stored in the collection are valid. > > Actually, the compiler will be generating the addresses. > > Does the VM have absolute trust in the compiler to generate valid > > addresses? > Yes. Generate bad bytecode and the VM crashes. This is what I expected to hear but wanted it to be clear for compilers generated by my parser generator tool as you did. Ralph -- best, Eliot
Eliot Miranda wrote:
Hi Ben,
On Nov 8, 2014, at 3:35 PM, Ben Coman btc@openInWorld.com wrote:
Eliot Miranda wrote:
On Sat, Nov 8, 2014 at 11:21 AM, Ralph Boland <rpboland@gmail.com mailto:rpboland@gmail.com> wrote: > Hi Ralph, ... > > > > I was aware of caseOf: in Squeak. I always found it awkward to use and > > felt a true case statement would be simpler. Alas, it's impossible to > > have a true case statement added to Smalltalk now I think. > So what's a "true" case statement? For me, at least, the Squeak one *is*, > and is more general than one limited to purely integer keys, as for example > is C's switch statement. A number of languages provide case statements > that are like Squeak's. What do you consider a "true" case statement? I mean that: caseOf: is not part of the language itself but rather part of the standard library or set of packages that one finds in the IDE. To be part of the language it would need to be something the compiler is aware of. Ah OK. I see what you mean. But you're wrong on a few counts. First, there are *no* control structures in the language beyond closures and polymorphism. ifTrue:, to:do:, and: whileTrue: et al are all defined in the library, not by the compiler. Second, tehse structures, /including/ caseOf: are understood by the compiler and compiled to non-message-sending code. So none of the blocks in caseOf:, ifTrue: and: whileTrue: et al, the optimized selectors, are created and all are inlined by the compiler. So a) by your criterion of being in the compiler caseOf: is in the language, but b) it all control structures in Smalltalk are defined in the library, and some are optimized by the compiler.
Reviewing the code for the following is enlightening: True>ifTrue: True>>ifFalse: False>>ifTrue: False>>ifFalse: to see as the original implementation, but remembering that as an optimization these are inlined, so that code is currently not executed.
Eliot, Would I be right to presume that the Interpreter does execute those methods without optimisation?
The interpreter directly executes the bytecode produced by the compiler. Go look. So it depends in how the code base is compiled. Right now the interpreter does *not* , because inlined blocks, conditional branches and jumps are much faster than closure creation and messages. The interpreter benefits a lot from this; early Smalltalk implementations were interpreted hence the optimisation in the first place. However, with adaptive optimisation one can allow the JIT to perform the optimisation in context, allowing alternative implementations of ifTrue: et al in other than booleans. In Sista we've chosen not to do that, keeping inlining and using conditional branches as our performance counters. But it may allow the compiler to be smart and optimize these forms in fewer cases.
Ahh. I was thinking about it the wrong way. To check.. inlined means inlined-bytecode not inlined-machine-code? And the result of compilation is the same bytecode to run on the VM regardless of whether that VM is the Intepreter or Cog ?
(And indeed the compiler is itself running in-image on top of the VM). cheers -ben
cheers -ben
That is to day the Smalltalk language is not very much. Smalltalk (Squeak) the language would not include Sets or Dictionaries but would include (some) Array classes because some aspects of Arrays are dealt with directly by the compiler. There is a syntactic form for creating Array, but really the notion that the Smalltalk compiler defines the language is a limited one. It's fair to say that language is defined by a small set of variables, return, blocks, an object representation (ability to create classes that define a sequence of named inst vars and inherit from other classes), and message lookup rules (normal sends and super sends), and a small number of literal forms (Array, Integer, Float, Fraction, ByteArray, String and Symbol literals), and a method syntax. The rest is in the library. What this really means is that Smalltalk can't be reduced to a language, becaue the anguage doesn't defne enough. Instead it is a small language and a large library. Selectors such as ifTrue: and to:do: are part of the language because they are inlined by the compiler. No. One can change the compiler to not inline them. This is merely an optimization. Put another way, if I could get my doBlockAt: method incorporated into the Squeak IDE it would nevertheless NOT be part of Squeak the language. The consequence of caseOf: not being part of the language is that the compiler/VM cannot perform optimizations when caseOf: is run into but must treat it as user written code. Squeak's caseOf: is more general than C's switch statement but it could be more general in that there is a hard coded message (=). I would like to be able to replace the '=' message by an arbitrary binary operator such as includes: or '>'. I have to backtrack here: I looked at the code and it looks like the compiler inlines caseOf: and caseOf:otherwise. If so then these selectors are part of the language by my definition. Well, live and learn :-) ... > > But I wouldn't want to be forced to implement my FSMs this way. > > It might be acceptable for small FSMs. > > I want to avoid sequential search and > > even binary search might be rather expensive. > > I look at computed gotos as the solution but, > > as you pointed out, computed gotos pose problems for JIT. > > Admittedly, for large FSM's, it might be best or necessary to > > use a FSM simulator anyway, as I do now. > Nah. One should always be able to map it down somehow. Tis will be easier > with the Spur instruction set which lifts number of literals and length of > branches limits. Good to hear. > > Again, for my FSM, case this would often be considered to be good. > > But if the state transition tables are sparse then Dictionaries > > might be preferable to Arrays. > > Yes, but getting to the limit of what the VM can reasonably interpret. > Better would be an Array of value. pc pairs, where the keys are the values > the switch bytecode compares top of stack against, and the pcs are where to > jump to on a match. The JIT can therefore implement the table as it sees > fit, whereas the interpreter can just do a linear search through the Array. I am looking at this from the point of view of a compiler writer/generator and consider your proposal as inadequate for my needs. You, I think, are looking at this from the point of view of a VM writer and what can reasonably be delivered. I don't think what I want is overly difficult for the interpreter to deliver but as you pointed out, and you know much better than I, what I want causes serious problems for the VM. > > My expection is that at: be sent to the collection object > > to get the address to go to. Knowing that the collection > > is an array though makes it easier for the compiler/VM to > > ensure that the addresses stored in the collection are valid. > > Actually, the compiler will be generating the addresses. > > Does the VM have absolute trust in the compiler to generate valid > > addresses? > Yes. Generate bad bytecode and the VM crashes. This is what I expected to hear but wanted it to be clear for compilers generated by my parser generator tool as you did. Ralph -- best, Eliot
Hi Ben!
On Nov 8, 2014, at 4:20 PM, Ben Coman btc@openInWorld.com wrote:
Eliot Miranda wrote:
Hi Ben, On Nov 8, 2014, at 3:35 PM, Ben Coman btc@openInWorld.com wrote:
Eliot Miranda wrote:
On Sat, Nov 8, 2014 at 11:21 AM, Ralph Boland <rpboland@gmail.com mailto:rpboland@gmail.com> wrote: > Hi Ralph, ...
I was aware of caseOf: in Squeak. I always found it awkward to
use and
felt a true case statement would be simpler. Alas, it's
impossible to
have a true case statement added to Smalltalk now I think.
So what's a "true" case statement? For me, at least, the Squeak
one *is*,
and is more general than one limited to purely integer keys, as
for example
is C's switch statement. A number of languages provide case
statements
that are like Squeak's. What do you consider a "true" case
statement? I mean that: caseOf: is not part of the language itself but rather part of the standard library or set of packages that one finds in the IDE. To be part of the language it would need to be something the compiler is aware of. Ah OK. I see what you mean. But you're wrong on a few counts. First, there are *no* control structures in the language beyond closures and polymorphism. ifTrue:, to:do:, and: whileTrue: et al are all defined in the library, not by the compiler. Second, tehse structures, /including/ caseOf: are understood by the compiler and compiled to non-message-sending code. So none of the blocks in caseOf:, ifTrue: and: whileTrue: et al, the optimized selectors, are created and all are inlined by the compiler. So a) by your criterion of being in the compiler caseOf: is in the language, but b) it all control structures in Smalltalk are defined in the library, and some are optimized by the compiler.
Reviewing the code for the following is enlightening: True>ifTrue: True>>ifFalse: False>>ifTrue: False>>ifFalse: to see as the original implementation, but remembering that as an optimization these are inlined, so that code is currently not executed.
Eliot, Would I be right to presume that the Interpreter does execute those methods without optimisation?
The interpreter directly executes the bytecode produced by the compiler. Go look. So it depends in how the code base is compiled. Right now the interpreter does *not* , because inlined blocks, conditional branches and jumps are much faster than closure creation and messages. The interpreter benefits a lot from this; early Smalltalk implementations were interpreted hence the optimisation in the first place. However, with adaptive optimisation one can allow the JIT to perform the optimisation in context, allowing alternative implementations of ifTrue: et al in other than booleans. In Sista we've chosen not to do that, keeping inlining and using conditional branches as our performance counters. But it may allow the compiler to be smart and optimize these forms in fewer cases.
Ahh. I was thinking about it the wrong way. To check.. inlined means inlined-bytecode not inlined-machine-code? And the result of compilation is the same bytecode to run on the VM regardless of whether that VM is the Intepreter or Cog ?
Exactly.
(And indeed the compiler is itself running in-image on top of the VM). cheers -ben
Right. And in Sista even the adaptive optimizer runs in-image whereas in almost every other adaptive optimizing/speculative inlining VM the optimizer is in-VM.
cheers -ben
That is to day the Smalltalk language is not very much. Smalltalk (Squeak) the language would not include Sets or Dictionaries but would include (some) Array classes because some aspects of Arrays are dealt with directly by the compiler. There is a syntactic form for creating Array, but really the notion that the Smalltalk compiler defines the language is a limited one. It's fair to say that language is defined by a small set of variables, return, blocks, an object representation (ability to create classes that define a sequence of named inst vars and inherit from other classes), and message lookup rules (normal sends and super sends), and a small number of literal forms (Array, Integer, Float, Fraction, ByteArray, String and Symbol literals), and a method syntax. The rest is in the library. What this really means is that Smalltalk can't be reduced to a language, becaue the anguage doesn't defne enough. Instead it is a small language and a large library. Selectors such as ifTrue: and to:do: are part of the language because they are inlined by the compiler. No. One can change the compiler to not inline them. This is merely an optimization. Put another way, if I could get my doBlockAt: method incorporated into the Squeak IDE it would nevertheless NOT be part of Squeak the language. The consequence of caseOf: not being part of the language is that the compiler/VM cannot perform optimizations when caseOf: is run into but must treat it as user written code. Squeak's caseOf: is more general than C's switch statement but it could be more general in that there is a hard coded message (=). I would like to be able to replace the '=' message by an arbitrary binary operator such as includes: or '>'. I have to backtrack here: I looked at the code and it looks like the compiler inlines caseOf: and caseOf:otherwise. If so then these selectors are part of the language by my definition. Well, live and learn :-) ...
But I wouldn't want to be forced to implement my FSMs this way. It might be acceptable for small FSMs. I want to avoid sequential search and even binary search might be rather expensive. I look at computed gotos as the solution but, as you pointed out, computed gotos pose problems for JIT. Admittedly, for large FSM's, it might be best or necessary to use a FSM simulator anyway, as I do now.
Nah. One should always be able to map it down somehow. Tis will
be easier
with the Spur instruction set which lifts number of literals and
length of
branches limits.
Good to hear.
Again, for my FSM, case this would often be considered to be good. But if the state transition tables are sparse then Dictionaries might be preferable to Arrays.
Yes, but getting to the limit of what the VM can reasonably
interpret.
Better would be an Array of value. pc pairs, where the keys are
the values
the switch bytecode compares top of stack against, and the pcs
are where to
jump to on a match. The JIT can therefore implement the table as
it sees
fit, whereas the interpreter can just do a linear search through
the Array. I am looking at this from the point of view of a compiler writer/generator and consider your proposal as inadequate for my needs. You, I think, are looking at this from the point of view of a VM writer and what can reasonably be delivered. I don't think what I want is overly difficult for the interpreter to deliver but as you pointed out, and you know much better than I, what I want causes serious problems for the VM.
My expection is that at: be sent to the collection object to get the address to go to. Knowing that the collection is an array though makes it easier for the compiler/VM to ensure that the addresses stored in the collection are valid. Actually, the compiler will be generating the addresses. Does the VM have absolute trust in the compiler to generate valid addresses?
Yes. Generate bad bytecode and the VM crashes.
This is what I expected to hear but wanted it to be clear for
compilers generated by my parser generator tool as you did. Ralph -- best, Eliot
Eliot (phone)
vm-dev@lists.squeakfoundation.org