Hi,
I have a question that I think the "original" Smalltalk designers could help me understand:
I always wondering why Smalltalk method signature (not keywords like self, super etc.,) are CaSe SENSITIVE. Was this by design or due to implementation constraints? I can understand why static languages are case sensitive(e.g. more efficient compiling) but for dynamic languages like Smalltallk, I can think of more advantages of being non-case sensitive rather than case sensitive. Some obvious advantages of having Smalltalk non-case sensitive that I can think of are:
- reduce typo runtime errors - polymorphism is easier due to lesser constraint (especially when the method signature is long) - reduce ambigous methods that have same name but different cases etc,
Of course, the Smalltalk compiler must not allow the same method define if the names are the same but the case is different. Implementation wise, I think it should not be that hard (I have prototyped that in IBM Smalltalk a while ago, e.g. changing >>doesNotUnderstand: for quick hack). Therefore, I think I must have missed something. I sure hate seeing different Smalltalk dialects implements the same method with the same name but different case. I think one of those classic example is >>asUppercase versus >>asUpperCase. This creates portablitiy problem between multi-vendor dialects. Maybe this should consider to be part of ANSI standard too.
I appreciate any comments.
Thanks.
-- Mark Wai Frontier Systems Architecture Inc. mailto: mwai@ibm.net or:[ mwai@frontiersa.com] __
I can understand why static languages are case sensitive(e.g. more efficient compiling) but for dynamic languages like Smalltallk, I can think of more advantages of being non-case sensitive rather than case sensitive.
From my point of view (I am _not_ one of the "original" smalltalk
designers) it's just the other way around. In a static language you have all the time you need for compiling and linking and doing case insensitive comparisons. Smalltalk, however, does lots of things with its selectors. You can "perform:" them which is in basically a lookup of the signature. In principle, this happens every time a message is sent. I'd say if the method lookup takes longer for case-insensitive comparisons this a very good reason to stay case insensitive.
Just my thoughts, Andreas
From my point of view (I am _not_ one of the "original" smalltalk designers) it's just the other way around. In a static language you have all the time you need for compiling and linking and doing case insensitive comparisons. Smalltalk, however, does lots of things with its selectors. You can "perform:" them which is in basically a lookup of the signature. In principle, this happens every time a message is sent. I'd say if the method lookup takes longer for case-insensitive comparisons this a very good reason to stay case insensitive.
I am not advocating for case-insensitity but in response to the message above, if one were to adopt this policy:
(1) It would seem to me selectors could be replaced by all upper- or lower-case selectors at compile (i.e. "accept") time. Then all lookups would be canonical.
(2) Common Lisp and other Lisps are case-insensitive and work fine in an interactive environment. Its syntax is more suited though since it can easily use - as a separator rather than _ which requires a <shift> key.
(3) WithoutASeparatorLike_or-IdGuessMostPeopleWouldStillTypeLikeThis
-- Patrick Logan mailto:patrickl@gemstone.com Voice 503-533-3365 Fax 503-629-8556 Gemstone Systems, Inc http://www.gemstone.com
Andreas Raab writes:
From my point of view (I am _not_ one of the "original" smalltalk
designers) it's just the other way around. In a static language you have all the time you need for compiling and linking and doing case insensitive comparisons. Smalltalk, however, does lots of things with its selectors. You can "perform:" them which is in basically a lookup of the signature. In principle, this happens every time a message is sent. I'd say if the method lookup takes longer for case-insensitive comparisons this a very good reason to stay case insensitive.
(That last word is probably supposed to be "sensitive".)
Method lookup speed should not depend on case (in)sensitivity at all. What is compared during lookup are the *identities* of selectors. Those selectors, Symbols, are created (or fetched from the Symbol table) at compilation time and stored in the CompiledMethods' literal frames and message dictionaries. Lookup machinery works with those Symbols' object pointers (and probably the pointers' hash values). Case does not matter at all.
Case insensitivity can be trivially introduced by converting all relevant symbols (those identified as selectors by the Parser) to upper or lower case during compilation. This is how most Lisp and Pascal implementations solve the issue. All it would take to make Smalltalk case-insensitive would likely be a single "asLowercase" in the right place in the Parser. (OK, that doesn't mean you can just go ahead and do that in an existing Smalltalk system, for obvious reasons). The price of insensitivity would thus be one #asLowercase per symbol during compilation, which is minuscule. So, if there actually is/was a reason, it is not performance.
Note however, that case does play a semantic role in variable names: capitalization determines a variable scope. In a language with case-sensitive variable names, it is quite natural to expect case-sensitive func... oops, selectors -- if only for the sake of consistency.
Finally, case sensitivity might simply be a matter of someone's personal concept of The Right Thing (tm).
--Vassili
Mark:
I'm not one of the original guys of whom you asked the question, but I can't resist commenting anyway.
I've always hated systems that take what I write and change it to something else. If I write #examineSituationAndTakePositiveAction I will not be happy trying to read #examinesituationandtakepositiveaction, both because it is nearly unreadable and because it is not what I wrote. WinDoze drives me nuts with its insistance it know just how file names should be written, no matter what I wrote.
I suspect that in the overall scheme of things there are relatively few runtime errors due to typos, and those get caught right quickly. The #asLowerCase versus #asLowerCase difference is trivial to fix when porting: just add one new method that invokes the other one.
I don't see how polymorphism would be easier. As it now works, the method name is a symbol, and when a real lookup is done it simply has to find a matching symbol, which is really cheap since symbols have unique object pointers. The length of the name is not an issue, and you may note that IBM Smalltalk can inline the comparison of two symbols by just comparing two 32-bit object pointers.
If case didn't matter, and various cased names were considered equivalent, then it would seem that lookup has to do a character-by-character comparison of each method name.
If all names are mashed to some upper or lower case equivalent, and then converted to symbols, it won't be any faster; it'll just be harder to read.
If you hide the mashing so that the code still shows the uppercase version but the 'real' name is unicase, then what should this answer?
#aSymbolJustInCase == #asymboljustincase
What if I pass #aSymbolJustInCase and the receiver does a #perform:? Should perform create a lowercase symbol and then do the lookup? (Creating new symbols is usually quite expensive).
My guess is that your IBM Smalltalk implementation is really slow because it has to create new symbols. I've just been through tuning a program that uses a lot of symbols (but not as method selectors) and allows mixed case. I finally had to preprocess all the symbols rather than convert then as I ran into them (often many times). I also wrote a special method (IBM Smalltalk only - it uses private junk) for converting symbols to lower case that only does the conversion when it has to:
!Symbol publicMethods !
asLowercaseSymbol " Answer a symbol converted to lower case. Since this is a very expensive operation the obvious code: aSymbol asLowercase asSymbol is not recommended. This method first checks to see if it is necessary to perform the conversion and does so only if an uppercase character is found. "
[ :element :index | | newChar | newChar := CurrentLCCType asLowercase: element. newChar ~= element ifTrue: [ ^ self asLowercase asSymbol ] ] applyWithIndex: self from: 1 to: self size.
^ self! !
I just tried a case (in Squeak 1.3) where the doesNotUnderstand: method was:
doesNotUnderstand: aMessage aMessage selector == #ASDF ifTrue: [ ^ self perform: #asdf withArguments: aMessage arguments ]. ^ self perform: aMessage selector asLowercase asSymbol withArguments: aMessage arguments!
This was in a class with an #asdf method but no #ASDF or #aSdF methods. I then tried two cases, one sending #ASDF and one sending #aSdF. The first case took 133 microseconds for 10,000 sends and the second took 9833 microseconds. (I did not measure an empty test so the true ratio is probably even worse.)
I won't guess what The Designer really had in mind; I've guessed wrong several times. It might be that He'll speak to us. :-)
Dave
At 16:19 -0500 1/27/98, Mark Wai wrote:
Hi,
I have a question that I think the "original" Smalltalk designers could help me understand:
I always wondering why Smalltalk method signature (not keywords like self, super etc.,) are CaSe SENSITIVE. Was this by design or due to implementation constraints? I can understand why static languages are case sensitive(e.g. more efficient compiling) but for dynamic languages like Smalltallk, I can think of more advantages of being non-case sensitive rather than case sensitive. Some obvious advantages of having Smalltalk non-case sensitive that I can think of are:
- reduce typo runtime errors
- polymorphism is easier due to lesser constraint (especially when the
method signature is long)
- reduce ambigous methods that have same name but different cases
etc,
Of course, the Smalltalk compiler must not allow the same method define if the names are the same but the case is different. Implementation wise, I think it should not be that hard (I have prototyped that in IBM Smalltalk a while ago, e.g. changing >>doesNotUnderstand: for quick hack). Therefore, I think I must have missed something. I sure hate seeing different Smalltalk dialects implements the same method with the same name but different case. I think one of those classic example is >>asUppercase versus >>asUpperCase. This creates portablitiy problem between multi-vendor dialects. Maybe this should consider to be part of ANSI standard too.
I appreciate any comments.
Thanks.
-- Mark Wai Frontier Systems Architecture Inc. mailto: mwai@ibm.net or:[ mwai@frontiersa.com] __
_______________________________ David N. Smith IBM T J Watson Research Center Hawthorne, NY _______________________________ Any opinions or recommendations herein are those of the author and not of his employer.
squeak-dev@lists.squeakfoundation.org