A fine idea. Are you volunteering?
Well, here you are :)
The following rules use EBNF notations. Terminal symbols are enclosed in double quotes. [ ] means optional occurence, { } zero or more occurences and | is an alternative. These are the main rules for Squeak's variant of Smalltalk, as implemented in the class Parser.
TODO: Define word, binary, string, number, scannerLiteral.
start = method.
method = pattern temporaries primitive statements. NOTE: Compiler asures that no other tokens follow.
pattern = unaryPattern | binaryPattern | keywordPattern.
unaryPattern = unarySelector.
unarySelector = word.
binaryPattern = binarySelector argument.
binarySelector = (binary | "|"). NOTE: | isn't a binary and || isn't allowed at all; this might be an error.
keywordPattern = keyword argument {keyword argument}.
keyword = word ":". NOTE: Actually, this is already detected by the scanner.
argument = word. NOTE: The argument must not already been declared. Write access is not allowed for arguments.
temporaries = [ "|" {variable} "|" ]. NOTE: The variable must not already been declared. || (without whitespaces) is allowed because || isn't a binarySelector.
variable = word.
primitive = [ "<" primitiveDecl ">" ].
primitiveDecl = [ "primitive:" number ]. NOTE: you can omit the complete declaration. In this case, a primitive nil is returned. You can also use floats for the number. Both seems to be errors in the compiler
statements = returnStmt | expressionStmt NOTE: if there's another statement after a return, the compiler notify us that it expects the block end, regardless whether we're in a block or not.
returnStmt = "^" expression [ "." ].
expressionStmt = [ expression {"." statements} [ "." ] ]. NOTE: If no expression is available, method bodies are replaced with "self" and empty blocks with "nil".
expression = (assignment | braceAssign | primaryExpr) messagePart
assignment = variable assigner expression. NOTE: variable must be bound.
assigner = "_" | ":=". NOTE: "_" is actually the left arrow.
braceAssign = brace assigner expression. NOTE: I don't understand this.
primaryExpr = varExpr | blockExpr | braceExpr | "(" expression ")" | literal.
varExpr = variable NOTE: Variable must be bound or one of the following special variables: false, nil, self, super, thisContext, true and homeContext.
blockExpr = "[" [ ":" argument {":" argument} "|" ] statements "]" NOTE: The arguments may shadow other declarations. Write access is not allowed for arguments.
braceExpr = "{" [ expression {"." expression} ] "}".
literal = string | number | ("-" number) | scannerLiteral. NOTE: ScanneLiterals include symbols, characters, literal arrays, etc.
messagePart = [ messagePart3 {messagePart3} cascade ].
messagePart3 = keywordMsgPart | messagePart2. NOTE: This somewhat complex scheme is used to express the precedence rules of different message types.
keywordMsgPart = keywordPart {keywordPart}. NOTE: All keywordParts are concatenated to one message send.
keywordPart = keyword primaryExpr messagePart2 {messagePart2}.
messagePart2 = binaryMsgPart | messagePart1.
binaryMsgPart = binarySelector messagePart1 {messagePart1}.
messagePart1 = unaryMessagePart.
unaryMsgPart = unarySelector.
cascade = {";" messagePart3}. NOTE: Compiler checks whether cascade is allowed. You cannot cascade super sends and sends to special selectors like ifTrue: which are replaced by the compiler.
method = pattern temporaries primitive statements. NOTE: Compiler asures that no other tokens follow.
pattern = unaryPattern | binaryPattern | keywordPattern.
unaryPattern = unarySelector.
unarySelector = word.
binaryPattern = binarySelector argument.
binarySelector = (binary | "|"). NOTE: | isn't a binary and || isn't allowed at all; this might be an error.
keywordPattern = keyword argument {keyword argument}.
keyword = word ":". NOTE: Actually, this is already detected by the scanner.
argument = word. NOTE: The argument must not already been declared. Write access is not allowed for arguments.
temporaries = [ "|" {variable} "|" ]. NOTE: The variable must not already been declared. || (without whitespaces) is allowed because || isn't a binarySelector.
variable = word.
primitive = [ "<" primitiveDecl ">" ].
primitiveDecl = [ "primitive:" number ]. NOTE: you can omit the complete declaration. In this case, a primitive nil is returned. You can also use floats for the number. Both seems to be errors in the compiler
statements = returnStmt | expressionStmt NOTE: if there's another statement after a return, the compiler notify us that it expects the block end, regardless whether we're in a block or not.
returnStmt = "^" expression [ "." ].
expressionStmt = [ expression {"." statements} [ "." ] ]. NOTE: If no expression is available, method bodies are replaced with "self" and empty blocks with "nil".
expression = (assignment | braceAssign | primaryExpr) messagePart
assignment = variable assigner expression. NOTE: variable must be bound.
assigner = "_" | ":=". NOTE: "_" is actually the left arrow.
braceAssign = brace assigner expression. NOTE: I don't understand this.
primaryExpr = varExpr | blockExpr | braceExpr | "(" expression ")" | literal.
varExpr = variable NOTE: Variable must be bound or one of the following special variables: false, nil, self, super, thisContext, true and homeContext.
blockExpr = "[" [ ":" argument {":" argument} "|" ] statements "]" NOTE: The arguments may shadow other declarations. Write access is not allowed for arguments.
braceExpr = "{" [ expression {"." expression} ] "}".
literal = string | number | ("-" number) | scannerLiteral. NOTE: ScanneLiterals include symbols, characters, literal arrays, etc.
messagePart = [ messagePart3 {messagePart3} cascade ].
messagePart3 = keywordMsgPart | messagePart2. NOTE: This somewhat complex scheme is used to express the precedence rules of different message types.
keywordMsgPart = keywordPart {keywordPart}. NOTE: All keywordParts are concatenated to one message send.
keywordPart = keyword primaryExpr messagePart2 {messagePart2}.
messagePart2 = binaryMsgPart | messagePart1.
binaryMsgPart = binarySelector messagePart1 {messagePart1}.
messagePart1 = unaryMessagePart.
unaryMsgPart = unarySelector.
cascade = {";" messagePart3}. NOTE: Compiler checks whether cascade is allowed. You cannot cascade super sends and sends to special selectors like ifTrue: which are replaced by the compiler.
bye -- Stefan Matthias Aust // Are you ready to discover the twilight zone?
A fine idea. Are you volunteering?
Well, here you are :)
<RANT>
But... but... but... this is only syntax. Surely you need to give a denotational semantics too. Better dust off that lambda calculus. But wait! Can't I tweek the compiler internals too? Even at run time? My brain hurts!
Look! Squeak is a marvelous and fluid experiment from which we have a very nice interctive environment (the squeak whole), a nice new graph- ical environment (morphic) even if it's in the early stages. As a fluid experiment, the nature of the beast will continue to change in substan- tial and perhaps surprising ways. This is the art that precedes the engineering.
Don't get me wrong; I think when you do have the answer, you need to be able to unambiguously specify its implementation, but I think folks are trying to discover new answers rather than hash out known ones. Think about how much we learned from Lisp with many distinct implementations with varied properties creeping up. I at least enjoy the glimpse into more-or-less unfettered exploration, and I am quite satisfied with hack- ing and reading the code to learn about the state of things. Let's not march with blinders in one direction quite yet. When squeak sets (pet- rifies?) or even slows, then someone can mathematize it. Upto that point, it chills radical invention.
Hard specification means play time is over.
</RANT>
squeak-dev@lists.squeakfoundation.org