I also would like to see a standard Squeak parser that is clearer and easier to work with. Another starting point might be the refactoring- browser's parser. Why not?! It is under scrutiny by a large number of people, and it's been proven useful for refactoring, rule-based code rewriting, and for "lint" checking. I haven't looked real closely at the RB's parser, but from the outside it sounds promising.
But anyway, what would people really like out of a parser? Here are some possibilities:
1. A clear parse-tree structure and symbol table. This was my primary motivation behind finally writing a custom parser for Lucid: Squeak's parser appears to be highly geared towards compilation, and it loves to use encoded integers that end up embedded in byte codes. Also, for analyzing code in the abstract, it is nice to have explicit nested symbol tables that can be inspected after the parser has finished.
2. Ability to generate parse trees from code, not just Smalltalk code. This is useful for parsing alternative languages like Prolog.
3. Ability to execute statements straight out of a parse tree. Or at the least, the ability to generate a block from a parse tree which can later be evaluated. (blocks and methods are very similar in the abstract, but they aren't in Squeak). It just seems like this should be possible, due to the nature of what a parse tree is.
4. Ability to make simple pattern-based modifications to a parse tree. This is useful for partial evaluation, as well as anywhere that the refactoring browser's rewrite tool would be useful.
5. Ability to clone a parse tree into a different format, so that, eg, the CCodeGenerator can continue to work.
-Lex
Dwight Hughes dwighth@ipa.net wrote:
Hal's Squeak compiler stuff is at http://www.hellblazer.com/personal/squeak/
Hal approached the design of the new compiler in a very "engineered" fashion -- the result is somewhat larger and structurally more complex than the current design. While it might be more easily extended in some ways, I found it to be rather unsatisfying since it added overall complexity in an area I would most like to see simplified and clarified instead.
-- Dwight
Les Tyrrell wrote:
About the compiler: Once upon a time, Hal Hildebrand wrote a parser/compiler for Squeak which he called "Loki". I meant to give you a nice tidy web reference to this earlier, but I have not been able to find it. I could give you a link to the source files, but the laptop I have them on is dead right now.
So, perhaps someone out there might have a link or more information?
- les
Lex -
- Ability to execute statements straight out of a parse tree. Or at
the least, the ability to generate a block from a parse tree which can later be evaluated. (blocks and methods are very similar in the abstract, but they aren't in Squeak). It just seems like this should be possible, due to the nature of what a parse tree is.
I presume you already use this in your type inferencer, since it allows you to execute an existing method in the space of the types of its parameters and receiver, to produce the type of the result. So I'm strongly in favor of this capability.
- Dan
Lex Spoon wrote:
- A clear parse-tree structure and symbol table. This was my primary
motivation behind finally writing a custom parser for Lucid: Squeak's parser appears to be highly geared towards compilation, and it loves to use encoded integers that end up embedded in byte codes.
In my view, the biggest problem is that optimization is applied as part of parsing, ie. you can't get a parse tree without having the transformations of ifTrue:ifFalse:, to:do:, etc. applied to it. There should be a "pure parsing" switch somewhere; it shouldn't be too hard to do, but I've been able to circumvent this problem so far.
(blocks and methods are very similar in the abstract, but they aren't in Squeak).
I've also thought about why MethodNode isn't a subclass of BlockNode--it would make parse tree analysis simpler in many cases. Would it cause trouble for some aspect of compilation that hasn't occurred to me? Or perhaps it just didn't matter much for compilation as such--methods and blocks compile rather differently, after all.
Mats Nygren wrote:
The Parser will not inherit from Scanner but will refer to a Scanner. This adds flexibility which is sometimes good to have. And it is in accordance with my general view of how to handle syntax. I don't think it will run slower but I havn't measured it.
In general, I wouldn't worry about compiler speed too much/too early. Most of the time is taken by reading from/writing to disk (I think it was about 60%)--this is true even for my partial evaluator. (Using the hack to generate parse trees by decompiling bytecodes instead of parsing source from disk gave me a ~500% speed increase in partial evaluation, at the only cost of losing the comments.)
Henrik wrote:
In my view, the biggest problem is that optimization is applied as part of parsing, ie. you can't get a parse tree without having the transformations of ifTrue:ifFalse:, to:do:, etc. applied to it. There should be a "pure parsing" switch somewhere; it shouldn't be too hard to do, but I've been able to circumvent this problem so far.
Back when I was working on my source code formatter I cloned Parser and modified the clone so that it would not do these optimizations. I've been playing around with these recently, so if you would like I can make this as well as my visitor framework available with a better naming scheme than what I used the first time around. ( the old one is still at the old address I mentioned long ago ).
Gotta run,
- les
squeak-dev@lists.squeakfoundation.org