Hi,
I'm using SmaCC to parse a line-oriented file format. I'm only interested in some lines that begin with known keywords, and I want to ignore all other lines without knowing their format.
At the moment I have a scanner rule like this: File : ((InterestingLine | JunkLine) <newline>)* ;
but how do I define JunkLine to match anything BUT interesting lines?
I'm using SmaCC to parse a line-oriented file format. I'm only interested in some lines that begin with known keywords, and I want to ignore all other lines without knowing their format.
At the moment I have a scanner rule like this: File : ((InterestingLine | JunkLine) <newline>)* ;
Have a look at Pier (or SmallWiki) they do line-based paring using SmaCC.
Also make sure that you are using a 3.9 image, because before there were some bugs in Character that made it impossible to write a line-based parser with SmaCC.
Cheers, Lukas
2006/12/30, Lukas Renggli renggli@gmail.com:
Have a look at Pier (or SmallWiki) they do line-based paring using SmaCC.
Thanks, but in Pier's case there are simple patterns for all possible line beginnings, e.g. ++ matches a non-link.
In my case if I want to match lines that don't begin with abc, I have to write ([^a] | a[^b] | ab[^c]) .* I actually have 4 keywords, each about 8 chars long, so the pattern is already tedious to write...
Damien Pollet a écrit :
Hi,
I'm using SmaCC to parse a line-oriented file format. I'm only interested in some lines that begin with known keywords, and I want to ignore all other lines without knowing their format.
At the moment I have a scanner rule like this: File : ((InterestingLine | JunkLine) <newline>)* ;
but how do I define JunkLine to match anything BUT interesting lines?
You should be able to define a priority between rules.
2006/12/30, Damien Cassou damien.cassou@laposte.net:
but how do I define JunkLine to match anything BUT interesting lines?
You should be able to define a priority between rules.
I don't know how besides ordering them. But does it really solve my problem, since the two rules still match (and the junk one possibly matches a longer input since it is much more generic) ?
In fact I'm getting an error while compiling the parser: "A block compiles more than 1K bytes of code"
You should be able to define a priority between rules.
I don't know how besides ordering them. But does it really solve my problem, since the two rules still match (and the junk one possibly matches a longer input since it is much more generic) ?
Yes, rules defined first match first. As far as I experienced this is only true for the parser and not the scanner, but I don't know the details.
In fact I'm getting an error while compiling the parser: "A block compiles more than 1K bytes of code"
This means that one of your scanner reg-exp is too complicated, so that it cannot be compiled into one method. VisualWorks doesn't have this problem, as their methods work up to a couple of GB.
To come back to your original problem, I think that the following code should do what you request (not actually tried out):
Scanner: <newline> : \r \n | \n | \r ; <any> : [^\r\n]+ ;
Parser: Start : | Line | Start <newline> Line | Start <newline> ; Line : "abc" <any> { Transcript show: '--> '; show: '2' value; cr } | <any> { Transcript show: '1' value; cr } ;
squeak-dev@lists.squeakfoundation.org