negative matching using Smacc

List overview All Threads
Download

newer

older

Symbol>>capitalized?

Re: Design Principles Behind...

Damien Pollet

30 Dec 2006 30 Dec '06

5:56 a.m.

Hi,

I'm using SmaCC to parse a line-oriented file format. I'm only interested in some lines that begin with known keywords, and I want to ignore all other lines without knowing their format.

At the moment I have a scanner rule like this: File : ((InterestingLine | JunkLine) <newline>)* ;

but how do I define JunkLine to match anything BUT interesting lines?

-- Damien Pollet type less, do more [ | ] http://typo.cdlm.fasmz.org

Show replies by date

Lukas Renggli

30 Dec 30 Dec

9:02 a.m.

...

I'm using SmaCC to parse a line-oriented file format. I'm only interested in some lines that begin with known keywords, and I want to ignore all other lines without knowing their format.

At the moment I have a scanner rule like this: File : ((InterestingLine | JunkLine) <newline>)* ;

Have a look at Pier (or SmallWiki) they do line-based paring using SmaCC.

Also make sure that you are using a 3.9 image, because before there were some bugs in Character that made it impossible to write a line-based parser with SmaCC.

Cheers, Lukas

-- Lukas Renggli http://www.lukas-renggli.ch

Damien Pollet

3:37 p.m.

2006/12/30, Lukas Renggli renggli@gmail.com:

...

Have a look at Pier (or SmallWiki) they do line-based paring using SmaCC.

Thanks, but in Pier's case there are simple patterns for all possible line beginnings, e.g. ++ matches a non-link.

In my case if I want to match lines that don't begin with abc, I have to write ([^a] | a[^b] | ab[^c]) .* I actually have 4 keywords, each about 8 chars long, so the pattern is already tedious to write...

-- Damien Pollet type less, do more [ | ] http://typo.cdlm.fasmz.org

Damien Cassou

11:01 a.m.

Damien Pollet a écrit :

...

Hi,

I'm using SmaCC to parse a line-oriented file format. I'm only interested in some lines that begin with known keywords, and I want to ignore all other lines without knowing their format.

At the moment I have a scanner rule like this: File : ((InterestingLine | JunkLine) <newline>)* ;

but how do I define JunkLine to match anything BUT interesting lines?

You should be able to define a priority between rules.

Damien Pollet

3:45 p.m.

2006/12/30, Damien Cassou damien.cassou@laposte.net:

...

...
but how do I define JunkLine to match anything BUT interesting lines?

You should be able to define a priority between rules.

I don't know how besides ordering them. But does it really solve my problem, since the two rules still match (and the junk one possibly matches a longer input since it is much more generic) ?

In fact I'm getting an error while compiling the parser: "A block compiles more than 1K bytes of code"

-- Damien Pollet type less, do more [ | ] http://typo.cdlm.fasmz.org

Lukas Renggli

31 Dec 31 Dec

10:52 a.m.

...

...
You should be able to define a priority between rules.

I don't know how besides ordering them. But does it really solve my problem, since the two rules still match (and the junk one possibly matches a longer input since it is much more generic) ?

Yes, rules defined first match first. As far as I experienced this is only true for the parser and not the scanner, but I don't know the details.

...

In fact I'm getting an error while compiling the parser: "A block compiles more than 1K bytes of code"

This means that one of your scanner reg-exp is too complicated, so that it cannot be compiled into one method. VisualWorks doesn't have this problem, as their methods work up to a couple of GB.

To come back to your original problem, I think that the following code should do what you request (not actually tried out):

Scanner: <newline> : \r \n | \n | \r ; <any> : [^\r\n]+ ;

Parser: Start : | Line | Start <newline> Line | Start <newline> ; Line : "abc" <any> { Transcript show: '--> '; show: '2' value; cr } | <any> { Transcript show: '1' value; cr } ;

-- Lukas Renggli http://www.lukas-renggli.ch

6347

Age (days ago)

6348

Last active (days ago)

squeak-dev@lists.squeakfoundation.org

5 comments

3 participants

tags (0)

participants (3)

Damien Cassou
Damien Pollet
Lukas Renggli