Manual diff again ...
Best,
Christoph
==================== Summary ====================
Name: Regex-Core-ct.74
Author: ct
Time: 3 November 2021, 2:56:37.143587 pm
UUID: f1beae70-97a0-9e4f-9e5a-3235c0a6e43a
Ancestors: Regex-Core-mt.61
Adds support for unicode backslash syntax in pieces and character sets.
Some examples:
聽聽聽聽'Squeak is the perfect language' allRegexMatches: '\w*\u0061\w*'. "--> #('Squeak' 'language')"
聽聽聽聽'Squeak is beautiful' allRegexMatches: '\w*\x75\w*'. "--> #('Squeak' 'beautiful')"
聽聽聽聽(WebUtils jsonDecode: '"$1.00 = \u20AC0.86 = \u00A30.84"' readStream) allRegexMatches: '\p{Sc}\d+\.[\x31-\u{ar57}]+'. "--> #('?0.86' '?0.84')"
聽聽聽聽'Carpe Squeak!' allRegexMatches: '\p{L}+'. "--> #('Carpe' 'Squeak')"
聽聽聽聽(WebUtils jsonDecode: '" get rid of \u2007all these nonsense\nseparators"' readStream) allRegexMatches: '\P{Z}+'. "--> #('get' 'rid' 'of' 'all' 'these' 'nonsense
separators')"
This is a replacement for Regex-Core-ct.68 (which can be moved to the treated inbox) updated with support for the new syntax inside character sets, inspired by Regex-Core-tobe.62 (this is a counterproposal to Regex-Core-tobe.62). See Regex-Tests-Core-ct.28. The following changes have been made since Regex-Core-ct.68:
- Factored out common parser logic from RxParser and RxCharSetParser into new common superclass RxAbstractParser. Apart from deduplication, this is crucial to use #[uni]codePoint and #unicodeCategory specials in both parsers. (I also considered invoking another RxParser from RxCharSetParser but eventually found this solution more elegant.)
- Split up BackslashSpecials into BackslashPredicates (on RxAbstractParser) and BackslashConditions (only available on RxParser). Escaped uppercase characters (such as '\D') now automatically map to the negation of the lowercase special (see #backslashSpecial:). Deprecated RxsPredicate class >> #forEscapedLetter.
- Cleaned up & deduplicated RxCharSetParser to match the functional style of RxParser. RxCharSetParser is now responsible by itself for handling BackslashConstants.
- Made sure to parse escape characters in the end of a char set range, i.e., allow '[2-\x38]' asRegex and reject '[2-\d]' asRegex (like most other parsers out there do, too).
- Correct maintaining of source position in a RegexSyntaxError that was signaled while parsing a char set. See #testRegexSyntaxErrorPosition.
- Enabled and fixed matching against composed RxCharSets, which can happen now in the case of a pattern like '[\P{L}a]' asRegex.
* Honor case-(in)sensitive matching in nested char sets by appending a #IgnoringCase: argument to #predicate[s|Negation|PartPredicate] on RxsCharSet resp. RxsPredicate.
- Added support for Squeak-style codepoints such as '\x{2r100000}' asRegex matches: ' '.
- Removed superfluos spaces from error messages.
Reuploaded just another time, superseding Regex-Core-ct.71, to fix a small slip in RxsCharSet >> #predicatePartPredicateIgnoringCase:. Thanks to Tom (tobe) for testing! :-)
Requires Kernel-ct.1419 (NumberParser >> #defaultBase:) and Multilingual-ct.259 (Unicode class >> #generalTagOf:).
=============== Diff against Regex-Core-mt.61 ===============
RxAbstractParser
+ Object subclass: #RxAbstractParser
+ 聽聽聽聽instanceVariableNames: 'source lookahead'
+ 聽聽聽聽classVariableNames: 'BackslashConstants BackslashPredicates'
+ 聽聽聽聽poolDictionaries: ''
+ 聽聽聽聽category: 'Regex-Core'
+
+ RxAbstractParser class
+ 聽聽聽聽instanceVariableNames: ''
+
+ "I provide general parsing facilities for all kinds of regex parsers.
+
+ Instance variables:
+ 聽聽聽聽input聽聽聽聽聽聽聽聽<Stream> A stream with the expression being parsed.
+ 聽聽聽聽lookahead聽聽聽聽<Character>聽聽聽聽The current lookahead character."
RxAbstractParser class>>doShiftingSyntaxExceptionPositions:from: {exception signaling} 路 ct 10/28/2021 03:18
+ doShiftingSyntaxExceptionPositions: aBlock from: start
+ 聽聽聽聽"When invoking a nested parser, make sure to update the positions of any syntax exception raised by this nested parser."
+ 聽聽聽聽^ aBlock
+ 聽聽聽聽聽聽聽聽on: RegexSyntaxError
+ 聽聽聽聽聽聽聽聽do: [:ex | ex resignalAs: (ex copy
+ 聽聽聽聽聽聽聽聽聽聽聽聽position: ex position + start - 1;
+ 聽聽聽聽聽聽聽聽聽聽聽聽yourself)]
RxAbstractParser class>>initialize {class initialization} 路 ct 10/27/2021 23:30
+ initialize
+ 聽聽聽聽"self initialize"
+ 聽聽聽聽self
+ 聽聽聽聽聽聽聽聽initializeBackslashConstants;
+ 聽聽聽聽聽聽聽聽initializeBackslashPredicates
RxAbstractParser class>>initializeBackslashConstants {class initialization} 路 ct 10/27/2021 07:50
+ initializeBackslashConstants
+ 聽聽聽聽"self initializeBackslashConstants"
+
+ 聽聽聽聽(BackslashConstants := Dictionary new)
+ 聽聽聽聽聽聽聽聽at: $e put: Character escape;
+ 聽聽聽聽聽聽聽聽at: $n put: Character lf;
+ 聽聽聽聽聽聽聽聽at: $r put: Character cr;
+ 聽聽聽聽聽聽聽聽at: $f put: Character newPage;
+ 聽聽聽聽聽聽聽聽at: $t put: Character tab
RxAbstractParser class>>initializeBackslashPredicates {class initialization} 路 ct 10/27/2021 20:57
+ initializeBackslashPredicates
+ 聽聽聽聽"The keys are characters that normally follow a $\, the values are either associations of classes and initialization selectors on their instance side, or evaluables that will be evaluated on the current parser instance."
+ 聽聽聽聽"self initializeBackslashPredicates"
+
+ 聽聽聽聽(BackslashPredicates := Dictionary new)
+ 聽聽聽聽聽聽聽聽at: $d put: RxsPredicate -> #beDigit;
+ 聽聽聽聽聽聽聽聽at: $p put: #unicodeCategory;
+ 聽聽聽聽聽聽聽聽at: $s put: RxsPredicate -> #beSpace;
+ 聽聽聽聽聽聽聽聽at: $u put: #unicodePoint;
+ 聽聽聽聽聽聽聽聽at: $w put: RxsPredicate -> #beWordConstituent;
+ 聽聽聽聽聽聽聽聽at: $x put: #codePoint.
RxAbstractParser class>>signalSyntaxException: {exception signaling} 路 avi 11/30/2003 13:25
+ signalSyntaxException: errorString
+ 聽聽聽聽RegexSyntaxError new signal: errorString
RxAbstractParser class>>signalSyntaxException:at: {exception signaling} 路 CamilloBruni 10/7/2012 22:50
+ signalSyntaxException: errorString at: errorPosition
+ 聽聽聽聽RegexSyntaxError signal: errorString at: errorPosition
RxAbstractParser>>backslashConstant {parsing} 路 ct 10/27/2021 07:48
+ backslashConstant
+
+ 聽聽聽聽| character node |
+ 聽聽聽聽character := BackslashConstants at: lookahead ifAbsent: [^ nil].
+ 聽聽聽聽self next.
+ 聽聽聽聽node := RxsCharacter with: character.
+ 聽聽聽聽^ node
RxAbstractParser>>backslashNode {parsing} 路 ct 10/28/2021 03:09
+ backslashNode
+
+ 聽聽聽聽| char |
+ 聽聽聽聽lookahead ifNil: [ self signalParseError: 'bad quotation' ].
+ 聽聽聽聽
+ 聽聽聽聽self basicBackslashNode ifNotNil: [:node | ^node].
+ 聽聽聽聽
+ 聽聽聽聽char := lookahead.
+ 聽聽聽聽self next.
+ 聽聽聽聽^ RxsCharacter with: char
RxAbstractParser>>backslashPredicate {parsing} 路 ct 10/27/2021 07:49
+ backslashPredicate
+
+ 聽聽聽聽^ self backslashSpecial: BackslashPredicates
RxAbstractParser>>backslashSpecial: {private} 路 ct 10/28/2021 02:56
+ backslashSpecial: specials
+
+ 聽聽聽聽| negate specialSelector node |
+ 聽聽聽聽negate := false.
+ 聽聽聽聽specialSelector := specials at: lookahead ifAbsent: [
+ 聽聽聽聽聽聽聽聽(lookahead isLetter and: [lookahead isUppercase]) ifTrue: [
+ 聽聽聽聽聽聽聽聽聽聽聽聽negate := true.
+ 聽聽聽聽聽聽聽聽聽聽聽聽specialSelector := specials at: lookahead asLowercase ifAbsent: []].
+ 聽聽聽聽聽聽聽聽specialSelector ifNil: [^ nil]].
+ 聽聽聽聽self next.
+ 聽聽聽聽
+ 聽聽聽聽node := specialSelector isVariableBinding
+ 聽聽聽聽聽聽聽聽ifTrue: [specialSelector key new perform: specialSelector value]
+ 聽聽聽聽聽聽聽聽ifFalse: [specialSelector value: self].
+ 聽聽聽聽negate ifTrue: [node := node negated].
+ 聽聽聽聽^ node
RxAbstractParser>>basicBackslashNode {parsing} 路 ct 10/28/2021 03:03
+ basicBackslashNode
+ 聽聽聽聽
+ 聽聽聽聽self backslashConstant ifNotNil: [:node | ^ node].
+ 聽聽聽聽self backslashPredicate ifNotNil: [:node | ^ node].
+ 聽聽聽聽^ nil
RxAbstractParser>>codePoint {parsing} 路 ct 10/27/2021 20:48
+ codePoint
+
+ 聽聽聽聽^ self codePoint: 2
RxAbstractParser>>codePoint: {parsing} 路 ct 10/27/2021 22:47
+ codePoint: size
+ 聽聽聽聽"Matches a character that has the given code codepoint with the specified size of hex digits, unless braced.
+ 聽聽聽聽<codePoint> ::= \x ({<hex>} '|' <hex>[size])"
+
+ 聽聽聽聽| braced codeString codePoint codeStream |
+ 聽聽聽聽braced := self tryMatch: ${.
+ 聽聽聽聽codeString := braced
+ 聽聽聽聽聽聽聽聽ifFalse: [self
+ 聽聽聽聽聽聽聽聽聽聽聽聽input: size
+ 聽聽聽聽聽聽聽聽聽聽聽聽errorMessage: 'invalid codepoint']
+ 聽聽聽聽聽聽聽聽ifTrue: [self
+ 聽聽聽聽聽聽聽聽聽聽聽聽inputUpTo: $}
+ 聽聽聽聽聽聽聽聽聽聽聽聽errorMessage: 'no terminating "}"'].
+ 聽聽聽聽
+ 聽聽聽聽codeStream := codeString readStream.
+ 聽聽聽聽codePoint := ((ExtendedNumberParser on: codeStream)
+ 聽聽聽聽聽聽聽聽defaultBase: 16;
+ 聽聽聽聽聽聽聽聽nextInteger) ifNil: [
+ 聽聽聽聽聽聽聽聽聽聽聽聽self signalParseError: 'invalid codepoint'].
+ 聽聽聽聽codeStream atEnd ifFalse: [
+ 聽聽聽聽聽聽聽聽self signalParseError: 'invalid codepoint'].
+ 聽聽聽聽
+ 聽聽聽聽braced ifTrue: [
+ 聽聽聽聽聽聽聽聽self match: $}].
+ 聽聽聽聽
+ 聽聽聽聽^ RxsCharacter with: (Character codePoint: codePoint)
RxAbstractParser>>initialize: {initialize-release} 路 ct 10/27/2021 07:24
+ initialize: aStream
+
+ 聽聽聽聽source := aStream.
+ 聽聽聽聽self next.
RxAbstractParser>>input:errorMessage: {private} 路 ct 10/27/2021 20:52
+ input: anInteger errorMessage: aString
+ 聽聽聽聽"Accumulate input stream with anInteger characters. Raise an error with the specified message if there are not enough characters available, or if the accumulated characters are not included in the characterSet."
+
+ 聽聽聽聽| accumulator |
+ 聽聽聽聽accumulator := WriteStream on: (String new: 20).
+ 聽聽聽聽anInteger timesRepeat: [
+ 聽聽聽聽聽聽聽聽lookahead ifNil: [self signalParseError: aString].
+ 聽聽聽聽聽聽聽聽accumulator nextPut: lookahead.
+ 聽聽聽聽聽聽聽聽self next].
+ 聽聽聽聽^ accumulator contents
RxAbstractParser>>inputUpTo:errorMessage: {private} 路 ul 9/24/2015 08:25
+ inputUpTo: aCharacter errorMessage: aString
+ 聽聽聽聽"Accumulate input stream until <aCharacter> is encountered
+ 聽聽聽聽and answer the accumulated chars as String, not including
+ 聽聽聽聽<aCharacter>. Signal error if end of stream is encountered,
+ 聽聽聽聽passing <aString> as the error description."
+
+ 聽聽聽聽| accumulator |
+ 聽聽聽聽accumulator := WriteStream on: (String new: 20).
+ 聽聽聽聽[ lookahead == aCharacter or: [lookahead == nil ] ]
+ 聽聽聽聽聽聽聽聽whileFalse: [
+ 聽聽聽聽聽聽聽聽聽聽聽聽accumulator nextPut: lookahead.
+ 聽聽聽聽聽聽聽聽聽聽聽聽self next].
+ 聽聽聽聽lookahead ifNil: [ self signalParseError: aString ].
+ 聽聽聽聽^accumulator contents
RxAbstractParser>>inputUpToAny:errorMessage: {private} 路 ul 9/24/2015 08:24
+ inputUpToAny: aDelimiterString errorMessage: aString
+ 聽聽聽聽"Accumulate input stream until any character from <aDelimiterString> is encountered
+ 聽聽聽聽and answer the accumulated chars as String, not including the matched characters from the
+ 聽聽聽聽<aDelimiterString>. Signal error if end of stream is encountered,
+ 聽聽聽聽passing <aString> as the error description."
+
+ 聽聽聽聽| accumulator |
+ 聽聽聽聽accumulator := WriteStream on: (String new: 20).
+ 聽聽聽聽[ lookahead == nil or: [ aDelimiterString includes: lookahead ] ]
+ 聽聽聽聽聽聽聽聽whileFalse: [
+ 聽聽聽聽聽聽聽聽聽聽聽聽accumulator nextPut: lookahead.
+ 聽聽聽聽聽聽聽聽聽聽聽聽self next ].
+ 聽聽聽聽lookahead ifNil: [ self signalParseError: aString ].
+ 聽聽聽聽^accumulator contents
RxAbstractParser>>match: {parsing} 路 ct 10/27/2021 22:37
+ match: aCharacter
+ 聽聽聽聽"<aCharacter> MUST match the current lookeahead. If this is the case, advance the input. Otherwise, blow up."
+
+ 聽聽聽聽aCharacter = lookahead ifTrue: [ ^self next ].
+ 聽聽聽聽self signalParseError: (lookahead
+ 聽聽聽聽聽聽聽聽ifNil: ['unexpected end']
+ 聽聽聽聽聽聽聽聽ifNotNil: ['unexpected character: ', lookahead asString])
RxAbstractParser>>next {private} 路 ct 10/27/2021 07:15
+ next
+
+ 聽聽聽聽^ lookahead := source next
RxAbstractParser>>signalParseError {private} 路 ct 10/27/2021 07:16
+ signalParseError
+
+ 聽聽聽聽self class
+ 聽聽聽聽聽聽聽聽signalSyntaxException: 'Regex syntax error'
+ 聽聽聽聽聽聽聽聽at: source position
RxAbstractParser>>signalParseError: {private} 路 ct 10/27/2021 07:16
+ signalParseError: aString
+
+ 聽聽聽聽self class signalSyntaxException: aString at: source position
RxAbstractParser>>tryMatch: {private} 路 ct 8/23/2021 21:01
+ tryMatch: aCharacter
+
+ 聽聽聽聽^ lookahead == ${
+ 聽聽聽聽聽聽聽聽ifTrue: [self next];
+ 聽聽聽聽聽聽聽聽yourself
RxAbstractParser>>unicodeCategory {parsing} 路 ct 10/27/2021 22:01
+ unicodeCategory
+ 聽聽聽聽"Matches a character that belongs to the given unicode category.
+ 聽聽聽聽<unicodeCategory> ::= \p '{' <categoryName> '}'"
+
+ 聽聽聽聽| category |
+ 聽聽聽聽self match: ${.
+ 聽聽聽聽category := self inputUpTo: $} errorMessage: 'no terminating "}"'.
+ 聽聽聽聽self match: $}.
+ 聽聽聽聽^ RxsPredicate new beUnicodeCategory: category
RxAbstractParser>>unicodePoint {parsing} 路 ct 10/27/2021 20:49
+ unicodePoint
+
+ 聽聽聽聽^ self codePoint: 4
RxCharSetParser (changed)
- Object subclass: #RxCharSetParser
- 聽聽聽聽instanceVariableNames: 'source lookahead elements'
+ RxAbstractParser subclass: #RxCharSetParser
+ 聽聽聽聽instanceVariableNames: 'elements'
聽聽聽聽classVariableNames: ''
聽聽聽聽poolDictionaries: ''
聽聽聽聽category: 'Regex-Core'
RxCharSetParser class
聽聽聽聽instanceVariableNames: ''
"-- Regular Expression Matcher v 1.1 (C) 1996, 1999 Vassili Bykov
--
I am a parser created to parse the insides of a character set ([...]) construct. I create and answer a collection of "elements", each being an instance of one of: RxsCharacter, RxsRange, or RxsPredicate.
Instance Variables:
- 聽聽聽聽source聽聽聽聽<Stream>聽聽聽聽open on whatever is inside the square brackets we have to parse.
- 聽聽聽聽lookahead聽聽聽聽<Character>聽聽聽聽The current lookahead character
聽聽聽聽elements聽聽聽聽<Collection of: <RxsCharacter|RxsRange|RxsPredicate>> Parsing result"
RxCharSetParser>>add: {parsing} 路 ct 10/28/2021 02:24
+ add: nodeOrNodes
+
+ 聽聽聽聽nodeOrNodes isCollection
+ 聽聽聽聽聽聽聽聽ifFalse: [elements add: nodeOrNodes]
+ 聽聽聽聽聽聽聽聽ifTrue: [elements addAll: nodeOrNodes]
RxCharSetParser>>addChar: {parsing} 路 vb 4/11/09 21:56 (removed)
- addChar: aChar
-
- 聽聽聽聽elements add: (RxsCharacter with: aChar)
RxCharSetParser>>addRangeFrom:to: {parsing} 路 CamilloBruni 10/7/2012 22:52 (removed)
- addRangeFrom: firstChar to: lastChar
-
- 聽聽聽聽firstChar asInteger > lastChar asInteger ifTrue:
- 聽聽聽聽聽聽聽聽[RxParser signalSyntaxException: ' bad character range' at: source position].
- 聽聽聽聽elements add: (RxsRange from: firstChar to: lastChar)
RxCharSetParser>>char {parsing} 路 ct 10/28/2021 03:06
+ char
+
+ 聽聽聽聽| char |
+ 聽聽聽聽lookahead == $\ ifTrue:
+ 聽聽聽聽聽聽聽聽[self match: $\.
+ 聽聽聽聽聽聽聽聽^self backslashNode
+ 聽聽聽聽聽聽聽聽聽聽聽聽ifNil: [RxsCharacter with: lookahead]].
+ 聽聽聽聽
+ 聽聽聽聽char := RxsCharacter with: lookahead.
+ 聽聽聽聽self next.
+ 聽聽聽聽^char
RxCharSetParser>>char: {parsing} 路 ct 10/28/2021 02:20
+ char: aCharacter
+
+ 聽聽聽聽^ RxsCharacter with: aCharacter
RxCharSetParser>>charOrRange {parsing} 路 ct 10/28/2021 02:50
+ charOrRange
+
+ 聽聽聽聽| firstChar lastChar |
+ 聽聽聽聽firstChar := self char.
+ 聽聽聽聽lookahead == $- ifFalse:
+ 聽聽聽聽聽聽聽聽[^firstChar].
+ 聽聽聽聽
+ 聽聽聽聽self next ifNil:
+ 聽聽聽聽聽聽聽聽[^{firstChar. self char: $-}].
+ 聽聽聽聽
+ 聽聽聽聽lastChar := self char.
+ 聽聽聽聽firstChar isRegexCharacter ifFalse:
+ 聽聽聽聽聽聽聽聽[self signalParseError: 'range must start with a single character'].
+ 聽聽聽聽lastChar isRegexCharacter ifFalse:
+ 聽聽聽聽聽聽聽聽[self signalParseError: 'range must end with a single character'].
+ 聽聽聽聽^self rangeFrom: firstChar character to: lastChar character
RxCharSetParser>>element {parsing} 路 ct 10/28/2021 02:48
+ element
+
+ 聽聽聽聽(lookahead == $[ and: [source peek == $:]) ifTrue:
+ 聽聽聽聽聽聽聽聽[^self namedSet].
+ 聽聽聽聽^self charOrRange
RxCharSetParser>>initialize: {initialize-release} 路 ct 10/27/2021 07:24 (changed)
initialize: aStream
- 聽聽聽聽source := aStream.
- 聽聽聽聽lookahead := aStream next.
+ 聽聽聽聽super initialize: aStream.
聽聽聽聽elements := OrderedCollection new
RxCharSetParser>>match: {parsing} 路 ul 5/24/2015 22:01 (removed)
- match: aCharacter
-
- 聽聽聽聽aCharacter = lookahead ifTrue: [ ^self next ].
- 聽聽聽聽RxParser
- 聽聽聽聽聽聽聽聽signalSyntaxException: 'unexpected character: ', (String with: lookahead)
- 聽聽聽聽聽聽聽聽at: source position
RxCharSetParser>>namedSet {parsing} 路 ct 10/28/2021 02:19
+ namedSet
+
+ 聽聽聽聽| name |
+ 聽聽聽聽self match: $[; match: $:.
+ 聽聽聽聽name := (String with: lookahead), (source upTo: $:).
+ 聽聽聽聽self next.
+ 聽聽聽聽self match: $].
+ 聽聽聽聽^ RxsPredicate forNamedClass: name
RxCharSetParser>>next {parsing} 路 ul 5/24/2015 21:19 (removed)
- next
-
- 聽聽聽聽^lookahead := source next
RxCharSetParser>>parse {accessing} 路 ct 10/28/2021 02:49 (changed)
parse
- 聽聽聽聽lookahead == $- ifTrue: [
- 聽聽聽聽聽聽聽聽self addChar: $-.
- 聽聽聽聽聽聽聽聽self next ].
- 聽聽聽聽[ lookahead == nil ] whileFalse: [ self parseStep ].
+ 聽聽聽聽[ lookahead == nil ] whileFalse: [ self add: self element ].
聽聽聽聽^elements
RxCharSetParser>>parseCharOrRange {parsing} 路 ul 5/24/2015 21:20 (removed)
- parseCharOrRange
-
- 聽聽聽聽| firstChar |
- 聽聽聽聽firstChar := lookahead.
- 聽聽聽聽self next == $- ifFalse: [ ^self addChar: firstChar ].
- 聽聽聽聽self next ifNil: [ ^self addChar: firstChar; addChar: $- ].
- 聽聽聽聽self addRangeFrom: firstChar to: lookahead.
- 聽聽聽聽self next
RxCharSetParser>>parseEscapeChar {parsing} 路 tobe 8/12/2021 08:56 (removed)
- parseEscapeChar
-
- 聽聽聽聽| first |
- 聽聽聽聽self match: $\.
- 聽聽聽聽first := (RxsPredicate forEscapedLetter: lookahead)
- 聽聽聽聽聽聽聽聽ifNil: [ RxsCharacter with: lookahead ].
- 聽聽聽聽self next == $- ifFalse: [^ elements add: first].
- 聽聽聽聽self next ifNil: [
- 聽聽聽聽聽聽聽聽elements add: first.
- 聽聽聽聽聽聽聽聽^ self addChar: $-].
- 聽聽聽聽self addRangeFrom: first character to: lookahead.
- 聽聽聽聽self next
RxCharSetParser>>parseNamedSet {parsing} 路 ul 5/24/2015 22:00 (removed)
- parseNamedSet
-
- 聽聽聽聽| name |
- 聽聽聽聽self match: $[; match: $:.
- 聽聽聽聽name := (String with: lookahead), (source upTo: $:).
- 聽聽聽聽self next.
- 聽聽聽聽self match: $].
- 聽聽聽聽elements add: (RxsPredicate forNamedClass: name)
RxCharSetParser>>parseStep {parsing} 路 ul 5/24/2015 21:14 (removed)
- parseStep
-
- 聽聽聽聽lookahead == $[ ifTrue:
- 聽聽聽聽聽聽聽聽[source peek == $:
- 聽聽聽聽聽聽聽聽聽聽聽聽ifTrue: [^self parseNamedSet]
- 聽聽聽聽聽聽聽聽聽聽聽聽ifFalse: [^self parseCharOrRange]].
- 聽聽聽聽lookahead == $\ ifTrue:
- 聽聽聽聽聽聽聽聽[^self parseEscapeChar].
- 聽聽聽聽lookahead == $- ifTrue:
- 聽聽聽聽聽聽聽聽[RxParser signalSyntaxException: 'invalid range' at: source position].
- 聽聽聽聽self parseCharOrRange
RxCharSetParser>>rangeFrom:to: {parsing} 路 ct 10/28/2021 02:20
+ rangeFrom: firstChar to: lastChar
+
+ 聽聽聽聽firstChar asInteger > lastChar asInteger ifTrue:
+ 聽聽聽聽聽聽聽聽[self signalParseError: 'bad character range'].
+ 聽聽聽聽^ RxsRange from: firstChar to: lastChar
RxMatchOptimizer>>syntaxCharSet: {double dispatch} 路 ct 10/27/2021 08:55 (changed)
syntaxCharSet: charSetNode
聽聽聽聽"All these (or none of these) characters is the prefix."
聽聽聽聽(charSetNode enumerableSetIgnoringCase: ignoreCase) ifNotNil: [ :enumerableSet |
聽聽聽聽聽聽聽聽charSetNode isNegated
聽聽聽聽聽聽聽聽聽聽聽聽ifTrue: [ self addNonPrefixes: enumerableSet ]
聽聽聽聽聽聽聽聽聽聽聽聽ifFalse: [ self addPrefixes: enumerableSet ] ].
- 聽聽聽聽charSetNode predicates ifNotNil: [ :charsetPredicates |
+ 聽聽聽聽(charSetNode predicatesIgnoringCase: ignoreCase) ifNotNil: [ :charsetPredicates |
聽聽聽聽聽聽聽聽charSetNode isNegated
聽聽聽聽聽聽聽聽聽聽聽聽ifTrue: [
聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽charsetPredicates do: [ :each | self addNonPredicate: each ] ]
聽聽聽聽聽聽聽聽聽聽聽聽ifFalse: [
聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽charsetPredicates do: [ :each | self addPredicate: each ] ] ]
RxMatchOptimizer>>syntaxPredicate: {double dispatch} 路 ct 10/27/2021 08:54 (changed)
syntaxPredicate: predicateNode
- 聽聽聽聽self addPredicate: predicateNode predicate
+ 聽聽聽聽self addPredicate: (predicateNode predicateIgnoringCase: ignoreCase)
RxMatcher>>syntaxPredicate: {double dispatch} 路 ct 10/27/2021 08:54 (changed)
syntaxPredicate: predicateNode
聽聽聽聽"Double dispatch from the syntax tree.
聽聽聽聽A character set is a few characters, and we either match any of them,
聽聽聽聽or match any that is not one of them."
- 聽聽聽聽^RxmPredicate with: predicateNode predicate
+ 聽聽聽聽^RxmPredicate with: (predicateNode predicateIgnoringCase: ignoreCase)
RxParser (changed)
- Object subclass: #RxParser
- 聽聽聽聽instanceVariableNames: 'input lookahead'
- 聽聽聽聽classVariableNames: 'BackslashConstants BackslashSpecials'
+ RxAbstractParser subclass: #RxParser
+ 聽聽聽聽instanceVariableNames: ''
+ 聽聽聽聽classVariableNames: 'BackslashConditions'
聽聽聽聽poolDictionaries: ''
聽聽聽聽category: 'Regex-Core'
RxParser class
聽聽聽聽instanceVariableNames: ''
"-- Regular Expression Matcher v 1.1 (C) 1996, 1999 Vassili Bykov
--
- The regular expression parser. Translates a regular expression read from a stream into a parse tree. ('accessing' protocol). The tree can later be passed to a matcher initialization method. All other classes in this category implement the tree. Refer to their comments for any details.
-
- Instance variables:
- 聽聽聽聽input聽聽聽聽聽聽聽聽<Stream> A stream with the regular expression being parsed.
- 聽聽聽聽lookahead聽聽聽聽<Character>"
+ The regular expression parser. Translates a regular expression read from a stream into a parse tree. ('accessing' protocol). The tree can later be passed to a matcher initialization method. All other classes in this category implement the tree. Refer to their comments for any details."
RxParser class>>initialize {class initialization} 路 ct 10/27/2021 07:50 (changed)
initialize
聽聽聽聽"self initialize"
- 聽聽聽聽self
- 聽聽聽聽聽聽聽聽initializeBackslashConstants;
- 聽聽聽聽聽聽聽聽initializeBackslashSpecials
+ 聽聽聽聽self initializeBackslashConditions
RxParser class>>initializeBackslashConditions {class initialization} 路 ct 10/27/2021 08:17
+ initializeBackslashConditions
+ 聽聽聽聽"The keys are characters that normally follow a $\, the values are either associations of classes and initialization selectors on their instance side, or evaluables that will be evaluated on the current parser instance."
+ 聽聽聽聽"self initializeBackslashConditions"
+
+ 聽聽聽聽(BackslashConditions := Dictionary new)
+ 聽聽聽聽聽聽聽聽at: $b put: RxsContextCondition -> #beWordBoundary;
+ 聽聽聽聽聽聽聽聽at: $B put: RxsContextCondition -> #beNonWordBoundary;
+ 聽聽聽聽聽聽聽聽at: $< put: RxsContextCondition -> #beBeginningOfWord;
+ 聽聽聽聽聽聽聽聽at: $> put: RxsContextCondition -> #beEndOfWord.
RxParser class>>initializeBackslashConstants {class initialization} 路 lr 11/4/2009 22:14 (removed)
- initializeBackslashConstants
- 聽聽聽聽"self initializeBackslashConstants"
-
- 聽聽聽聽(BackslashConstants := Dictionary new)
- 聽聽聽聽聽聽聽聽at: $e put: Character escape;
- 聽聽聽聽聽聽聽聽at: $n put: Character lf;
- 聽聽聽聽聽聽聽聽at: $r put: Character cr;
- 聽聽聽聽聽聽聽聽at: $f put: Character newPage;
- 聽聽聽聽聽聽聽聽at: $t put: Character tab
RxParser class>>initializeBackslashSpecials {class initialization} 路 vb 4/11/09 21:56 (removed)
- initializeBackslashSpecials
- 聽聽聽聽"Keys are characters that normally follow a \, the values are
- 聽聽聽聽associations of classes and initialization selectors on the instance side
- 聽聽聽聽of the classes."
- 聽聽聽聽"self initializeBackslashSpecials"
-
- 聽聽聽聽(BackslashSpecials := Dictionary new)
- 聽聽聽聽聽聽聽聽at: $w put: (Association key: RxsPredicate value: #beWordConstituent);
- 聽聽聽聽聽聽聽聽at: $W put: (Association key: RxsPredicate value: #beNotWordConstituent);
- 聽聽聽聽聽聽聽聽at: $s put: (Association key: RxsPredicate value: #beSpace);
- 聽聽聽聽聽聽聽聽at: $S put: (Association key: RxsPredicate value: #beNotSpace);
- 聽聽聽聽聽聽聽聽at: $d put: (Association key: RxsPredicate value: #beDigit);
- 聽聽聽聽聽聽聽聽at: $D put: (Association key: RxsPredicate value: #beNotDigit);
- 聽聽聽聽聽聽聽聽at: $b put: (Association key: RxsContextCondition value: #beWordBoundary);
- 聽聽聽聽聽聽聽聽at: $B put: (Association key: RxsContextCondition value: #beNonWordBoundary);
- 聽聽聽聽聽聽聽聽at: $< put: (Association key: RxsContextCondition value: #beBeginningOfWord);
- 聽聽聽聽聽聽聽聽at: $> put: (Association key: RxsContextCondition value: #beEndOfWord)
RxParser class>>signalSyntaxException: {exception signaling} 路 avi 11/30/2003 13:25 (removed)
- signalSyntaxException: errorString
- 聽聽聽聽RegexSyntaxError new signal: errorString
RxParser class>>signalSyntaxException:at: {exception signaling} 路 CamilloBruni 10/7/2012 22:50 (removed)
- signalSyntaxException: errorString at: errorPosition
- 聽聽聽聽RegexSyntaxError signal: errorString at: errorPosition
RxParser>>atom {recursive descent} 路 ct 10/28/2021 03:03 (changed)
atom
聽聽聽聽"An atom is one of a lot of possibilities, see below."
聽聽聽聽| atom |
聽聽聽聽(lookahead == nil
聽聽聽聽or: [ lookahead == $|
聽聽聽聽or: [ lookahead == $)
聽聽聽聽or: [ lookahead == $*
聽聽聽聽or: [ lookahead == $+
聽聽聽聽or: [ lookahead == $? ]]]]])
聽聽聽聽聽聽聽聽ifTrue: [ ^RxsEpsilon new ].
聽聽聽聽聽聽聽聽
聽聽聽聽lookahead == $(
聽聽聽聽聽聽聽聽ifTrue: [
聽聽聽聽聽聽聽聽聽聽聽聽"<atom> ::= '(' <regex> ')' "
聽聽聽聽聽聽聽聽聽聽聽聽self match: $(.
聽聽聽聽聽聽聽聽聽聽聽聽atom := self regex.
聽聽聽聽聽聽聽聽聽聽聽聽self match: $).
聽聽聽聽聽聽聽聽聽聽聽聽^atom ].
聽聽聽聽
聽聽聽聽lookahead == $[
聽聽聽聽聽聽聽聽ifTrue: [
聽聽聽聽聽聽聽聽聽聽聽聽"<atom> ::= '[' <characterSet> ']' "
聽聽聽聽聽聽聽聽聽聽聽聽self match: $[.
聽聽聽聽聽聽聽聽聽聽聽聽atom := self characterSet.
聽聽聽聽聽聽聽聽聽聽聽聽self match: $].
聽聽聽聽聽聽聽聽聽聽聽聽^atom ].
聽聽聽聽
聽聽聽聽lookahead == $:
聽聽聽聽聽聽聽聽ifTrue: [
聽聽聽聽聽聽聽聽聽聽聽聽"<atom> ::= ':' <messagePredicate> ':' "
聽聽聽聽聽聽聽聽聽聽聽聽self match: $:.
聽聽聽聽聽聽聽聽聽聽聽聽atom := self messagePredicate.
聽聽聽聽聽聽聽聽聽聽聽聽self match: $:.
聽聽聽聽聽聽聽聽聽聽聽聽^atom ].
聽聽聽聽
聽聽聽聽lookahead == $.
聽聽聽聽聽聽聽聽ifTrue: [
聽聽聽聽聽聽聽聽聽聽聽聽"any non-whitespace character"
聽聽聽聽聽聽聽聽聽聽聽聽self next.
聽聽聽聽聽聽聽聽聽聽聽聽^RxsContextCondition new beAny].
聽聽聽聽
聽聽聽聽lookahead == $^
聽聽聽聽聽聽聽聽ifTrue: [
聽聽聽聽聽聽聽聽聽聽聽聽"beginning of line condition"
聽聽聽聽聽聽聽聽聽聽聽聽self next.
聽聽聽聽聽聽聽聽聽聽聽聽^RxsContextCondition new beBeginningOfLine].
聽聽聽聽
聽聽聽聽lookahead == $$
聽聽聽聽聽聽聽聽ifTrue: [
聽聽聽聽聽聽聽聽聽聽聽聽"end of line condition"
聽聽聽聽聽聽聽聽聽聽聽聽self next.
聽聽聽聽聽聽聽聽聽聽聽聽^RxsContextCondition new beEndOfLine].
聽聽聽聽聽聽聽聽
聽聽聽聽lookahead == $\
聽聽聽聽聽聽聽聽ifTrue: [
- 聽聽聽聽聽聽聽聽聽聽聽聽"<atom> ::= '\' <character>"
- 聽聽聽聽聽聽聽聽聽聽聽聽self next ifNil: [ self signalParseError: 'bad quotation' ].
- 聽聽聽聽聽聽聽聽聽聽聽聽(BackslashConstants includesKey: lookahead) ifTrue: [
- 聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽atom := RxsCharacter with: (BackslashConstants at: lookahead).
- 聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽self next.
- 聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽^atom].
- 聽聽聽聽聽聽聽聽聽聽聽聽self ifSpecial: lookahead
- 聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽then: [:node | self next. ^node]].
- 聽聽聽聽聽聽聽聽
+ 聽聽聽聽聽聽聽聽聽聽聽聽"<atom> ::= '\' <node>"
+ 聽聽聽聽聽聽聽聽聽聽聽聽self match: $\.
+ 聽聽聽聽聽聽聽聽聽聽聽聽^self backslashNode].
+ 聽聽聽聽
聽聽聽聽"If passed through the above, the following is a regular character."
聽聽聽聽atom := RxsCharacter with: lookahead.
聽聽聽聽self next.
聽聽聽聽^atom
RxParser>>backslashCondition {recursive descent} 路 ct 10/27/2021 07:38
+ backslashCondition
+
+ 聽聽聽聽^ self backslashSpecial: BackslashConditions
RxParser>>basicBackslashNode {recursive descent} 路 ct 10/28/2021 03:03
+ basicBackslashNode
+
+ 聽聽聽聽^ super basicBackslashNode ifNil: [self backslashCondition]
RxParser>>characterSet {recursive descent} 路 ct 10/28/2021 03:19 (changed)
characterSet
聽聽聽聽"Match a range of characters: something between `[' and `]'.
聽聽聽聽Opening bracked has already been seen, and closing should
聽聽聽聽not be consumed as well. Set spec is as usual for
聽聽聽聽sets in regexes."
- 聽聽聽聽| spec errorMessage |
- 聽聽聽聽errorMessage := ' no terminating "]"'.
+ 聽聽聽聽| start spec errorMessage |
+ 聽聽聽聽errorMessage := 'no terminating "]"'.
+ 聽聽聽聽start := source position.
聽聽聽聽spec := self inputUpTo: $] nestedOn: $[ errorMessage: errorMessage.
聽聽聽聽(spec isEmpty
聽聽聽聽or: [spec = '^'])
聽聽聽聽聽聽聽聽ifTrue: [
聽聽聽聽聽聽聽聽聽聽聽聽"This ']' was literal."
聽聽聽聽聽聽聽聽聽聽聽聽self next.
聽聽聽聽聽聽聽聽聽聽聽聽spec := spec, ']', (self inputUpTo: $] nestedOn: $[ errorMessage: errorMessage)].
- 聽聽聽聽^self characterSetFrom: spec
+ 聽聽聽聽^self class
+ 聽聽聽聽聽聽聽聽doShiftingSyntaxExceptionPositions: [self characterSetFrom: spec]
+ 聽聽聽聽聽聽聽聽from: start
RxParser>>ifSpecial:then: {private} 路 vb 4/11/09 21:56 (removed)
- ifSpecial: aCharacter then: aBlock
- 聽聽聽聽"If the character is such that it defines a special node when follows a $\,
- 聽聽聽聽then create that node and evaluate aBlock with the node as the parameter.
- 聽聽聽聽Otherwise just return."
-
- 聽聽聽聽| classAndSelector |
- 聽聽聽聽classAndSelector := BackslashSpecials at: aCharacter ifAbsent: [^self].
- 聽聽聽聽^aBlock value: (classAndSelector key new perform: classAndSelector value)
RxParser>>inputUpTo:errorMessage: {private} 路 ul 9/24/2015 08:25 (removed)
- inputUpTo: aCharacter errorMessage: aString
- 聽聽聽聽"Accumulate input stream until <aCharacter> is encountered
- 聽聽聽聽and answer the accumulated chars as String, not including
- 聽聽聽聽<aCharacter>. Signal error if end of stream is encountered,
- 聽聽聽聽passing <aString> as the error description."
-
- 聽聽聽聽| accumulator |
- 聽聽聽聽accumulator := WriteStream on: (String new: 20).
- 聽聽聽聽[ lookahead == aCharacter or: [lookahead == nil ] ]
- 聽聽聽聽聽聽聽聽whileFalse: [
- 聽聽聽聽聽聽聽聽聽聽聽聽accumulator nextPut: lookahead.
- 聽聽聽聽聽聽聽聽聽聽聽聽self next].
- 聽聽聽聽lookahead ifNil: [ self signalParseError: aString ].
- 聽聽聽聽^accumulator contents
RxParser>>inputUpTo:nestedOn:errorMessage: {private} 路 ct 10/27/2021 08:06 (changed)
inputUpTo: aCharacter nestedOn: anotherCharacter errorMessage: aString
- 聽聽聽聽"Accumulate input stream until <aCharacter> is encountered
- 聽聽聽聽and answer the accumulated chars as String, not including
- 聽聽聽聽<aCharacter>. Signal error if end of stream is encountered,
- 聽聽聽聽passing <aString> as the error description."
+ 聽聽聽聽"Accumulate input stream until <aCharacter> is encountered without escaping and answer the accumulated chars as String, not including <aCharacter>. Signal error if end of stream is encountered, passing <aString> as the error description."
聽聽聽聽| accumulator nestLevel |
聽聽聽聽accumulator := WriteStream on: (String new: 20).
聽聽聽聽nestLevel := 0.
聽聽聽聽[ lookahead == aCharacter and: [ nestLevel = 0 ] ] whileFalse: [
聽聽聽聽聽聽聽聽lookahead ifNil: [ self signalParseError: aString ].
聽聽聽聽聽聽聽聽lookahead == $\
聽聽聽聽聽聽聽聽聽聽聽聽ifTrue: [
聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽self next ifNil: [ self signalParseError: aString ].
- 聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽BackslashConstants
- 聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽at: lookahead
- 聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽ifPresent: [ :unescapedCharacter | accumulator nextPut: unescapedCharacter ]
- 聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽ifAbsent: [
- 聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽accumulator
- 聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽nextPut: $\;
- 聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽nextPut: lookahead ] ]
+ 聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽accumulator
+ 聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽nextPut: $\;
+ 聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽nextPut: lookahead ]
聽聽聽聽聽聽聽聽聽聽聽聽ifFalse: [
聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽accumulator nextPut: lookahead.
聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽lookahead == anotherCharacter ifTrue: [ nestLevel := nestLevel + 1 ].
聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽lookahead == aCharacter ifTrue: [ nestLevel := nestLevel - 1 ] ].
聽聽聽聽聽聽聽聽self next ].
聽聽聽聽^accumulator contents
RxParser>>inputUpToAny:errorMessage: {private} 路 ul 9/24/2015 08:24 (removed)
- inputUpToAny: aDelimiterString errorMessage: aString
- 聽聽聽聽"Accumulate input stream until any character from <aDelimiterString> is encountered
- 聽聽聽聽and answer the accumulated chars as String, not including the matched characters from the
- 聽聽聽聽<aDelimiterString>. Signal error if end of stream is encountered,
- 聽聽聽聽passing <aString> as the error description."
-
- 聽聽聽聽| accumulator |
- 聽聽聽聽accumulator := WriteStream on: (String new: 20).
- 聽聽聽聽[ lookahead == nil or: [ aDelimiterString includes: lookahead ] ]
- 聽聽聽聽聽聽聽聽whileFalse: [
- 聽聽聽聽聽聽聽聽聽聽聽聽accumulator nextPut: lookahead.
- 聽聽聽聽聽聽聽聽聽聽聽聽self next ].
- 聽聽聽聽lookahead ifNil: [ self signalParseError: aString ].
- 聽聽聽聽^accumulator contents
RxParser>>match: {private} 路 ul 5/16/2015 01:51 (removed)
- match: aCharacter
- 聽聽聽聽"<aCharacter> MUST match the current lookeahead.
- 聽聽聽聽If this is the case, advance the input. Otherwise, blow up."
-
- 聽聽聽聽aCharacter == lookahead ifFalse: [ ^self signalParseError ]. "does not return"
- 聽聽聽聽self next
RxParser>>messagePredicate {recursive descent} 路 ct 10/27/2021 22:01 (changed)
messagePredicate
聽聽聽聽"Match a message predicate specification: a selector (presumably
聽聽聽聽understood by a Character) enclosed in :'s ."
聽聽聽聽| spec negated |
- 聽聽聽聽spec := self inputUpTo: $: errorMessage: ' no terminating ":"'.
+ 聽聽聽聽spec := self inputUpTo: $: errorMessage: 'no terminating ":"'.
+ 聽聽聽聽spec ifEmpty: [self signalParseError ].
聽聽聽聽negated := false.
聽聽聽聽spec first = $^
聽聽聽聽聽聽聽聽ifTrue: [
聽聽聽聽聽聽聽聽聽聽聽聽negated := true.
聽聽聽聽聽聽聽聽聽聽聽聽spec := spec copyFrom: 2 to: spec size].
聽聽聽聽^RxsMessagePredicate new
聽聽聽聽聽聽聽聽initializeSelector: spec asSymbol
聽聽聽聽聽聽聽聽negated: negated
RxParser>>next {private} 路 ul 9/25/2015 10:02 (removed)
- next
- 聽聽聽聽"Advance the input storing the just read character
- 聽聽聽聽as the lookahead."
-
- 聽聽聽聽^lookahead := input next
RxParser>>parseStream: {accessing} 路 ct 10/27/2021 07:24 (changed)
parseStream: aStream
聽聽聽聽"Parse an input from a character stream <aStream>.
聽聽聽聽On success, answers an RxsRegex -- parse tree root.
聽聽聽聽On error, raises `RxParser syntaxErrorSignal' with the current
聽聽聽聽input stream position as the parameter."
聽聽聽聽| tree |
- 聽聽聽聽input := aStream.
- 聽聽聽聽self next.
+ 聽聽聽聽self initialize: aStream.
聽聽聽聽tree := self regex.
聽聽聽聽self match: nil.
聽聽聽聽^tree
RxParser>>quantifiedAtom: {recursive descent} 路 ct 10/27/2021 22:01 (changed)
quantifiedAtom: atom
聽聽聽聽"Parse a quanitifer expression which can have one of the following forms
聽聽聽聽聽聽聽聽{<min>,<max>} match <min> to <max> occurences
聽聽聽聽聽聽聽聽{<minmax>} which is the same as with repeated limits: {<number>,<number>}
聽聽聽聽聽聽聽聽{<min>,} match at least <min> occurences
聽聽聽聽聽聽聽聽{,<max>} match maximally <max> occurences, which is the same as {0,<max>}"
聽聽聽聽| min max |
聽聽聽聽self next.
聽聽聽聽lookahead == $,
聽聽聽聽聽聽聽聽ifTrue: [ min := 0 ]
聽聽聽聽聽聽聽聽ifFalse: [
- 聽聽聽聽聽聽聽聽聽聽聽聽max := min := (self inputUpToAny: ',}' errorMessage: ' no terminating "}"') asUnsignedInteger ].
+ 聽聽聽聽聽聽聽聽聽聽聽聽max := min := (self inputUpToAny: ',}' errorMessage: 'no terminating "}"') asUnsignedInteger ].
聽聽聽聽lookahead == $,
聽聽聽聽聽聽聽聽ifTrue: [
聽聽聽聽聽聽聽聽聽聽聽聽self next.
- 聽聽聽聽聽聽聽聽聽聽聽聽max := (self inputUpToAny: ',}' errorMessage: ' no terminating "}"') asUnsignedInteger ].聽聽聽聽
+ 聽聽聽聽聽聽聽聽聽聽聽聽max := (self inputUpToAny: ',}' errorMessage: 'no terminating "}"') asUnsignedInteger ].聽聽聽聽
聽聽聽聽self match: $}.
聽聽聽聽atom isNullable
聽聽聽聽聽聽聽聽ifTrue: [ self signalNullableClosureParserError ].
聽聽聽聽(max notNil and: [ max < min ])
聽聽聽聽聽聽聽聽ifTrue: [ self signalParseError: ('wrong quantifier, expected ', min asString, ' <= ', max asString) ].
聽聽聽聽^ RxsPiece new
聽聽聽聽聽聽聽聽initializeAtom: atom
聽聽聽聽聽聽聽聽min: min
聽聽聽聽聽聽聽聽max: max
RxParser>>signalNullableClosureParserError {private} 路 ct 10/27/2021 22:00 (changed)
signalNullableClosureParserError
- 聽聽聽聽self signalParseError: ' nullable closure'.
+ 聽聽聽聽self signalParseError: 'nullable closure'.
RxParser>>signalParseError {private} 路 CamilloBruni 10/7/2012 22:50 (removed)
- signalParseError
-
- 聽聽聽聽self class
- 聽聽聽聽聽聽聽聽signalSyntaxException: 'Regex syntax error' at: input position
RxParser>>signalParseError: {private} 路 CamilloBruni 10/7/2012 22:49 (removed)
- signalParseError: aString
-
- 聽聽聽聽self class signalSyntaxException: aString at: input position
RxsCharSet>>basicMaximumCharacterCodeIgnoringCase: {accessing} 路 ct 10/27/2021 08:59
+ basicMaximumCharacterCodeIgnoringCase: aBoolean
+
+ 聽聽聽聽^ elements inject: -1 into: [ :max :each |
+ 聽聽聽聽聽聽聽聽(each maximumCharacterCodeIgnoringCase: aBoolean) max: max ]
RxsCharSet>>enumerableSetIgnoringCase: {privileged} 路 ct 10/27/2021 08:59 (changed)
enumerableSetIgnoringCase: aBoolean
聽聽聽聽"Answer a collection of characters that make up the portion of me that can be enumerated, or nil if there are no such characters. The case check is only used to determine the type of set to be used. The returned set won't contain characters of both cases, because this way the senders of this method can create more efficient checks."
聽聽聽聽| highestCharacterCode set |
- 聽聽聽聽highestCharacterCode := elements inject: -1 into: [ :max :each |
- 聽聽聽聽聽聽聽聽(each maximumCharacterCodeIgnoringCase: aBoolean) max: max ].
+ 聽聽聽聽highestCharacterCode := self basicMaximumCharacterCodeIgnoringCase: aBoolean.
聽聽聽聽highestCharacterCode = -1 ifTrue: [ ^nil ].
聽聽聽聽set := highestCharacterCode <= 255
聽聽聽聽聽聽聽聽ifTrue: [ CharacterSet new ]
聽聽聽聽聽聽聽聽ifFalse: [ WideCharacterSet new ].
聽聽聽聽elements do: [ :each | each enumerateTo: set ].
聽聽聽聽^set
RxsCharSet>>enumerateTo: {accessing} 路 ct 10/27/2021 08:36
+ enumerateTo: aSet
+
+ 聽聽聽聽negated ifTrue: [^ self "Not enumerable"].
+ 聽聽聽聽^ elements do: [:each | each enumerateTo: aSet]
RxsCharSet>>isEnumerable {testing} 路 ct 10/27/2021 08:50 (changed)
isEnumerable
+ 聽聽聽聽negated ifTrue: [^ false].
聽聽聽聽^elements anySatisfy: [:some | some isEnumerable ]
RxsCharSet>>maximumCharacterCodeIgnoringCase: {accessing} 路 ct 10/27/2021 08:59
+ maximumCharacterCodeIgnoringCase: aBoolean
+ 聽聽聽聽"Return the largest character code among the characters I represent."
+
+ 聽聽聽聽negated ifTrue: [^ -1 "not enumerable"].
+ 聽聽聽聽^ self basicMaximumCharacterCodeIgnoringCase: aBoolean
RxsCharSet>>negated {converting} 路 ct 10/27/2021 08:35
+ negated
+
+ 聽聽聽聽^ self class new
+ 聽聽聽聽聽聽聽聽initializeElements: elements
+ 聽聽聽聽聽聽聽聽negated: negated not
RxsCharSet>>predicateIgnoringCase: {accessing} 路 ct 10/27/2021 08:52 (changed)
predicateIgnoringCase: aBoolean
聽聽聽聽| enumerable predicate |
聽聽聽聽enumerable := self enumerablePartPredicateIgnoringCase: aBoolean.
- 聽聽聽聽predicate := self predicatePartPredicate ifNil: [
+ 聽聽聽聽predicate := (self predicatePartPredicateIgnoringCase: aBoolean) ifNil: [
聽聽聽聽聽聽聽聽"There are no predicates in this set."
聽聽聽聽聽聽聽聽^enumerable ifNil: [
聽聽聽聽聽聽聽聽聽聽聽聽"This set is empty."
聽聽聽聽聽聽聽聽聽聽聽聽[ :char | negated ] ] ].
聽聽聽聽enumerable ifNil: [ ^predicate ].
聽聽聽聽negated ifTrue: [
聽聽聽聽聽聽聽聽"enumerable and predicate already negate the result, that's why #not is not needed here."
聽聽聽聽聽聽聽聽^[ :char | (enumerable value: char) and: [ predicate value: char ] ] ].
聽聽聽聽^[ :char | (enumerable value: char) or: [ predicate value: char ] ]
RxsCharSet>>predicatePartPredicate {privileged} 路 ul 5/16/2015 01:37 (removed)
- predicatePartPredicate
- 聽聽聽聽"Answer a predicate that tests all of my elements that cannot be enumerated, or nil if such elements don't exist."
-
- 聽聽聽聽| predicates size |
- 聽聽聽聽predicates := elements reject: [ :some | some isEnumerable ].
- 聽聽聽聽(size := predicates size) = 0 ifTrue: [
- 聽聽聽聽聽聽聽聽"We could return a real predicate block - like [ :char | negated ] - here, but it wouldn't be used anyway. This way we signal that this character set has no predicates."
- 聽聽聽聽聽聽聽聽^nil ].
- 聽聽聽聽size = 1 ifTrue: [
- 聽聽聽聽聽聽聽聽negated ifTrue: [ ^predicates first predicateNegation ].
- 聽聽聽聽聽聽聽聽^predicates first predicate ].
- 聽聽聽聽predicates replace: [ :each | each predicate ].
- 聽聽聽聽negated ifTrue: [ ^[ [: char | predicates noneSatisfy: [ :some | some value: char ] ] ] ].
- 聽聽聽聽^[ :char | predicates anySatisfy: [ :some | some value: char ] ]
- 聽聽聽聽
RxsCharSet>>predicatePartPredicateIgnoringCase: {privileged} 路 ct 11/3/2021 14:54
+ predicatePartPredicateIgnoringCase: aBoolean
+ 聽聽聽聽"Answer a predicate that tests all of my elements that cannot be enumerated, or nil if such elements don't exist."
+
+ 聽聽聽聽| predicates size |
+ 聽聽聽聽predicates := elements reject: [ :some | some isEnumerable ].
+ 聽聽聽聽(size := predicates size) = 0 ifTrue: [
+ 聽聽聽聽聽聽聽聽"We could return a real predicate block - like [ :char | negated ] - here, but it wouldn't be used anyway. This way we signal that this character set has no predicates."
+ 聽聽聽聽聽聽聽聽^nil ].
+ 聽聽聽聽size = 1 ifTrue: [
+ 聽聽聽聽聽聽聽聽negated ifTrue: [ ^predicates first predicateNegationIgnoringCase: aBoolean ].
+ 聽聽聽聽聽聽聽聽^predicates first predicateIgnoringCase: aBoolean ].
+ 聽聽聽聽predicates replace: [ :each | each predicateIgnoringCase: aBoolean ].
+ 聽聽聽聽negated ifTrue: [ ^[: char | predicates noneSatisfy: [ :some | some value: char ] ] ].
+ 聽聽聽聽^[ :char | predicates anySatisfy: [ :some | some value: char ] ]
RxsCharSet>>predicates {accessing} 路 ul 5/16/2015 01:29 (removed)
- predicates
-
- 聽聽聽聽| predicates |
- 聽聽聽聽predicates := elements reject: [ :some | some isEnumerable ].
- 聽聽聽聽predicates isEmpty ifTrue: [ ^nil ].
- 聽聽聽聽^predicates replace: [ :each | each predicate ]
RxsCharSet>>predicatesIgnoringCase: {accessing} 路 ct 10/27/2021 08:55
+ predicatesIgnoringCase: aBoolean
+
+ 聽聽聽聽| predicates |
+ 聽聽聽聽predicates := elements reject: [ :some | some isEnumerable ].
+ 聽聽聽聽predicates isEmpty ifTrue: [ ^nil ].
+ 聽聽聽聽^predicates replace: [ :each | each predicateIgnoringCase: aBoolean ]
RxsCharacter>>isRegexCharacter {testing} 路 ct 10/27/2021 20:38
+ isRegexCharacter
+
+ 聽聽聽聽^ true
RxsCharacter>>negated {converting} 路 ct 10/27/2021 08:32
+ negated
+
+ 聽聽聽聽^ RxsCharSet new
+ 聽聽聽聽聽聽聽聽initializeElements: {self}
+ 聽聽聽聽聽聽聽聽negated: true
RxsNode>>isRegexCharacter {testing} 路 ct 10/27/2021 20:38
+ isRegexCharacter
+
+ 聽聽聽聽^ false
RxsPredicate (changed)
RxsNode subclass: #RxsPredicate
聽聽聽聽instanceVariableNames: 'predicate negation'
- 聽聽聽聽classVariableNames: 'EscapedLetterSelectors NamedClassSelectors'
+ 聽聽聽聽classVariableNames: 'NamedClassSelectors'
聽聽聽聽poolDictionaries: ''
聽聽聽聽category: 'Regex-Core'
RxsPredicate class
聽聽聽聽instanceVariableNames: ''
"-- Regular Expression Matcher v 1.1 (C) 1996, 1999 Vassili Bykov
--
This represents a character that satisfies a certain predicate.
Instance Variables:
聽聽聽聽predicate聽聽聽聽<BlockClosure>聽聽聽聽A one-argument block. If it evaluates to the value defined by <negated> when it is passed a character, the predicate is considered to match.
聽聽聽聽negation聽聽聽聽<BlockClosure>聽聽聽聽A one-argument block that is a negation of <predicate>."
RxsPredicate class>>forEscapedLetter: {instance creation} 路 ct 10/27/2021 08:16 (changed)
forEscapedLetter: aCharacter
聽聽聽聽"Return a predicate instance for the given character, or nil if there's no such predicate."
- 聽聽聽聽^EscapedLetterSelectors
- 聽聽聽聽聽聽聽聽at: aCharacter
- 聽聽聽聽聽聽聽聽ifPresent: [ :selector | self new perform: selector ]
+ 聽聽聽聽self deprecated.
+ 聽聽聽聽^ RxParser new
+ 聽聽聽聽聽聽聽聽initialize: {aCharacter} readStream;
+ 聽聽聽聽聽聽聽聽backslashPredicate
RxsPredicate class>>initialize {class initialization} 路 ct 10/27/2021 08:16 (changed)
initialize
聽聽聽聽"self initialize"
- 聽聽聽聽self
- 聽聽聽聽聽聽聽聽initializeNamedClassSelectors;
- 聽聽聽聽聽聽聽聽initializeEscapedLetterSelectors
+ 聽聽聽聽self initializeNamedClassSelectors
RxsPredicate class>>initializeEscapedLetterSelectors {class initialization} 路 ul 9/25/2015 09:25 (removed)
- initializeEscapedLetterSelectors
- 聽聽聽聽"self initializeEscapedLetterSelectors"
-
- 聽聽聽聽EscapedLetterSelectors := Dictionary new
- 聽聽聽聽聽聽聽聽at: $w put: #beWordConstituent;
- 聽聽聽聽聽聽聽聽at: $W put: #beNotWordConstituent;
- 聽聽聽聽聽聽聽聽at: $d put: #beDigit;
- 聽聽聽聽聽聽聽聽at: $D put: #beNotDigit;
- 聽聽聽聽聽聽聽聽at: $s put: #beSpace;
- 聽聽聽聽聽聽聽聽at: $S put: #beNotSpace;
- 聽聽聽聽聽聽聽聽yourself
RxsPredicate>>beUnicodeCategory: {initialize-release} 路 ct 8/23/2021 20:50
+ beUnicodeCategory: categoryName
+
+ 聽聽聽聽self predicate: [:char |
+ 聽聽聽聽聽聽聽聽(Unicode generalTagOf: char asUnicode) beginsWith: categoryName].
RxsPredicate>>predicate {accessing} 路 vb 4/11/09 21:56 (removed)
- predicate
-
- 聽聽聽聽^predicate
RxsPredicate>>predicate: {initialize-release} 路 ct 8/23/2021 20:50
+ predicate: aBlock
+
+ 聽聽聽聽predicate := aBlock.
+ 聽聽聽聽negation := [:char | (predicate value: char) not].
RxsPredicate>>predicateIgnoringCase: {accessing} 路 ct 10/27/2021 08:53
+ predicateIgnoringCase: aBoolean
+
+ 聽聽聽聽^predicate
RxsPredicate>>predicateNegation {accessing} 路 vb 4/11/09 21:56 (removed)
- predicateNegation
-
- 聽聽聽聽^negation
RxsPredicate>>predicateNegationIgnoringCase: {accessing} 路 ct 10/27/2021 08:53
+ predicateNegationIgnoringCase: aBoolean
+
+ 聽聽聽聽^negation
Regex-Core package postscript (changed)
- RxsPredicate initializeEscapedLetterSelectors.
+ RxParser initialize.
---
Sent from Squeak Inbox Talk
["Regex-Core-ct.74.mcz"]
["Regex-Core-ct.74.mcz"]
["Regex-Core-ct.74.mcz"]