I have a ReadStream and I want to detect some substrings in it.
This works, but it is ugly.
((self match:'{|') |
(self match:'|-') |
(self match:'|}') |
(self match:'{{') |
(self match:'}}') |
(self match:'[[') |
(self match:']]') |
(self match:'__') |
(self match:'==') |
(self match:'::') |
(self match:'**') |
(self match:'##') |
(self match:'''') )
Is anybody aware of an elegant approach to this?
Something along the lines of
self matchAny: { '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' }
thx in advance
What about
{ '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' } anySatisfy: [:pattern | self match: pattern]
?
Maybe also this one if the identity of the matching pattern is of interest:
{ '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' }
detect: [:pattern | self match: pattern]
ifFound: [:pattern | self inform: 'Matched pattern: ' , pattern]
ifNone: [self inform: 'no match']
Best,
Christoph
PS: Don't use #| unless you explicitly want every method to be invoked always. Use #or:... instead, this is faster.
________________________________ Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von gettimothy via Squeak-dev squeak-dev@lists.squeakfoundation.org Gesendet: Samstag, 27. November 2021 17:30:38 An: squeak-dev Betreff: [squeak-dev] Anybody got an elegent construct for this functional monstrosity?
I have a ReadStream and I want to detect some substrings in it.
This works, but it is ugly.
((self match:'{|') | (self match:'|-') | (self match:'|}') | (self match:'{{') | (self match:'}}') | (self match:'[[') | (self match:']]') | (self match:'__') | (self match:'==') | (self match:'::') | (self match:'**') | (self match:'##') | (self match:'''') )
Is anybody aware of an elegant approach to this?
Something along the lines of
self matchAny: { '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' }
thx in advance
On Sat, Nov 27, 2021 at 8:36 AM Thiede, Christoph < Christoph.Thiede@student.hpi.uni-potsdam.de> wrote:
What about
{ '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' } anySatisfy: [:pattern | self match: pattern]
?
Maybe also this one if the identity of the matching pattern is of interest:
{ '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' }
detect: [:pattern | self match: pattern] ifFound: [:pattern | self inform: 'Matched pattern: ' , pattern] ifNone: [self inform: 'no match']
Best,
Christoph
PS: Don't use #| unless you explicitly want every method to be invoked always. Use #or:... instead, this is faster.
And just as importantly, never use a brace construct when a literal array will do. { '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' } is created at run-time. The equivalent #('{|' '|-' '|}' '{{' '}}' '[[' ']]' '__' '==' '::' '**' '##' '''') is created at compile-time. Inspect the method in the browser or the debugger and have a look at the bytecode.
If performancer is important you'll construct a parser of some form. For example, the simplest optimization here is to check if the first character is a candidate and then if the second character is a candidate. In a parser you'd have different code executed for each first character candidate. But the below avoids doing a match until we know both characters are in the set. I've written it as a doit bit I'm imagining Firsts and Seconds are class or instance variables (the issue here is provided the matcher is called often we want Firsts and Seconds to be computed precisely once).
| patterns first second Firsts Seconds | patterns := #('{|' '|-' '|}' '{{' '}}' '[[' ']]' '__' '==' '::' '**' '##' ''''). Firsts ifNil: [Firsts := (patterns collect: #first) as: String. Seconds := (patterns collect: #second) as: String]. self size >= 2 and: [(Firsts includes: (first := self first)) and: [(Seconds includes: (second := sef second) and: [patterns includes: (ByteString with: first with: second)]]]
------------------------------
*Von:* Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von gettimothy via Squeak-dev < squeak-dev@lists.squeakfoundation.org> *Gesendet:* Samstag, 27. November 2021 17:30:38 *An:* squeak-dev *Betreff:* [squeak-dev] Anybody got an elegent construct for this functional monstrosity?
I have a ReadStream and I want to detect some substrings in it.
This works, but it is ugly.
((self match:'{|') | (self match:'|-') | (self match:'|}') | (self match:'{{') | (self match:'}}') | (self match:'[[') | (self match:']]') | (self match:'__') | (self match:'==') | (self match:'::') | (self match:'**') | (self match:'##') | (self match:'''') )
Is anybody aware of an elegant approach to this?
Something along the lines of
self matchAny: { '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' }
thx in advance
On Sat, Nov 27, 2021 at 9:47 AM Eliot Miranda eliot.miranda@gmail.com wrote:
On Sat, Nov 27, 2021 at 8:36 AM Thiede, Christoph < Christoph.Thiede@student.hpi.uni-potsdam.de> wrote:
What about
{ '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' } anySatisfy: [:pattern | self match: pattern]
?
Maybe also this one if the identity of the matching pattern is of interest:
{ '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' }
detect: [:pattern | self match: pattern] ifFound: [:pattern | self inform: 'Matched pattern: ' , pattern] ifNone: [self inform: 'no match']
Best,
Christoph
PS: Don't use #| unless you explicitly want every method to be invoked always. Use #or:... instead, this is faster.
And just as importantly, never use a brace construct when a literal array will do. { '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' } is created at run-time. The equivalent #('{|' '|-' '|}' '{{' '}}' '[[' ']]' '__' '==' '::' '**' '##' '''') is created at compile-time. Inspect the method in the browser or the debugger and have a look at the bytecode.
If performancer is important you'll construct a parser of some form. For example, the simplest optimization here is to check if the first character is a candidate and then if the second character is a candidate. In a parser you'd have different code executed for each first character candidate. But the below avoids doing a match until we know both characters are in the set. I've written it as a doit bit I'm imagining Firsts and Seconds are class or instance variables (the issue here is provided the matcher is called often we want Firsts and Seconds to be computed precisely once).
| patterns first second Firsts Seconds | patterns := #('{|' '|-' '|}' '{{' '}}' '[[' ']]' '__' '==' '::' '**' '##' ''''). Firsts ifNil: [Firsts := (patterns collect: #first) as: String. Seconds := (patterns collect: #second) as: String]. self size >= 2 and: [(Firsts includes: (first := self first)) and: [(Seconds includes: (second := sef second) and: [patterns includes: (ByteString with: first with: second)]]]
Oops. I meant of course | patterns first second Firsts Seconds | patterns := #('{|' '|-' '|}' '{{' '}}' '[[' ']]' '__' '==' '::' '**' '##' ''''). Firsts ifNil: [Firsts := ((patterns collect: #first) as: Set) as: String. Seconds := ((patterns collect: #second) as: Set) as: String]. self size >= 2 and: [(Firsts includes: (first := self first)) and: [(Seconds includes: (second := sef second) and: [patterns includes: (ByteString with: first with: second)]]]
*Von:* Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von gettimothy via Squeak-dev < squeak-dev@lists.squeakfoundation.org> *Gesendet:* Samstag, 27. November 2021 17:30:38 *An:* squeak-dev *Betreff:* [squeak-dev] Anybody got an elegent construct for this functional monstrosity?
I have a ReadStream and I want to detect some substrings in it.
This works, but it is ugly.
((self match:'{|') | (self match:'|-') | (self match:'|}') | (self match:'{{') | (self match:'}}') | (self match:'[[') | (self match:']]') | (self match:'__') | (self match:'==') | (self match:'::') | (self match:'**') | (self match:'##') | (self match:'''') )
Is anybody aware of an elegant approach to this?
Something along the lines of
self matchAny: { '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' }
thx in advance
-- _,,,^..^,,,_ best, Eliot
heh.
Awesome!
Thank you.
I will test this in the coming days and understand it.
For my immediate task this would be pre-mature optimization within the larger scope of work I am still cobbling together.
I will definitelly add this to the SqueakHOWTO's later this week, possibly by tomorrow.
t
---- On Sat, 27 Nov 2021 12:50:10 -0500 Eliot Miranda eliot.miranda@gmail.com wrote ----
On Sat, Nov 27, 2021 at 9:47 AM Eliot Miranda mailto:eliot.miranda@gmail.com wrote:
On Sat, Nov 27, 2021 at 8:36 AM Thiede, Christoph mailto:Christoph.Thiede@student.hpi.uni-potsdam.de wrote:
What about
{ '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' } anySatisfy: [:pattern | self match: pattern]
?
Maybe also this one if the identity of the matching pattern is of interest:
{ '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' }
detect: [:pattern | self match: pattern]
ifFound: [:pattern | self inform: 'Matched pattern: ' , pattern]
ifNone: [self inform: 'no match']
Best,
Christoph
PS: Don't use #| unless you explicitly want every method to be invoked always. Use #or:... instead, this is faster.
And just as importantly, never use a brace construct when a literal array will do. { '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' } is created at run-time. The equivalent #('{|' '|-' '|}' '{{' '}}' '[[' ']]' '__' '==' '::' '**' '##' '''') is created at compile-time. Inspect the method in the browser or the debugger and have a look at the bytecode.
If performancer is important you'll construct a parser of some form. For example, the simplest optimization here is to check if the first character is a candidate and then if the second character is a candidate. In a parser you'd have different code executed for each first character candidate. But the below avoids doing a match until we know both characters are in the set. I've written it as a doit bit I'm imagining Firsts and Seconds are class or instance variables (the issue here is provided the matcher is called often we want Firsts and Seconds to be computed precisely once).
| patterns first second Firsts Seconds | patterns := #('{|' '|-' '|}' '{{' '}}' '[[' ']]' '__' '==' '::' '**' '##' ''''). Firsts ifNil: [Firsts := (patterns collect: #first) as: String. Seconds := (patterns collect: #second) as: String]. self size >= 2 and: [(Firsts includes: (first := self first)) and: [(Seconds includes: (second := sef second) and: [patterns includes: (ByteString with: first with: second)]]]
Oops. I meant of course
| patterns first second Firsts Seconds | patterns := #('{|' '|-' '|}' '{{' '}}' '[[' ']]' '__' '==' '::' '**' '##' ''''). Firsts ifNil: [Firsts := ((patterns collect: #first) as: Set) as: String. Seconds := ((patterns collect: #second) as: Set) as: String]. self size >= 2 and: [(Firsts includes: (first := self first)) and: [(Seconds includes: (second := sef second)
and: [patterns includes: (ByteString with: first with: second)]]]
Von: Squeak-dev mailto:squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von gettimothy via Squeak-dev mailto:squeak-dev@lists.squeakfoundation.org Gesendet: Samstag, 27. November 2021 17:30:38 An: squeak-dev Betreff: [squeak-dev] Anybody got an elegent construct for this functional monstrosity?
I have a ReadStream and I want to detect some substrings in it.
This works, but it is ugly.
((self match:'{|') |
(self match:'|-') |
(self match:'|}') |
(self match:'{{') |
(self match:'}}') |
(self match:'[[') |
(self match:']]') |
(self match:'__') |
(self match:'==') |
(self match:'::') |
(self match:'**') |
(self match:'##') |
(self match:'''') )
Is anybody aware of an elegant approach to this?
Something along the lines of
self matchAny: { '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' }
thx in advance
Beautiful, Eliot. :-)
(I was just wondering why our regex matcher is not capable to handle this kind of query efficiently - afaik it uses a DFS approach. BFS regex optimization could be just another interesting side-project ... :D)
Best,
Christoph
________________________________ Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von Eliot Miranda eliot.miranda@gmail.com Gesendet: Samstag, 27. November 2021 18:50:10 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] Anybody got an elegent construct for this functional monstrosity?
On Sat, Nov 27, 2021 at 9:47 AM Eliot Miranda <eliot.miranda@gmail.commailto:eliot.miranda@gmail.com> wrote:
On Sat, Nov 27, 2021 at 8:36 AM Thiede, Christoph <Christoph.Thiede@student.hpi.uni-potsdam.demailto:Christoph.Thiede@student.hpi.uni-potsdam.de> wrote:
What about
{ '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' } anySatisfy: [:pattern | self match: pattern]
?
Maybe also this one if the identity of the matching pattern is of interest:
{ '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' }
detect: [:pattern | self match: pattern]
ifFound: [:pattern | self inform: 'Matched pattern: ' , pattern]
ifNone: [self inform: 'no match']
Best,
Christoph
PS: Don't use #| unless you explicitly want every method to be invoked always. Use #or:... instead, this is faster.
And just as importantly, never use a brace construct when a literal array will do. { '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' } is created at run-time. The equivalent #('{|' '|-' '|}' '{{' '}}' '[[' ']]' '__' '==' '::' '**' '##' '''') is created at compile-time. Inspect the method in the browser or the debugger and have a look at the bytecode.
If performancer is important you'll construct a parser of some form. For example, the simplest optimization here is to check if the first character is a candidate and then if the second character is a candidate. In a parser you'd have different code executed for each first character candidate. But the below avoids doing a match until we know both characters are in the set. I've written it as a doit bit I'm imagining Firsts and Seconds are class or instance variables (the issue here is provided the matcher is called often we want Firsts and Seconds to be computed precisely once).
| patterns first second Firsts Seconds | patterns := #('{|' '|-' '|}' '{{' '}}' '[[' ']]' '__' '==' '::' '**' '##' ''''). Firsts ifNil: [Firsts := (patterns collect: #first) as: String. Seconds := (patterns collect: #second) as: String]. self size >= 2 and: [(Firsts includes: (first := self first)) and: [(Seconds includes: (second := sef second) and: [patterns includes: (ByteString with: first with: second)]]]
Oops. I meant of course | patterns first second Firsts Seconds | patterns := #('{|' '|-' '|}' '{{' '}}' '[[' ']]' '__' '==' '::' '**' '##' ''''). Firsts ifNil: [Firsts := ((patterns collect: #first) as: Set) as: String. Seconds := ((patterns collect: #second) as: Set) as: String]. self size >= 2 and: [(Firsts includes: (first := self first)) and: [(Seconds includes: (second := sef second) and: [patterns includes: (ByteString with: first with: second)]]]
________________________________ Von: Squeak-dev <squeak-dev-bounces@lists.squeakfoundation.orgmailto:squeak-dev-bounces@lists.squeakfoundation.org> im Auftrag von gettimothy via Squeak-dev <squeak-dev@lists.squeakfoundation.orgmailto:squeak-dev@lists.squeakfoundation.org> Gesendet: Samstag, 27. November 2021 17:30:38 An: squeak-dev Betreff: [squeak-dev] Anybody got an elegent construct for this functional monstrosity?
I have a ReadStream and I want to detect some substrings in it.
This works, but it is ugly.
((self match:'{|') | (self match:'|-') | (self match:'|}') | (self match:'{{') | (self match:'}}') | (self match:'[[') | (self match:']]') | (self match:'__') | (self match:'==') | (self match:'::') | (self match:'**') | (self match:'##') | (self match:'''') )
Is anybody aware of an elegant approach to this?
Something along the lines of
self matchAny: { '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' }
thx in advance
-- _,,,^..^,,,_ best, Eliot
-- _,,,^..^,,,_ best, Eliot
Here is a stab of Eliot's approach...
Oops. I meant of course
| patterns first second Firsts Seconds | patterns := #('{|' '|-' '|}' '{{' '}}' '[[' ']]' '__' '==' '::' '**' '##' ''''). Firsts ifNil: [Firsts := ((patterns collect: #first) as: Set) as: String. Seconds := ((patterns collect: #second) as: Set) as: String]. self size >= 2 and: [(Firsts includes: (first := self first)) and: [(Seconds includes: (second := sef second) and: [patterns includes: (ByteString with: first with: second)]]]
adapted towards a workspace.
| ios patterns firsts seconds|
ios := ReadStream on: 'Neque porro quisquam est qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit...|-'.
patterns := #('{|' '|-' '|}' '{{' '}}' '[[' ']]' '__' '==' '::' '**' '##' '''''').
firsts := ((patterns collect: #first) as: Set).
seconds := ((patterns collect: #second) as: Set).
[(ios size >= 2) & (ios peek notNil) ]
whileTrue:[
((firsts includes: (first := ios next))
and: [(seconds includes: (second := ios peek))]
and: [patterns includes: (ByteString with: first with: second)])
ifTrue:[^true]]
cordially
Thank you! That is a keeper.
However, I had to add a reset to get the first one to work becuase it appears the Stream runs to the end at each iteration of the Array.
|ios|
ios := ReadStream on: 'Neque porro quisquam est qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit...|-'.
{ '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' } anySatisfy: [:pattern | ios match: pattern. ios reset].
The second form is not detecting the the '|-' at the end, as far as I can tell.
|ios|
ios := ReadStream on: 'Neque porro quisquam est qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit...|-'.
{ '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' }
detect: [:pattern | ios match: pattern]
ifFound: [:pattern | self inform: 'Matched pattern: ' , pattern.]
ifNone: [self inform: 'no match'. ].
ios reset
cheers, t
---- On Sat, 27 Nov 2021 11:36:49 -0500 Thiede, Christoph Christoph.Thiede@student.hpi.uni-potsdam.de wrote ----
What about
{ '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' } anySatisfy: [:pattern | self match: pattern]
?
Maybe also this one if the identity of the matching pattern is of interest:
{ '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' }
detect: [:pattern | self match: pattern]
ifFound: [:pattern | self inform: 'Matched pattern: ' , pattern]
ifNone: [self inform: 'no match']
Best,
Christoph
PS: Don't use #| unless you explicitly want every method to be invoked always. Use #or:... instead, this is faster.
Von: Squeak-dev mailto:squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von gettimothy via Squeak-dev mailto:squeak-dev@lists.squeakfoundation.org Gesendet: Samstag, 27. November 2021 17:30:38 An: squeak-dev Betreff: [squeak-dev] Anybody got an elegent construct for this functional monstrosity?
I have a ReadStream and I want to detect some substrings in it.
This works, but it is ugly.
((self match:'{|') |
(self match:'|-') |
(self match:'|}') |
(self match:'{{') |
(self match:'}}') |
(self match:'[[') |
(self match:']]') |
(self match:'__') |
(self match:'==') |
(self match:'::') |
(self match:'**') |
(self match:'##') |
(self match:'''') )
Is anybody aware of an elegant approach to this?
Something along the lines of
self matchAny: { '{|' . '|-' . '|}' . '{{' . '}}' . '[[' . ']]' . '__' . '==' . '::' . '**' . '##' . '''' }
thx in advance
Hi Christoph
The non-optimized version of your first example is here:
https://github.com/gettimothy/Doc-SqueakHOWTO/blob/master/SqueakHOWTO.md#org...
I want to add in Eliot's work and a possible fix of the second detect: version when I get time.
After that, I can import it into the CustomHelp using the Doc converters.
I am busy with a larger task at the moment,though.
cheers,
t
squeak-dev@lists.squeakfoundation.org