Re: [Pharo-dev] String >> #=

List overview All Threads
Download

newer

older

The Trunk:...

Re: [squeak-dev] Re: [Pharo-dev]...

Eliot Miranda

27 May 2014 27 May '14

6:45 a.m.

Hi Phillipe,

On Mon, May 26, 2014 at 12:51 AM, Philippe Marschall < philippe.marschall@netcetera.ch> wrote:

...

Hi

I have been investigating why Dictionary look up performance with String keys is not as good as I would expected. Something I noted is that String

...
...
#= is implemented in terms of #compare:with:collated:. There is no short

circuit if Strings are not the same size. In my case some Strings have the same prefix but a different length eg 'Content-Type' and 'Content-Length'. In that case a #compare:with:collated: is performed even though we know in advance the answer will be false because they have different sizes.

Why not rewrite

String>>= aString "Answer whether the receiver sorts equally as aString. The collation order is simple ascii (with case differences)." aString isString ifFalse: [ ^ false ]. ^ (self compare: self with: aString collated: AsciiOrder) = 2

String>>= aString "Answer whether the receiver sorts equally as aString. The collation order is simple ascii (with case differences)."

(aString isString and: [self size = aString size]) ifFalse: [^false]. ^ (self compare: self withSize: with: aString collated: AsciiOrder) = 2

One /could/ add a replacement compare:with:collated: primitive primitiveCompareString which took the sizes as arguments to avoid asking twice. But it wouldn't be safe. One could abuse the primitive and lie about the size. So I suspect it is best to add the size check to String>>#= and accept the duplication of the primitive finding the sizes of the two strings. The cost in the primitive is minimal. A WideString version of the primitive might pay its way, but if Spur and Sista arrive soon the primitive shouldn't be faster than the optimised Smalltalk code.

-- best, Eliot

Attachments:

attachment.html (text/html — 2.9 KB)

Show replies by date

J. Vuletich (mail lists)

27 May 27 May

3:54 p.m.

New subject: [Pharo-dev] String >> #=

Quoting Eliot Miranda eliot.miranda@gmail.com:

...

Hi Phillipe,
On Mon, May 26, 2014 at 12:51 AM, Philippe Marschall
philippe.marschall@netcetera.ch wrote:

...
Hi

I have been investigating why Dictionary look up performance with String keys is not as good as I would expected. Something I noted is that String >> #= is implemented in terms of #compare:with:collated:. There is no short circuit if Strings are not the same size. In my case some Strings have the same prefix but a different length eg 'Content-Type' and 'Content-Length'. In that case a #compare:with:collated: is performed even though we know in advance the answer will be false because they have different sizes.
  
 Why not rewrite
  
 String>>= aString 
 "Answer whether the receiver sorts equally as aString.
 The collation order is simple ascii (with case differences)."
 aString isString ifFalse: [ ^ false ].
 ^ (self compare: self with: aString collated: AsciiOrder) = 2
  
 as
  
       String>>= aString 
  "Answer whether the receiver sorts equally as aString.
  The collation order is simple ascii (with case differences)."
   
  (aString isString
  and: [self size = aString size]) ifFalse: [^false].
  ^ (self compare: self withSize: with: aString collated:
AsciiOrder) = 2
  
 ?

 
One /could/ add a replacement compare:with:collated: primitive primitiveCompareString which took the sizes as arguments to avoid asking twice. But it wouldn't be safe. One could abuse the primitive and lie about the size. So I suspect it is best to add the size check to String>>#= and accept the duplication of the primitive finding the sizes of the two strings. The cost in the primitive is minimal. A WideString version of the primitive might pay its way, but if Spur and Sista arrive soon the primitive shouldn't be faster than the optimised Smalltalk code. -- best, Eliot

BTW, any good reason for not prefixing all the implementors of #= with this?

"Any object is equal to itself" self == argument ifTrue: [ ^ true ]. Cheers, Juan Vuletich

Eliot Miranda

8:59 p.m.

New subject: [Pharo-dev] String >> #=

On Tue, May 27, 2014 at 6:54 AM, J. Vuletich (mail lists) < juanlists@jvuletich.org> wrote:

...

Quoting Eliot Miranda eliot.miranda@gmail.com:

Hi Phillipe,

On Mon, May 26, 2014 at 12:51 AM, Philippe Marschall < philippe.marschall@netcetera.ch> wrote:

...
Hi

I have been investigating why Dictionary look up performance with String keys is not as good as I would expected. Something I noted is that String

...
...
#= is implemented in terms of #compare:with:collated:. There is no short

circuit if Strings are not the same size. In my case some Strings have the same prefix but a different length eg 'Content-Type' and 'Content-Length'. In that case a #compare:with:collated: is performed even though we know in advance the answer will be false because they have different sizes.

Why not rewrite

String>>= aString "Answer whether the receiver sorts equally as aString. The collation order is simple ascii (with case differences)." aString isString ifFalse: [ ^ false ]. ^ (self compare: self with: aString collated: AsciiOrder) = 2

as

String>>= aString "Answer whether the receiver sorts equally as aString. The collation order is simple ascii (with case differences)."

(aString isString and: [self size = aString size]) ifFalse: [^false]. ^ (self compare: self withSize: with: aString collated: AsciiOrder) = 2

?

This makes a huge difference, over 3 times faster:

| bs t1 t2 | bs := ByteString allInstances first: 10000. t1 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun. (FileStream fileNamed: '/Users/eliot/Squeak/Squeak4.5/String-=.st') fileIn. t2 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun. { t1. t2 } #(13726 4467) 4467 - 13726 / 137.26 -67.46%

...

One /could/ add a replacement compare:with:collated: primitive primitiveCompareString which took the sizes as arguments to avoid asking twice. But it wouldn't be safe. One could abuse the primitive and lie about the size. So I suspect it is best to add the size check to String>>#= and accept the duplication of the primitive finding the sizes of the two strings. The cost in the primitive is minimal. A WideString version of the primitive might pay its way, but if Spur and Sista arrive soon the primitive shouldn't be faster than the optimised Smalltalk code. -- best, Eliot

BTW, any good reason for not prefixing all the implementors of #= with this?

"Any object is equal to itself" self == argument ifTrue: [ ^ true ].

It doesn't make much difference:

4560 - 4628 / 46.28 -1.47%

So is it worth it? If you feel it is I've no objection other than it feels a little kludgey for such little benefit. And there are the Symbols if one needs quick comparison and can bear the cost of slow interning.

-- best, Eliot

Andres Valloud

28 May 28 May

3:10 a.m.

New subject: [Pharo-dev] String >> #=

What is going to happen when one compares two general Unicode series of characters that represent the same string but differ in normalization? Wouldn't the size test would result in false negatives?

http://unicode.org/reports/tr15/

I'm asking because I haven't seen any discussion on the subject, and the decision to change the code as proposed could have side effects.

On 5/27/14 11:59 , Eliot Miranda wrote:

...

On Tue, May 27, 2014 at 6:54 AM, J. Vuletich (mail lists) <juanlists@jvuletich.org mailto:juanlists@jvuletich.org> wrote:
__

Quoting Eliot Miranda <eliot.miranda@gmail.com
<mailto:eliot.miranda@gmail.com>>:
...
Hi Phillipe,


On Mon, May 26, 2014 at 12:51 AM, Philippe Marschall
<philippe.marschall@netcetera.ch
<mailto:philippe.marschall@netcetera.ch>> wrote:

    Hi

    I have been investigating why Dictionary look up performance
    with String keys is not as good as I would expected. Something
    I noted is that String >> #= is implemented in terms of
    #compare:with:collated:. There is no short circuit if Strings
    are not the same size. In my case some Strings have the same
    prefix but a different length eg 'Content-Type' and
    'Content-Length'. In that case a #compare:with:collated: is
    performed even though we know in advance the answer will be
    false because they have different sizes.

Why not rewrite
String>>= aString
"Answer whether the receiver sorts equally as aString.
The collation order is simple ascii (with case differences)."
aString isString ifFalse: [ ^ false ].
^ (self compare: self with: aString collated: AsciiOrder) = 2
as
String>>= aString
"Answer whether the receiver sorts equally as aString.
The collation order is simple ascii (with case differences)."
(aString isString
and: [self size = aString size]) ifFalse: [^false].
^ (self compare: self withSize: with: aString collated:
AsciiOrder) = 2
?
This makes a huge difference, over 3 times faster:

| bs t1 t2 | bs := ByteString allInstances first: 10000. t1 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun. (FileStream fileNamed: '/Users/eliot/Squeak/Squeak4.5/String-=.st') fileIn. t2 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun. { t1. t2 } #(13726 4467) 4467 - 13726 / 137.26 -67.46%

...
One /could/ add a replacement compare:with:collated:
primitive primitiveCompareString which took the sizes as arguments
to avoid asking twice.  But it wouldn't be safe.  One could abuse
the primitive and lie about the size.  So I suspect it is best to
add the size check to String>>#= and accept the duplication of
the primitive finding the sizes of the two strings.  The cost in
the primitive is minimal.  A WideString version of the primitive
might pay its way, but if Spur and Sista arrive soon the primitive
shouldn't be faster than the optimised Smalltalk code.
--
best,
Eliot
BTW, any good reason for not prefixing all the implementors of #=
with this?

"Any object is equal to itself"
self == argument ifTrue: [ ^ true ].
It doesn't make much difference:

| bs t1 t2 | bs := ByteString allInstances first: 10000. t1 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun. (FileStream fileNamed: '/Users/eliot/Squeak/Squeak4.5/String-=.st') fileIn. t2 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun. { t1. t2 } #(4628 4560)

4560 - 4628 / 46.28 -1.47%

So is it worth it? If you feel it is I've no objection other than it feels a little kludgey for such little benefit. And there are the Symbols if one needs quick comparison and can bear the cost of slow interning. -- best, Eliot

Eliot Miranda

3:53 a.m.

New subject: [Pharo-dev] String >> #=

Hi Andres,

On Tue, May 27, 2014 at 6:10 PM, Andres Valloud < avalloud@smalltalk.comcastbiz.net> wrote:

...

What is going to happen when one compares two general Unicode series of characters that represent the same string but differ in normalization? Wouldn't the size test would result in false negatives?

http://unicode.org/reports/tr15/

I'm asking because I haven't seen any discussion on the subject, and the decision to change the code as proposed could have side effects.

The issue is whether String supports variable-sized encodings such as UTF-8 where there is no fixed relationship between the number of bytes i a string and the number of characters in a string. Right now we have ByteString and WideString. ByteString has 1 byte per character. WideString has 4 bytes per character. So 'hello' asByteString contains 5 bytes and has size 5, but 'hello' asWideString contains 20 bytes and also has size 5. Hence the size check is fine, since size answers the number of characters, not the number of bytes. If we were to add a UTF8String we'd have to delete the size check. But I think for now we're not going to do that.

A ByteString can contain some characters that comprise a UTF-8 string (see UTF8TextCoverter) but that's a convention of usage. if you print some ByteString containing the UTF-8 encoding of a string containing characters that take more than one byte to encode, that string won't print as the input, it'll print treating each byte as a character, and so will scramble the string. It is up to the user to handle these ByteStrings that happen by convention to contain UTF-8 correctly.

Note that there is nothing to stop us adding a UTF8String *provided* that class implements size to answer the number of characters, not the number of bytes. My understanding is that VW takes this approach also. File streams expose the encoding, sicne position is a byte position, not a character position, and so it is up to the file stream client to cope with the positioning complexities that this introduces, not the stream.

OK?

...

On 5/27/14 11:59 , Eliot Miranda wrote:

...
On Tue, May 27, 2014 at 6:54 AM, J. Vuletich (mail lists) <juanlists@jvuletich.org mailto:juanlists@jvuletich.org> wrote:
__

Quoting Eliot Miranda <eliot.miranda@gmail.com
<mailto:eliot.miranda@gmail.com>>:

 Hi Phillipe,
...
On Mon, May 26, 2014 at 12:51 AM, Philippe Marschall
<philippe.marschall@netcetera.ch
<mailto:philippe.marschall@netcetera.ch>> wrote:

    Hi

    I have been investigating why Dictionary look up performance
    with String keys is not as good as I would expected. Something
    I noted is that String >> #= is implemented in terms of
    #compare:with:collated:. There is no short circuit if Strings
    are not the same size. In my case some Strings have the same
    prefix but a different length eg 'Content-Type' and
    'Content-Length'. In that case a #compare:with:collated: is
    performed even though we know in advance the answer will be
    false because they have different sizes.

Why not rewrite
String>>= aString
"Answer whether the receiver sorts equally as aString.
The collation order is simple ascii (with case differences)."
aString isString ifFalse: [ ^ false ].
^ (self compare: self with: aString collated: AsciiOrder) = 2
as
String>>= aString
"Answer whether the receiver sorts equally as aString.
The collation order is simple ascii (with case differences)."
(aString isString
and: [self size = aString size]) ifFalse: [^false].
^ (self compare: self withSize: with: aString collated:
AsciiOrder) = 2
?
This makes a huge difference, over 3 times faster:

| bs t1 t2 | bs := ByteString allInstances first: 10000. t1 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun. (FileStream fileNamed: '/Users/eliot/Squeak/Squeak4.5/String-=.st') fileIn. t2 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun. { t1. t2 } #(13726 4467) 4467 - 13726 / 137.26 -67.46%
 One /could/ add a replacement compare:with:collated:
...
primitive primitiveCompareString which took the sizes as arguments
to avoid asking twice.  But it wouldn't be safe.  One could abuse
the primitive and lie about the size.  So I suspect it is best to
add the size check to String>>#= and accept the duplication of
the primitive finding the sizes of the two strings.  The cost in
the primitive is minimal.  A WideString version of the primitive
might pay its way, but if Spur and Sista arrive soon the primitive
shouldn't be faster than the optimised Smalltalk code.
--
best,
Eliot
BTW, any good reason for not prefixing all the implementors of #=
with this?

"Any object is equal to itself"
self == argument ifTrue: [ ^ true ].
It doesn't make much difference:

| bs t1 t2 | bs := ByteString allInstances first: 10000. t1 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun. (FileStream fileNamed: '/Users/eliot/Squeak/Squeak4.5/String-=.st') fileIn. t2 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun. { t1. t2 } #(4628 4560)

4560 - 4628 / 46.28 -1.47%

So is it worth it? If you feel it is I've no objection other than it feels a little kludgey for such little benefit. And there are the Symbols if one needs quick comparison and can bear the cost of slow interning. -- best, Eliot

-- best, Eliot

Andres Valloud

4:23 a.m.

New subject: [Pharo-dev] String >> #=

String encoding is perpendicular to my point. I'm referring to canonical equivalence as defined in section 1.1 of the document referenced by the URL I sent. For instance, the Hangul example in the first table shows that a combination of two characters (regardless of encoding) is to be considered canonically equivalent to a single character. From the document (which claims to be Unicode Standard Annex #15),

"Canonical equivalence is a fundamental equivalency between characters or sequences of characters that represent the same abstract character, and when correctly displayed should always have the same visual appearance and behavior."

How do you propose that a size check is appropriate in the presence of canonical equivalence? What is string equivalence supposed to mean? I think more attention should be given to those questions.

On 5/27/14 18:53 , Eliot Miranda wrote:

...

Hi Andres,

On Tue, May 27, 2014 at 6:10 PM, Andres Valloud <avalloud@smalltalk.comcastbiz.net mailto:avalloud@smalltalk.comcastbiz.net> wrote:

What is going to happen when one compares two general Unicode series
of characters that represent the same string but differ in
normalization? Wouldn't the size test would result in false negatives?

http://unicode.org/reports/__tr15/ <http://unicode.org/reports/tr15/>

I'm asking because I haven't seen any discussion on the subject, and
the decision to change the code as proposed could have side effects.

OK?

On 5/27/14 11:59 , Eliot Miranda wrote:




    On Tue, May 27, 2014 at 6:54 AM, J. Vuletich (mail lists)
    <juanlists@jvuletich.org <mailto:juanlists@jvuletich.org>
    <mailto:juanlists@jvuletich.__org
    <mailto:juanlists@jvuletich.org>>> wrote:

         __

         Quoting Eliot Miranda <eliot.miranda@gmail.com
    <mailto:eliot.miranda@gmail.com>
         <mailto:eliot.miranda@gmail.__com
    <mailto:eliot.miranda@gmail.com>>>:

             Hi Phillipe,


             On Mon, May 26, 2014 at 12:51 AM, Philippe Marschall
             <philippe.marschall@netcetera.__ch
        <mailto:philippe.marschall@netcetera.ch>
             <mailto:philippe.marschall@__netcetera.ch
        <mailto:philippe.marschall@netcetera.ch>>> wrote:

                 Hi

                 I have been investigating why Dictionary look up
        performance
                 with String keys is not as good as I would
        expected. Something
                 I noted is that String >> #= is implemented in terms of
                 #compare:with:collated:. There is no short circuit
        if Strings
                 are not the same size. In my case some Strings have
        the same
                 prefix but a different length eg 'Content-Type' and
                 'Content-Length'. In that case a
        #compare:with:collated: is
                 performed even though we know in advance the answer
        will be
                 false because they have different sizes.

             Why not rewrite
             String>>= aString
             "Answer whether the receiver sorts equally as aString.
             The collation order is simple ascii (with case
        differences)."
             aString isString ifFalse: [ ^ false ].
             ^ (self compare: self with: aString collated:
        AsciiOrder) = 2
             as
             String>>= aString
             "Answer whether the receiver sorts equally as aString.
             The collation order is simple ascii (with case
        differences)."
             (aString isString
             and: [self size = aString size]) ifFalse: [^false].
             ^ (self compare: self withSize: with: aString collated:
             AsciiOrder) = 2
             ?



    This makes a huge difference, over 3 times faster:

    | bs t1 t2 |
    bs := ByteString allInstances first: 10000.
    t1 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun.
    (FileStream fileNamed:
    '/Users/eliot/Squeak/Squeak4.__5/String-=.st') fileIn.
    t2 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun.
    { t1. t2 } #(13726 4467)
    4467 - 13726 / 137.26 -67.46%

             One /could/ add a replacement compare:with:collated:
             primitive primitiveCompareString which took the sizes
        as arguments
             to avoid asking twice.  But it wouldn't be safe.  One
        could abuse
             the primitive and lie about the size.  So I suspect it
        is best to
             add the size check to String>>#= and accept the
        duplication of
             the primitive finding the sizes of the two strings.
          The cost in
             the primitive is minimal.  A WideString version of the
        primitive
             might pay its way, but if Spur and Sista arrive soon
        the primitive
             shouldn't be faster than the optimised Smalltalk code.
             --
             best,
             Eliot


         BTW, any good reason for not prefixing all the implementors
    of #=
         with this?

         "Any object is equal to itself"
         self == argument ifTrue: [ ^ true ].


    It doesn't make much difference:

    | bs t1 t2 |
    bs := ByteString allInstances first: 10000.
    t1 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun.
    (FileStream fileNamed:
    '/Users/eliot/Squeak/Squeak4.__5/String-=.st') fileIn.
    t2 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun.
    { t1. t2 } #(4628 4560)

    4560 - 4628 / 46.28 -1.47%

    So is it worth it?  If you feel it is I've no objection other
    than it
    feels a little kludgey for such little benefit.  And there are the
    Symbols if one needs quick comparison and can bear the cost of slow
    interning.
    --
    best,
    Eliot

-- best, Eliot

Yoshiki Ohshima

4:50 a.m.

New subject: [Pharo-dev] String >> #=

At Tue, 27 May 2014 19:23:09 -0700, Andres Valloud wrote:

...

String encoding is perpendicular to my point. I'm referring to canonical equivalence as defined in section 1.1 of the document referenced by the URL I sent. For instance, the Hangul example in the first table shows that a combination of two characters (regardless of encoding) is to be considered canonically equivalent to a single character. From the document (which claims to be Unicode Standard Annex #15),

"Canonical equivalence is a fundamental equivalency between characters or sequences of characters that represent the same abstract character, and when correctly displayed should always have the same visual appearance and behavior."

How do you propose that a size check is appropriate in the presence of canonical equivalence? What is string equivalence supposed to mean? I think more attention should be given to those questions.

I think that the single equal message (=) in the Smalltalk language should not really worry about canonical equvalence. For those who need it, it'd be fine to define a new selector and does the real stuff, and such method could track the Unicode standard revisions and do the right thing. But something as fundamental as String>>#= does not have to have dependency to the external standard.

-- Yoshiki

Nicolas Cellier

8:14 a.m.

New subject: [Pharo-dev] String >> #=

2014-05-28 4:50 GMT+02:00 Yoshiki Ohshima Yoshiki.Ohshima@acm.org:

...

At Tue, 27 May 2014 19:23:09 -0700, Andres Valloud wrote:

...
String encoding is perpendicular to my point. I'm referring to canonical equivalence as defined in section 1.1 of the document referenced by the URL I sent. For instance, the Hangul example in the first table shows that a combination of two characters (regardless of encoding) is to be considered canonically equivalent to a single character. From the document (which claims to be Unicode Standard Annex #15),

"Canonical equivalence is a fundamental equivalency between characters or sequences of characters that represent the same abstract character, and when correctly displayed should always have the same visual appearance and behavior."

How do you propose that a size check is appropriate in the presence of canonical equivalence? What is string equivalence supposed to mean? I think more attention should be given to those questions.

I think that the single equal message (=) in the Smalltalk language should not really worry about canonical equvalence. For those who need it, it'd be fine to define a new selector and does the real stuff, and such method could track the Unicode standard revisions and do the right thing. But something as fundamental as String>>#= does not have to have dependency to the external standard.

-- Yoshiki

If internal representation is not canonical, we are going toward a path of maximum complexity. All comparison functions = < > <= >= hash will have to first canonicalize. So i tend to agree with Yoshiki, let these kernel methods perform their dumb task, and reject this complexity outside.

Well beyond the complexity of Unicode, the cr-lf mess already creates the same problem. There is no semantic difference between cr and cr-lf. Though I had to insert a few withSqueakLineEndings sends in Monticello when playing with GitFileTree.

Chris Muller

3:49 a.m.

New subject: [Vm-dev] Re: Re: [Pharo-dev] String >> #=

...

On Tue, May 27, 2014 at 6:54 AM, J. Vuletich (mail lists) juanlists@jvuletich.org wrote:

...
Quoting Eliot Miranda eliot.miranda@gmail.com:

Hi Phillipe,

On Mon, May 26, 2014 at 12:51 AM, Philippe Marschall philippe.marschall@netcetera.ch wrote:

...
Hi

I have been investigating why Dictionary look up performance with String keys is not as good as I would expected. Something I noted is that String >> #= is implemented in terms of #compare:with:collated:. There is no short circuit if Strings are not the same size. In my case some Strings have the same prefix but a different length eg 'Content-Type' and 'Content-Length'. In that case a #compare:with:collated: is performed even though we know in advance the answer will be false because they have different sizes.

Why not rewrite

String>>= aString "Answer whether the receiver sorts equally as aString. The collation order is simple ascii (with case differences)." aString isString ifFalse: [ ^ false ]. ^ (self compare: self with: aString collated: AsciiOrder) = 2

as

String>>= aString "Answer whether the receiver sorts equally as aString. The collation order is simple ascii (with case differences)."

(aString isString and: [self size = aString size]) ifFalse: [^false]. ^ (self compare: self withSize: with: aString collated: AsciiOrder) = 2

?

This makes a huge difference, over 3 times faster:

| bs t1 t2 | bs := ByteString allInstances first: 10000. t1 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun. (FileStream fileNamed: '/Users/eliot/Squeak/Squeak4.5/String-=.st') fileIn. t2 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun. { t1. t2 } #(13726 4467) 4467 - 13726 / 137.26 -67.46%

...
One /could/ add a replacement compare:with:collated: primitive primitiveCompareString which took the sizes as arguments to avoid asking twice. But it wouldn't be safe. One could abuse the primitive and lie about the size. So I suspect it is best to add the size check to String>>#= and accept the duplication of the primitive finding the sizes of the two strings. The cost in the primitive is minimal. A WideString version of the primitive might pay its way, but if Spur and Sista arrive soon the primitive shouldn't be faster than the optimised Smalltalk code.

best, Eliot

BTW, any good reason for not prefixing all the implementors of #= with this?

"Any object is equal to itself" self == argument ifTrue: [ ^ true ].

It doesn't make much difference:

| bs t1 t2 | bs := ByteString allInstances first: 10000. t1 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun. (FileStream fileNamed: '/Users/eliot/Squeak/Squeak4.5/String-=.st') fileIn. t2 := [bs do: [:a| bs do: [:b| a = b]]] timeToRun. { t1. t2 } #(4628 4560)

4560 - 4628 / 46.28 -1.47%

So is it worth it? If you feel it is I've no objection other than it feels a little kludgey for such little benefit.

That is a very common convention in the image, not kludgey. String>>#= should advertise, by its implementation, its desire and intent for maximum performance. Not covering this check conveys a lack of thoroghness and/or dedication to performance. Future readers of the method will wonder why such a simple avoidance of unnecessary processing for that case wasn't taken. By the time they wonder whether it was an oversight they've already expended too much thought about it.

We should really consider Andres' question before popping this into trunk though..

...

And there are the Symbols if one needs quick comparison and can bear the cost of slow interning.

...

-- best, Eliot

3643

Age (days ago)

3644

Last active (days ago)

squeak-dev@lists.squeakfoundation.org

8 comments

6 participants

tags (0)

participants (6)

Andres Valloud
Chris Muller
Eliot Miranda
J. Vuletich (mail lists)
Nicolas Cellier
Yoshiki Ohshima