Hi,
I think we found a bug, but I'm interested in your opinion before "fixing" it. Some TextConverters (e.g. ISO88592TextConverter) implement #leadingChar. The problem is that this #leadingChar is added to all decoded characters. Since character equality takes leadingChar into account, these decoded characters will never be equal to unicode characters. The following example returns false, because the carriage return (13) will be decoded as (Character value: 58720269):
(String cr convertFromWithConverter: ISO88592TextConverter new) = String cr
The current system (Collections, Compiler, etc) assumes that the first 256 characters are unique and doesn't care about the variants of these characters which have non-zero leadingChar.
So, I think we should change Character class >> #leadingChar:code: to ignore it's first argument, when the second is less than 256.
Also, I think only TextConverters of CJKV languages should implement #leadingChar, because AFAIK only the characters of those languages are unified.
What do you think?
Cheers, Levente
No response, so I uploaded Collections-ul.440 and Multilingual-ul.141 to the Inbox. In addition to the previously described ideas, I implemented the various copy methods for Character, because they are not unique since Squeak 3.8. The tests are green.
Levente
On Fri, 22 Apr 2011, Levente Uzonyi wrote:
Hi,
I think we found a bug, but I'm interested in your opinion before "fixing" it. Some TextConverters (e.g. ISO88592TextConverter) implement #leadingChar. The problem is that this #leadingChar is added to all decoded characters. Since character equality takes leadingChar into account, these decoded characters will never be equal to unicode characters. The following example returns false, because the carriage return (13) will be decoded as (Character value: 58720269):
(String cr convertFromWithConverter: ISO88592TextConverter new) = String cr
The current system (Collections, Compiler, etc) assumes that the first 256 characters are unique and doesn't care about the variants of these characters which have non-zero leadingChar.
So, I think we should change Character class >> #leadingChar:code: to ignore it's first argument, when the second is less than 256.
Also, I think only TextConverters of CJKV languages should implement #leadingChar, because AFAIK only the characters of those languages are unified.
What do you think?
Cheers, Levente
There were no objection, so the code is in the Trunk now.
Levente
On Tue, 26 Apr 2011, Levente Uzonyi wrote:
No response, so I uploaded Collections-ul.440 and Multilingual-ul.141 to the Inbox. In addition to the previously described ideas, I implemented the various copy methods for Character, because they are not unique since Squeak 3.8. The tests are green.
Levente
On Fri, 22 Apr 2011, Levente Uzonyi wrote:
Hi,
I think we found a bug, but I'm interested in your opinion before "fixing" it. Some TextConverters (e.g. ISO88592TextConverter) implement #leadingChar. The problem is that this #leadingChar is added to all decoded characters. Since character equality takes leadingChar into account, these decoded characters will never be equal to unicode characters. The following example returns false, because the carriage return (13) will be decoded as (Character value: 58720269):
(String cr convertFromWithConverter: ISO88592TextConverter new) = String cr
The current system (Collections, Compiler, etc) assumes that the first 256 characters are unique and doesn't care about the variants of these characters which have non-zero leadingChar.
So, I think we should change Character class >> #leadingChar:code: to ignore it's first argument, when the second is less than 256.
Also, I think only TextConverters of CJKV languages should implement #leadingChar, because AFAIK only the characters of those languages are unified.
What do you think?
Cheers, Levente
squeak-dev@lists.squeakfoundation.org