Wouldn't that be a pretty big speed impact given how much strings are used?
From: "Alan Lovejoy" squeak-dev.sourcery@forum-mail.net Reply-To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org To: "'The general-purpose Squeak developers list'"squeak-dev@lists.squeakfoundation.org Subject: RE: UTF8 Squeak Date: Thu, 7 Jun 2007 11:55:02 -0700
Each String object should specify its encoding scheme. UTF-8 should be the default, but all commonly-encounterd encodings should be supported, and should all be useable at once (in different String instances.) When a Character is reified from a String, it should use the Unicode code point values (full 32-bit value.) Ideally, the encoding of a String should be a function of an associated Strategy object, and not be based on having different subclasses of String.
_________________________________________________________________ Need a break? Find your escape route with Live Search Maps. http://maps.live.com/default.aspx?ss=Restaurants~Hotels~Amusement%20Park&...
<Alan L>UTF-8 should be the default</Alan L>
<J J (Jason)>Wouldn't that be a pretty big speed impact given how much strings are used?</J J (Jason)>
Now that I think about it, that could very well be the case. There might be clever ways to make the impact much less than one might otherwise expect (for example, RunArrays were a clever way to make Text objects reasonably efficient)--but I haven't actually implmented it, so there's no guarantee.
So, perhaps the default internal String encoding should be UTF-32, instead of UTF-8 or UTF-16, in order to avoid the performance issue. But that raises a memory usage issue--which is the primary reason I don't think a "one size fits all" approach is sufficient.
--Alan
squeak-dev@lists.squeakfoundation.org