Hi Squeakers, I have some problems with the ASCII code of Squeak 2.0
Try a 'print it' of this:
| file a|
file _ StandardFileStream newFileNamed:'ascii.txt'. 1 to: 256 do:[ :i| file nextPut:(Character characterTable at: i)]. file close.
file _ StandardFileStream fileNamed: 'ascii.txt'. file reset. a _ file contentsOfEntireFile.
And after this,try to read the 'ascii.txt' file with BlockNotes(Win95).
If you compare the content of BlockNotes to the content of a, you will find a big difference. WHY?
How can I avoid this difference and all the consequent problems?
Thanks for your help
Alessandro
Allessandro,
Try a 'print it' of this:
[...]
And after this,try to read the 'ascii.txt' file with BlockNotes(Win95).
If you compare the content of BlockNotes to the content of a, you will find a big difference. WHY?
Because of the differences in the character sets used (Note: This is determined by the *look* of the bitmaps describing certain characters). Squeak uses a Mac Roman character set whereas Windows uses an Ascii character set. While the first 127 characters are the same, the mapping of everything starting from 128 is fairly different.
How can I avoid this difference and all the consequent problems?
You can use a character translation table. There is one in sqWin32Window.c mapping from Win to Squeak. It looks as follows
{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,100,101,102,103,104,105,106,107,108,109,110,111, 112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127, 173,176,226,196,227,201,160,224,246,228,178,220,206,179,182,183, 184,212,213,210,211,165,208,209,247,170,185,221,207,186,189,217, 202,193,162,163,219,180,195,164,172,169,187,199,194,197,168,248, 161,177,198,215,171,181,166,225,252,218,188,200,222,223,240,192, 203,231,229,204,128,129,174,130,233,131,230,232,237,234,235,236, 245,132,241,238,239,205,133,249,175,244,242,243,134,250,251,167, 136,135,137,139,138,140,190,141,143,142,144,145,147,146,148,149, 253,150,152,151,153,155,154,214,191,157,156,158,159,254,255,216 };
and is used to translate the input characters in the VM. The reverse table (e.g., mapping from Squeak to Win) can be created by
| squeakToWinTable | squeakToWinTable _ ByteArray new: 256. 1 to: 256 do:[:i| squeakToWinTable at: (winToSqueakTable at: i)+1 put: i-1].
Hope this helps. Andreas
I'd like to suggest that Squeak switches to Unicode. The Unicode subset with a high byte of zero is equal to the ISO 8859L1 encoding, which is the default for Windows and most X platforms (have the HP weirdos changed their minds about hp-roman8 recently? :-)
Hans-Martin
On Fri, 24 Jul 1998, Hans-Martin Mosner wrote:
I'd like to suggest that Squeak switches to Unicode.
I think that this would be a good idea, as well. There are a couple of Squeak/Smalltalk-specific ramifications I'd like to point out.
In Unicode U+005F is the LOW LINE or SPACING UNDERSCORE rather than the left arrow used as the assignment operator in Squeak. The Unicode standard doesn't even provide a cross-reference to the left arrow for that code point. (Anybody remember the ASR33 teletypes?) The LEFTWARDS ARROW is U+2190.
Although I prefer seeing the LEFTWARDS ARROW over COLON + EQUALS, I prefer COLON + EQUALS over SPACING UNDERSCORE. Viewing code snippets outside of the Squeak environment is often annoying.
There is a similar, but less annoying problem with the return operator. U+005E is the CIRCUMFLEX ACCENT. U+2191 is the UPWARDS ARROW.
The only ASCII control code that has specified semantice in Unicode is U+0009 HORIZONTAL TAB. Smalltalk uses CR, conventionaly as a line-end character, and this often causes headaches moving between OS's with differing conventions. Unicode provides
U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR
Since Smalltalk treats all white-space equivalently, this is not really a language issue, but I'd thought I'd point out their existance as a steping-stone to treating line-end-convention issues.
Several of the characters used as binary operators have ambiguous semantics in ASCII and are disambiguated in Unicode.
For example:
U+002D is the HYPHEN-MINUS U+2012 is the MINUS SIGN
We also have U+00D7 MULTIPLICATION SIGN not to mention the Zapf dingbat U+2715 MULTIPLICATION X
Once we are using multi-byte characters, we would need to choose an encoding (UTF8 would be a good choice)
Heading off into the blue-plane (or, perhaps an APL-induced halucination) we have, now hundreds of new characters for binary operators.
U+2208 ELEMENT OF (looks sorta like an E -- from set theory) could be implemented as
<U+2208> aCollection ^aCollection includes: self
Or perhaps we could embed Morphs into source-code that would represent mathematical expressions in a much more readable fasion (sort of like the new graphing calculator on the macintosh -- Definately check this out if you have never seen it)
But to take off into the blue, we need to build up speed in the pink; Does anybody know where to get a font that encodes all visible unicode characters? It sure would be handy.
-- Mike Klein
squeak-dev@lists.squeakfoundation.org