ASCII code

List overview All Threads
Download

newer

older

TextAttributes

Squeak in HP9000

Alessandro Manunza

24 Jul 1998 24 Jul '98

9:50 a.m.

Hi Squeakers, I have some problems with the ASCII code of Squeak 2.0

Try a 'print it' of this:

| file a|

file _ StandardFileStream newFileNamed:'ascii.txt'. 1 to: 256 do:[ :i| file nextPut:(Character characterTable at: i)]. file close.

file _ StandardFileStream fileNamed: 'ascii.txt'. file reset. a _ file contentsOfEntireFile.

And after this,try to read the 'ascii.txt' file with BlockNotes(Win95).

If you compare the content of BlockNotes to the content of a, you will find a big difference. WHY?

How can I avoid this difference and all the consequent problems?

Thanks for your help

Alessandro

Show replies by date

Andreas Raab

24 Jul 24 Jul

10:54 a.m.

Allessandro,

...

Try a 'print it' of this:

[...]

...

And after this,try to read the 'ascii.txt' file with BlockNotes(Win95).

If you compare the content of BlockNotes to the content of a, you will find a big difference. WHY?

Because of the differences in the character sets used (Note: This is determined by the *look* of the bitmaps describing certain characters). Squeak uses a Mac Roman character set whereas Windows uses an Ascii character set. While the first 127 characters are the same, the mapping of everything starting from 128 is fairly different.

...

How can I avoid this difference and all the consequent problems?

You can use a character translation table. There is one in sqWin32Window.c mapping from Win to Squeak. It looks as follows

{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,100,101,102,103,104,105,106,107,108,109,110,111, 112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127, 173,176,226,196,227,201,160,224,246,228,178,220,206,179,182,183, 184,212,213,210,211,165,208,209,247,170,185,221,207,186,189,217, 202,193,162,163,219,180,195,164,172,169,187,199,194,197,168,248, 161,177,198,215,171,181,166,225,252,218,188,200,222,223,240,192, 203,231,229,204,128,129,174,130,233,131,230,232,237,234,235,236, 245,132,241,238,239,205,133,249,175,244,242,243,134,250,251,167, 136,135,137,139,138,140,190,141,143,142,144,145,147,146,148,149, 253,150,152,151,153,155,154,214,191,157,156,158,159,254,255,216 };

and is used to translate the input characters in the VM. The reverse table (e.g., mapping from Squeak to Win) can be created by

| squeakToWinTable | squeakToWinTable _ ByteArray new: 256. 1 to: 256 do:[:i| squeakToWinTable at: (winToSqueakTable at: i)+1 put: i-1].

Hope this helps. Andreas

-- Linear algebra is your friend - Trigonometry is your enemy. +===== Andreas Raab ============= (raab@isg.cs.uni-magdeburg.de) =====+ I Department of Simulation and Graphics Phone: +49 391 671 8065 I I University of Magdeburg, Germany Fax: +49 391 671 1164 I +=============< http://isgwww.cs.uni-magdeburg.de/~raab >=============+

Hans-Martin Mosner

8:37 p.m.

I'd like to suggest that Squeak switches to Unicode. The Unicode subset with a high byte of zero is equal to the ISO 8859L1 encoding, which is the default for Windows and most X platforms (have the HP weirdos changed their minds about hp-roman8 recently? :-)

Hans-Martin

Mike Klein

25 Jul 25 Jul

2:20 p.m.

On Fri, 24 Jul 1998, Hans-Martin Mosner wrote:

...

I'd like to suggest that Squeak switches to Unicode.

I think that this would be a good idea, as well. There are a couple of Squeak/Smalltalk-specific ramifications I'd like to point out.

In Unicode U+005F is the LOW LINE or SPACING UNDERSCORE rather than the left arrow used as the assignment operator in Squeak. The Unicode standard doesn't even provide a cross-reference to the left arrow for that code point. (Anybody remember the ASR33 teletypes?) The LEFTWARDS ARROW is U+2190.

Although I prefer seeing the LEFTWARDS ARROW over COLON + EQUALS, I prefer COLON + EQUALS over SPACING UNDERSCORE. Viewing code snippets outside of the Squeak environment is often annoying.

There is a similar, but less annoying problem with the return operator. U+005E is the CIRCUMFLEX ACCENT. U+2191 is the UPWARDS ARROW.

The only ASCII control code that has specified semantice in Unicode is U+0009 HORIZONTAL TAB. Smalltalk uses CR, conventionaly as a line-end character, and this often causes headaches moving between OS's with differing conventions. Unicode provides

U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR

Since Smalltalk treats all white-space equivalently, this is not really a language issue, but I'd thought I'd point out their existance as a steping-stone to treating line-end-convention issues.

Several of the characters used as binary operators have ambiguous semantics in ASCII and are disambiguated in Unicode.

For example:

U+002D is the HYPHEN-MINUS U+2012 is the MINUS SIGN

We also have U+00D7 MULTIPLICATION SIGN not to mention the Zapf dingbat U+2715 MULTIPLICATION X

Once we are using multi-byte characters, we would need to choose an encoding (UTF8 would be a good choice)

Heading off into the blue-plane (or, perhaps an APL-induced halucination) we have, now hundreds of new characters for binary operators.

U+2208 ELEMENT OF (looks sorta like an E -- from set theory) could be implemented as

<U+2208> aCollection ^aCollection includes: self

Or perhaps we could embed Morphs into source-code that would represent mathematical expressions in a much more readable fasion (sort of like the new graphing calculator on the macintosh -- Definately check this out if you have never seen it)

But to take off into the blue, we need to build up speed in the pink; Does anybody know where to get a font that encodes all visible unicode characters? It sure would be handy.

-- Mike Klein

9429

Age (days ago)

9430

Last active (days ago)

squeak-dev@lists.squeakfoundation.org

3 comments

4 participants

tags (0)

participants (4)

Alessandro Manunza
Andreas Raab
Hans-Martin Mosner
Mike Klein