Re: [BUG] Unicode in multilingualized Squeak

List overview All Threads
Download

newer

older

[ENH][KCP]...

How to interrup a deep recursion

Daniel Vainsencher

27 Mar 2003 27 Mar '03

6:46 p.m.

Yoshiki,

What is stopping you from running the scripts on a 3.4 system and working on that?

Can you elaborate a little on what the significance of that 3.2 image is? It might be easier to move that content forward than to find and fix the problem in the 3.2 code.

If that is impractical, could you elaborate on what exactly doesn't work? if it causes a walk back, a bug report from the debugger might help someone spot the problem.

Daniel

Yoshiki.Ohshima@acm.org wrote:

...

Daniel,

Thank you for your suggestion. The issue is that the trick from Andreas doesn't work (for me) in 3.2 image that the current m17n effort is based on.

If someone is willing to scan all the changes from 3.2 to 3.4 and resolve the conflict with the m17n scripts, that would be great.

-- Yoshiki

Show replies by date

Yoshiki.Ohshima＠acm.org

27 Mar 27 Mar

7:23 p.m.

New subject: [BUG] Unicode in multilingualized Squeak

Daniel,

...

What is stopping you from running the scripts on a 3.4 system and working on that?

Oh, well. It must be my other jobs like installing PostgreSQL database, learning SQL (duh), and such. Fortunately, I'm allowed to use Squeak for frontend. I haven't tried to use Jim Menard's PostgreSQL interface, but I guess I need to modify it a bit so that it can handle the Japanese characters. (Which is good. If I can do it quick enough, I would have time to work on the m17n stuff.)

...

Can you elaborate a little on what the significance of that 3.2 image is? It might be easier to move that content forward than to find and fix the problem in the 3.2 code.

There is nothing special in 3.2. It was just the official release by the time I started the m17n work.

...

If that is impractical, could you elaborate on what exactly doesn't work? if it causes a walk back, a bug report from the debugger might help someone spot the problem.

I haven't tried this. The problem is that if I see a debugger popped up, I can't help to look into it:-)

And also I would rather prefer to check the conflict. The change sorter has a function to do this, so it would not be hard. It just can be tedious...

-- Yoshiki

Yoshiki.Ohshima＠acm.org

2 Apr 2 Apr

8:09 p.m.

New subject: Squeak multilingualization on SqueakMap

Hello,

I found it awkward to keep saying "I have no time to do this," so I made an SAR package for the multilingualization.

I only test it on vanilla 3.4 image, and some features are not implemented. Also, I anticipated format changes on the .changes file, so I don't know if I can keep going along with this SAR style installer. (Well, it is also true that we can do whatever we want in Squeak, so there will be always a workaround this, though.)

The fixes from Boris are included and the workspace that appears at the end of installation shows the example code from him. Thank you Boris.

As always, any comments and suggestions are welcome,

-- Yoshiki

Boris Gaertner

6 Apr 6 Apr

9:40 p.m.

New subject: Squeak multilingualization on SqueakMap

Yoshiki.Ohshima@acm.org wrote: (on Wednesday, April 02, 2003 8:09 PM)

...

Hello,

I found it awkward to keep saying "I have no time to do this," so I made an SAR package for the multilingualization.

I only test it on vanilla 3.4 image, and some features are not implemented. Also, I anticipated format changes on the .changes file, so I don't know if I can keep going along with this SAR style installer. (Well, it is also true that we can do whatever we want in Squeak, so there will be always a workaround this, though.)

The fixes from Boris are included and the workspace that appears at the end of installation shows the example code from him. Thank you Boris.

As always, any comments and suggestions are welcome,

-- Yoshiki

The move to 3.4 is a very pleasant progress: In Squeak 3.2, the debugger does not work properly in MVC - this is a serious problem for MVC users and it was fixed in 3.3. The SAR installation package works excellent - thank you for making it available.

Now some words about my plans to experiment with and to hopefully contribute to your work:

My short-termed interests are additional fonts and additions to Scamper. At this moment I try to adapt a font editor to your font representations and I think that I will finish this soon.

Fonts: As to the fonts, there are some really good free bdf-fonts available in the internet. It is entirely possible to find all glyphs of the blocks 'CJK Unified Ideographs' and 'Hangul Syllables' in the web. (in one size only, but for the beginning that is sufficient.) At http://www.bgaertner.gmxhome.de/UnicodeResources.htm you find details and code that can be used to load large bdf-fonts into a 3.4 image. I loaded the ClearlyU font and the cmex24m.bdf font into a Squeak 3.4 image. To do that, I used code that splits these large fonts into many StrikeFonts. Glyphs from U+4E00 to U+4EFF are placed into one StrikeFont, glyphs from U+4F00 to U+4FFF into a different StrikeFont and so on. This is not what we really need, but at least I can use my font editor to look at the fonts.

What I want to do next is loading these fonts into multilingualized Squeak. I think that I will need some additional subclasses of class Unicode to do this.

Currently the class Unicode does not have subclasses for these glyph blocks: UnicodeHangulSyllables UnicodeKangXi Radicals UnicodeCJKRadicalsSupplement UnicodeBoPoMoFo UnicodeHangulJamoCompatibility UnicodeCJKUnifiedIdeographs UnicodeCJKUnifiedIdeographsExtensionA UnicodeCJKUnifiedIdeographsExtensionB

The absence of classes for these blocks is not a surprise, because your support for these writings is currently based on encodings like GB2312 and KSX1001.

A few words about the usefulnes of these blocks: CJK Unified Ideographs - obvious! Hangual Syllables - obvious! KangXi Radicals useful for support tools that show CJK ideographs in a "radical + additional strokes" order. All data for this ordering can be found in the Unicode support file UniHan.txt HangulJamoCompatibility useful for support tools that allow the selection of a hangul syllable by its choseong, its jungseong and a jongseong. The algorithm that is needed to do this is described in chapter 3.11 of the Unicode Documentation. BoPoMoFo useful for support tools that allow the selection of an ideograph based on its mandarin pronounciation. Pronounciations can be found in the file UniHan.txt

Now my questions: 1. Will we have subclasses for these Unicode blocks? 2. What leading chars will be assigned to these blocks?

-- Boris

Yoshiki.Ohshima＠acm.org

7 Apr 7 Apr

8:49 a.m.

New subject: Squeak multilingualization on SqueakMap

Boris,

...

The move to 3.4 is a very pleasant progress: In Squeak 3.2, the debugger does not work properly in MVC - this is a serious problem for MVC users and it was fixed in 3.3. The SAR installation package works excellent - thank you for making it available.

You're welcome and thank you for trying!

...

As to the fonts, there are some really good free bdf-fonts available in the internet. It is entirely possible to find all glyphs of the blocks 'CJK Unified Ideographs' and 'Hangul Syllables' in the web. (in one size only, but for the beginning that is sufficient.) At http://www.bgaertner.gmxhome.de/UnicodeResources.htm you find details and code that can be used to load large bdf-fonts into a 3.4 image.

...

I loaded the ClearlyU font and the cmex24m.bdf font into a Squeak 3.4 image. To do that, I used code that splits these large fonts into many StrikeFonts. Glyphs from U+4E00 to U+4EFF are placed into one StrikeFont, glyphs from U+4F00 to U+4FFF into a different StrikeFont and so on. This is not what we really need, but at least I can use my font editor to look at the fonts.

Well. I haven't looked at it carefully, but it sounds like cmex34m is based on Big6 glyphs?

...

Currently the class Unicode does not have subclasses for these glyph blocks: UnicodeHangulSyllables UnicodeKangXi Radicals UnicodeCJKRadicalsSupplement UnicodeBoPoMoFo UnicodeHangulJamoCompatibility UnicodeCJKUnifiedIdeographs UnicodeCJKUnifiedIdeographsExtensionA UnicodeCJKUnifiedIdeographsExtensionB

There are UnicodeJapanese, UnicodeTraditionalChinese, UnicodeKorean, etc. that are what I'm planning to move to. So far, I don't see real need for those blocks you mentioned.

...

The absence of classes for these blocks is not a surprise, because your support for these writings is currently based on encodings like GB2312 and KSX1001.

*Currently* based on those encodings, yes.

...

Now my questions:

Will we have subclasses for these Unicode blocks?

What leading chars will be assigned to these blocks?

The answer is:

To overcome the han unification problem, we'd like to put different leading chars for a Unicode code point for an ideograph character. UnicodeJapanese etc. exist for this purpose. You can add ContinentalChinese support with BIG5 based font, but such glyphs simply are not usable for Japanese.

This explanation may be too terse. If you want to know more about what I'm thinking, please let me know and I'll elaborate.

Thank you again,

-- Yoshiki

7714

Age (days ago)

7725

Last active (days ago)

squeak-dev@lists.squeakfoundation.org

4 comments

3 participants

tags (0)

participants (3)

Boris Gaertner
Daniel Vainsencher
Yoshiki.Ohshima＠acm.org