Hi, I'been working on a script to fix some xml files for a web application, and I'm having some trouble with character encoding. Tt seems there are some characters that squeak does not recognize like "..." -> u2026, u2014, that ms word uses on their text files... Could anyone confirm this?, and maybe provide a workaround... thanks in advance!
On Mon, 29 Aug 2011, Gonzalo Romano wrote:
Hi, I'been working on a script to fix some xml files for a web application, and I'm having some trouble with character encoding. Tt seems there are some characters that squeak does not recognize like "..." -> u2026, u2014, that ms word uses on their text files... Could anyone confirm this?, and maybe provide a workaround... thanks in advance!
Would you like to display those documents in Squeak or just process the files with a program you wrote? In the first case you have to install and use a font, that contains the missing characters (the default font doesn't contain these). In the second case you have to make sure that you're using the right text converter for your document.
Levente
-- Gonzalo, Romano
Hi levente thanks for you answer, the idea was just to process the files, I'm sure the files are in utf8, I've used squeakToUtf8 to convert the string and write the files, but no luck.
am I using the wright text converter? I'm doing some stuff with regexp, and rewriting the file, these characters have no translation to ascii or iso could that be the problem?
2011/8/29 Levente Uzonyi leves@elte.hu:
On Mon, 29 Aug 2011, Gonzalo Romano wrote:
Hi, I'been working on a script to fix some xml files for a web application, and I'm having some trouble with character encoding. Tt seems there are some characters that squeak does not recognize like "..." -> u2026, u2014, that ms word uses on their text files... Could anyone confirm this?, and maybe provide a workaround... thanks in advance!
Would you like to display those documents in Squeak or just process the files with a program you wrote? In the first case you have to install and use a font, that contains the missing characters (the default font doesn't contain these). In the second case you have to make sure that you're using the right text converter for your document.
Levente
-- Gonzalo, Romano
On Tue, 30 Aug 2011, Gonzalo Romano wrote:
Hi levente thanks for you answer, the idea was just to process the files, I'm sure the files are in utf8, I've used squeakToUtf8 to convert the string and write the files, but no luck.
That converter should be fine if your files really have UTF-8 encoding.
am I using the wright text converter? I'm doing some stuff with regexp, and rewriting the file, these characters have no translation to ascii or iso could that be the problem?
Which regular expression library do you use? How are you opening the file you're writing the output into?
Levente
2011/8/29 Levente Uzonyi leves@elte.hu:
On Mon, 29 Aug 2011, Gonzalo Romano wrote:
Hi, I'been working on a script to fix some xml files for a web application, and I'm having some trouble with character encoding. Tt seems there are some characters that squeak does not recognize like "..." -> u2026, u2014, that ms word uses on their text files... Could anyone confirm this?, and maybe provide a workaround... thanks in advance!
Would you like to display those documents in Squeak or just process the files with a program you wrote? In the first case you have to install and use a font, that contains the missing characters (the default font doesn't contain these). In the second case you have to make sure that you're using the right text converter for your document.
Levente
-- Gonzalo, Romano
-- Gonzalo, Romano
I'm using "RePlugin" by andrew Greenberg, and I'm opening the file like this "aFileEntry readWriteStream" where a aFileEntry is a DirectoryEntryFile.
2011/9/1 Levente Uzonyi leves@elte.hu:
On Tue, 30 Aug 2011, Gonzalo Romano wrote:
Hi levente thanks for you answer, the idea was just to process the files, I'm sure the files are in utf8, I've used squeakToUtf8 to convert the string and write the files, but no luck.
That converter should be fine if your files really have UTF-8 encoding.
am I using the wright text converter? I'm doing some stuff with regexp, and rewriting the file, these characters have no translation to ascii or iso could that be the problem?
Which regular expression library do you use? How are you opening the file you're writing the output into?
Levente
2011/8/29 Levente Uzonyi leves@elte.hu:
On Mon, 29 Aug 2011, Gonzalo Romano wrote:
Hi, I'been working on a script to fix some xml files for a web application, and I'm having some trouble with character encoding. Tt seems there are some characters that squeak does not recognize like "..." -> u2026, u2014, that ms word uses on their text files... Could anyone confirm this?, and maybe provide a workaround... thanks in advance!
Would you like to display those documents in Squeak or just process the files with a program you wrote? In the first case you have to install and use a font, that contains the missing characters (the default font doesn't contain these). In the second case you have to make sure that you're using the right text converter for your document.
Levente
-- Gonzalo, Romano
-- Gonzalo, Romano
On Fri, 2 Sep 2011, Gonzalo Romano wrote:
I'm using "RePlugin" by andrew Greenberg, and I'm opening the file like this "aFileEntry readWriteStream" where a aFileEntry is a DirectoryEntryFile.
Okay, I guess RePlugin is responsible for the problem. It was written for pre 3.8 Squeak, so it's pretty likely that it doesn't support WideStrings. PRCE supports UTF-8 encoded strings, so if it's possible to the tell RePlugin that the text is using that encoding, then you should be able to make it work.
Levente
2011/9/1 Levente Uzonyi leves@elte.hu:
On Tue, 30 Aug 2011, Gonzalo Romano wrote:
Hi levente thanks for you answer, the idea was just to process the files, I'm sure the files are in utf8, I've used squeakToUtf8 to convert the string and write the files, but no luck.
That converter should be fine if your files really have UTF-8 encoding.
am I using the wright text converter? I'm doing some stuff with regexp, and rewriting the file, these characters have no translation to ascii or iso could that be the problem?
Which regular expression library do you use? How are you opening the file you're writing the output into?
Levente
2011/8/29 Levente Uzonyi leves@elte.hu:
On Mon, 29 Aug 2011, Gonzalo Romano wrote:
Hi, I'been working on a script to fix some xml files for a web application, and I'm having some trouble with character encoding. Tt seems there are some characters that squeak does not recognize like "..." -> u2026, u2014, that ms word uses on their text files... Could anyone confirm this?, and maybe provide a workaround... thanks in advance!
Would you like to display those documents in Squeak or just process the files with a program you wrote? In the first case you have to install and use a font, that contains the missing characters (the default font doesn't contain these). In the second case you have to make sure that you're using the right text converter for your document.
Levente
-- Gonzalo, Romano
-- Gonzalo, Romano
-- Gonzalo, Romano
squeak-dev@lists.squeakfoundation.org