Hi All,
does anyone know how Windows maps Unicode text to the CF_UNICODE format used in the clipboard? It seems to me that CF_UNICODE might simply be two-byte characters, excluding any codes beyond 16rFFFF. Is it in fact UTF-16?
If it is UTF-16 has anyone fixed our UTF16TextConverter? I don't see any of the conveniences that exist for the UTF8TextConverter such as decodeString: etc. _,,,^..^,,,_ best, Eliot
On 18. Nov 2022, at 22:52, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi All,
does anyone know how Windows maps Unicode text to the CF_UNICODE format used in the clipboard? It seems to me that CF_UNICODE might simply be two-byte characters, excluding any codes beyond 16rFFFF. Is it in fact UTF-16?
Windows being windows, this ought to be UTF-16. When MS adopted Unicode in the 90s, it was still "small" enough for 16Bit, and was, in fact, UCS2. It got "upgraded" to UTF-16 around Windows 2000.
See: https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows
NOTE: UTF-16 has a lot of fun with "surrogate pairs", which makes it possible to have the whole UCS4-spectrum of code points. This is a lot messy, and surrogate pairs are invalid UTF-8, go figure.
Sidenode: This is the reason, why https://simonsapin.github.io/wtf-8/ exists.
Best regards -Tobias
If it is UTF-16 has anyone fixed our UTF16TextConverter? I don't see any of the conveniences that exist for the UTF8TextConverter such as decodeString: etc. _,,,^..^,,,_ best, Eliot
squeak-dev@lists.squeakfoundation.org