On Fri, Jan 18, 2019 at 08:58:07AM -0500, David T. Lewis wrote:
On Fri, Jan 18, 2019 at 01:40:26PM +0100, Sven Van Caekenberghe wrote:
On 18 Jan 2019, at 01:54, David T. Lewis via Pharo-dev pharo-dev@lists.pharo.org wrote:
On Thu, Jan 17, 2019 at 04:57:18PM +0100, Sven Van Caekenberghe wrote:
Right, bytes are always uninterpreted, else they would be something else. We got ByteArray>>#decodedWith: and ByteArray>>#utf8Decoded and our ByteArray inspector decodes automatically if it can.
Hi Sven,
I am the author of the getenv primitives, and I am also sadly uninformed about matters of character sets and strings in a multilingual environment.
The primitives answer environment variable variable values as ByteString rather than ByteArray. This made sense to me at the time that I wrote it, because ByteString is easy to display in an inspector, and because it is easily converted to ByteArray.
For an American English speaker this seems like a good choice, but I wonder now if it is a bad decision. After all, it is also trivially easy to convert a ByteArray to ByteString for display in the image.
Would it be helpful to have getenv primitives that answer ByteArray instead, and to let all conversion (including in OSProcess) be done in the image?
Thanks, Dave
Normally, the correct way to represent uninterpreted bytes is with a ByteArray. Decoding these bytes as characters is the specific task of a character encoder/decoder, with a deliberate choice as to which to use.
Since the getenv() system call uses simple C strings, it is understandable that this was carried over. It is probably not worth or too risky to change that - as long as the receiver understands that it is a raw OS string that needs more work.
Like with file path encoding/decoding, environment variable encoding/decoding is plain messy and complex. IMHO it is better to manage that at the image level where we are more agile and can better handle that complexity.
Thanks Sven, that makes perfect sense to me.
I added some new primitives to OSProcessPlugin that answer ByteArray instead of ByteString.
For Unix (Linux, OS X): <primitive: 'primitiveGetCurrentWorkingDirectoryAsBytes' module: 'UnixOSProcessPlugin'> <primitive: 'primitiveArgumentAtAsBytes' module: 'UnixOSProcessPlugin'> <primitive: 'primitiveEnvironmentAtAsBytes' module: 'UnixOSProcessPlugin'> <primitive: 'primitiveEnvironmentAtSymbolAsBytes' module: 'UnixOSProcessPlugin'> <primitive: 'primitiveRealpathAsBytes' module: 'UnixOSProcessPlugin'>
For Windows: <primitive: 'primitiveGetCurrentWorkingDirectoryAsBytes' module: 'Win32OSProcessPlugin'> <primitive: 'primitiveGetEnvironmentStringsAsBytes' module: 'Win32OSProcessPlugin'>
These should be in the latest VM builds now.
If you are using OSProcess, update it to the latest version to get accessor methods for the new primitives. For example, OSProcess accessor primGetCurrentWorkingDirectory calls the original primitive that answers a ByteString, and to get raw bytes you can use OSProcess accessor primGetCurrentWorkingDirectoryAsBytes instead.
Dave
vm-dev@lists.squeakfoundation.org