Whilst testing out Seaside in the 6.1 trunk image we discovered that LargePositiveInteger needs an #asByteArray method to enable the entity tag method to provide all that nice formatting stuff.
For the specific case of LPI it's trivial "self as: ByteArray" works fine and sure, maybe it could be optimised.
However, there is actual usage of #asByteArray in the main image that appears to be unsatisfied. For example, Magnitude>>#putOn: for binary streams. It appears no subclass of Magnitude implements #asByteArray, which seems unfortunate.#putOn: is used a fair bit in writing to streams. There's also some clashing with potentially important behaviour where collections of (very)smallintegers can be converted to ByteArrays so long as all the numeric values are <255. If we were to make an #asByteArray for SmallInteger that produced up to 8 bytes, how might that affect some of these usages? And what is a good byte array representation of a LargeNegativeInteger? And should every conversion be reversible?
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: EF: Emulate Fireworks
It does not generally make sense for a number, or any other kind of object, to know how to represent itself as a byte array.
If you know in advance that your object is e.g. a LargePositiveInteger and if you know that you want to serialize that object into a sequence of bytes, and if you know that each byte should be 8 bits in size, and if you know the endianness of the target byte order representation, and if you know that the desired numeric representation is e.g. twos complement (as opposed to say ones complement) integer, and you know that the register size of the numeric representation is 8 bytes (or 4 bytes, or 2 bytes, or whatever) then you might reasonably want to offer a convenience method to make the happen, Large positive integers are the simple case for which most people will know what to expect from #asByteArray, so offering this as a convenience method makes sense. For other kinds of numbers (or objects) it does not make sense and should not be implemented.
Dave
On 2023-10-30 19:01, Tim Rowledge wrote:
Whilst testing out Seaside in the 6.1 trunk image we discovered that LargePositiveInteger needs an #asByteArray method to enable the entity tag method to provide all that nice formatting stuff.
For the specific case of LPI it's trivial "self as: ByteArray" works fine and sure, maybe it could be optimised.
However, there is actual usage of #asByteArray in the main image that appears to be unsatisfied. For example, Magnitude>>#putOn: for binary streams. It appears no subclass of Magnitude implements #asByteArray, which seems unfortunate.#putOn: is used a fair bit in writing to streams. There's also some clashing with potentially important behaviour where collections of (very)smallintegers can be converted to ByteArrays so long as all the numeric values are <255. If we were to make an #asByteArray for SmallInteger that produced up to 8 bytes, how might that affect some of these usages? And what is a good byte array representation of a LargeNegativeInteger? And should every conversion be reversible?
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: EF: Emulate Fireworks
An interesting thought. Kinda suggests we should get rid of the usages like #putOn:
On 2023-10-31, at 7:43 PM, lewis@mail.msen.com wrote:
It does not generally make sense for a number, or any other kind of object, to know how to represent itself as a byte array.
I see your point about the range of plausible options.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: NOP: Randomize the PSW and then branch
Hmm... not sure that #asByteArray makes sense without specifying encoding rules such as little-vs-big-endian, signed-vs-unsigned, max-num-bytes, etc ... hmm....
Best, Marcel Am 01.11.2023 04:33:31 schrieb Tim Rowledge tim@rowledge.org: An interesting thought. Kinda suggests we should get rid of the usages like #putOn:
On 2023-10-31, at 7:43 PM, lewis@mail.msen.com wrote:
It does not generally make sense for a number, or any other kind of object, to know how to represent itself as a byte array.
I see your point about the range of plausible options.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: NOP: Randomize the PSW and then branch
IMO, it absolutely makes sense to not require specifying that in cases where the other side is Squeak. Particularly when working with a set of cryptographic primitives that support String, ByteArray or Integer scalars interchangeably, not having to worry about the internal byte order of ByteArray's, other than it be consistent with the other side, which a default would cover, is nice.
Therefore, IMO there's no reason to force the cognitive burden of byte order by excluding #asByteArray, but instead provide #asByteArray: aBoolean, which #asByteArray can call with a default choice.
On Fri, Nov 3, 2023 at 5:36 AM Marcel Taeumel via Squeak-dev < squeak-dev@lists.squeakfoundation.org> wrote:
Hmm... not sure that #asByteArray makes sense without specifying encoding rules such as little-vs-big-endian, signed-vs-unsigned, max-num-bytes, etc ... hmm....
Best, Marcel
Am 01.11.2023 04:33:31 schrieb Tim Rowledge tim@rowledge.org: An interesting thought. Kinda suggests we should get rid of the usages like #putOn:
On 2023-10-31, at 7:43 PM, lewis@mail.msen.com wrote:
It does not generally make sense for a number, or any other kind of
object, to know how to represent itself as a byte array.
I see your point about the range of plausible options.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: NOP: Randomize the PSW and then branch
So, maybe all variable-byte subclasses then?
Just wondering that, when working with integers, this would work for LPI but not for small integers...
How would we implement #asByteArray in SmallInteger? Like this?
asByteArray | result ws | result := ByteArray new: (ws := Smalltalk wordSize). ws = 4 ifTrue: [result longAt: 1 put: self bigEndian: Smalltalk endianness == #big] ifFalse: [result long64At: 1 put: self bigEndian: Smalltalk endianness == #big]. ^ result
Best, Marcel
Am 11.11.2023 00:15:55 schrieb Chris Muller asqueaker@gmail.com:
IMO, it absolutely makes sense to not require specifying that in cases where the other side is Squeak. Particularly when working with a set of cryptographic primitives that support String, ByteArray or Integer scalars interchangeably, not having to worry about the internal byte order of ByteArray's, other than it be consistent with the other side, which a default would cover, is nice.
Therefore, IMO there's no reason to force the cognitive burden of byte order by excluding #asByteArray, but instead provide #asByteArray: aBoolean, which #asByteArray can call with a default choice.
On Fri, Nov 3, 2023 at 5:36 AM Marcel Taeumel via Squeak-dev <squeak-dev@lists.squeakfoundation.orgmailto:squeak-dev@lists.squeakfoundation.org> wrote: Hmm... not sure that #asByteArray makes sense without specifying encoding rules such as little-vs-big-endian, signed-vs-unsigned, max-num-bytes, etc ... hmm....
Best, Marcel
Am 01.11.2023 04:33:31 schrieb Tim Rowledge <tim@rowledge.orgmailto:tim@rowledge.org>:
An interesting thought. Kinda suggests we should get rid of the usages like #putOn:
On 2023-10-31, at 7:43 PM, lewis@mail.msen.commailto:lewis@mail.msen.com wrote:
It does not generally make sense for a number, or any other kind of object, to know how to represent itself as a byte array.
I see your point about the range of plausible options.
tim -- tim Rowledge; tim@rowledge.orgmailto:tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: NOP: Randomize the PSW and then branch
1 asByteArray #[1 0 0 0 0 0 0 0] 1152921504606846975 asByteArray #[255 255 255 255 255 255 255 15] 1152921504606846976 asByteArray #[0 0 0 0 0 0 0 16] "Large Positive Int"
Am 13.11.2023 13:33:31 schrieb Marcel marcel.taeumel@hpi.uni-potsdam.de:
So, maybe all variable-byte subclasses then?
Just wondering that, when working with integers, this would work for LPI but not for small integers...
How would we implement #asByteArray in SmallInteger? Like this?
asByteArray | result ws | result := ByteArray new: (ws := Smalltalk wordSize). ws = 4 ifTrue: [result longAt: 1 put: self bigEndian: Smalltalk endianness == #big] ifFalse: [result long64At: 1 put: self bigEndian: Smalltalk endianness == #big]. ^ result
Best, Marcel
Am 11.11.2023 00:15:55 schrieb Chris Muller asqueaker@gmail.com:
IMO, it absolutely makes sense to not require specifying that in cases where the other side is Squeak. Particularly when working with a set of cryptographic primitives that support String, ByteArray or Integer scalars interchangeably, not having to worry about the internal byte order of ByteArray's, other than it be consistent with the other side, which a default would cover, is nice.
Therefore, IMO there's no reason to force the cognitive burden of byte order by excluding #asByteArray, but instead provide #asByteArray: aBoolean, which #asByteArray can call with a default choice.
On Fri, Nov 3, 2023 at 5:36 AM Marcel Taeumel via Squeak-dev <squeak-dev@lists.squeakfoundation.orgmailto:squeak-dev@lists.squeakfoundation.org> wrote: Hmm... not sure that #asByteArray makes sense without specifying encoding rules such as little-vs-big-endian, signed-vs-unsigned, max-num-bytes, etc ... hmm....
Best, Marcel
Am 01.11.2023 04:33:31 schrieb Tim Rowledge <tim@rowledge.orgmailto:tim@rowledge.org>:
An interesting thought. Kinda suggests we should get rid of the usages like #putOn:
On 2023-10-31, at 7:43 PM, lewis@mail.msen.commailto:lewis@mail.msen.com wrote:
It does not generally make sense for a number, or any other kind of object, to know how to represent itself as a byte array.
I see your point about the range of plausible options.
tim -- tim Rowledge; tim@rowledge.orgmailto:tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: NOP: Randomize the PSW and then branch
A Number should not respond to #asByteArray.
There are lots a of possible binary encodings for numbers, which means that there is no such thing as a correct answer to the question of how to represent a number as an array of bytes.
Dave
On 2023-11-13 12:53, Taeumel, Marcel via Squeak-dev wrote:
1 asByteArray #[1 0 0 0 0 0 0 0] 1152921504606846975 asByteArray #[255 255 255 255 255 255 255 15] 1152921504606846976 asByteArray #[0 0 0 0 0 0 0 16] "Large Positive Int"
Am 13.11.2023 13:33:31 schrieb Marcel marcel.taeumel@hpi.uni-potsdam.de:
So, maybe all variable-byte subclasses then?
Just wondering that, when working with integers, this would work for LPI but not for small integers...
How would we implement #asByteArray in SmallInteger? Like this?
asByteArray | result ws | result := ByteArray new: (ws := Smalltalk wordSize). ws = 4 ifTrue: [result longAt: 1 put: self bigEndian: Smalltalk endianness == #big] ifFalse: [result long64At: 1 put: self bigEndian: Smalltalk endianness == #big]. ^ result
Best, Marcel
Am 11.11.2023 00:15:55 schrieb Chris Muller asqueaker@gmail.com:
IMO, it absolutely makes sense to not require specifying that in cases where the other side is Squeak. Particularly when working with a set of cryptographic primitives that support String, ByteArray or Integer scalars interchangeably, not having to worry about the internal byte order of ByteArray's, other than it be consistent with the other side, which a default would cover, is nice.
Therefore, IMO there's no reason to force the cognitive burden of byte order by excluding #asByteArray, but instead provide #asByteArray: aBoolean, which #asByteArray can call with a default choice.
On Fri, Nov 3, 2023 at 5:36 AM Marcel Taeumel via Squeak-dev squeak-dev@lists.squeakfoundation.org wrote:
Hmm... not sure that #asByteArray makes sense without specifying encoding rules such as little-vs-big-endian, signed-vs-unsigned, max-num-bytes, etc ... hmm....
Best, Marcel
Am 01.11.2023 04:33:31 schrieb Tim Rowledge tim@rowledge.org:
An interesting thought. Kinda suggests we should get rid of the usages like #putOn:
On 2023-10-31, at 7:43 PM, lewis@mail.msen.com wrote:
It does not generally make sense for a number, or any other kind of object, to know how to represent itself as a byte array.
I see your point about the range of plausible options.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: NOP: Randomize the PSW and then branch
On 2024-02-11 19:54, Eliot Miranda wrote:
Hi Dave,
On Nov 13, 2023, at 9:33 AM, lewis@mail.msen.com wrote:
A Number should not respond to #asByteArray.
Agreed. Ignore my previous reply which was specific to the integers. More accurately we could say "In general, a Number should not respond to #asByteArray. But for some kinds of number, a byte array representation may make sense."
Well put, and I agree.
I don't want to ignore your previous reply with respect to integers. It makes good sense when explained that way. I do think that the general notion of Integer>>asByteArray is a bit questionable, but a good method comment might go a long way toward clearing that up :-)
Dave
If any #asByteArray is adopted into the trunk, I hope it'll be the one that's been in the Cryptography package since 2003 (method fileout attached) and respects Network Byte Order (e.g., big endian). Unlike the #as: implementation, it works equally with SmallIntegers, too.
On Sun, Feb 11, 2024 at 2:36 PM lewis@mail.msen.com wrote:
On 2024-02-11 19:54, Eliot Miranda wrote:
Hi Dave,
On Nov 13, 2023, at 9:33 AM, lewis@mail.msen.com wrote:
A Number should not respond to #asByteArray.
Agreed. Ignore my previous reply which was specific to the integers. More accurately we could say "In general, a Number should not respond to #asByteArray. But for some kinds of number, a byte array representation may make sense."
Well put, and I agree.
I don't want to ignore your previous reply with respect to integers. It makes good sense when explained that way. I do think that the general notion of Integer>>asByteArray is a bit questionable, but a good method comment might go a long way toward clearing that up :-)
Dave
On 2023-11-13, at 4:53 AM, Taeumel, Marcel via Squeak-dev squeak-dev@lists.squeakfoundation.org wrote:
1152921504606846976 asByteArray
That example is hilariously confusing at first glance. 16! How can it be 16! Oh no it's all screwed up! And then you remember that the 16 is the *highest* byte...
I'd suggest that this is one of those cases where a too-general name got chosen, then used in some places where 'other code' needed a ByteArray (consider the conversion of bitmaps to bytearrays to allow using the compression code), and then mis-used in other places where it isn't really appropriate.
The case that triggered this thread is in Seaside where a hash of a string is used as a key into some javascript thing. - entityTagFor: aStringOrByteArray | hash base64 | hash := GRPlatform current secureHashFor: aStringOrByteArray. "etags have to be delimited by double quotes" base64 := GRPlatform current base64Encode: hash asByteArray. ^ String new: base64 size + 2 streamContents: [ :stream | stream nextPut: $"; nextPutAll: base64; nextPut: $"]
I can't help thinking it might be more helpful to have a method to directly convert a hash value into net-transportable base64 format.
And then there are methods like CanvasEncoder>>#drawString:from:to:in:font:color: where an argument is converted to a String, copied from (so we know the result is a String), then if it is a WideString converted to a byte array and then to a String again. I suspect it is because StringSocket doesn't have a way to handle WideStrings in #addToOutBuf: and so on. So any user has to do the converse dance just in case.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Do you like me for my brain or my baud?
Tim, is this something that could be handled in the Grease package for Seaside? I am not up to speed on Seaside things, but I'm guessing that the notion of LPI>>asByteArray might have been one of those Pharo "enhancements".
I like your idea of having a method to directly convert a hash value into net-transportable base64 format, that seems much better than trying to do a generic asByteArray for integers.
Dave
On 2023-11-13 18:24, Tim Rowledge wrote:
On 2023-11-13, at 4:53 AM, Taeumel, Marcel via Squeak-dev squeak-dev@lists.squeakfoundation.org wrote:
1152921504606846976 asByteArray
That example is hilariously confusing at first glance. 16! How can it be 16! Oh no it's all screwed up! And then you remember that the 16 is the *highest* byte...
I'd suggest that this is one of those cases where a too-general name got chosen, then used in some places where 'other code' needed a ByteArray (consider the conversion of bitmaps to bytearrays to allow using the compression code), and then mis-used in other places where it isn't really appropriate.
The case that triggered this thread is in Seaside where a hash of a string is used as a key into some javascript thing. - entityTagFor: aStringOrByteArray | hash base64 | hash := GRPlatform current secureHashFor: aStringOrByteArray. "etags have to be delimited by double quotes" base64 := GRPlatform current base64Encode: hash asByteArray. ^ String new: base64 size + 2 streamContents: [ :stream | stream nextPut: $"; nextPutAll: base64; nextPut: $"]
I can't help thinking it might be more helpful to have a method to directly convert a hash value into net-transportable base64 format.
And then there are methods like CanvasEncoder>>#drawString:from:to:in:font:color: where an argument is converted to a String, copied from (so we know the result is a String), then if it is a WideString converted to a byte array and then to a String again. I suspect it is because StringSocket doesn't have a way to handle WideStrings in #addToOutBuf: and so on. So any user has to do the converse dance just in case.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Do you like me for my brain or my baud?
Hi all,
I'm a bit late on the party but support the idea of having more conversion methods between different immediate classes and raw bits array classes. Some use cases that I have been having recently are:
* convert a SecureHashAlgorithm to a ByteArray (which is faster to compare, hash, and file out/in the LargePositiveInteger and consumes less space on disk (as we only need 20 bytes))
For 99% of hashes, we can just send #asByteArray to the result of #hashMessage:, but the hash is a SmallInteger by chance, this will not work. This is confusing, IMHO all integers should implement #asByteArray. By the way, Pharo 5+ and Squot also implement this on Integer.
* decode a base64 string into a Float32Array
Currently I use this:
(Base64MimeConverter mimeDecodeToBytes: 'AAAAANsPSUDzBLU/AACAvw==' readStream) contents changeClassTo: Float32Array
But I wonder whether this also works in big endian images (my expectation would be that it answers an equal FloatArray)? Is #changeClassTo: the most efficient we have for this, i.e., will the VM copy all the data or just flip a few bits or so?
One could also add further decode variants to the converters (e.g., I wrote a Base64MimeConverter>>#mimeDecodeToWordArray), but that feels like redundant work if we can efficiently convert ByteArrays to WordArrays etc.
By the way, I also dislike the duplication between #mimeDecode and #mimeDecodeToByteArray where the only difference is "nextPut: byte" vs "nextPut: byte asCharacter". Would it be reasonable to have #veryBasicNext and #veryBasicNextPut: on Stream and subclasses that map to #basicAt: and #basicAt:put: of the underlying collection? I tested it out; it works for me.
Regarding the naming, I only want to have this functionality available in the trunk. We can also rename it to #basicAsByteArray or whatever if we want to avoid confusion with #asArray & Co.
Best, Christoph
--- Sent from Squeak Inbox Talk
On 2023-11-13T18:39:06+00:00, lewis@mail.msen.com wrote:
Tim, is this something that could be handled in the Grease package for Seaside? I am not up to speed on Seaside things, but I'm guessing that the notion of LPI>>asByteArray might have been one of those Pharo "enhancements".
I like your idea of having a method to directly convert a hash value into net-transportable base64 format, that seems much better than trying to do a generic asByteArray for integers.
Dave
On 2023-11-13 18:24, Tim Rowledge wrote:
On 2023-11-13, at 4:53 AM, Taeumel, Marcel via Squeak-dev <squeak-dev(a)lists.squeakfoundation.org> wrote:
1152921504606846976 asByteArray
That example is hilariously confusing at first glance. 16! How can it be 16! Oh no it's all screwed up! And then you remember that the 16 is the *highest* byte...
I'd suggest that this is one of those cases where a too-general name got chosen, then used in some places where 'other code' needed a ByteArray (consider the conversion of bitmaps to bytearrays to allow using the compression code), and then mis-used in other places where it isn't really appropriate.
The case that triggered this thread is in Seaside where a hash of a string is used as a key into some javascript thing. - entityTagFor: aStringOrByteArray | hash base64 | hash := GRPlatform current secureHashFor: aStringOrByteArray. "etags have to be delimited by double quotes" base64 := GRPlatform current base64Encode: hash asByteArray. ^ String new: base64 size + 2 streamContents: [ :stream | stream nextPut: $"; nextPutAll: base64; nextPut: $"]
I can't help thinking it might be more helpful to have a method to directly convert a hash value into net-transportable base64 format.
And then there are methods like CanvasEncoder>>#drawString:from:to:in:font:color: where an argument is converted to a String, copied from (so we know the result is a String), then if it is a WideString converted to a byte array and then to a String again. I suspect it is because StringSocket doesn't have a way to handle WideStrings in #addToOutBuf: and so on. So any user has to do the converse dance just in case.
tim
tim Rowledge; tim(a)rowledge.org; http://www.rowledge.org/tim Do you like me for my brain or my baud?
Hi Christoph,
Le mer. 7 févr. 2024 à 17:12, christoph.thiede@student.hpi.uni-potsdam.de a écrit :
Hi all,
I'm a bit late on the party but support the idea of having more conversion methods between different immediate classes and raw bits array classes. Some use cases that I have been having recently are:
- convert a SecureHashAlgorithm to a ByteArray (which is faster to compare, hash, and file out/in the LargePositiveInteger and consumes less space on disk (as we only need 20 bytes))
For 99% of hashes, we can just send #asByteArray to the result of #hashMessage:, but the hash is a SmallInteger by chance, this will not work. This is confusing, IMHO all integers should implement #asByteArray. By the way, Pharo 5+ and Squot also implement this on Integer.
Note that LargePositive/NegativeInteger convert the magnitude as ByteArray. I presume we should do the same for negative SmallInteger... But the fact that two different (Small)Integer convert to the same ByteArray is a smell, isn't it?
- decode a base64 string into a Float32Array
Currently I use this:
(Base64MimeConverter mimeDecodeToBytes: 'AAAAANsPSUDzBLU/AACAvw==' readStream) contents changeClassTo: Float32Array
But I wonder whether this also works in big endian images (my expectation would be that it answers an equal FloatArray)? Is #changeClassTo: the most efficient we have for this, i.e., will the VM copy all the data or just flip a few bits or so?
The VM won't do any flip of bytes (except at image startup if we restore on a platform with different endianness - untested, because we currently have no such platform). Because all arrays in the image are presumed to share the same endianness (that of the host VM).
You must consider that a ByteArray is an uninterpreted sequence of bytes. The order (endianness) is in the eye of the beholder.
We have a convenience, ArrayedCollection>>restoreEndianness that would do the job for Word (convert from BigEndian to image endianness - we only have littleEndian VM right now). But not for double-byte nor double-word (I don't think that it should because DataStream did store element-wise). Pfff...
I use BinaryStream from http://squeaksource.com/STEM.html for changing endianness element-wise at read/write time. You make me think that I should use some BitBlt trick for Bulk Byte Swapping when we read/write an array!
I suggest that we implement SwapEndianness in RawBitsArray.
I have also played with raw bits copying for bulk data transfer in some binary format reading/writing (HDF5, National Instrument TDMS). But it's more complex because more capable (extract a sub-region of a multidimensional array). Experiments are in package BulkDataTransfer. Probably overkill.
Nicolas
One could also add further decode variants to the converters (e.g., I wrote a Base64MimeConverter>>#mimeDecodeToWordArray), but that feels like redundant work if we can efficiently convert ByteArrays to WordArrays etc.
By the way, I also dislike the duplication between #mimeDecode and #mimeDecodeToByteArray where the only difference is "nextPut: byte" vs "nextPut: byte asCharacter". Would it be reasonable to have #veryBasicNext and #veryBasicNextPut: on Stream and subclasses that map to #basicAt: and #basicAt:put: of the underlying collection? I tested it out; it works for me.
Regarding the naming, I only want to have this functionality available in the trunk. We can also rename it to #basicAsByteArray or whatever if we want to avoid confusion with #asArray & Co.
Best, Christoph
Sent from Squeak Inbox Talk
On 2023-11-13T18:39:06+00:00, lewis@mail.msen.com wrote:
Tim, is this something that could be handled in the Grease package for Seaside? I am not up to speed on Seaside things, but I'm guessing that the notion of LPI>>asByteArray might have been one of those Pharo "enhancements".
I like your idea of having a method to directly convert a hash value into net-transportable base64 format, that seems much better than trying to do a generic asByteArray for integers.
Dave
On 2023-11-13 18:24, Tim Rowledge wrote:
On 2023-11-13, at 4:53 AM, Taeumel, Marcel via Squeak-dev <squeak-dev(a)lists.squeakfoundation.org> wrote:
1152921504606846976 asByteArray
That example is hilariously confusing at first glance. 16! How can it be 16! Oh no it's all screwed up! And then you remember that the 16 is the *highest* byte...
I'd suggest that this is one of those cases where a too-general name got chosen, then used in some places where 'other code' needed a ByteArray (consider the conversion of bitmaps to bytearrays to allow using the compression code), and then mis-used in other places where it isn't really appropriate.
The case that triggered this thread is in Seaside where a hash of a string is used as a key into some javascript thing. - entityTagFor: aStringOrByteArray | hash base64 | hash := GRPlatform current secureHashFor: aStringOrByteArray. "etags have to be delimited by double quotes" base64 := GRPlatform current base64Encode: hash asByteArray. ^ String new: base64 size + 2 streamContents: [ :stream | stream nextPut: $"; nextPutAll: base64; nextPut: $"]
I can't help thinking it might be more helpful to have a method to directly convert a hash value into net-transportable base64 format.
And then there are methods like CanvasEncoder>>#drawString:from:to:in:font:color: where an argument is converted to a String, copied from (so we know the result is a String), then if it is a WideString converted to a byte array and then to a String again. I suspect it is because StringSocket doesn't have a way to handle WideStrings in #addToOutBuf: and so on. So any user has to do the converse dance just in case.
tim
tim Rowledge; tim(a)rowledge.org; http://www.rowledge.org/tim Do you like me for my brain or my baud?
This discussion started a while back (last October!) when I was having a problem with Seaside entity tag creation. I thought we pretty much concluded that #asByteArray wasn't a very good method name that had become misused in way too many ways.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- All foam, no beer.
Am Mi., 7. Feb. 2024 um 17:12 Uhr schrieb christoph.thiede@student.hpi.uni-potsdam.de:
IMHO all integers should implement #asByteArray. By the way, Pharo 5+ and Squot also implement this on Integer.
To be more precise, the extension comes from FileSystem-Git and I had to copy the method from Pharo to get this to work without more changes. It is used when writing the Git repository files. Actually I cannot tell from the name of the message how many bytes it will produce for a small number. It will produce three bytes for 65536 and I wonder when that is the right thing to do...
By the way, I also dislike the duplication between #mimeDecode and #mimeDecodeToByteArray where the only difference is "nextPut: byte" vs "nextPut: byte asCharacter". Would it be reasonable to have #veryBasicNext and #veryBasicNextPut: on Stream and subclasses that map to #basicAt: and #basicAt:put: of the underlying collection? I tested it out; it works for me.
veryBasicNext sounds like a leaky abstraction to me. Isn't there any Stream implementation that will use #basicAt: on #next and #basicAtPut: on #nextPut:?
Kind regards, Jakob
Hi Nicolas, Tim, Jakob,
Note that LargePositive/NegativeInteger convert the magnitude as ByteArray. I presume we should do the same for negative SmallInteger... But the fact that two different (Small)Integer convert to the same ByteArray is a smell, isn't it?
Could you give me an example?
The VM won't do any flip of bytes (except at image startup if we restore on a platform with different endianness - untested, because we currently have no such platform).
Sorry, I should have expressed my question more clearly: will ByteArray adoptInstance: #(1 2 3 100 255 256) asWordArray perform any malloc/memcpy of all the data or will the VM just change the class membership of the existing object? Asking as a VM noob. :-)
I suggest that we implement SwapEndianness in RawBitsArray.
Sounds useful!
You make me think that I should use some BitBlt trick for Bulk Byte Swapping when we read/write an array!
It's funny to see how we have been choosing the same route as mainstream HPC: First implementing a dedicated graphics module (i.e., GPU/BitBlt) and making it fast, than (ab)using it for non-graphic operations that benefit from the same kind of optimizations (gzip compression, sound processing, etc.). :D But at least this approach properly modularizes a certain kind of computing-intensive operations, so when we adapt the BitBlt plugin to new GPU generations, gzip will become faster as well. :-)
This discussion started a while back (last October!) when I was having a problem with Seaside entity tag creation. I thought we pretty much concluded that #asByteArray wasn't a very good method name that had become misused in way too many ways.
As mentioned before, I do not insist on the name #asByteArray, but I would like to have the general functionality under whatever name in the trunk.
veryBasicNext sounds like a leaky abstraction to me. Isn't there any Stream implementation that will use #basicAt: on #next and #basicAtPut: on #nextPut:?
Apparently not! Canweaddthatplease? :-)
Best, Christoph
--- Sent from Squeak Inbox Talk
On 2024-02-07T19:45:16+01:00, jakres+squeak@gmail.com wrote:
Am Mi., 7. Feb. 2024 um 17:12 Uhr schrieb <christoph.thiede(a)student.hpi.uni-potsdam.de>:
IMHO all integers should implement #asByteArray. By the way, Pharo 5+ and Squot also implement this on Integer.
To be more precise, the extension comes from FileSystem-Git and I had to copy the method from Pharo to get this to work without more changes. It is used when writing the Git repository files. Actually I cannot tell from the name of the message how many bytes it will produce for a small number. It will produce three bytes for 65536 and I wonder when that is the right thing to do...
By the way, I also dislike the duplication between #mimeDecode and #mimeDecodeToByteArray where the only difference is "nextPut: byte" vs "nextPut: byte asCharacter". Would it be reasonable to have #veryBasicNext and #veryBasicNextPut: on Stream and subclasses that map to #basicAt: and #basicAt:put: of the underlying collection? I tested it out; it works for me.
veryBasicNext sounds like a leaky abstraction to me. Isn't there any Stream implementation that will use #basicAt: on #next and #basicAtPut: on #nextPut:?
Kind regards, Jakob
On Fri, Feb 9, 2024 at 11:27 AM christoph.thiede@student.hpi.uni-potsdam.de wrote:
Hi Nicolas, Tim, Jakob,
Note that LargePositive/NegativeInteger convert the magnitude as
ByteArray.
I presume we should do the same for negative SmallInteger... But the fact that two different (Small)Integer convert to the same ByteArray is a smell, isn't it?
Could you give me an example?
1 and -1
Arguably negative numbers should throw an error.
The VM won't do any flip of bytes (except at image startup if we restore
on a platform with different endianness - untested, because we currently have no such platform).
Sorry, I should have expressed my question more clearly: will ByteArray adoptInstance: #(1 2 3 100 255 256) asWordArray perform any malloc/memcpy of all the data or will the VM just change the class membership of the existing object? Asking as a VM noob. :-)
It will only rewrite the object header, no reallocation. That's what makes it much more efficient than using #become:, but it only works if the format is compatible and the storage size identical.
Vanessa
On 2024-02-09 19:01, christoph.thiede@student.hpi.uni-potsdam.de wrote:
This discussion started a while back (last October!) when I was having a problem with Seaside entity tag creation. I thought we pretty much concluded that #asByteArray wasn't a very good method name that had become misused in way too many ways.
As mentioned before, I do not insist on the name #asByteArray, but I would like to have the general functionality under whatever name in the trunk.
-1
Regarding "the general functionality":
As discussed before (I think), there are many ways in which you might want to convert an integer into a sequence of bytes, and there is no such thing as a single correct way to do it. Maybe you want 32 bit twos complement little endian, or maybe you want 48 bit ones complement in network byte order, or maybe you want EBCDIC decimal bytes. You can implement any of these things and a dozen others, but none of them should be named #asByteArray.
Regarding "under whatever name in the trunk":
Just don't. It is very easy to put something into trunk, and it is extremely difficult to get rid of it a few years later. So if there is no real use case and no clear understanding of the functionality, it is best not to add it.
There are other ways to handle this. In FileSystem-Git, Integer>>asByteArray is an extension method, and the method comment explains that it is "copied from Pharo 5". This may not be the preferred method name, but the reason for it is explained and it is kept separate from Squeak trunk. This means that if the method causes problems later on, the reader will know why it is there and maybe what to do about it.
I'm not sure what Tim did to handle the original Seaside issue, but there is a compatibility package called "Grease" for Seaside that is designed to handle dialect-specific differences like this, so that might be a good place to handle "convert an integer to a byte array for Seaside". Maybe it would be the same thing as the Pharo 5 method, or maybe it is something else, but either way these seem like things that are best handled by the external packages (Seaside or FileSystem-Git or whatever).
Dave
Hi Dave,
Regarding "the general functionality":
As discussed before (I think), there are many ways in which you might want to convert an integer into a sequence of bytes, and there is no such thing as a single correct way to do it. Maybe you want 32 bit twos complement little endian, or maybe you want 48 bit ones complement in network byte order, or maybe you want EBCDIC decimal bytes.
I appreciate this point, however, it's also true that often it isn't the particular internal representation itself that the developer cares about, only that the producing function is compatible with the consuming function.
You can implement any of these things and a dozen others, but none of them should be named #asByteArray.
Which is why being forced to always have to "look up" the other side's otherwise irrelevant byte order, just to be able to write this side correctly, feels like an unnecessary burden in those situations.
I certainly agree that all the API specifying the byte order should exist and be the primary API, but I feel that Christoph's request kinda shows us that there is a desire for a universal, friendly, #asByteArray and #fromByteArray: pair(s) that "just work" together for the cases when only compatibility matters, and not the particular byte order, and is therefore worthy of consideration.
A seeming irony about it is, not defining it creates a sort of feedback that supports the critique that there might be different, incompatible definitions, whereas, defining it would actually allow that argument to simply evaporate...
Best, Chris
On Fri, Feb 9, 2024 at 9:41 PM Chris Muller asqueaker@gmail.com wrote:
I feel that Christoph's request kinda shows us that there is a desire for a universal, friendly, #asByteArray and #fromByteArray: pair(s) that "just work" together for the cases when only compatibility matters, and not the particular byte order, and is therefore worthy of consideration.
Wouldn't that imply that #asByteArray and #fromByteArray: would have to be the inverse of each other? Currently that's not possible because, as stated before, at least for large integers #asByteArray currently answers the same array for different numbers.
Vanessa
On Fri, Feb 9, 2024 at 11:55 PM Vanessa Freudenberg vanessa@codefrau.net wrote:
On Fri, Feb 9, 2024 at 9:41 PM Chris Muller asqueaker@gmail.com wrote:
I feel that Christoph's request kinda shows us that there is a desire for a universal, friendly, #asByteArray and #fromByteArray: pair(s) that "just work" together for the cases when only compatibility matters, and not the particular byte order, and is therefore worthy of consideration.
Wouldn't that imply that #asByteArray and #fromByteArray: would have to be the inverse of each other? Currently that's not possible because, as stated before, at least for large integers #asByteArray currently answers the same array for different numbers.
ByteArray's only support element values between 0 and 255. Attempting to convert -1 to a ByteArray rightly produces an error, as I thought you mentioned before.
The version copied from Pharo 5 simply turns -1 into #[1]. So if yours raises an error Chris, the different variants that are around already have different behavior.
For what it's worth, FileSystem-Git just seems to need it in two methods involved with Git pack file writing. One of them seems to have no senders... The other one uses asByteArray to collect the bytes of all the sha1 hashes. Hmm, this may have a bug if one hash starts with '00' and thus the underlying LargePositiveInteger only has 19 bytes. That might even be the explanation of bug #1 https://github.com/hpi-swa/Squot/issues/1 :-P
Am Sa., 10. Feb. 2024 um 16:58 Uhr schrieb Chris Muller asqueaker@gmail.com:
On Fri, Feb 9, 2024 at 11:55 PM Vanessa Freudenberg vanessa@codefrau.net wrote:
On Fri, Feb 9, 2024 at 9:41 PM Chris Muller asqueaker@gmail.com wrote:
I feel that Christoph's request kinda shows us that there is a desire for a universal, friendly, #asByteArray and #fromByteArray: pair(s) that "just work" together for the cases when only compatibility matters, and not the particular byte order, and is therefore worthy of consideration.
Wouldn't that imply that #asByteArray and #fromByteArray: would have to be the inverse of each other? Currently that's not possible because, as stated before, at least for large integers #asByteArray currently answers the same array for different numbers.
ByteArray's only support element values between 0 and 255. Attempting to convert -1 to a ByteArray rightly produces an error, as I thought you mentioned before.
The version copied from Pharo 5 simply turns -1 into #[1].
So, in Pharo, I should expect:
-1 asByteArray asInteger = -1 "false"
I mean, how is that not a bug?
So if yours raises an error Chris, the different variants that are around already have different behavior.
True, but is that an issue? If an external framework brings in its own #asByteArray as a normal override, it should work with no changes.
I just wonder whether it's better to continue to let the new variants proliferate, or establish a baseline implementation.
Best, Chris
For what it's worth, FileSystem-Git just seems to need it in two methods involved with Git pack file writing. One of them seems to have no senders... The other one uses asByteArray to collect the bytes of all the sha1 hashes. Hmm, this may have a bug if one hash starts with '00' and thus the underlying LargePositiveInteger only has 19 bytes. That might even be the explanation of bug #1 https://github.com/hpi-swa/Squot/issues/1 :-P
Am Sa., 10. Feb. 2024 um 16:58 Uhr schrieb Chris Muller < asqueaker@gmail.com>:
On Fri, Feb 9, 2024 at 11:55 PM Vanessa Freudenberg <
vanessa@codefrau.net> wrote:
On Fri, Feb 9, 2024 at 9:41 PM Chris Muller asqueaker@gmail.com
wrote:
I feel that Christoph's request kinda shows us that there is a desire
for a universal, friendly, #asByteArray and #fromByteArray: pair(s) that "just work" together for the cases when only compatibility matters, and not the particular byte order, and is therefore worthy of consideration.
Wouldn't that imply that #asByteArray and #fromByteArray: would have to
be the inverse of each other? Currently that's not possible because, as stated before, at least for large integers #asByteArray currently answers the same array for different numbers.
ByteArray's only support element values between 0 and 255. Attempting
to convert -1 to a ByteArray rightly produces an error, as I thought you mentioned before.
On Sat, Feb 10, 2024 at 07:58 Chris Muller asqueaker@gmail.com wrote:
On Fri, Feb 9, 2024 at 11:55 PM Vanessa Freudenberg vanessa@codefrau.net wrote:
On Fri, Feb 9, 2024 at 9:41 PM Chris Muller asqueaker@gmail.com wrote:
I feel that Christoph's request kinda shows us that there is a desire for a universal, friendly, #asByteArray and #fromByteArray: pair(s) that "just work" together for the cases when only compatibility matters, and not the particular byte order, and is therefore worthy of consideration.
Wouldn't that imply that #asByteArray and #fromByteArray: would have to be the inverse of each other? Currently that's not possible because, as stated before, at least for large integers #asByteArray currently answers the same array for different numbers.
ByteArray's only support element values between 0 and 255. Attempting to convert -1 to a ByteArray rightly produces an error, as I thought you mentioned before.
I suggested it should raise an error, which would make the behavior better defined. Right now e. g. -1e100 and 1e100 produce the same byte array.
I guess we could start deprecating it for negative integers, then again, I don’t really know the use case.
Vanessa
On 2024-02-10 17:46, Vanessa Freudenberg wrote:
On Sat, Feb 10, 2024 at 07:58 Chris Muller asqueaker@gmail.com wrote:
On Fri, Feb 9, 2024 at 11:55 PM Vanessa Freudenberg vanessa@codefrau.net wrote:
On Fri, Feb 9, 2024 at 9:41 PM Chris Muller asqueaker@gmail.com wrote:
I feel that Christoph's request kinda shows us that there is a desire for a universal, friendly, #asByteArray and #fromByteArray: pair(s) that "just work" together for the cases when only compatibility matters, and not the particular byte order, and is therefore worthy of consideration.
Wouldn't that imply that #asByteArray and #fromByteArray: would have to be the inverse of each other? Currently that's not possible because, as stated before, at least for large integers #asByteArray currently answers the same array for different numbers.
ByteArray's only support element values between 0 and 255. Attempting to convert -1 to a ByteArray rightly produces an error, as I thought you mentioned before.
I suggested it should raise an error, which would make the behavior better defined. Right now e. g. -1e100 and 1e100 produce the same byte array.
I guess we could start deprecating it for negative integers, then again, I don't really know the use case.
Vanessa
If I were to try to reverse engineer the design intent of the Integer>>asByteArray as copied from Pharo 5 and used in FileSystem-Git, I might guess that it was:
"The internal implementation of LargePositiveInteger and LargeNegativeInteger is similar to a ones-complement integer representation, with the magnitude represented as indexable bytes, and the sign bit implicit in the class. The magnitude is thus implemented as bytes and can naturally be converted to a ByteArray. Therefore it makes sense to think of #asByteArray as the conversion of the magnitude bytes to a ByteArray, independent of sign. If we then want SmallInteger to behave the same way, so that all integers respond similarly to #asByteArray, we should pretend that all integers are implemented with magnitude as an array of bytes and sign implemented in some other way. Therefore, Integer>>asByteArray should answer a byte array representation of the absolute value of the integer."
Hopefully it is obvious that I am not advocating this, just trying to explain how such an oddity might have come into being. There is a reason that ones-complement encoding fell out of favor for the binary representation of integers.
Dave
Am So., 11. Feb. 2024 um 21:07 Uhr schrieb Eliot Miranda eliot.miranda@gmail.com:
On Feb 10, 2024, at 7:38 PM, lewis@mail.msen.com wrote:
There is a reason that ones-complement encoding fell out of favor for the binary representation of integers.
But it /hasn’t/ fallen out of favour, it just isn’t used that much, because few systems provide arbitrary precision integers. In systems that do, then it’s a reasonable choice, isn’t it? Smalltalk’s implementations are existence proofs. Isn’t the case that the GNU multiple precision package uses one’s complement internally for negative integers?
If it is any indication for falling out of favor, C23 no longer allows signed integer representations other than two's complement. Of course that doesn't affect what you said about Smalltalk's large integers.
Hi Christoph,
On Feb 7, 2024, at 8:12 AM, christoph.thiede@student.hpi.uni-potsdam.de wrote:
Hi all,
I'm a bit late on the party but support the idea of having more conversion methods between different immediate classes and raw bits array classes. Some use cases that I have been having recently are:
- convert a SecureHashAlgorithm to a ByteArray (which is faster to compare, hash, and file out/in the LargePositiveInteger and consumes less space on disk (as we only need 20 bytes))
For 99% of hashes, we can just send #asByteArray to the result of #hashMessage:, but the hash is a SmallInteger by chance, this will not work. This is confusing, IMHO all integers should implement #asByteArray. By the way, Pharo 5+ and Squot also implement this on Integer.
- decode a base64 string into a Float32Array
Currently I use this:
(Base64MimeConverter mimeDecodeToBytes: 'AAAAANsPSUDzBLU/AACAvw==' readStream) contents changeClassTo: Float32Array
But I wonder whether this also works in big endian images (my expectation would be that it answers an equal FloatArray)? Is #changeClassTo: the most efficient we have for this, i.e., will the VM copy all the data or just flip a few bits or so?
For the time being this is a hypothetical, because Spur has never been implemented on a big endian machine. Little endian has won (thank goodness; at lower levels big endian is a PITA). In writing Spur I tried to flag everywhere where endianness matters. Look for “self flag:#endianness” in the sources.
It would be a great project to try and realise Spur on a big endian machine but I don’t see any interesting candidates currently. Do you?
One could also add further decode variants to the converters (e.g., I wrote a Base64MimeConverter>>#mimeDecodeToWordArray), but that feels like redundant work if we can efficiently convert ByteArrays to WordArrays etc.
By the way, I also dislike the duplication between #mimeDecode and #mimeDecodeToByteArray where the only difference is "nextPut: byte" vs "nextPut: byte asCharacter". Would it be reasonable to have #veryBasicNext and #veryBasicNextPut: on Stream and subclasses that map to #basicAt: and #basicAt:put: of the underlying collection? I tested it out; it works for me.
Regarding the naming, I only want to have this functionality available in the trunk. We can also rename it to #basicAsByteArray or whatever if we want to avoid confusion with #asArray & Co.
Best, Christoph
Eliot _,,,^..^,,,_ (phone)
Sent from Squeak Inbox Talk
On 2023-11-13T18:39:06+00:00, lewis@mail.msen.com wrote:
Tim, is this something that could be handled in the Grease package for Seaside? I am not up to speed on Seaside things, but I'm guessing that the notion of LPI>>asByteArray might have been one of those Pharo "enhancements".
I like your idea of having a method to directly convert a hash value into net-transportable base64 format, that seems much better than trying to do a generic asByteArray for integers.
Dave
On 2023-11-13 18:24, Tim Rowledge wrote:
On 2023-11-13, at 4:53 AM, Taeumel, Marcel via Squeak-dev <squeak-dev(a)lists.squeakfoundation.org> wrote:
1152921504606846976 asByteArray
That example is hilariously confusing at first glance. 16! How can it be 16! Oh no it's all screwed up! And then you remember that the 16 is the *highest* byte...
I'd suggest that this is one of those cases where a too-general name got chosen, then used in some places where 'other code' needed a ByteArray (consider the conversion of bitmaps to bytearrays to allow using the compression code), and then mis-used in other places where it isn't really appropriate.
The case that triggered this thread is in Seaside where a hash of a string is used as a key into some javascript thing. - entityTagFor: aStringOrByteArray | hash base64 | hash := GRPlatform current secureHashFor: aStringOrByteArray. "etags have to be delimited by double quotes" base64 := GRPlatform current base64Encode: hash asByteArray. ^ String new: base64 size + 2 streamContents: [ :stream | stream nextPut: $"; nextPutAll: base64; nextPut: $"]
I can't help thinking it might be more helpful to have a method to directly convert a hash value into net-transportable base64 format.
And then there are methods like CanvasEncoder>>#drawString:from:to:in:font:color: where an argument is converted to a String, copied from (so we know the result is a String), then if it is a WideString converted to a byte array and then to a String again. I suspect it is because StringSocket doesn't have a way to handle WideStrings in #addToOutBuf: and so on. So any user has to do the converse dance just in case.
tim
tim Rowledge; tim(a)rowledge.org; http://www.rowledge.org/tim Do you like me for my brain or my baud?
Am So., 11. Feb. 2024 um 20:59 Uhr schrieb Eliot Miranda eliot.miranda@gmail.com:
It would be a great project to try and realise Spur on a big endian machine but I don’t see any interesting candidates currently. Do you?
Define interesting. :-) The ones I know that are also used in the business world are the Power machines of IBM, running either AIX or Linux.
On 2024-02-11, at 12:08 PM, Jakob Reschke jakres+squeak@gmail.com wrote:
Am So., 11. Feb. 2024 um 20:59 Uhr schrieb Eliot Miranda eliot.miranda@gmail.com:
It would be a great project to try and realise Spur on a big endian machine but I don’t see any interesting candidates currently. Do you?
Define interesting. :-) The ones I know that are also used in the business world are the Power machines of IBM, running either AIX or Linux.
Apparently IBM is committed to transitioning to little-endian (for power, probably no way to change mainframe short of rebooting the entire continuum of space time and a probability. So, not before the Dr gets involved) -
https://developer.ibm.com/articles/l-power-little-endian-faq-trs/
But the hardware is bi. And in fact, so is ARM (post v3 or 4, I forget). Imagine the 'fun' one could have implementing the bitblt code to use that fact, given the our bitmaps are pretty much the only thing I can think of where big-endian still causes trouble.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- Only playing with 51 cards.
Am So., 11. Feb. 2024 um 21:17 Uhr schrieb Tim Rowledge tim@rowledge.org:
On 2024-02-11, at 12:08 PM, Jakob Reschke jakres+squeak@gmail.com wrote:
Am So., 11. Feb. 2024 um 20:59 Uhr schrieb Eliot Miranda eliot.miranda@gmail.com:
It would be a great project to try and realise Spur on a big endian machine but I don’t see any interesting candidates currently. Do you?
Define interesting. :-) The ones I know that are also used in the business world are the Power machines of IBM, running either AIX or Linux.
Apparently IBM is committed to transitioning to little-endian (for power, probably no way to change mainframe short of rebooting the entire continuum of space time and a probability. So, not before the Dr gets involved) -
https://developer.ibm.com/articles/l-power-little-endian-faq-trs/
But the hardware is bi.
Ok, then only AIX with respect to big endian.
https://www.ibm.com/support/pages/just-faqs-about-little-endian
Does this transition affect application ecosystems for AIX or IBM i? No, there will be no effect on AIX or IBM i application environments as a result of this change.
squeak-dev@lists.squeakfoundation.org