I've made a 'quick-n-dirty' fix to the DataStream's behavior. DataStream>>readString method uses PositionableStream>>nextString to do the job. The latter only handles Strings that are shorter than 16384 bytes. Further, it appears that when a String is saved, DataStream saves Strings signature, and only then checks if it is longer than 16K. If so, it saves it as a ByteArray.
I changed PositionableStream>>nextString method, and my program immediately recovered seemingly lost data. The change is as follows: I read a byte that designates length. If it is zero, I peek the next byte, if that one is zero as well, I assume that this String is longer than 16K, thus 4 bytes comprise the length byte. It has not been tested extensively with other applications or Squeak itself (e.g., this could break if you legally had two consecutive zeros).
Please let me know if I am playing with fire.
Bolot
---DO NOT FILE IT IN YET! LET'S HEAR WHAT SQUEAK CENTRAL SAYS---
PositionableStream>>nextString "Read a string from the receiver. The first byte is the length of the string, unless it is greater than 192, in which case the first two bytes encode the length."
| aString char length| length _ self next. "first byte." length >= 192 ifTrue: [length _ (length - 192) * 256 + self next].
" I added these lines " (length = 0 and: [self peek = 0]) ifTrue: [length _ (length bitShift: 24) + (self next bitShift: 16) + (self next bitShift: 8) + self next].
aString _ String new: length. 1 to: length do: [:i | aString at: i put: self next asCharacter]. ^aString
squeak-dev@lists.squeakfoundation.org