At 03:32 PM 10/31/98 -0500, Lex wrote:
"R. A. Harmon" harmonra@webname.com wrote:
[snip]
I don't think there is any reason for "guessLineEndConvention" in the approach I propose and if it guesses wrong (especially on an already anomalous file), CrLfFileStream, seems to produce anomalies that I don't think are caused just by cut-and-paste.
The purpose of this method is to pick a convention for *new* files. The idea being if you create a text file on Windows, it should have CRLF line endings, and if you create a file on Unix, it should have LF line endings. That way you can view Squeak files using other applications on your operating system, without having to convert the files
first.
If you don't do this method during startup, then CrLfFileStream won't notice it is operating on a new platform. It will continue using whatever convention it was using when the image was saved, even if it was saved on a different
platform.
Yes, I agree a default should be set at start up. I think all that is needed is a defaultLineTermString class variable set at start up for the platform it's running on, and a stream instance lineTermString that is set to the default and can be reset to something else manually if you want. This is how Smalltalk Express (SE) does it, if I remember correctly. I assume Squeak can determine at start up what platform it's on, or should be changed to do so, because it's useful in other areas also.
I think the CrLfFileStream approach to append applies only to files, and not streams which my be ports, sockets, or something equally as exotic (means I don't understand it).
Now, there's a second "guess" going on in CrLfFileStream, and that's when a specific file is opened. This guess is to ensure that new data written to the file will have the same convention as the data that's already in the file.
I think this will produce a mixed convention file if it gets a mixed convention file.
If you have a CRLF-delimitted file on Unix, then you should keep writing CRLF endings, and not start appeding lines with LF endings. The point is debatable, I suppose, but that's the purpose of this one.
[snip]
I can see where this would be quite useful, especially for someone working in both UNIX and Windows (dual boot). I propose it be an option that one can select as the default append, new, or both behavior.
I think the convention I propose works in all the following cases:
- Writing a new file, Crs are appropriately transformed to Cr, CrLf, or Lf according to lineTermString.
- Append an exiting file, Crs are appropriately transformed to Cr, CrLf, or Lf according to the lineTermString.
- Reading an exiting file, Cr, CrLf, Lf, or a mixture of these, are appropriately transformed internally to Crs.
On Windows, some applications gracefully handle Cr, CrLf, Lf, or a mixture of these, while others, make it difficult to work with any file that doesn't conform to its line termination convention. A mixture of conventions makes no difference to the later, so I would prefer a Squeak default that produced mixed convention on append. This approach doesn't require exceptional handling for append.
So I sent a patch around a few days ago that did just this.
Yes, I saw it.
Now, I've been using this setup for a week or so now with no troubles. However, I've not messed with any *really* strange files....
I'm not sure why I did. Somebody else said they were getting strange stuff too. Have you tried cut-and-paste operations. I think these might confuse CrLfFileStream. I think they are valid operations that the fix to line termination should handle.
I appreciate that you did CrLfFileStream. I would have probably spent a fair amount of time trying out the same idea, but not doing it as well.
At 03:28 PM 10/31/98 -0800, Michael S. Klein wrote:
I don't think there is any reason for "guessLineEndConvention" in the approach I propose and if it guesses wrong (especially on an already anomalous file), CrLfFileStream, seems to produce anomalies that I don't think are caused just by cut-and-paste.
Sometimes you may want to guess, sometimes you may want a rigid line end policy.
After some reflection, I agree (see above).
The native platform line termination conventions I know of are as follows:
DOS/Windows on x86 CrLf UNIX Lf Mac Cr
Smalltalks use cr. There is also Unicode which has explicitly different line separators and paragraph separators ( U+2028 and U+2029 ).
[snip]
Adding Unicode (same as double-byte characters?) will require overriding some behavior I suspect. The line termination will be one of them. I don't no enough to join in that conversation.
This works sometimes, but there are some of us who actually use ff & vt's placed in text by other people.
My question is what do the Ff and Vt characters in Text instances mean? If it means line termination, I suggest the code that uses them be changed to reflect the convention if adopted.
If they are used as for some other purpose, say optimization because of the sorting order (simple-minded example), I think that is one of those case-by-case exceptions. If the Text instance has Ff put in, something done with the instance, then it's thrown away then that seems reasonable and won't break any code. If the instance is used by other code then this additional Ff behavior instance should probably be in a new subclass of Text (or new wrapper class, etc.), and conversation methods added. External strings terminated with a binary zero are good examples of why one might want a string like object that breaks the convention. SE has conversation methods to add and remove the trailing zero.
I envision the line termination convention not as coercive, but as liberating us a little bit. One doesn't need to code for all the odd cases everywhere. I think this is especially important in a cooperative endeavor like Squeak. That's why I add the last rule:
- You run into something that doesn't follow the convention, send in a fix or at least point it out.
I think we need to backstop each other like baseball players do. One player will get behind another play trying to catch a ball, so if he misses it the second player is there to help. Someone might not know of the convention, or ported code from another platform and didn't get around to changing to convention (it happen to me, anyway). I'd appreciate any back-stopping I could get. I guess I approach the Squeak community as a team rather than a group of "rugged individuals".
This does require reading external text a character at a time, but doesn't seem prohibitively expensive.
First make it work.... then make it fast
[snip] I heartily agree.
As far as line end convention goes, I think the important thing to do is to factor out the handling into a Policy object. Otherwise the streaming code just gets all krufted up with different cases.
If somebody wants a different policy, they add a new class instead of futzing with the convoluted code.
Would this be similar to the External Stream Decorator suggestion I copped from the preview of the book "The Design Patterns Smalltalk Companion"?
Your idea sounds promising, could you explain it a little more concretely for me.
-- Richard A. Harmon "The only good zombie is a dead zombie" harmonra@webname.com E. G. McCarthy
squeak-dev@lists.squeakfoundation.org