At 04:23 PM 10/29/98 -0500, Lex wrote:
"R. A. Harmon" harmonra@webname.com wrote:
I think with the following proposed set of conventions that PILT could be effectively achieved and the odd "bite" handled on a case by case basis in some standard way.
[snip]
This sounds basically like what CrLfFileStream does
[snip]
So in my opinion, the only things stopping adoption of CrLfFileStream as a
default are:
- a decision on the file positions issue (I vote for "it's illegal", migrating towards "it's expensive and unadvised")
- automatically choosing output line endings based on what the current platform is. This could be done by putting a "CrLfFileStream >
guessLineEndConvention" in SystemDictionary.processStartupList. [snip]
I don't think there is any reason for "guessLineEndConvention" in the approach I propose and if it guesses wrong (especially on an already anomalous file), CrLfFileStream, seems to produce anomalies that I don't think are caused just by cut-and-paste.
The native platform line termination conventions I know of are as follows:
DOS/Windows on x86 CrLf UNIX Lf Mac Cr
So when reading external text, all interline spacing (carriage return, form feed, line feed, and vertical tab) characters are handled as follows:
Cr - add Cr to internal collection, if followed by Lf then read and ignore Lf. Lf - if proceeded by Cr then ignore Lf else add Cr to internal collection. Ff and VT - ignore.
This does require reading external text a character at a time, but doesn't seem prohibitively expensive.
CrLfFileStream also doesn't deal with the problems like that of runs breaking in Text instances. It seemed to me that a lot of this kind of anomaly exists when just about any of the classes start reading and writing to external devices -- ports, disks, etc.
I didn't really dig into the code to see for certain, so I could be wrong. The more I looked, the more things looked a little shaky in a number of classes.
At 03:15 PM 10/30/98 -0800, Michael S. Klein wrote:
At 03:15 PM 10/30/98 -0800, Hans-Martin Mosner wrote: I'd support the first alternative with the reasoning that file positions for ASCII files should be treated as opaque 'cookies', that is, you can get the file position and set it to get back to a point where you were before, but you should not do arithmetic with them.
[other less pleasurable options deleted]
You have to do something like this, anyway, to support multi-byte characters, so you may as well do lineEndConvention this way, as well.
I've also been thinking about how to handle multi-byte character support. I agree with Hans-Martin Mosner and Michael S. Klein.
-- Richard A. Harmon "The only good zombie is a dead zombie" harmonra@webname.com E. G. McCarthy
I don't think there is any reason for "guessLineEndConvention" in the approach I propose and if it guesses wrong (especially on an already anomalous file), CrLfFileStream, seems to produce anomalies that I don't think are caused just by cut-and-paste.
Sometimes you may want to guess, sometimes you may want a rigid line end policy.
The native platform line termination conventions I know of are as follows:
DOS/Windows on x86 CrLf UNIX Lf Mac Cr
Smalltalks use cr. There is also Unicode which has explicitly different line separators and paragraph separators ( U+2028 and U+2029 ). Personally, I think the whole idea of "Control Characters" is perverse. Line end conventions are just the best-known symptom of this perversity.
So when reading external text, all interline spacing (carriage return, form feed, line feed, and vertical tab) characters are handled as follows:
Cr - add Cr to internal collection, if followed by Lf then read and ignore Lf. Lf - if proceeded by Cr then ignore Lf else add Cr to internal collection. Ff and VT - ignore.
This works sometimes, but there are some of us who actually use ff & vt's placed in text by other people.
This does require reading external text a character at a time, but doesn't seem prohibitively expensive.
First make it work.... then make it fast
CrLfFileStream also doesn't deal with the problems like that of runs breaking in Text instances. It seemed to me that a lot of this kind of anomaly exists when just about any of the classes start reading and writing to external devices -- ports, disks, etc.
Yeah, strings are deceptively easy to externalize.
As far as line end convention goes, I think the important thing to do is to factor out the handling into a Policy object. Otherwise the streaming code just gets all krufted up with different cases.
If somebody wants a different policy, they add a new class instead of futzing with the convoluted code.
-- Mike Klein
squeak-dev@lists.squeakfoundation.org