Dear Richard, Dear All;
I've just read parts of the ANSI Smalltalk Standard, and now I have the impression, that indeed there are gaps...
What does it say?
<standard>
5.7.5.5 Message: removeAll: oldElements
Synopsis For each element in oldElements, remove the first element from the receiver which is equivalent to this element. Definition: <extensibleCollection> This message is used to remove each element of a given collection from the receiver's elements. The operation is defined to be equivalent to removing each element of oldElements from the receiver using the #remove: message with the element as the parameter. The behavior is undefined if any element of oldElements is not found. Parameters oldElements <collection> uncaptured Return Values UNSPECIFIED Errors: none
</standard>
UNSPECIFIED return values is clear, but what means Parameters oldElements <collection> uncaptured ? I try to make it short, the longer version is below...
<standard>
Essential aliasing of parameters is described using a parameter aliasing attribute:
*captured* The receiver always retains a reference to the parameter, directly or indirectly, as a result of this message. *uncaptured* The receiver never retains a reference to the parameter, directly or indirectly, as a result of this message. *unspecified* It is unspecified as to whether or not a reference is retained as a result of this message i.e. either case may occur.
</standard>
Since we have an 'uncaptured' parameter, the receiver *has* *to* never retain a reference to the parameter, directly or indirectly, as a result of this message.
Further: The parameter *is* the receiver in the case of a removeAll: a . Has this receiver a direct reference to itself? I don't think so. But the *coupling* between receiver and parameter here is much stronger - identity! - than a mere reference. And the intent of the requirement here - as *I* read the standard - has been to have *no* coupling between parameter and receiver here.
My conclusions: - I think this supports Allen Wirfs-Brock's view, that this special case has been overlooked. - The intent of the standard seems to be to *forbid* a removeAll: a . - At least I think this is a reasonable possible interpretation.
Greetings,
Stephan
Therefrom I have snipped (after a first rough snipping):
+--------------------------------------------+ | NCITS J20 DRAFT of ANSI Smalltalk Standard | | December, 1997 revision 1.9 | +--------------------------------------------+
<draft> <snipped>
5.1.2.2 Parameter Specification
A parameter specification is defined by a parameter name, a parameter interface definition, and a parameter aliasing attribute.
A parameter specification places constraints on the parameter in terms of protocol conformance, and provides information concerning how the parameter is used by implementations of the message. The parameter name is the name of a formal parameter and is used to identify the parameter with a parameter specification, and to refer to the parameter in textual descriptions.
A parameter interface definition is defined as either: - A single protocol name <P>. - A logical OR of two or more protocols, written as <P1> | <P2> | ... | <Pn>
The parameter interface definition identifies the behavioral assumptions the message makes concerning the parameter. A client must supply an appropriate actual parameter. An OR of protocols means that the parameter must conform to at least one of the protocols in the disjunction. This is required to describe cases where a message accepts objects with diverse behavior and tests their behavior by sending messages in order to determine the action to be taken. Note that this is different from the case where a message accepts objects with diverse behavior, but only makes use of common shared behavior. In the latter case, the message is not really dealing with diverse cases of behavior.
When a message specifies that a given formal parameter must conform to a protocol <P>, it is making a commitment to use only behavior which is defined in <P> in the message implementation. In this sense, the conformance statement is a maximal behavioral requirement-at most all of the behavior described by <P> will be used, and no more.
Aliasing information (for example, whether a parameter is stored, or whether a returned value is new or returned state) is specified to avoid having implementors use defensive programming techniques which result in unnecessary object creation and copying, incurring a performance penalty. We differentiate between incidental aliasing and essential aliasing, both for parameters and for return values. Essential aliasing forms a critical part of the behavior of the interface, and as such it must be specified by the interface designer. Incidental aliasing should not be specified since it is a side effect of implementation choices, and is not fundamental to the specified functionality of the interface.
Essential aliasing of parameters is described using a parameter aliasing attribute:
*captured* The receiver always retains a reference to the parameter, directly or indirectly, as a result of this message.
*uncaptured* The receiver never retains a reference to the parameter, directly or indirectly, as a result of this message.
*unspecified* It is unspecified as to whether or not a reference is retained as a result of this message i.e. either case may occur.
5.1.2.3 Return value specification
A return value specification is defined by a return value protocol and a return value aliasing attribute. Whereas the parameter description is prescriptive in that it states requirements to which the parameters must conform, the return value information is descriptive in that it provides information about the result being returned. Whereas a protocol makes a conformance requirement statement about parameters, it makes a conformance commitment concerning the return value. The specification guarantees that the return value will conform to the specified protocol.
A message specification may have multiple distinct return value specifications. Conversely, a single return value specification may describe multiple return values if the return value specification applies to all such values. Multiple return value specifications are required for cases where a message is defined to return objects conforming to different protocols, on a case-specific basis. These are conveniently described with separate conformance statements and aliasing annotations. In order to establish correspondence between sets of return value specifications, we do not permit two distinct return value specifications which promise conformance to the same protocol.
If a message specification has no return value specification (that is, the return value is not specified), then it is not prepared to guarantee anything about the behavior of the returned object. In this case we denote the return value as UNSPECIFIED. This can be used to separate procedural messages from functional messages; to allow for inconsequential differences in implementations; or to allow conforming implementations which return different results but are otherwise operationally equivalent.
In order to relate return values through conformance, we define the return value interface definition for a message specification to be the single return value protocol, or the logical OR of the protocols in each distinct return value specification.
Information concerning retained references to return values (by the message receiver) is described using a return value aliasing attribute, which is one of the following identifiers:
*state* The receiver retains a reference (direct or indirect) to the returned object after the method returns i.e. the object is returned state.
*new* The object is newly created in the method invocation and no reference (direct or indirect) is retained by the receiver after the method returns.
*unspecified* No information is provided as to the origin or retained references to the object (Note this is different from saying that the return value itself is UNSPECIFIED. Here we are committing that the return value conforms to some protocol, but making no commitment about the aliasing behavior).
Note that we do not attempt to describe the aliasing of the state variables of the return value itself-the attribute applies only to the first level returned object. The implication is that second and subsequent level aliasing of the return value is always unspecified. An exception occurs in the case where the returned state is an object which the client originally gave the service provider for safekeeping. This occurs with element retrieval in collections, for example. In such cases only the client knows the implications of modifying second level state of the return value.
<snipped>
5.7.5.2 Message: addAll: newElements
Synopsis Add each element of newElements to the receiver's elements. Definition: <extensibleCollection> This message adds each element of newElements to the receiver.
The operation is equivalent to adding each element of newElements to the receiver using the #add: message with the element as the parameter. The newElements are traversed in the order specified by the #do: message for newElements. Parameters newElements <collection> unspecified Return Values UNSPECIFIED Errors none
<snipped>
5.7.5.5 Message: removeAll: oldElements
Synopsis For each element in oldElements, remove the first element from the receiver which is equivalent to this element. Definition: <extensibleCollection> This message is used to remove each element of a given collection from the receiver's elements. The operation is defined to be equivalent to removing each element of oldElements from the receiver using the #remove: message with the element as the parameter. The behavior is undefined if any element of oldElements is not found. Parameters oldElements <collection> uncaptured Return Values UNSPECIFIED Errors: none
<snipped> </draft>
Stephan Rudlof wrote:
[...] Since we have an 'uncaptured' parameter, the receiver *has* *to* never retain a reference to the parameter, directly or indirectly, as a result of this message.
Further: The parameter *is* the receiver in the case of a removeAll: a . Has this receiver a direct reference to itself? I don't think so. But the *coupling* between receiver and parameter here is much stronger - identity! - than a mere reference. And the intent of the requirement here - as *I* read the standard - has been to have *no* coupling between parameter and receiver here.
This interpretation is clearly unreasonable. I can see how one might reasonably interpret the identity coupling as a sort of indirect reference (though I suspect that what is meant instead is a reference chain), but you're ignoring the very important clause, "as a result of this message." This coupling is not a result of the message at all. Furthermore, consider the following:
a := OrderedCollection with: 1. b := OrderedCollection with: 1 with: a. b removeAll: a.
I would certainly hope that any reasonable programmer would consider this perfectly sensible, and should remove 1 from b, leaving it with the single element a. This is a direct reference that your reading of the standard would require to be illegal. Fortunately, since it is clear that this reference is not a result of sending #removeAll:, it is, in fact, permitted by the standard.
However, I'm not sure how much weight to put on the fact that the standard doesn't say anything about the argument to #removeAll: being invariant during the call. On a quick browse, I didn't see any explicit caveats about the receiver of a #do: being modified by the argument block, either. Richard, would you say that the standard requires #do: to work properly when the receiver is modified during the iteration, as well?
-Jesse
Jesse Welton wrote:
Stephan Rudlof wrote:
[...] Since we have an 'uncaptured' parameter, the receiver *has* *to* never retain a reference to the parameter, directly or indirectly, as a result of this message.
Further: The parameter *is* the receiver in the case of a removeAll: a . Has this receiver a direct reference to itself? I don't think so. But the *coupling* between receiver and parameter here is much stronger - identity! - than a mere reference. And the intent of the requirement here - as *I* read the standard - has been to have *no* coupling between parameter and receiver here.
This interpretation is clearly unreasonable. I can see how one might reasonably interpret the identity coupling as a sort of indirect reference (though I suspect that what is meant instead is a reference chain),
but you're ignoring the very important clause, "as a result of this message." This coupling is not a result of the message at all.
This is true: I've overlooked that.
Furthermore, consider the following:
a := OrderedCollection with: 1. b := OrderedCollection with: 1 with: a. b removeAll: a.
I would certainly hope that any reasonable programmer would consider this perfectly sensible, and should remove 1 from b, leaving it with the single element a. This is a direct reference that your reading of the standard would require to be illegal. Fortunately, since it is clear that this reference is not a result of sending #removeAll:, it is, in fact, permitted by the standard.
Good example.
Thank you for the correction.
Greetings,
Stephan
<snipped>
Jesse Welton wrote: <snipped>
However, I'm not sure how much weight to put on the fact that the standard doesn't say anything about the argument to #removeAll: being invariant during the call. On a quick browse, I didn't see any explicit caveats about the receiver of a #do: being modified by the argument block, either.
Richard, would you say that the standard requires #do: to work properly when the receiver is modified during the iteration, as well?
What should be the semantics then? E.g.:
| a num | num _ 1. a := OrderedCollection with: num. a do: [:e | a size < 100 ifTrue: [a add: (num _ num+1)]]. a
Should this give 100 elements in a (iterating over added elements, too)? Or 2 (copy semantics)? Like with
| a num | num _ 1. a := OrderedCollection with: num. a copy do: [:e | a size < 100 ifTrue: [a add: (num _ num+1)]]. a
BTW: it gives 9...
Greetings,
Stephan
From the standard:
NCITS J20 DRAFT of ANSI Smalltalk Standard December, 1997 revision 1.9
5.7.1.13 Message: do: operation Synopsis Evaluate operation with each element of the receiver. Definition: <collection> For each element of the receiver, operation is evaluated with the element as the parameter. Unless specifically refined, the elements are not traversed in a particular order. Each element is visited exactly once. Conformant protocols may refine this message to specify a particular ordering. Parameters operation <monadicValuable> uncaptured Return Values UNSPECIFIED Errors If the elements of the receiver are inappropriate for use as arguments to operation.
-Jesse
I wrote:
Jesse Welton wrote:
<snipped>
However, I'm not sure how much weight to put on the fact that the standard doesn't say anything about the argument to #removeAll: being invariant during the call. On a quick browse, I didn't see any explicit caveats about the receiver of a #do: being modified by the argument block, either.
Richard, would you say that the standard requires #do: to work properly when the receiver is modified during the iteration, as well?
What should be the semantics then? E.g.:
| a num | num _ 1. a := OrderedCollection with: num. a do: [:e | a size < 100 ifTrue: [a add: (num _ num+1)]]. a
Should this give 100 elements in a (iterating over added elements, too)? Or 2 (copy semantics)? Like with
| a num | num _ 1. a := OrderedCollection with: num. a copy do: [:e | a size < 100 ifTrue: [a add: (num _ num+1)]]. a
BTW: it gives 9...
This is one of the reasons, why I don't like to allow things like a removeAll: a . It is clear, that there has to be some iteration over the argument *while* changing it. Allowing this would give the impression that the #do: above could/should also work (resulting in 100 elements). If this isn't enough: try implementing the functionality of the #do: above for Sets...
As a general rule: Don't change a Collection while iterating over it.
Greetings,
Stephan
Greetings,
Stephan
From the standard:
NCITS J20 DRAFT of ANSI Smalltalk Standard December, 1997 revision 1.9
5.7.1.13 Message: do: operation Synopsis Evaluate operation with each element of the receiver. Definition: <collection> For each element of the receiver, operation is evaluated with the element as the parameter. Unless specifically refined, the elements are not traversed in a particular order. Each element is visited exactly once. Conformant protocols may refine this message to specify a particular ordering. Parameters operation <monadicValuable> uncaptured Return Values UNSPECIFIED Errors If the elements of the receiver are inappropriate for use as arguments to operation.
-Jesse
I frankly find all the spec-lawyering quite disheartening. (This coming from a patent lawyer.) But as long as folks seem to be overwhelming compelled to viewing this as a black-or-white answer-or-no-answer issue, let me add, at least, a few more shades of gray to these discussions.
Much lip-service has been given to "accepting" my prior proposition that an iterand Collection should never be modified without expecting poor results. "Yes, yes of course," others would say. And then go on to argue, "but the parameter to removeAll: is not an iterand -- indeed, there is not even an implied enumeration -- the #removeAll is in the 'remove' protocol, not the 'enumeration' protocol. There is no iteration in #removeAll:, express or implied, except perhaps because of a few naive implementations." (Paraphrases of course, but I think a fair summary of this sub-line of argument)
But the ANSI standard does not, in fact, admit such a dodge. Indeed, it DEFINES removeAll: as "the equivalent" to an iteration over the elements of the parameter, as a call of #remove for each such element. Specifically, it says that, "[t]he operation is defined to be equivalent to removing each element of oldElements from the receiver." In this sense, I stand by my construction of the standard, to wit:
1) An enumeration over an iterand that changes during the enumeration is undefined. 2) If element each is in a receiver, r, then "r remove: x" changes r. 3) The ANSI standard section 5.7.5.5 explicitly defines "removeAll: oldElements" as an iteration over the elements of oldElements, sending for each such element, say 'each', the message #remove: each. 4) At least for non-empty collections, "x removeAll: x" is undefined.
While one can quibble over the truth of proposition 3, one must accept, at least, that it is a fair argument, at least in the gray region. (I, for one, find it compelling, but reasonable people, apparently, may differ.) Accordingly, it seems to me that the argument that "failure" of "x removeAll: x" is clearly and in black-and-white a bug is, at best, overstated.
While I am sensible to some of the discussion on both sides of this argument, I find it simply indefensible to suggest that it is in any sense clear and black-and-white how, if at all, the ANSI standard actually defines the result in the class of cases where such an iteration results in changes to oldCollection, and particularly the special case where the reciver and parameter are identical.
On Friday, August 30, 2002, at 10:46 AM, Stephan Rudlof wrote:
<standard>
5.7.5.5 Message: removeAll: oldElements
Synopsis For each element in oldElements, remove the first element from the receiver which is equivalent to this element. Definition: <extensibleCollection> This message is used to remove each element of a given collection from the receiver's elements. The operation is defined to be equivalent to removing each element of oldElements from the receiver using the #remove: message with the element as the parameter. The behavior is undefined if any element of oldElements is not found. Parameters oldElements <collection> uncaptured Return Values UNSPECIFIED Errors: none
</standard>
squeak-dev@lists.squeakfoundation.org