#withoutDuplicates on Collection?

List overview All Threads
Download

newer

older

The Trunk:...

Squeak 6.0:...

christoph.thiede＠student.hpi.uni-potsdam.de

10 Jun 2023 10 Jun '23

5:06 p.m.

Hi all,

I was just wondering why we only define #withoutDuplicates on SequenceableCollection, since it does not depend on the order of the receiver and could be applied to other types of collections as well. For instance, Set could override it with ^self copy, Bags would be naturally covered as well, etc.

What do you think?

Best, Christoph

--- Sent from Squeak Inbox Talk

Show replies by date

Marcel Taeumel

12 Jun 12 Jun

9:02 a.m.

Hi --

I think that it is very important to not mess up the order in the receiver when removing duplicates. This is what the algorithm in SequenceableCollection >> #withoutDuplicates does. Still, having #select: available for any kind of Collection, we might be able to move it up. Then, other kinds could optimize the implementation, like Set where no duplicates exist in the first place.

+1 for moving #withoutDuplicates up to Collection

Best, Marcel Am 10.06.2023 17:10:16 schrieb christoph.thiede@student.hpi.uni-potsdam.de christoph.thiede@student.hpi.uni-potsdam.de: Hi all,

What do you think?

Best, Christoph

--- Sent from Squeak Inbox Talk

Vanessa Freudenberg

13 Jun 13 Jun

12:10 a.m.

I don’t think I’ve ever needed it on a non-sequential collection. And the semantics would not be obvious on e.g. a dictionary.

So I don't really see the point of moving it up.

Vanessa

On Sat, Jun 10, 2023 at 08:10 christoph.thiede@student.hpi.uni-potsdam.de wrote:

...

Hi all,

I was just wondering why we only define #withoutDuplicates on SequenceableCollection, since it does not depend on the order of the receiver and could be applied to other types of collections as well. For instance, Set could override it with ^self copy, Bags would be naturally covered as well, etc.

What do you think?

Best, Christoph

Sent from Squeak Inbox Talk

Chris Muller

5:21 a.m.

Hi Christoph,

I was just wondering why we only define #withoutDuplicates on

...

SequenceableCollection, since it does not depend on the order of the receiver and could be applied to other types of collections as well. For instance, Set could override it with ^self copy, Bags would be naturally covered as well, etc.

What do you think?

+1. IMO, the *implementation* of methods as semantically abstract as "collections" should drive the decision of where they reside. #size and #do: are the core methods of Collection. Any operation that can rely solely on those should reside in Collection.

When useful abstract operations are needlessly stuck in subclasses like SequenceableCollection, that, itself, becomes a question of semantics. And of design and usability, too. Especially when one discovers they have a use for it in a non-Sequenceable, "Why," becomes the always- distracting first question. Often, people will simply re-implement something else rather than take a detour to lobby to move it up or modify the core library, thus continuing to reinforce a false notion of, "See? It's not needed except for Sequenceables.."

I lost a similar argument several years ago with #joinSeparatedBy:. I was generating and processing hundreds of thousands of query objects, each having a collection of named arguments (key / value pairs), whose names (keys) simply had to be unique. And because it absolutely *did not matter* what order the arguments were printed on the output stream, a Dictionary was the obvious choice for that collection. Nevertheless, the naysayers argued that their own personal lack of context to such a use-case meant, "random result order makes such a feature questionable".

It is, until it isn't. The only reason #do: on Dictionary makes sense (e.g., "should it enumerate the 'values', or the 'associations'?) is due to our own experience as Smalltalkers of using it. Every operation on Dictionary's has non-obvious semantics to anyone not familiar with Smalltalk. This is why the *implementation* should weigh most heavily in such decisions, with semantics and personal experiences being secondary weights.

Best, Chris

Marcel Taeumel

9:20 a.m.

Hi all --

Chris (cmm) is arguing for a kind of "polymorphic convenience", focusing on a single operation where programmers do not care about specific properties of the collection at hand. The definition of "being a duplicate" is probably something like "a = b" and thus relying on the implementation of #=. I think that Christoph (ct) has a similar perspective here.

Vanessa (codefrau) is arguing for considering a broader context of what programmers are trying to achieve and how they think about their collections at hand. Here, "being sequenceable" seems to be somehow connected to figuring out what "being a duplicate in a container" means. Thus, it would be strange if #withoutDuplicates worked but a following "uniqueStuff first" would raise an error in certain cases. Additionally, "being a duplicate" seems to be tricky for more complex structures such as a Dictionary. There, "duplicate keys" are typically forbidden while "duplicate values" are okay. This property makes #withoutDuplicates a little bit more challenging to understand and use, depending on the kind of collection.

At the moment, programmers have to convert their collection to a sequenceable one to then be able to use #withoutDuplicates. This requires extra knowledge about the collection at hand. We have checks such as #isSequenceable for that to avoid unnecessary conversions. However, following Vanessa's train of thought, that extra knowledge would also be required when #withoutDuplicates was moved to the top. Specifically, programmers would have to think about the next operations they want to perform. Then, a conversion might be necessary anyway.

As I cannot see any improvement of changing the status quo in this regard, I would argue not to move #withoutDuplicates to the top but keep it where it is.

Best, Marcel Am 13.06.2023 05:22:53 schrieb Chris Muller asqueaker@gmail.com: Hi Christoph,

What do you think?

+1. IMO, the implementation of methods as semantically abstract as "collections" should drive the decision of where they reside. #size and #do: are the core methods of Collection. Any operation that can rely solely on those should reside in Collection.

I lost a similar argument several years ago with #joinSeparatedBy:. I was generating and processing hundreds of thousands of query objects, each having a collection of named arguments (key / value pairs), whose names (keys) simply had to be unique. And because it absolutely did not matter what order the arguments were printed on the output stream, a Dictionary was the obvious choice for that collection. Nevertheless, the naysayers argued that their own personal lack of context to such a use-case meant, "random result order makes such a feature questionable".

It is, until it isn't. The only reason #do: on Dictionary makes sense (e.g., "should it enumerate the 'values', or the 'associations'?) is due to our own experience as Smalltalkers of using it. Every operation on Dictionary's has non-obvious semantics to anyone not familiar with Smalltalk. This is why the implementation should weigh most heavily in such decisions, with semantics and personal experiences being secondary weights.

Best, Chris

Marcel Taeumel

11:39 a.m.

That said, I am not "super against" moving #withoutDuplicates up to Collection. In contrast to #joinSeparatedBy:, I do not see the issue of "immediate surprise" here when having a non-sequenceable collection at hand. :-) Only maybe "probable surprise" or "eventual surprise" ^^

Best, Marcel

Am 13.06.2023 09:20:35 schrieb Marcel Taeumel marcel.taeumel@hpi.de: Hi all --

As I cannot see any improvement of changing the status quo in this regard, I would argue not to move #withoutDuplicates to the top but keep it where it is.

Best, Marcel Am 13.06.2023 05:22:53 schrieb Chris Muller asqueaker@gmail.com: Hi Christoph,

What do you think?

I lost a similar argument several years ago with #joinSeparatedBy:. I was generating and processing hundreds of thousands of query objects, each having a collection of named arguments (key / value pairs), whose names (keys) simply had to be unique. And because it absolutely did not matter what order the arguments were printed on the output stream, a Dictionary was the obvious choice for that collection. Nevertheless, the naysayers argued that their own personal lack of context to such a use-case meant, "random result order makes such a feature questionable".

It is, until it isn't. The only reason #do: on Dictionary makes sense (e.g., "should it enumerate the 'values', or the 'associations'?) is due to our own experience as Smalltalkers of using it. Every operation on Dictionary's has non-obvious semantics to anyone not familiar with Smalltalk. This is why the implementation should weigh most heavily in such decisions, with semantics and personal experiences being secondary weights.

Best, Chris

Vanessa Freudenberg

7:18 p.m.

I like Chris' argument and example.

And even for Dictionaries the semantics are pretty clear – the keys already have no duplicates after all, so obviously it would apply to the values.

No harm in moving it up.

Vanessa

On Tue, Jun 13, 2023 at 2:39 AM Marcel Taeumel via Squeak-dev < squeak-dev@lists.squeakfoundation.org> wrote:

...

That said, I am not "super against" moving #withoutDuplicates up to Collection. In contrast to #joinSeparatedBy:, I do not see the issue of "immediate surprise" here when having a non-sequenceable collection at hand. :-) Only maybe "probable surprise" or "eventual surprise" ^^

Best, Marcel

Am 13.06.2023 09:20:35 schrieb Marcel Taeumel marcel.taeumel@hpi.de: Hi all --

Chris (cmm) is arguing for a kind of "polymorphic convenience", focusing on a single operation where programmers do not care about specific properties of the collection at hand. The definition of "being a duplicate" is probably something like "a = b" and thus relying on the implementation of #=. I think that Christoph (ct) has a similar perspective here.

Vanessa (codefrau) is arguing for considering a broader context of what programmers are trying to achieve and how they think about their collections at hand. Here, "being sequenceable" seems to be somehow connected to figuring out what "being a duplicate in a container" means. Thus, it would be strange if #withoutDuplicates worked but a following "uniqueStuff first" would raise an error in certain cases. Additionally, "being a duplicate" seems to be tricky for more complex structures such as a Dictionary. There, "duplicate keys" are typically forbidden while "duplicate values" are okay. This property makes #withoutDuplicates a little bit more challenging to understand and use, depending on the kind of collection.

At the moment, programmers have to convert their collection to a sequenceable one to then be able to use #withoutDuplicates. This requires extra knowledge about the collection at hand. We have checks such as #isSequenceable for that to avoid unnecessary conversions. However, following Vanessa's train of thought, that extra knowledge would also be required when #withoutDuplicates was moved to the top. Specifically, programmers would have to think about the next operations they want to perform. Then, a conversion might be necessary anyway.

As I cannot see any improvement of changing the status quo in this regard, I would argue not to move #withoutDuplicates to the top but keep it where it is.

Best, Marcel

Am 13.06.2023 05:22:53 schrieb Chris Muller asqueaker@gmail.com: Hi Christoph,

I was just wondering why we only define #withoutDuplicates on

...
SequenceableCollection, since it does not depend on the order of the receiver and could be applied to other types of collections as well. For instance, Set could override it with ^self copy, Bags would be naturally covered as well, etc.

What do you think?

+1. IMO, the *implementation* of methods as semantically abstract as "collections" should drive the decision of where they reside. #size and #do: are the core methods of Collection. Any operation that can rely solely on those should reside in Collection.

When useful abstract operations are needlessly stuck in subclasses like SequenceableCollection, that, itself, becomes a question of semantics. And of design and usability, too. Especially when one discovers they have a use for it in a non-Sequenceable, "Why," becomes the always- distracting first question. Often, people will simply re-implement something else rather than take a detour to lobby to move it up or modify the core library, thus continuing to reinforce a false notion of, "See? It's not needed except for Sequenceables.."

I lost a similar argument several years ago with #joinSeparatedBy:. I was generating and processing hundreds of thousands of query objects, each having a collection of named arguments (key / value pairs), whose names (keys) simply had to be unique. And because it absolutely *did not matter* what order the arguments were printed on the output stream, a Dictionary was the obvious choice for that collection. Nevertheless, the naysayers argued that their own personal lack of context to such a use-case meant, "random result order makes such a feature questionable".

It is, until it isn't. The only reason #do: on Dictionary makes sense (e.g., "should it enumerate the 'values', or the 'associations'?) is due to our own experience as Smalltalkers of using it. Every operation on Dictionary's has non-obvious semantics to anyone not familiar with Smalltalk. This is why the *implementation* should weigh most heavily in such decisions, with semantics and personal experiences being secondary weights.

Best, Chris

Stéphane Rollandin

8:24 p.m.

...

And even for Dictionaries the semantics are pretty clear – the keys already have no duplicates after all, so obviously it would apply to the values.

Yes but to remove duplicates you then need to keep only one key among the existing ones - how do you choose?

Stef

Marcel Taeumel

14 Jun 14 Jun

9:28 a.m.

Maybe for Dictionaries a #withoutDuplicatesDo: would make more sense. Could be useful for all kinds of collections.

Best, Marcel Am 13.06.2023 20:24:21 schrieb Stéphane Rollandin lecteur@zogotounga.net:

...

And even for Dictionaries the semantics are pretty clear – the keys already have no duplicates after all, so obviously it would apply to the values.

Yes but to remove duplicates you then need to keep only one key among the existing ones - how do you choose?

Stef

Tobias Pape

11:58 a.m.

...

On 13. Jun 2023, at 19:18, Vanessa Freudenberg vanessa@codefrau.net wrote:

I like Chris' argument and example.

And even for Dictionaries the semantics are pretty clear – the keys already have no duplicates after all, so obviously it would apply to the values.

I'm not on board with this one. I'd expect to be #withoutDuplicates be a '^self'. That said, most messages without explicit *Key are on values in dict, so there's that

-t

...

No harm in moving it up.

Vanessa

On Tue, Jun 13, 2023 at 2:39 AM Marcel Taeumel via Squeak-dev squeak-dev@lists.squeakfoundation.org wrote: That said, I am not "super against" moving #withoutDuplicates up to Collection. In contrast to #joinSeparatedBy:, I do not see the issue of "immediate surprise" here when having a non-sequenceable collection at hand. :-) Only maybe "probable surprise" or "eventual surprise" ^^

Best, Marcel

...
Am 13.06.2023 09:20:35 schrieb Marcel Taeumel marcel.taeumel@hpi.de:

Hi all --

Chris (cmm) is arguing for a kind of "polymorphic convenience", focusing on a single operation where programmers do not care about specific properties of the collection at hand. The definition of "being a duplicate" is probably something like "a = b" and thus relying on the implementation of #=. I think that Christoph (ct) has a similar perspective here.

Vanessa (codefrau) is arguing for considering a broader context of what programmers are trying to achieve and how they think about their collections at hand. Here, "being sequenceable" seems to be somehow connected to figuring out what "being a duplicate in a container" means. Thus, it would be strange if #withoutDuplicates worked but a following "uniqueStuff first" would raise an error in certain cases. Additionally, "being a duplicate" seems to be tricky for more complex structures such as a Dictionary. There, "duplicate keys" are typically forbidden while "duplicate values" are okay. This property makes #withoutDuplicates a little bit more challenging to understand and use, depending on the kind of collection.

At the moment, programmers have to convert their collection to a sequenceable one to then be able to use #withoutDuplicates. This requires extra knowledge about the collection at hand. We have checks such as #isSequenceable for that to avoid unnecessary conversions. However, following Vanessa's train of thought, that extra knowledge would also be required when #withoutDuplicates was moved to the top. Specifically, programmers would have to think about the next operations they want to perform. Then, a conversion might be necessary anyway.

As I cannot see any improvement of changing the status quo in this regard, I would argue not to move #withoutDuplicates to the top but keep it where it is.

Best, Marcel

...
Am 13.06.2023 05:22:53 schrieb Chris Muller asqueaker@gmail.com:

Hi Christoph,

I was just wondering why we only define #withoutDuplicates on SequenceableCollection, since it does not depend on the order of the receiver and could be applied to other types of collections as well. For instance, Set could override it with ^self copy, Bags would be naturally covered as well, etc.

What do you think?

+1. IMO, the implementation of methods as semantically abstract as "collections" should drive the decision of where they reside. #size and #do: are the core methods of Collection. Any operation that can rely solely on those should reside in Collection.

When useful abstract operations are needlessly stuck in subclasses like SequenceableCollection, that, itself, becomes a question of semantics. And of design and usability, too. Especially when one discovers they have a use for it in a non-Sequenceable, "Why," becomes the always- distracting first question. Often, people will simply re-implement something else rather than take a detour to lobby to move it up or modify the core library, thus continuing to reinforce a false notion of, "See? It's not needed except for Sequenceables.."

I lost a similar argument several years ago with #joinSeparatedBy:. I was generating and processing hundreds of thousands of query objects, each having a collection of named arguments (key / value pairs), whose names (keys) simply had to be unique. And because it absolutely did not matter what order the arguments were printed on the output stream, a Dictionary was the obvious choice for that collection. Nevertheless, the naysayers argued that their own personal lack of context to such a use-case meant, "random result order makes such a feature questionable".

It is, until it isn't. The only reason #do: on Dictionary makes sense (e.g., "should it enumerate the 'values', or the 'associations'?) is due to our own experience as Smalltalkers of using it. Every operation on Dictionary's has non-obvious semantics to anyone not familiar with Smalltalk. This is why the implementation should weigh most heavily in such decisions, with semantics and personal experiences being secondary weights.

Best, Chris

Thiede, Christoph

15 Jun 15 Jun

8:38 p.m.

Hi all,

I originally brought up this question while revising a paragraph in Squeak by Example:

...

Collection>>asSet offers us a convenient way to eliminate duplicates from a collection: {Color black . Color white . (Color red + Color blue + Color green)} asSet size --> 2 The result of the example is 2, as the color white was included twice in the collection. Once as the result of Color white, and once as the result of the combination of red, blue, and green.

However, if you are working with a sequenceable collection and want to preserve its type or order, you can use SequenceableCollection>>withoutDuplicates instead.

I stroke me that I had to write "if you are working with a sequenceable collection" here, and it still strikes me today. #asSet should not be the only efficient way to deduplicate a bag, heap, or whatever. If I want deduplication, I should not have to know the type of my collection if it's a variant type.

The property of having duplicates is completely invariant from the property of being sequencable unless you think about how an implementation of eliminating duplicates might look like.

I think that this discussion is being overshadowed by another issue, the iteration semantics of dictionaries. Because Dictionary>>#do: ignores the keys, there is always a sharp break when applying iteration logic from another collection to dictionaries. Thus, I think we should not discuss Collection>>#withoutDuplicates through the example of dictionaries. But given their existing iteration semantics, I'm with Vanessa on the behavior of Dictionary>>#withoutDuplicates: Keys are only metadata not relevant for iteration, so they should also not influence deduplication.

...

Yes but to remove duplicates you then need to keep only one key among the existing ones - how do you choose?

That's as undefined as the iteration order of #do:, or the answer of #anyOne is for non-sequenceable collections. If you want to iterate or deduplicate your bag, order it first. :-)

Best, Christoph

________________________________ Von: Tobias Pape Das.Linux@gmx.de Gesendet: Mittwoch, 14. Juni 2023 11:58:54 An: The general-purpose Squeak developers list Betreff: [squeak-dev] Re: #withoutDuplicates on Collection?

...

On 13. Jun 2023, at 19:18, Vanessa Freudenberg vanessa@codefrau.net wrote:

I like Chris' argument and example.

And even for Dictionaries the semantics are pretty clear – the keys already have no duplicates after all, so obviously it would apply to the values.

I'm not on board with this one. I'd expect to be #withoutDuplicates be a '^self'. That said, most messages without explicit *Key are on values in dict, so there's that

-t

...

No harm in moving it up.

Vanessa

On Tue, Jun 13, 2023 at 2:39 AM Marcel Taeumel via Squeak-dev squeak-dev@lists.squeakfoundation.org wrote: That said, I am not "super against" moving #withoutDuplicates up to Collection. In contrast to #joinSeparatedBy:, I do not see the issue of "immediate surprise" here when having a non-sequenceable collection at hand. :-) Only maybe "probable surprise" or "eventual surprise" ^^

Best, Marcel

...
Am 13.06.2023 09:20:35 schrieb Marcel Taeumel marcel.taeumel@hpi.de:

Hi all --

Chris (cmm) is arguing for a kind of "polymorphic convenience", focusing on a single operation where programmers do not care about specific properties of the collection at hand. The definition of "being a duplicate" is probably something like "a = b" and thus relying on the implementation of #=. I think that Christoph (ct) has a similar perspective here.

Vanessa (codefrau) is arguing for considering a broader context of what programmers are trying to achieve and how they think about their collections at hand. Here, "being sequenceable" seems to be somehow connected to figuring out what "being a duplicate in a container" means. Thus, it would be strange if #withoutDuplicates worked but a following "uniqueStuff first" would raise an error in certain cases. Additionally, "being a duplicate" seems to be tricky for more complex structures such as a Dictionary. There, "duplicate keys" are typically forbidden while "duplicate values" are okay. This property makes #withoutDuplicates a little bit more challenging to understand and use, depending on the kind of collection.

At the moment, programmers have to convert their collection to a sequenceable one to then be able to use #withoutDuplicates. This requires extra knowledge about the collection at hand. We have checks such as #isSequenceable for that to avoid unnecessary conversions. However, following Vanessa's train of thought, that extra knowledge would also be required when #withoutDuplicates was moved to the top. Specifically, programmers would have to think about the next operations they want to perform. Then, a conversion might be necessary anyway.

As I cannot see any improvement of changing the status quo in this regard, I would argue not to move #withoutDuplicates to the top but keep it where it is.

Best, Marcel

...
Am 13.06.2023 05:22:53 schrieb Chris Muller asqueaker@gmail.com:

Hi Christoph,

I was just wondering why we only define #withoutDuplicates on SequenceableCollection, since it does not depend on the order of the receiver and could be applied to other types of collections as well. For instance, Set could override it with ^self copy, Bags would be naturally covered as well, etc.

What do you think?

+1. IMO, the implementation of methods as semantically abstract as "collections" should drive the decision of where they reside. #size and #do: are the core methods of Collection. Any operation that can rely solely on those should reside in Collection.

When useful abstract operations are needlessly stuck in subclasses like SequenceableCollection, that, itself, becomes a question of semantics. And of design and usability, too. Especially when one discovers they have a use for it in a non-Sequenceable, "Why," becomes the always- distracting first question. Often, people will simply re-implement something else rather than take a detour to lobby to move it up or modify the core library, thus continuing to reinforce a false notion of, "See? It's not needed except for Sequenceables.."

I lost a similar argument several years ago with #joinSeparatedBy:. I was generating and processing hundreds of thousands of query objects, each having a collection of named arguments (key / value pairs), whose names (keys) simply had to be unique. And because it absolutely did not matter what order the arguments were printed on the output stream, a Dictionary was the obvious choice for that collection. Nevertheless, the naysayers argued that their own personal lack of context to such a use-case meant, "random result order makes such a feature questionable".

It is, until it isn't. The only reason #do: on Dictionary makes sense (e.g., "should it enumerate the 'values', or the 'associations'?) is due to our own experience as Smalltalkers of using it. Every operation on Dictionary's has non-obvious semantics to anyone not familiar with Smalltalk. This is why the implementation should weigh most heavily in such decisions, with semantics and personal experiences being secondary weights.

Best, Chris

David T. Lewis

11:44 p.m.

On Thu, Jun 15, 2023 at 06:38:36PM +0000, Thiede, Christoph wrote:

...

I think that this discussion is being overshadowed by another issue, the iteration semantics of dictionaries. Because Dictionary>>#do: ignores the keys, there is always a sharp break when applying iteration logic from another collection to dictionaries. Thus, I think we should not discuss Collection>>#withoutDuplicates through the example of dictionaries. But given their existing iteration semantics, I'm with Vanessa on the behavior of Dictionary>>#withoutDuplicates: Keys are only metadata not relevant for iteration, so they should also not influence deduplication.

To summarize what I have read so far in this discussion:

It sounds like a good idea, but

1) There is no known use case for it. 2) It does not solve any problems. 3) It leads to confusion.

Dave

Rein, Patrick

16 Jun 16 Jun

7 a.m.

Sorry for adding to the discussion after your summary Dave, but I am also in favor of moving it up and I disagree that there is no use case. Just imagine the following:

You have some code collecting stuff in an OrderedCollection. The collection is used to generate a list of filter items, which only uses the unique items, so you used #withoutDuplicates. Now, you have to optimize it and realize you only care for the counts and thus use a Bag (not unusual in my experience)...

Same as with the ExceptionSet discussion, I am in favor for this change (in general, no opinion on the specifics) due to my experience with tutoring several hundred university students starting with Squeak. These situations are what gets them frustrated (yes, they get frustrated easily, but some things are more justified than others). Protocols that are inconsistent across classes for no special reason and thus require them to learn special ways of doing things. Christoph's struggle in arguing for why there are different ways to deduplicate a collection in Squeak by Example also hints at that usability problem. :)

Cheers, Patrick ________________________________________ From: David T. Lewis lewis@mail.msen.com Sent: Thursday, June 15, 2023 11:44:32 PM To: The general-purpose Squeak developers list Subject: [squeak-dev] Re: #withoutDuplicates on Collection?

On Thu, Jun 15, 2023 at 06:38:36PM +0000, Thiede, Christoph wrote:

...

I think that this discussion is being overshadowed by another issue, the iteration semantics of dictionaries. Because Dictionary>>#do: ignores the keys, there is always a sharp break when applying iteration logic from another collection to dictionaries. Thus, I think we should not discuss Collection>>#withoutDuplicates through the example of dictionaries. But given their existing iteration semantics, I'm with Vanessa on the behavior of Dictionary>>#withoutDuplicates: Keys are only metadata not relevant for iteration, so they should also not influence deduplication.

To summarize what I have read so far in this discussion:

It sounds like a good idea, but

1) There is no known use case for it. 2) It does not solve any problems. 3) It leads to confusion.

Dave

David T. Lewis

3:32 p.m.

Patrick,

Thank you for the clarification, and apologies to Christoph for my misunderstanding.

I'll go back to lurking mode now :-)

Dave

On Fri, Jun 16, 2023 at 05:00:35AM +0000, Rein, Patrick via Squeak-dev wrote:

...

Sorry for adding to the discussion after your summary Dave, but I am also in favor of moving it up and I disagree that there is no use case. Just imagine the following:

You have some code collecting stuff in an OrderedCollection. The collection is used to generate a list of filter items, which only uses the unique items, so you used #withoutDuplicates. Now, you have to optimize it and realize you only care for the counts and thus use a Bag (not unusual in my experience)...

Same as with the ExceptionSet discussion, I am in favor for this change (in general, no opinion on the specifics) due to my experience with tutoring several hundred university students starting with Squeak. These situations are what gets them frustrated (yes, they get frustrated easily, but some things are more justified than others). Protocols that are inconsistent across classes for no special reason and thus require them to learn special ways of doing things. Christoph's struggle in arguing for why there are different ways to deduplicate a collection in Squeak by Example also hints at that usability problem. :)

Cheers, Patrick ________________________________________ From: David T. Lewis lewis@mail.msen.com Sent: Thursday, June 15, 2023 11:44:32 PM To: The general-purpose Squeak developers list Subject: [squeak-dev] Re: #withoutDuplicates on Collection?

On Thu, Jun 15, 2023 at 06:38:36PM +0000, Thiede, Christoph wrote:

...
I think that this discussion is being overshadowed by another issue, the iteration semantics of dictionaries. Because Dictionary>>#do: ignores the keys, there is always a sharp break when applying iteration logic from another collection to dictionaries. Thus, I think we should not discuss Collection>>#withoutDuplicates through the example of dictionaries. But given their existing iteration semantics, I'm with Vanessa on the behavior of Dictionary>>#withoutDuplicates: Keys are only metadata not relevant for iteration, so they should also not influence deduplication.

To summarize what I have read so far in this discussion:

It sounds like a good idea, but

There is no known use case for it.

It does not solve any problems.

It leads to confusion.

Dave

Marcel Taeumel

20 Jun 20 Jun

11:58 a.m.

Hi Dave --

...

I'll go back to lurking mode now :-)

No worries =) Feel free to un-lurk any time ;-)

Best, Marcel Am 16.06.2023 15:32:43 schrieb David T. Lewis lewis@mail.msen.com: Patrick,

Thank you for the clarification, and apologies to Christoph for my misunderstanding.

I'll go back to lurking mode now :-)

Dave

On Fri, Jun 16, 2023 at 05:00:35AM +0000, Rein, Patrick via Squeak-dev wrote:

...

Sorry for adding to the discussion after your summary Dave, but I am also in favor of moving it up and I disagree that there is no use case. Just imagine the following:

You have some code collecting stuff in an OrderedCollection. The collection is used to generate a list of filter items, which only uses the unique items, so you used #withoutDuplicates. Now, you have to optimize it and realize you only care for the counts and thus use a Bag (not unusual in my experience)...

Same as with the ExceptionSet discussion, I am in favor for this change (in general, no opinion on the specifics) due to my experience with tutoring several hundred university students starting with Squeak. These situations are what gets them frustrated (yes, they get frustrated easily, but some things are more justified than others). Protocols that are inconsistent across classes for no special reason and thus require them to learn special ways of doing things. Christoph's struggle in arguing for why there are different ways to deduplicate a collection in Squeak by Example also hints at that usability problem. :)

Cheers, Patrick ________________________________________ From: David T. Lewis Sent: Thursday, June 15, 2023 11:44:32 PM To: The general-purpose Squeak developers list Subject: [squeak-dev] Re: #withoutDuplicates on Collection?

On Thu, Jun 15, 2023 at 06:38:36PM +0000, Thiede, Christoph wrote:

...
I think that this discussion is being overshadowed by another issue, the iteration semantics of dictionaries. Because Dictionary>>#do: ignores the keys, there is always a sharp break when applying iteration logic from another collection to dictionaries. Thus, I think we should not discuss Collection>>#withoutDuplicates through the example of dictionaries. But given their existing iteration semantics, I'm with Vanessa on the behavior of Dictionary>>#withoutDuplicates: Keys are only metadata not relevant for iteration, so they should also not influence deduplication.

To summarize what I have read so far in this discussion:

It sounds like a good idea, but

There is no known use case for it.

It does not solve any problems.

It leads to confusion.

Dave

Marcel Taeumel

2 Aug 2 Aug

10:51 a.m.

Hi all --

See the attached changeset or (most of it) via Collections-mt.1048 in inbox. Tests are here in the changeset only for now. I found several implementations of Collections that I think should not offer #withoutDuplicates: Bitset, CharacterSet, DependentsArray, Heap, Matrix, WeakRegistry.

Move to Trunk?

Best, Marcel Am 20.06.2023 11:58:56 schrieb Marcel Taeumel marcel.taeumel@hpi.de: Hi Dave --

...

I'll go back to lurking mode now :-)

No worries =) Feel free to un-lurk any time ;-)

Best, Marcel Am 16.06.2023 15:32:43 schrieb David T. Lewis lewis@mail.msen.com: Patrick,

Thank you for the clarification, and apologies to Christoph for my misunderstanding.

I'll go back to lurking mode now :-)

Dave

On Fri, Jun 16, 2023 at 05:00:35AM +0000, Rein, Patrick via Squeak-dev wrote:

...

Sorry for adding to the discussion after your summary Dave, but I am also in favor of moving it up and I disagree that there is no use case. Just imagine the following:

You have some code collecting stuff in an OrderedCollection. The collection is used to generate a list of filter items, which only uses the unique items, so you used #withoutDuplicates. Now, you have to optimize it and realize you only care for the counts and thus use a Bag (not unusual in my experience)...

Same as with the ExceptionSet discussion, I am in favor for this change (in general, no opinion on the specifics) due to my experience with tutoring several hundred university students starting with Squeak. These situations are what gets them frustrated (yes, they get frustrated easily, but some things are more justified than others). Protocols that are inconsistent across classes for no special reason and thus require them to learn special ways of doing things. Christoph's struggle in arguing for why there are different ways to deduplicate a collection in Squeak by Example also hints at that usability problem. :)

Cheers, Patrick ________________________________________ From: David T. Lewis Sent: Thursday, June 15, 2023 11:44:32 PM To: The general-purpose Squeak developers list Subject: [squeak-dev] Re: #withoutDuplicates on Collection?

On Thu, Jun 15, 2023 at 06:38:36PM +0000, Thiede, Christoph wrote:

...
I think that this discussion is being overshadowed by another issue, the iteration semantics of dictionaries. Because Dictionary>>#do: ignores the keys, there is always a sharp break when applying iteration logic from another collection to dictionaries. Thus, I think we should not discuss Collection>>#withoutDuplicates through the example of dictionaries. But given their existing iteration semantics, I'm with Vanessa on the behavior of Dictionary>>#withoutDuplicates: Keys are only metadata not relevant for iteration, so they should also not influence deduplication.

To summarize what I have read so far in this discussion:

It sounds like a good idea, but

There is no known use case for it.

It does not solve any problems.

It leads to confusion.

Dave

Tobias Pape

11:06 a.m.

...

On 2. Aug 2023, at 10:51, Marcel Taeumel via Squeak-dev squeak-dev@lists.squeakfoundation.org wrote:

Hi all --

See the attached changeset or (most of it) via Collections-mt.1048 in inbox. Tests are here in the changeset only for now. I found several implementations of Collections that I think should not offer #withoutDuplicates: Bitset, CharacterSet, DependentsArray, Heap, Matrix, WeakRegistry.

Move to Trunk?

I'm curiously ok with that… One quirk: Set>>withoutDuplicates should ^self Set>>copyWithoutDuplicates should ^self copy

Best regards -Tobias

...

Best, Marcel

...
Am 20.06.2023 11:58:56 schrieb Marcel Taeumel marcel.taeumel@hpi.de: Hi Dave --

...
I'll go back to lurking mode now :-)

No worries =) Feel free to un-lurk any time ;-)

Best, Marcel

...
Am 16.06.2023 15:32:43 schrieb David T. Lewis lewis@mail.msen.com: Patrick,

Thank you for the clarification, and apologies to Christoph for my misunderstanding.

I'll go back to lurking mode now :-)

Dave

On Fri, Jun 16, 2023 at 05:00:35AM +0000, Rein, Patrick via Squeak-dev wrote:

...
Sorry for adding to the discussion after your summary Dave, but I am also in favor of moving it up and I disagree that there is no use case. Just imagine the following:

You have some code collecting stuff in an OrderedCollection. The collection is used to generate a list of filter items, which only uses the unique items, so you used #withoutDuplicates. Now, you have to optimize it and realize you only care for the counts and thus use a Bag (not unusual in my experience)...

Same as with the ExceptionSet discussion, I am in favor for this change (in general, no opinion on the specifics) due to my experience with tutoring several hundred university students starting with Squeak. These situations are what gets them frustrated (yes, they get frustrated easily, but some things are more justified than others). Protocols that are inconsistent across classes for no special reason and thus require them to learn special ways of doing things. Christoph's struggle in arguing for why there are different ways to deduplicate a collection in Squeak by Example also hints at that usability problem. :)

Cheers, Patrick ________________________________________ From: David T. Lewis Sent: Thursday, June 15, 2023 11:44:32 PM To: The general-purpose Squeak developers list Subject: [squeak-dev] Re: #withoutDuplicates on Collection?

On Thu, Jun 15, 2023 at 06:38:36PM +0000, Thiede, Christoph wrote:

...
I think that this discussion is being overshadowed by another issue, the iteration semantics of dictionaries. Because Dictionary>>#do: ignores the keys, there is always a sharp break when applying iteration logic from another collection to dictionaries. Thus, I think we should not discuss Collection>>#withoutDuplicates through the example of dictionaries. But given their existing iteration semantics, I'm with Vanessa on the behavior of Dictionary>>#withoutDuplicates: Keys are only metadata not relevant for iteration, so they should also not influence deduplication.

To summarize what I have read so far in this discussion:

It sounds like a good idea, but

There is no known use case for it.

It does not solve any problems.

It leads to confusion.

Dave

<without-duplicates.1.cs>

Marcel Taeumel

8 Aug 8 Aug

2:37 p.m.

I think that we should not exploit that special case in Set to not answer a copy in #withoutDuplicates ... clients could never know ... better keep it a copy in all cases.

Other thoughts? Move it to Trunk?

Best, Marcel Am 02.08.2023 11:06:35 schrieb Tobias Pape das.linux@gmx.de: Hi

...

On 2. Aug 2023, at 10:51, Marcel Taeumel via Squeak-dev wrote:

Hi all --

See the attached changeset or (most of it) via Collections-mt.1048 in inbox. Tests are here in the changeset only for now. I found several implementations of Collections that I think should not offer #withoutDuplicates: Bitset, CharacterSet, DependentsArray, Heap, Matrix, WeakRegistry.

Move to Trunk?

I'm curiously ok with that… One quirk: Set>>withoutDuplicates should ^self Set>>copyWithoutDuplicates should ^self copy

Best regards -Tobias

...

Best, Marcel

...
Am 20.06.2023 11:58:56 schrieb Marcel Taeumel : Hi Dave --

...
I'll go back to lurking mode now :-)

No worries =) Feel free to un-lurk any time ;-)

Best, Marcel

...
Am 16.06.2023 15:32:43 schrieb David T. Lewis : Patrick,

Thank you for the clarification, and apologies to Christoph for my misunderstanding.

I'll go back to lurking mode now :-)

Dave

On Fri, Jun 16, 2023 at 05:00:35AM +0000, Rein, Patrick via Squeak-dev wrote:

...
Sorry for adding to the discussion after your summary Dave, but I am also in favor of moving it up and I disagree that there is no use case. Just imagine the following:

You have some code collecting stuff in an OrderedCollection. The collection is used to generate a list of filter items, which only uses the unique items, so you used #withoutDuplicates. Now, you have to optimize it and realize you only care for the counts and thus use a Bag (not unusual in my experience)...

Same as with the ExceptionSet discussion, I am in favor for this change (in general, no opinion on the specifics) due to my experience with tutoring several hundred university students starting with Squeak. These situations are what gets them frustrated (yes, they get frustrated easily, but some things are more justified than others). Protocols that are inconsistent across classes for no special reason and thus require them to learn special ways of doing things. Christoph's struggle in arguing for why there are different ways to deduplicate a collection in Squeak by Example also hints at that usability problem. :)

Cheers, Patrick ________________________________________ From: David T. Lewis Sent: Thursday, June 15, 2023 11:44:32 PM To: The general-purpose Squeak developers list Subject: [squeak-dev] Re: #withoutDuplicates on Collection?

On Thu, Jun 15, 2023 at 06:38:36PM +0000, Thiede, Christoph wrote:

...
I think that this discussion is being overshadowed by another issue, the iteration semantics of dictionaries. Because Dictionary>>#do: ignores the keys, there is always a sharp break when applying iteration logic from another collection to dictionaries. Thus, I think we should not discuss Collection>>#withoutDuplicates through the example of dictionaries. But given their existing iteration semantics, I'm with Vanessa on the behavior of Dictionary>>#withoutDuplicates: Keys are only metadata not relevant for iteration, so they should also not influence deduplication.

To summarize what I have read so far in this discussion:

It sounds like a good idea, but

There is no known use case for it.

It does not solve any problems.

It leads to confusion.

Dave

285

Age (days ago)

344

Last active (days ago)

squeak-dev@lists.squeakfoundation.org

17 comments

9 participants

tags (0)

participants (9)

Chris Muller
christoph.thiede＠student.hpi.uni-potsdam.de
David T. Lewis
Marcel Taeumel
Rein, Patrick
Stéphane Rollandin
Thiede, Christoph
Tobias Pape
Vanessa Freudenberg