On Wed, 24 Mar 2010, Igor Stasenko wrote:
On a board's meeting, Chris mentioned my implementation, and he says that it outperforms the LargeIdentitySet. I'm not sure how Chris compared it, because i implemented this scheme only for dictionaries , not sets.
It's really hard to believe that a dictionary can be faster than LargeIdentitySet (I even doubt that a set could be faster), so I'm waiting for the numbers.
As you pointed out it's unfair to compare sets to dictionaries, so I wrote LargeIdentityDictionary (http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionary.st ) which works just like LargeIdentitySet, but it also stores the values besides the keys. I modified Chris' benchmark just like you suggested: store nil as the value (http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionaryBenc... ). The benchmark compares LargeIdentityDictionary to IdentityDictionary. It uses two diffent lookup methods (#at: and #includesKey:). I ran the benchmarks and copied the results to a single file (http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionaryBenc... ), but I'm sure the nice charts are more interesting, so here they are: http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionary.png http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionary2.pn...
The first chart shows that IdentityDictionary has a big spike at 970000, probably because of clustering (we need even better primes). So the second chart is the same as the first, but it only shows the 0-5000ms range. What we can see is #at: and #includesKey: have similar performance for IdentityDictionary (both use #scanFor:). LargeIdentityDictionary >> #at: is ~3.5-5x faster than IdentityDictionary >> #at:. And what's more interresing is that #includesKey: is so fast that it seems to be totally flat (though it's not). That's because #includesKey: can use the #pointsTo: primitive, but #at: can't, because it has to return the value for the key. So if a dictionary is faster than LargeIdentitySet, than it's graph has to be even flatter than #includesKey:'s graph (it's really hard to believe that such dictionary exists).
I wonder how LargeIdentityDictionary compares to your dictionaries'.
Levente
P.S. LargeIdentityDictionary is hardly tested and doesn't implement every "public" method that IdentityDictionary does.
If you interested in details, you can check it in Magma. Brief description: i using a linked list for storing associations which keys having same hash value so, in addition to key/value pair there is next ivar. And dictionary's array entries look like this: ... ... ... e1 -> e2 -> e3 ... ...
where e1,e2,e3 is associations , which having same hash value, but different keys. So, when it doing lookup for a key, say e3, initially by doing hash lookup it finds e1, and then since #= for e1 answers false, it looks for the next linked association, until e3 is found.
If i remember, the last time i benched this implementation, it almost same in speed as an original implementation for small sizes, and outperforms on a large sizes, especially, when you got more than 4096 elements.
So, Chris, can you give us details how you compared the speed between LargeIdentitySet and my dictionaries? I don't think its correct to compare them, because dictionaries need to handle more data than sets, and we should expect some slowdown because of it. But sure, you can imitate the Set behavior with dicts, by simply ignoring the value field of association (set it to nil), and use only keys, i.e. set add: object is like: dict at: object put: nil.
-- Best regards, Igor Stasenko AKA sig.