On 8 Jun 2018, at 9:30, Max Leske wrote:
(Sorry, hit send accidentally).
On 8 Jun 2018, at 9:02, Marcus Denker wrote:
On 8 Jun 2018, at 08:46, Max Leske maxleske@gmail.com wrote:
Hi,
I was messing around with SmallDictionary when I suddenly realised that I can't find a single reason to use it over a normal Dictionary. While its name and class comment imply that it is somehow an optimised Dictionary, I don't see any measurement where that actually holds up. The following was run in a Pharo 7 image on a recent VM (see below):
| d | d := SmallDictionary new. d sizeInMemory. "24" [100000 timesRepeat: [ 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226"
[100000 timesRepeat: [ d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041"
| d | d := Dictionary new. d sizeInMemory. "16" [100000 timesRepeat: [ 1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385" [100000 timesRepeat: [ d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.006"
can you try with <10 elements?
The results are in! Note that I had to increase the repetitions by a factor or 100.
| d | d := SmallDictionary new. [10000000 timesRepeat: [ 1 to: 5 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:02.532"
[10000000 timesRepeat: [ d at: 4 ifAbsent: [] ] ] timeToRun. "0:00:00:00.686"
| d | d := Dictionary new. [10000000 timesRepeat: [ 1 to: 5 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:02.102"
[10000000 timesRepeat: [ d at: 4 ifAbsent: [] ] ] timeToRun. "0:00:00:00.567"
So SmallDictionary is indeed very fast for only few elements. Once the SmallDictionary has to grow its performance decreases rapidly, or at least the grow operation takes a long time. Dictionary still outperforms SmallDictionary though, so I guess the only reason left is space efficency.
Is anyone aware of a reason for hanging on to SmallDictionary? I'm also curious to know how SmallDictionary came to be. There must have been some advantage over Dictionary at some point in the past.
It came from RB… the idea was that there are (in the Refactoring engine) a lot of very small dictionaries with <10 elements. The idea is that for such dictionaries, the overhead of hashing was higher than just linear search.
I am sure I did a benchmark back then, but of course this could have changed in the meantime.
It would be nice to redo the benchmark.
That would be very nice, as I just realised, that the type of object used as key has an effect on the hashing time... There are a lot of variables here, unfortunately.
Max
For size: does #sizeInMemory take the internal data into account? e.g for Dictionary there is an Array that contains Associations, you need to count those, too.
No, I didn't check for that initially. When I did, it became apparent that the larger the dictionary the more space you save with SmallDictionary.
In the end this is a hack anyway: if needed Dictionary could do things like this transparently… without the user having to decide. And the question is if this is not just a bad side effect of the particular implementation.
Good point. Sounds like what Stefan Marr wrote in the paper where he performed a field study of collection libraries ;)
Max
Marcus