Re: [squeak-dev] [Pharo-dev] Why do we have SmallDictionary?

8 Jun 2018


      On 8 Jun 2018, at 9:30, Max Leske wrote:
...
(Sorry, hit send accidentally).
On 8 Jun 2018, at 9:02, Marcus Denker wrote:
...
...
On 8 Jun 2018, at 08:46, Max Leske maxleske@gmail.com wrote:
Hi,
I was messing around with SmallDictionary when I suddenly realised 
that I can't find a single reason to use it over a normal 
Dictionary. While its name and class comment imply that it is 
somehow an optimised Dictionary, I don't see any measurement where 
that actually holds up. The following was run in a Pharo 7 image on 
a recent VM (see below):
| d |
d := SmallDictionary new.
d sizeInMemory. "24"
[100000 timesRepeat: [
   1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:05.226"
[100000 timesRepeat: [
   d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041"
| d |
d := Dictionary new.
d sizeInMemory. "16"
[100000 timesRepeat: [
   1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:00.385"
[100000 timesRepeat: [
   d at: 48 ifAbsent: [] ] ] timeToRun.  "0:00:00:00.006"
can you try with <10 elements?
The results are in!
Note that I had to increase the repetitions by a factor or 100.
| d |
d := SmallDictionary new.
[10000000 timesRepeat: [
   1 to: 5 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:02.532"
[10000000 timesRepeat: [
   d at: 4 ifAbsent: [] ] ] timeToRun.  "0:00:00:00.686"
| d |
d := Dictionary new.
[10000000 timesRepeat: [
   1 to: 5 do: [ :i | d at:i put: i] ] ] timeToRun. "0:00:00:02.102"
[10000000 timesRepeat: [
   d at: 4 ifAbsent: [] ] ] timeToRun. "0:00:00:00.567"
So SmallDictionary is indeed very fast for only few elements. Once the 
SmallDictionary has to grow its performance decreases rapidly, or at 
least the grow operation takes a long time.
Dictionary still outperforms SmallDictionary though, so I guess the 
only reason left is space efficency.
...
...
Is anyone aware of a reason for hanging on to SmallDictionary? I'm 
also curious to know how SmallDictionary came to be. There must have 
been some advantage over Dictionary at some point in the past.
It came from RB… the idea was that there are (in the Refactoring 
engine) a lot of very small dictionaries with <10 elements.
The idea is that for such dictionaries, the overhead of hashing was 
higher than just linear search.
I am sure I did a benchmark back then, but of course this could have 
changed in the meantime.
It would be nice to redo the benchmark.
That would be very nice, as I just realised, that the type of object 
used as key has an effect on the hashing time... There are a lot of 
variables here, unfortunately.
Max
...
...
For size: does #sizeInMemory take the internal data into account? e.g 
for Dictionary there is an Array that contains Associations, you need 
to count those, too.
No, I didn't check for that initially. When I did, it became apparent 
that the larger the dictionary the more space you save with 
SmallDictionary.
...
In the end this is a hack anyway: if needed Dictionary could do 
things like this transparently… without the user having to decide. 
And the question is if this is not just a bad side effect of the 
particular implementation.
Good point. Sounds like what Stefan Marr wrote in the paper where he 
performed a field study of collection libraries ;)
Max
...
Marcus