Hi.
Colors are implemented by keeping the RGB components into 10 bits each, and all of them fit in a SmallInteger called rgb. The hash of colors is ^rgb. The instance variable rgb stores all what the color is, in terms of the RGB space model. So, rgb is pretty much like this:
[ 10 bits red | 10 bits green | 10 bits blue ] [sign] [SmallInteger mark bit]
And so is the hash of colors. Let's put a ton (let's say 1024 to make round numbers) of colors in a Dictionary as keys. When stored there, the keys are put inside an Array and the index at which they will be is determined from the hash value of such objects. It's pretty like looking up in a real dictionary by saying "well I'm looking for a word that starts with A, I won't binary search the whole dictionary, just a subsection of the whole A section".
To determine such indexes, Dictionaries get the size of the key Array and then take the remainder of dividing the hash of the key by that size. Then what happens?
Colors are just hashed by their red component.
And that's bad. Example.
When the nearest color from an array of colors to a given color is computer, the result gets cached because once it's determined it won't change and the computation is quite hard on the cpu cycles. The operation that takes most time to complete, when rendering colors from a Form to another one with a different color map, is precisely to access the cache with at:, because the cache becomes a Dictionary. About 20 something % of the time is spent doing #at:ifAbsent:. Here are the leaves of a tally on the whole process, worth 4m 16s 547ms.
24.7 Dictionary>>scanFor: 7.0 Color>>distanceTo: 6.4 BitBlt>>pixelAt: 6.0 BitBlt>>pixelAt:put: 5.4 Form class>>extent:depth: 5.3 BitBlt class>>destForm:sourceForm:hal...rigin:extent:clipRect: 5.0 Form>>setExtent:depth: 4.5 Rectangle class>>origin:extent: 2.5 Color class>>r:g:b:range: 2.1 ColorReducer>>closestColorIndexFrom:to:
In this case, the dictionary in question has 630 keys and 256 different value colors. So, to make 512x384 = 196608 accesses to a dictionary takes roughly a minute and change, which gives around 3200 accesses per second. I wonder how well a sorted collection of colors will perform (sorting by hash which would be lexicographical order). In this case Dictionaries perform slowly since there are a lot of colors that are very close to each other and so have similar red components and so hash collisions.
Then, what to do? It says that Color was implemented that way on purpose to save space. Is it ok to compute a slower hash? What slower hash? IMHO, it should be representative of the color in question more evenly through the bits of the hash.
What, if anything, should be done with the hash?
Andres.
Andres Valloud wrote:
To determine such indexes, Dictionaries get the size of the key Array and then take the remainder of dividing the hash of the key by that size. Then what happens?
Colors are just hashed by their red component.
And that's bad.
One might interleave the bits acccording to an rgbrgbrgbrgbrgbrgbrgbrgbrgbrgb sign SmallInteger_mark scheme; colors that are near in the RGB space will then also be near numerically.
The same principle applies when one wants to index according to multiple dimensions, e.g. in a geographical database.
Bit interleaving is an expensive operation. To improve speed without losing all interleaving one could, for example, interleave less bits in a blockwise manner: rrrrggggbbbbrrrrggggbbbbrrggbb sign SmallInteger_mark
If "nearness" of colors in RBG space doesn't matter (such as in hashing), we just move the most significant bits of the Blue and Green parts of the color into the most significant bits of the number, e.g. by XORing them into the most significant bit of the total SmallInteger object, like:
rrrrrrrr rrgggggg ggggbbbb bbbbbbsS | | | XOR -------+ | | | XOR ----------------+ | Hashcode result
This is not perfect (the Green and Blue bits are XORed into the high byte at a lower significance than one would like), but the hash key quality should be significantly better than when hashing just by the red pixels.
Regards, Joachim
squeak-dev@lists.squeakfoundation.org