I was doing some work in Squeak 3.0 at home, and found that (for the particular thing I was doing) most of the time was going in String>>hash.
Looking in Squeak 3.2, I see that String>>hash has been replaced by code that calls a primitive. It has made an improvement to the time, no two ways about that.
It hasn't made as much of an improvement as I expected.
Number of instances of String : 37,074 The sum of #size of each: 996,905
Time to hash all of them: 3.648 sec Average time to hash one: 98.5 usec
Time to hash all of them in C: 0.078 sec Average time to hash one in C: 2.1 usec
Measuring the time in Squeak:
s := String allInstances. t0 := [s do: [:each | each]] timeToRun. t1 := [s do: [:each | each hash]] timeToRun. (t1 - t0) * 1.0e-3 "Print It" (t1 - t0) * 1.0e3 / s size "Print It"
Measuring the time in C:
f := StandardFileStream newFileNamed: 'strpool.dat'. s do: [:each | f print: each size; space; nextPutAll: each]. f close.
Then a C program loads the strings, hashes them all 400 times, and writes out the times.
Controlling for hardware:
Squeak and the C program were both run on the same machine.
How do I know the C code was doing the same thing as the Squeak hash code?
String>>hash ^String stringHash: self initialHash: self species hash
String class>>stringHash: aString initialHash: speciesHash |stringSize hash low| <primitive: 'primitiveStringHash' module: 'MiscPrimitivePlugin'> self var: #aHash declareC: 'int speciesHash'. self var: #aString declareC: 'unsigned char *aString'. stringSize := aString size. hash := speciesHash bitAnd: 16rFFFFFFF. 1 to: stringSize do: [:pos | hash := hash + (aString at: pos) asciiValue. "Begin hashMultiply" low := hash bitAnd: 16383. hash := (16r260D * low + ((16r260D * (hash bitShift: -14) + (16r0065 * low) bitAnd: 16383)) * 16384)) bitAnd: 16r0FFFFFFF. ]. ^hash
I turned that into C, that's how. I don't actually understand the rules for Slang; I presume that (aString at: pos) asciiValue turns into aString[pos] (because at any rate that's the number that always used to be used).
So if #stringHash:initialHash: is compiled into C, how come C that does the same thing is running about 47 times faster?
Where is the time going?
What have I misunderstood?