I was doing some work in Squeak 3.0 at home, and found that (for the particular thing I was doing) most of the time was going in String>>hash.
Looking in Squeak 3.2, I see that String>>hash has been replaced by code that calls a primitive. It has made an improvement to the time, no two ways about that.
It hasn't made as much of an improvement as I expected.
Number of instances of String : 37,074 The sum of #size of each: 996,905
Time to hash all of them: 3.648 sec Average time to hash one: 98.5 usec
Time to hash all of them in C: 0.078 sec Average time to hash one in C: 2.1 usec
Measuring the time in Squeak:
s := String allInstances. t0 := [s do: [:each | each]] timeToRun. t1 := [s do: [:each | each hash]] timeToRun. (t1 - t0) * 1.0e-3 "Print It" (t1 - t0) * 1.0e3 / s size "Print It"
Measuring the time in C:
f := StandardFileStream newFileNamed: 'strpool.dat'. s do: [:each | f print: each size; space; nextPutAll: each]. f close.
Then a C program loads the strings, hashes them all 400 times, and writes out the times.
Controlling for hardware:
Squeak and the C program were both run on the same machine.
How do I know the C code was doing the same thing as the Squeak hash code?
String>>hash ^String stringHash: self initialHash: self species hash
String class>>stringHash: aString initialHash: speciesHash |stringSize hash low| <primitive: 'primitiveStringHash' module: 'MiscPrimitivePlugin'> self var: #aHash declareC: 'int speciesHash'. self var: #aString declareC: 'unsigned char *aString'. stringSize := aString size. hash := speciesHash bitAnd: 16rFFFFFFF. 1 to: stringSize do: [:pos | hash := hash + (aString at: pos) asciiValue. "Begin hashMultiply" low := hash bitAnd: 16383. hash := (16r260D * low + ((16r260D * (hash bitShift: -14) + (16r0065 * low) bitAnd: 16383)) * 16384)) bitAnd: 16r0FFFFFFF. ]. ^hash
I turned that into C, that's how. I don't actually understand the rules for Slang; I presume that (aString at: pos) asciiValue turns into aString[pos] (because at any rate that's the number that always used to be used).
So if #stringHash:initialHash: is compiled into C, how come C that does the same thing is running about 47 times faster?
Where is the time going?
What have I misunderstood?
"Richard A. O'Keefe" ok@cs.otago.ac.nz is claimed by the authorities to have written:
How do I know the C code was doing the same thing as the Squeak hash code?
[snip]
What have I misunderstood?
First likely possibility is that your copy of MiscPlugin doesn't have the prim - do you have an absolutely up to date vm?
Easy check - comment out the <prim.....> line, recompile and check times again.
You could of course use VMMakerTool (see instructions on the swiki) to regenerate the MiscPrimitivePlugin and check the C code to see what is going on. Time is also spent on the fetching and saving of results to the stack and it might possibly swamp the time to do the hash work. Shouldn't, but I've seen sillier situations.
tim
squeak-dev@lists.squeakfoundation.org