I don't know if this is of any general interest, but attached is a gprof output listing of a Squeak VM with the memory access routines from sqMemoryAccess.h recoded in Slang.
I now have the Slang inlining working so that it can fully inline all of these methods. With the Slang inlining activiated, performance is essentially identical to that of the normal macros in sqMemoryAccess.h. By turning the Slang inlining off, the functions are all called individually, which is what I used to generated the attached profile.
The host is 64-bit Linux AMD. The profile was run by opening a largish image, running "0 tinyBenchmarks" a half dozen times, and exiting the image without saving.
So far, the advantages of putting the memory access routines into the image as Slang seem to be: - You can step into the methods in a debugger - The methods can be profiled - Exposes type declaration problems previously hidden by the macros
Is anyone interested in this?
Dave
On Sun, Jul 20, 2008 at 11:49 PM, David T. Lewis lewis@mail.msen.com wrote:
I don't know if this is of any general interest, but attached is a gprof output listing of a Squeak VM with the memory access routines from sqMemoryAccess.h recoded in Slang.
I now have the Slang inlining working so that it can fully inline all of these methods. With the Slang inlining activiated, performance is essentially identical to that of the normal macros in sqMemoryAccess.h. By turning the Slang inlining off, the functions are all called individually, which is what I used to generated the attached profile.
The host is 64-bit Linux AMD. The profile was run by opening a largish image, running "0 tinyBenchmarks" a half dozen times, and exiting the image without saving.
So far, the advantages of putting the memory access routines into the image as Slang seem to be:
- You can step into the methods in a debugger
- The methods can be profiled
- Exposes type declaration problems previously hidden by the macros
Is anyone interested in this?
Dave
Yeah, I'm interested in this. I'm always interested in speeding up the interpreter.
What I find interesting is that there's no particular hot spot in the profile. I find that profiling typically reveals a single function taking up 50+% of the runtime, but that isn't the case here. There's a fairly even tail.
Even if we *doubled* the speed of interpret() and pointerForOop(), we'd still only gain a 20% speed-up.
So there's isn't any easy improvement, unless I misunderstand the data (which is quite possible!). To gain significant speed-ups, we'd have to make hundreds of micro-optimisations throughout the code-base, which would probably complicate the code (which is pretty clean at the moment).
Andrew
vm-dev@lists.squeakfoundation.org