On Sun, Feb 23, 2014 at 08:45:16AM -0800, Eliot Miranda wrote:
Hi David,
On Feb 23, 2014, at 8:22 AM, "David T. Lewis" lewis@mail.msen.com wrote:
On Sun, Feb 09, 2014 at 10:23:37AM -0800, tim Rowledge wrote:
On 09-02-2014, at 10:07 AM, David T. Lewis lewis@mail.msen.com wrote:
I think someone mentioned it earlier, but a very easy way to produce an intentionally slow VM is to generate the sources from VMMaker with the inlining step disabled. The slang inliner is extremely effective, and turning it off produces impressively sluggish results.
Does that actually work these days? Last I remember was that turning inlining off wouldn?t produce a buildable interp.c file. If someone has had the patience to make it work then I?m impressed.
You're right about one thing, it required a lot of patience ;-)
I did manage to get it working though, and the results are in VMMaker-dtl.342.
This turned out to be a useful exercise, as I flushed out a couple of type declaration bugs along the way.
The major issue was that the refactoring of object memory and interpreter into separate class hierarchies (which is a very good thing IMHO) requires the use of accessor methods, and this leads to name conflicts in the generated code if those accessor methods are not fully inlined.
I went with the approach of naming the accessors getFoo and setFoo: as well as, for the case of array access, fooAt: and fooAt:put:. This is not very pleasing from a readability point of view, but it is simple and it works.
If I compile a VM with inlining disabled and compiler optimization turned off, the result is about 1/8th the speed of the same interpreter VM built normally.
But more to the point, what's the speed with the same level of optimization as the normal VM?
I did not test this very carefully, but I saw this:
Normal interpreter VM: 0 tinyBenchmarks. '906194690 bytecodes/sec; 25262862 sends/sec' 0 tinyBenchmarks. '905393457 bytecodes/sec; 25413364 sends/sec' 0 tinyBenchmarks. '906997342 bytecodes/sec; 25786444 sends/sec'
No slang inlining, normal gcc optimization: 0 tinyBenchmarks. '452696728 bytecodes/sec; 15353518 sends/sec' 0 tinyBenchmarks. '459192825 bytecodes/sec; 15759973 sends/sec' 0 tinyBenchmarks. '458370635 bytecodes/sec; 15639770 sends/sec'
No slang inlining, no gcc optimization: 0 tinyBenchmarks. '205457463 bytecodes/sec; 7075541 sends/sec' 0 tinyBenchmarks. '206451612 bytecodes/sec; 7182476 sends/sec' 0 tinyBenchmarks. '206952303 bytecodes/sec; 7218843 sends/sec'
This is less of a difference than I expected for turning off the slang inlining. Either the gcc optimization has gotten better, or my memory has gotten worse, because I thought I remembered getting a bigger difference the last time I tried this (a long time ago).
I could slow the VM down quite a bit more if I use the MemoryAccess package. By itself, MemoryAccess will have no performance impact, but if you turn off slang inlining it should slow things down considerably. Perhaps that is what I am remembering from the earlier test. Unfortunately some bit rot has set in on MemoryAccess, so I'll have to fix that before I can confirm.
and does this affect the internalFoo inlining? Does this VM have everything that uses localSP & localIP inclined in interpret or are localSP & localIP no longer local to interpret?
There is no inlining in the interpret() loop, and the gnuification step is skipped. I believe that the localSP and localIP usage is uneffected, so yes they would still be local to interpret().
Dave