ENH][VM] Improved code generation (hopefully ;)

7 Jul 2003


      ...
From:   "Andreas Raab" < andreas.raab@g... >
Hi Guys,
I was always suspicious about the way CCodeGenerator handled  
#interpret with
respect to temps (e.g., inlining all temps into interpret and randomly
renaming them t1 ... tN) as it completely spoils life-time analysis  
for the
C compiler (which has to assume that temps may be read in other code
branches and may even "optimize" them into wasting unneeded registers  
across
code branches).
First I was using
Change Set:		CGeneratorEnhancements-ajh
Date:			12 February 2002
Author:			Anthony Hannan (ajh18@cornell.edu)
which localized the variables in interpret(), but your change set is a  
cleaner solution.
I downloaded and setup a new image with SM & loaded the latest VMMaker  
(or so I thing/thought/believe).
Ran into some issues with the version of VMMaker you used and the  
current one.
Tim and you can sort out what's happening.
TMethod lost an instance variable globalStructureBuildMethodHasFoo and  
an overwrote a change in
TMethod>>setSelector: args: locals: block: primitive:
These two I'm unsure about who's at fault.
a) Interpreter lost the class variable BlockMethodIndex
b) and the method isUnwindMarked: is missing
{Isn't that the block closure stuff?}
Also the two variables in interpret()
localReturnContext & localReturnValue end up with no declaration.
Well now because I was using Hannan changeset in earlier work, since  
3.2.7b1, the difference is too small/difficult to measure.
For the GCC flavor I don't think there was any difference in the code  
size. (40 bytes smaller for the entire VM, but I was missing the  
UnwindMarked method, so I think that accounts for the 40 bytes).
For CodeWarrior OS9 there was a 46 byte difference for the interpret()  
function but any improvement is
lost in measurement noise.  In the past the reason I used Hannan  
changeset because it was obvious that codewarrior
just gave up doing any useful local variable analyses and stuck the  
first couple of vars into registers and was stupid...
Also this made great improvements in how the 68K version worked with  
GCC on OpenBSD 3.x
From a note of mine to the list on April 9th, 2002 talking about this:
...
on a 68k BSD box with GCC the new numbers are {Hannan changeset }
1,614,205 bytecodes/sec and 57,652 sends/sec
versus my previous one using the jumptable modification
1,550,387 bytecodes/sec and 55,080 sends/sec
versus what I started with
1,439,884 bytecodes/sec and 51,098 sends/sec
So yes the change is good.
----------
PS Another topic  In my measurements of the macrobenchmark I see
55.9% is interpret()
4.5% is sweepPhase
5.0% is markPhase
3.0% UpdatePointers (spelling?)
0.9% is incCompMove
Thus 10% lurks in the mark/sweep phase of the GC.
Fidding with ObjectMemory>>startField can be measured in the  
tinybenchmarks.
I'm considering check for type 0, else type = 2, otherwise it's a small  
Integer.  That becomes a load with set condition, a branch
on condition, a check against 2 and a branch on condition. This  
improves macrobenchbenchmark by 2%, but degrades then tinybenchmark  
because of the integers it creates. MMM a case statement! might be  
useful here...
--
======================================================================== 
===
John M. McIntosh johnmci@smalltalkconsulting.com 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
======================================================================== 
===