On Wed, Dec 02, 2009 at 05:28:33PM -0800, John M McIntosh wrote:
So you sit there smug about the fact you built a 64bit VM, likely for hosting on your 64bit Linux OS. {Or the unix one for Darwin, or that new fangled cocoa one}
However it's possible that it's running 1/3 the performance of the 32bit VM. Did you check? Thought not...
So let's talk.
Are you using the gnuifed version of interp.c? If you don't know, well go check. Are you using GCC 4.1 or higher?
The interpreter loop is highly tuned monster that suffers from compiler optimization issues. With careful tuning parms as found in the macintosh xcode build project for the carbon VM using gcc 4.0 you'll get the most optimum performance.
GCC 4.2+ ?
Michael Rueger and I spent a few days attempting to get good performance out of GCC 4.2 WITHOUT success. I think that can account for at least a 33% slowdown.
So where does the other 33% slowdown come from?
Well when we compile the VM in 64bit to use a 32bit image each reference to an oops requires us to add a 64bit memory start address to the 32bit oops number to resolve to a 64bit memory address. Unfortunately GCC 4.2 growls, and produces the lousiest code possible to do this. Maybe higher versions of GCC are better? Anyone care to test?
So some solutions.
(a) Ensure the squeak oops memory block loads within the 0-4GB address space. See pagezero size for Darwin. Then alter the logic a bit so that sqMemoryBase is zero and that the squeak memory accessors don't do the add of sqMemoryBase=0 to the oops address. Although you might have to use GCC 4.2 you'll run 100% faster.
(b) Use the (non-free) Intel compiler
Hi John,
I get very different results, but they certainly support your observation that newer GCC compilers are a problem.
If I compare a 64-bit VM built on my computer to a 32-bit VM downloaded from Ian's site, running both on the same hardware and OS (AMD Turion, 64 bit Linux), the 64-bit VM is running about twice as fast as the 32-bit VM.
In the past (over several years), I have never measured this carefully, but I have the general impression that 64-bit and 32-bit VMs run at similar speeds on my hardware and OS (I guess I should figure out how to compile in 32-bit mode so I can really find out).
I would guess that the difference I am seeing now is due to compiler version. Ian's VM was compiled with gcc 4.3.3 and I am using an older gcc 4.1.2 compiler.
For the record, here are the results I got (copied from CommandShell windows in a Squeak trunk image).
For a 64-bit VM that I compiled locally, installed in /usr/local:
$ cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 36 model name : AMD Turion(tm) 64 Mobile Technology ML-34 stepping : 2 cpu MHz : 1600.000 cache size : 1024 KB fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow up pni lahf_lm bogomips : 3203.59 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc
$ cat /proc/version Linux version 2.6.18.2-34-default (geeko@buildhost) (gcc version 4.1.2 20061115 (prerelease) (SUSE Linux)) #1 SMP Mon Nov 27 11:46:27 UTC 2006 $ /usr/local/bin/squeak -version SQUEAK_ENCODING=UTF-8 SQUEAK_PATHENC=UTF-8 SQUEAK_PLUGINS=/usr/local/lib/squeak/3.11.9-2145 + exec /usr/local/lib/squeak/3.11.9-2145/squeakvm -version 3.11.9-2145 #1 XShm Thu Dec 3 10:54:44 EST 2009 gcc 4.1.2 Linux linux-6xfc 2.6.18.2-34-default #1 SMP Mon Nov 27 11:46:27 UTC 2006 x86_64 x86_64 x86_64 GNU/Linux plugin path: /usr/local/lib/squeak/3.11.9-2145 [default: /usr/local/lib/squeak/3.11.9-2145/] $ strings /usr/local/lib/squeak/3.11.9-2145/squeakvm | grep gcc gcc 4.1.2 $ 0 tinyBenchmarks 154031287 bytecodes/sec; 5145368 sends/sec $ 0 tinyBenchmarks 153201675 bytecodes/sec; 5183202 sends/sec $ 0 tinyBenchmarks 151658767 bytecodes/sec; 5268426 sends/sec $
For a 32-bit VM from Ian's site, running the same image from a local directory:
$ cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 36 model name : AMD Turion(tm) 64 Mobile Technology ML-34 stepping : 2 cpu MHz : 1600.000 cache size : 1024 KB fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow up pni lahf_lm bogomips : 3203.59 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc
$ cat /proc/version Linux version 2.6.18.2-34-default (geeko@buildhost) (gcc version 4.1.2 20061115 (prerelease) (SUSE Linux)) #1 SMP Mon Nov 27 11:46:27 UTC 2006 $ pwd /home/lewis/squeak/VMM-Ian/Squeak-3.11.3.2135-linux_i386/lib/squeak/3.11.3-2135 $ ls -l squeakvm -rwxr-xr-x 1 lewis users 2376017 2009-09-16 17:46 squeakvm $ strings squeakvm | grep gcc gcc 4.3.3 $ ./squeakvm -version 3.11.3-2135 #1 XShm Wed Sep 16 14:25:10 PDT 2009 gcc 4.3.3 Linux ubuntu 2.6.28-15-generic #49-Ubuntu SMP Tue Aug 18 18:40:08 UTC 2009 i686 GNU/Linux plugin path: /usr/local/lib/squeak/3.11.9-2145 [default: /home/lewis/squeak/VMM-Ian/Squeak-3.11.3.2135-linux_i386/lib/squeak/3.11.3-2135/] $ 0 tinyBenchmarks 62135922 bytecodes/sec; 3330746 sends/sec $ 0 tinyBenchmarks 62256809 bytecodes/sec; 3425013 sends/sec $ 0 tinyBenchmarks 62317429 bytecodes/sec; 3346096 sends/sec $
Dave
vm-dev@lists.squeakfoundation.org