2009/8/7 Andreas Raab andreas.raab@gmx.de:
Eliot Miranda wrote:
The first incarnation of the Cog JIT is complete (for x86 only) and in use at Qwaq. We are gearing up for a new server release and the Cog VM is the Vm beneath it. The next client release will include it also. This VM has a naive code generator (every push or pop in the bytecode results in a push or pop in machine code) but good inline cacheing. Performance is as high as 5x the current interpreter for certain computer-language-shootout benchmarks. The naive code generator means there is poor loop performance (1 to: n do: ... style code can be 4 times slower than VisualWorks) and the object model means there is no machine code instance creation and no machine code at:put: primitive. But send performance is good and block activation almost as fast as VisualWorks. In our real-world experience we were last week able to run almost three times as many Qwaq Forums clients against a QF server running on the Cog VM than we were able to above the interpreters. So the Cog JIT is providing significant speedups in real-world use.
Indeed. Here some numbers that I took earlier this year:
VM version bc/sec sends/sec Macro1 Macro2 Macro5 Total Closure(3.11.2) 198,295,894 5,801,773 3124ms 79333ms 9935ms 92411ms Stack (2.0.10) 178,521,617 8,141,165 2136ms 43081ms 6874ms 52117ms
it was always confusing to me, how it is possible to have higher send rate & lower bytecode execution rate at the same time. The way how tinybenchmark calculating it is tricky one.
Cog (current) 199,221,789 17,509,420 982ms 29392ms 4053ms 34445ms Stack vs. Closure 0.9 1.4 1.46 1.84 1.45 1.77 Cog vs. Stack 1.12 2.16 2.17 1.46 1.69 1.51 Cog vs. Closure 1.0 3.0 3.18 2.7 2.45 2.68
As a total improvement in performance Cog ranks at approx. 2.7x faster in macro benchmarks than what we started from. That's a pretty decent bit of speedup for real-world applications.
Compare this (for example) with j3 [1] which despite a speedup of 6x in microbenchmarks only provided a 2x speedup in the macros.
[1] http://aspn.activestate.com/ASPN/Mail/Message/squeak-list/2369033:
"Of course, that was 2001. Revisiting the benchmarks is kind of interesting...
Interp: '43805612 bytecodes/sec; 1325959 sends/sec' J3: '135665076 bytecodes/sec; 8100691 sends/sec'
Today: (PowerBookG4 1.5GHz), interp:
'114387846 bytecodes/sec; 5152891 sends/sec'
But the mircoBenchmarks don't tell the whole story: Even with a speedup of factor 6 in sends, we only saw the performance doubled on real world benchmarks (e.g. the MacroBenchmarks)."
Cheers, - Andreas