Re: [Vm-dev] Cog status & FFI directions [was rearchitecting the FFI implementation for reentrancy]

7 Aug 2009


      2009/8/7 Andreas Raab andreas.raab@gmx.de:
...
Eliot Miranda wrote:
...
The first incarnation of the Cog JIT is complete (for x86 only) and in use
at Qwaq.  We are gearing up for a new server release and the Cog VM is the
Vm beneath it.  The next client release will include it also.  This VM has a
naive code generator (every push or pop in the bytecode results in a push or
pop in machine code) but good inline cacheing.  Performance is as high as 5x
the current interpreter for certain computer-language-shootout benchmarks.
 The naive code generator means there is poor loop performance (1 to: n do:
... style code can be 4 times slower than VisualWorks) and the object model
means there is no machine code instance creation and no machine code at:put:
primitive.  But send performance is good and block activation almost as fast
as VisualWorks.  In our real-world experience we were last week able to run
almost three times as many Qwaq Forums clients against a QF server running
on the Cog VM than we were able to above the interpreters.  So the Cog JIT
is providing significant speedups in real-world use.
Indeed. Here some numbers that I took earlier this year:
VM version           bc/sec  sends/sec  Macro1  Macro2  Macro5    Total
Closure(3.11.2) 198,295,894  5,801,773  3124ms  79333ms 9935ms  92411ms
Stack (2.0.10)  178,521,617  8,141,165  2136ms  43081ms 6874ms  52117ms
it was always confusing to me, how it is possible to have higher send
rate & lower bytecode execution rate at the same time.
The way how tinybenchmark calculating it is tricky one.
...
Cog (current)   199,221,789 17,509,420   982ms  29392ms 4053ms  34445ms
Stack vs. Closure      0.9        1.4     1.46     1.84   1.45     1.77
Cog vs. Stack          1.12       2.16    2.17     1.46   1.69     1.51
Cog vs. Closure        1.0        3.0     3.18     2.7    2.45     2.68
As a total improvement in performance Cog ranks at approx. 2.7x faster in
macro benchmarks than what we started from. That's a pretty decent bit of
speedup for real-world applications.
Compare this (for example) with j3 [1] which despite a speedup of 6x in
microbenchmarks only provided a 2x speedup in the macros.
[1] http://aspn.activestate.com/ASPN/Mail/Message/squeak-list/2369033:
"Of course, that was 2001. Revisiting the benchmarks is kind of
interesting...
Interp:     '43805612 bytecodes/sec; 1325959 sends/sec'
J3:         '135665076 bytecodes/sec; 8100691 sends/sec'
Today: (PowerBookG4 1.5GHz), interp:
'114387846 bytecodes/sec; 5152891 sends/sec'
But the mircoBenchmarks don't tell the whole story: Even with a speedup
of factor 6 in sends, we only saw the performance doubled on real world
benchmarks (e.g. the MacroBenchmarks)."
Cheers,
 - Andreas
-- 
Best regards,
Igor Stasenko AKA sig.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Vm-dev] Cog status & FFI directions [was rearchitecting the FFI implementation for reentrancy]