0 tinyBenchmarks '1,708,090,075 bytecodes/sec; 199,734,789 sends/sec'
[] bench '131,000,000 per second.'
On 29.08.2014, at 05:03, Chris Muller ma.chris.m@gmail.com wrote:
0 tinyBenchmarks '1,708,090,075 bytecodes/sec; 199,734,789 sends/sec'
Nice.
[] bench '131,000,000 per second.'
Hmm, this mostly measures the millisecondClockValue primitive.
How about we replace this
count := 0. endTime := Time millisecondClockValue + 5000. startTime := Time millisecondClockValue. [ Time millisecondClockValue > endTime ] whileFalse: [ self value. count := count + 1 ]. endTime := Time millisecondClockValue.
with
count := 0. repeat := true. [(Delay forSeconds: 5) wait. repeat := false] forkAt: Processor activePriority + 1. startTime := Time millisecondClockValue. [ self value. count := count + 1. repeat ] whileTrue. endTime := Time millisecondClockValue.
which on my machine makes it go from
'70,800,000 per second.'
to
'168,000,000 per second.'
- Bert -
0 tinyBenchmarks '1,708,090,075 bytecodes/sec; 199,734,789 sends/sec'
Nice.
[] bench '131,000,000 per second.'
Hmm, this mostly measures the millisecondClockValue primitive.
How about we replace this
count := 0. endTime := Time millisecondClockValue + 5000. startTime := Time millisecondClockValue. [ Time millisecondClockValue > endTime ] whileFalse: [ self value. count := count + 1 ]. endTime := Time millisecondClockValue.
with
count := 0. repeat := true. [(Delay forSeconds: 5) wait. repeat := false] forkAt: Processor activePriority + 1. startTime := Time millisecondClockValue. [ self value. count := count + 1. repeat ] whileTrue. endTime := Time millisecondClockValue.
which on my machine makes it go from
'70,800,000 per second.'
to
'168,000,000 per second.'
Wow, I didn't realize millisecondClockValue had that much of an impact! Yours is defiintely less-intrusive, we should update #bench..
Do you actually have persistent bench results? That sounds interesting. Please, do tell.
I"m curious how you would be able to keep a consistent baseline with improving hardware and VM's and even image performance improvements.
On Sun, Aug 31, 2014 at 7:30 PM, Ben Coman btc@openinworld.com wrote:
Chris Muller wrote:
0 tinyBenchmarks '1,708,090,075 bytecodes/sec; 199,734,789 sends/sec'
Nice.
[] bench '131,000,000 per second.'
Hmm, this mostly measures the millisecondClockValue primitive.
How about we replace this
count := 0. endTime := Time millisecondClockValue + 5000. startTime := Time millisecondClockValue. [ Time millisecondClockValue > endTime ] whileFalse: [ self value.
count := count + 1 ]. endTime := Time millisecondClockValue.
with
count := 0. repeat := true. [(Delay forSeconds: 5) wait. repeat := false] forkAt: Processor
activePriority + 1. startTime := Time millisecondClockValue. [ self value. count := count + 1. repeat ] whileTrue. endTime := Time millisecondClockValue.
which on my machine makes it go from
'70,800,000 per second.'
to
'168,000,000 per second.'
Wow, I didn't realize millisecondClockValue had that much of an impact! Yours is defiintely less-intrusive, we should update #bench..
Would you leave #bench as it is to avoid invalidating comparisons with previous results, and add some kind of #bench2 ? cheers -ben
On 01.09.2014, at 02:30, Ben Coman btc@openInWorld.com wrote:
Would you leave #bench as it is to avoid invalidating comparisons with previous results, and add some kind of #bench2 ? cheers -ben
I wouldn't think that's necessary. #bench itself is supposed to have a negligible impact on the numbers, so keeping it as low as possible seems appropriate.
There is an argument to be made that if this change impacts the numbers, then we're not measuring anything useful anyway. E.g. cost of the block activation is still in there for each iteration, so maybe it's not worth changing after all:
[3+4] bench ==> '150,000,000 per second.'
[1 to: 150000000 do: [:i | 3 + 4]] timeToRun ==> 386
... which suggests that the block activation has an almost 200% overhead in this case. But that is a fallacy in itself:
[1 to: 150000000 do: [:i | 3 + 4. 3 + 4]] timeToRun ==> 373
... which suggests that the iteration has a 3000% overhead. At least in current Cog, whereas an optimizing JIT might reduce the whole thing into a no-op.
Yes, micro benchmarks are pretty meaningless.
Optimizing #bench does not make them more meaningful, but since it reduces the measurement error, it might still be worth doing?
- Bert -
Hi guys!
While on the subject of tinyBenchmarks (toying with comparing to LuaJIT2), can someone explain a few things to me:
- Why do we take "500000 / <the-time-to-run-benchmark>" to mean bytecodes/sec? I presume its because someone made a count at some point that it takes 500000 bytecodes to find those primes? Is that still a correct estimation/presumption?
- Why is benchFib not a correct Fibonacci sequence? The implementation as it stands (seems to have been like this ever since 1998 when John Maloney (?) wrote it - I checked in a Squeak 2.5) is not a correct Fibonacci: #(1 1 3 5 9 15 25 41 67 109 177)
...while correct Fibonacci is (returning self, not 1, and not adding 1 in the recursion): #(0 1 1 2 3 5 8 13 21 34 55)
It almost seems like an odd optimization gone wrong - returning 1 instead of "self" when < 2 - and then trying to compansate for the fact that "0 benchFib" should actually be 0 - by adding 1 to the result, but missing the fact that this will add 1 on every recursive call?
I presume there is something smart going on here - that makes this count "sends" better this way?
And if we just want to count sends - isn't there a better way?
Come on Bert - enlighten me! :)
Curious.
regards, Göran
Hi Göran,
On Sep 1, 2014, at 6:06 AM, Göran Krampe goran@krampe.se wrote:
Hi guys!
While on the subject of tinyBenchmarks (toying with comparing to LuaJIT2), can someone explain a few things to me:
- Why do we take "500000 / <the-time-to-run-benchmark>" to mean bytecodes/sec? I presume its because someone made a count at some point that it takes 500000 bytecodes to find those primes? Is that still a correct estimation/presumption?
That's right. It's probably still close. One can count the actual number by simulating the expression using (IIRC) run:atEachStep: which is in the class side of ContextPart.
- Why is benchFib not a correct Fibonacci sequence?
BTW Fibonacci sequences have been generalized. See Lucas Numbers on Wikipedia. Ffor example the classic one is close to 2^N, but one which added the previous three results would be close to 3^N, etc ("tribonacci").
But the point of benchFib is that it adds one for each and every invocation whereas the classic one adds one for each leaf activation. Hence benchFib's result is the number if activations required to evaluate it and hence dividing the result by the time in seconds taken to compute it gives a rough measure of activations per second. This really should be in the comment.
The implementation as it stands (seems to have been like this ever since 1998 when John Maloney (?) wrote it - I checked in a Squeak 2.5) is not a correct Fibonacci: #(1 1 3 5 9 15 25 41 67 109 177)
...while correct Fibonacci is (returning self, not 1, and not adding 1 in the recursion): #(0 1 1 2 3 5 8 13 21 34 55)
It almost seems like an odd optimization gone wrong - returning 1 instead of "self" when < 2 - and then trying to compansate for the fact that "0 benchFib" should actually be 0 - by adding 1 to the result, but missing the fact that this will add 1 on every recursive call?
I presume there is something smart going on here - that makes this count "sends" better this way?
And if we just want to count sends - isn't there a better way?
Come on Bert - enlighten me! :)
Curious.
regards, Göran
Eliot (phone)
Hi Eliot!
On 09/01/2014 03:39 PM, Eliot Miranda wrote:
Hi Göran,
On Sep 1, 2014, at 6:06 AM, Göran Krampe goran@krampe.se wrote:
Hi guys!
While on the subject of tinyBenchmarks (toying with comparing to LuaJIT2), can someone explain a few things to me:
- Why do we take "500000 / <the-time-to-run-benchmark>" to mean
bytecodes/sec? I presume its because someone made a count at some point that it takes 500000 bytecodes to find those primes? Is that still a correct estimation/presumption?
That's right. It's probably still close. One can count the actual number by simulating the expression using (IIRC) run:atEachStep: which is in the class side of ContextPart.
Ah, good. So I am not entirely stupid. :)
- Why is benchFib not a correct Fibonacci sequence?
BTW Fibonacci sequences have been generalized. See Lucas Numbers on Wikipedia. Ffor example the classic one is close to 2^N, but one which added the previous three results would be close to 3^N, etc ("tribonacci").
But the point of benchFib is that it adds one for each and every invocation whereas the classic one adds one for each leaf activation. Hence benchFib's result is the number if activations required to evaluate it and hence dividing the result by the time in seconds taken to compute it gives a rough measure of activations per second. This really should be in the comment.
Ah, great! Thank you, I knew it must be something "smart" :)
regards, Göran
squeak-dev@lists.squeakfoundation.org