Andreas Raab wrote:
On 4/14/2010 1:08 PM, Juan Vuletich wrote:
Profiling is indeed your friend. There is some serious inefficiency there. Quickly hacking this (warning: will only work for 1bpp):
[... snip ...]
gives over 30x speed increase (from 10 seconds down to 310 mSec) on my system. This is not a solution, just some food for thought.
Heh, heh. Very good. But now I'm gonna get serious ...
<pokerface on>
I see your 30x improvement and raise you another ... 6x for a total of 200x speedup (from 10secs to 50 msecs). There! Take that! :-)
(but if Igor pulls out some asm I may have to fold :-)
<pokerface off>
Cheers,
- Andreas
So, let's play!
<pokerface on>
Mh... Hard challenge. Let's see. I take your technique, but save all your objects in new instance variables. And I go down from 67msecs down to 31 msecs. A 110% speed increase over yours!
Who wins now? :)
<pokerface off>
Actually the my smalltalk version is not that bad in many situations. For example, on 'plogo.png' taken from http://palmzlib.sourceforge.net/images/dir.html , and evaluating: (1 to: 100) collect: [ :i | Smalltalk garbageCollect. [ 1 timesRepeat: [ Form fromFileNamed: 'test.png' ]] timeToRun ]
Yours gives an array with: self max -> 117 self min -> 107 self average*1.0 -> 108.13
Yoshiki's gives: self max -> 118 self min -> 108 self average*1.0 -> 110.63
And mine gives: self max -> 109 self min -> 95 self average*1.0 -> 97.68
So, the bitblt technique only makes sense if we avoid creating the objects for each scanline, as this is expensive if scanlines are small. This variant gives: self max -> 103 self min -> 91 self average * 1.0 -> 95.16 The gains is not as big as in test.png, because this one has less scanlines. But it is still the winner.
Cheers, Juan Vuletich