In depth 1, the resulting bits will always be 0.<br>
It's not a big problem because rgbMul is just a bitAnd operation at this depth.<br>
So a quick workaround would be to detect the case in BitBltSimulation
destDepth = 1 ifTrue: [^self bitAnd: sourceWord with: destinationWord].
That would also accelerate the Bit BLock Transfer operation, so it's a good hack.
But there is more. What we want is multiply ratios in interval [0,1].
dstRatio * srcRatio
Our implementation is scaled ratio (scaled by `1 << nBits - 1`):
src := (srcRatio * scale) rounded.
dst := (dstRatio * scale) rounded.
So what we want is:
((dst/scale) * (src/scale) * scale) rounded
that is:
(dst*src / (1<<nBits-1)) rounded
Unfortunately, that's the other grief with the current implementation used for rounding:
(dst+1)*(src+1) - 1 >> nBits
It only equals correctly rounded operation for depths 2 and 4.
For rounding we might use:
(((dst/scale) * (src/scale) + 0.5) * scale) truncated.
that is expressed with truncated division:
dst*src + (scale+1//2) // scale
So here is a nicer formulation for doing the job at any depth (including 5bits rgb channels for 16 bits depth) with correctly rounded division:
aux := src * dst + (1 << (nBits - 1)). "add mid-scale for rounding"
result := aux << (nBits - 1) + aux << (nBits -1). "divide by scale"
This is because instead of dividing by scale, we can multiply by shifted inverse (sort of double precision), then shift right.
(2 to: 32) allSatisfy: [:nBits | (1 << (nBits * 2) / (1 << nBits - 1)) rounded = (1 << nBits + 1)].
Multiplying by this inverse is easy and cheap:
x * (1 << nBits + 1) = (x << nBits + x).
And then applying the right shift `>> (2 * nBits)` is equivalent to:
x >> nBits + x >> nBits.
We must first add 0.5 (scaled), that is `src * dst + (1 << (nBits -1))` - our formulation of aux, and we're done.
We verify:
{
(0 to: 1<<20-1) allSatisfy: [:i | (1<<9+i)>>10+ (1<<9+i)>>10 = (i/1023) rounded].
(0 to: 1<<18-1) allSatisfy: [:i | (1<<8+i)>>9+ (1<<8+i)>>9 = (i/511) rounded].
(0 to: 1<<16-1) allSatisfy: [:i | (1<<7+i)>>8+ (1<<7+i)>>8 = (i/255) rounded].
(0 to: 1<<14-1) allSatisfy: [:i | (1<<6+i)>>7+ (1<<6+i)>>7 = (i/127) rounded].
(0 to: 1<<12-1) allSatisfy: [:i | (1<<5+i)>>6+ (1<<5+i)>>6 = (i/63) rounded].
(0 to: 1<<10-1) allSatisfy: [:i | (1<<4+i)>>5+ (1<<4+i)>>5 = (i/31) rounded].
(0 to: 1<<8-1) allSatisfy: [:i | (1<<3+i)>>4+ (1<<3+i)>>4 = (i/15) rounded].
(0 to: 1<<6-1) allSatisfy: [:i | (1<<2+i)>>3+ (1<<2+i)>>3 = (i/7) rounded].
(0 to: 1<<4-1) allSatisfy: [:i | (1<<1+i)>>2+ (1<<1+i)>>2 = (i/3) rounded].
} allSatisfy: #yourself.
The nice thing is that above down-scaling operation can be multiplexed.<b>
Suppose that we have p groups of 2*nBits `M` holding square-scale multiplication of each channel concatenated in a double-Word-Mul.
doubleWordMul = Mp .... M5 M3 M1
Note we arrange to have odd channels in low word, and even channels in high word.
We first form a `groupMask` on a word with (p+1)/2 groups of nBits alternating all one `i` and all zero `o`, `oioi...ioi`.<br>
channelMask := 1 << nBits - 1.
groupMask := 0.
0 to: wordBits // (2 * nBits) do: [:i |
groupMask = groupMask << (2 * nBits) + channelMask].
Where wordBits is the number of bits in a word (usually we want to operate on 32 bits words in BitBlt).
We form the `doubleGroupMask` on a double-word with p groups of 2*nBits `oi`:
doubleGroupMask := groupMask >> nBits.
doubleGroupMask := doubleGroupMask << wordBits + groupMask.
And we perform the division by scale:
doubleWordMul := (doubleWordMul >> nBits bitAnd: doubleGroupMask) + doubleWord >> nBits bitAnd: doubleGroupMask.
At this stage we obtain a double word containing scaled multiplicands interleaved with groups of nBits zeros:
o mp ... o m3 o m1
Now the final result can be obtained by shifting back:
doubleWordMul >> (wordBits - nBits) + (doubleWordMul bitAnd: groupMask)
The only problem remaining is how to obtain the squared-scale multiplicands. It would be easy to form the alternate even-odd channels for each src and dst operands:
doubleWordSrc := src >> nBits bitAnd: groupMask.
doubleWordSrc := doubleWordSrc << wordBits + (src bitAnd: groupMask).
doubleWordDst := dst >> nBits bitAnd: groupMask.
doubleWordDst := doubleWordDst << wordBits + (dst bitAnd: groupMask).
we now get `o sp ... o s3 o s1` and `0 dp ... o d3 o d1`, but we would now need a SIMD integer multiplication operating on groups of 2*nBits in parallel... We don't have that, at least in portable C code. So we still have to emulate it with a loop.
half := 1 << (nBits - 1).
shift := 0.
doubleWordMul := 0
0 to: nChannels - 1 do: [:i |
doubleWordMul := doubleWordMul + (((doubleWordSrc >> shift bitAnd: channelMask) * (doubleWordSrc >> shift bitAnd: channelMask) + half) << shift).
shift := shift + nBits + nBits].
We know that each operation cannot overflow on upper neighbour group of 2*nBits, because the maximum value is:
(1<<nBits-1) squared + (1 << (nBits-1)) = 1 << (2*nBits) - (2*(1<<nBits)) + (1 << (nBits-1)) - 1
< (1 << (2*nBits) - 1)
It remains the odd case of 16 bits depth, which has 3 groups of 5 bits and a leading zero.
I believe that above algorithm works without splitting in two half-words...
To be tested.
We have gathered the pieces for a correctly rounded almost-multiplexed rgbMul.<br>
Somehow have our cake and eat it too.
--
Reply to this email directly or view it on GitHub:
https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/651
You are receiving this because you are subscribed to this thread.
Message ID: <OpenSmalltalk/opensmalltalk-vm/issues/651(a)github.com>
Branch: refs/heads/virtend
Home: https://github.com/OpenSmalltalk/opensmalltalk-vm
Commit: 579d9ed8df21afa7c0b724635e30e6c4cf71a9ed
https://github.com/OpenSmalltalk/opensmalltalk-vm/commit/579d9ed8df21afa7c0…
Author: Eliot Miranda <eliot.miranda(a)gmail.com>
Date: 2023-02-26 (Sun, 26 Feb 2023)
Changed paths:
M platforms/win32/plugins/CameraPlugin/winCameraOps.cpp
Log Message:
-----------
Fix a regression in unbuffered modes of the win32 CameraPlugin introduced in
commit d51c1ac377067d3e469c111a15d85d4aaa187393
Author: Eliot Miranda <eliot.miranda(a)gmail.com>
Date: Mon Feb 13 16:53:35 2023 -0800
CogVM source as per VMMaker.oscog-eem.3306
Add support for mirroring a frame in the CameraPlugin.
Unimplemented on unix.
Branch: refs/heads/Cog
Home: https://github.com/OpenSmalltalk/opensmalltalk-vm
Commit: 7fc935470cc93eefb5457348460ff27a31f16a43
https://github.com/OpenSmalltalk/opensmalltalk-vm/commit/7fc935470cc93eefb5…
Author: Eliot Miranda <eliot.miranda(a)gmail.com>
Date: 2023-02-26 (Sun, 26 Feb 2023)
Changed paths:
M platforms/win32/plugins/CameraPlugin/winCameraOps.cpp
Log Message:
-----------
Fix a regression in unbuffered modes of the win32 CameraPlugin introduced in
commit d51c1ac377067d3e469c111a15d85d4aaa187393
Author: Eliot Miranda <eliot.miranda(a)gmail.com>
Date: Mon Feb 13 16:53:35 2023 -0800
CogVM source as per VMMaker.oscog-eem.3306
Add support for mirroring a frame in the CameraPlugin.
Unimplemented on unix.
Branch: refs/heads/virtend
Home: https://github.com/OpenSmalltalk/opensmalltalk-vm
Commit: 06f14e7c3cee25876ba2b672bf144ddecb44dbfd
https://github.com/OpenSmalltalk/opensmalltalk-vm/commit/06f14e7c3cee25876b…
Author: Eliot Miranda <eliot.miranda(a)gmail.com>
Date: 2023-02-23 (Thu, 23 Feb 2023)
Changed paths:
M src/spur32.cog.lowcode/cogit.h
M src/spur32.cog.lowcode/cogitARMv5.c
M src/spur32.cog.lowcode/cogitIA32.c
M src/spur32.cog/cogit.h
M src/spur32.cog/cogitARMv5.c
M src/spur32.cog/cogitIA32.c
M src/spur32.sista/cogit.h
M src/spur32.sista/cogitARMv5.c
M src/spur32.sista/cogitIA32.c
M src/spur64.cog.lowcode/cogit.h
M src/spur64.cog.lowcode/cogitARMv8.c
M src/spur64.cog.lowcode/cogitX64SysV.c
M src/spur64.cog.lowcode/cogitX64WIN64.c
M src/spur64.cog/cogit.h
M src/spur64.cog/cogitARMv8.c
M src/spur64.cog/cogitX64SysV.c
M src/spur64.cog/cogitX64WIN64.c
M src/spur64.sista/cogit.h
M src/spur64.sista/cogitARMv8.c
M src/spur64.sista/cogitX64SysV.c
M src/spur64.sista/cogitX64WIN64.c
M src/v3.cog/cogit.h
M src/v3.cog/cogitARMv5.c
M src/v3.cog/cogitIA32.c
Log Message:
-----------
CogVM source as per VMMaker.oscog-eem.3308
Oops! Fix 32-bit builds by providing the relevant null genPrimitiveHighResClock32
Branch: refs/heads/Cog
Home: https://github.com/OpenSmalltalk/opensmalltalk-vm
Commit: 5cb56813a7eb7585ccc7d7621d1cfa589eeaa523
https://github.com/OpenSmalltalk/opensmalltalk-vm/commit/5cb56813a7eb7585cc…
Author: Eliot Miranda <eliot.miranda(a)gmail.com>
Date: 2023-02-23 (Thu, 23 Feb 2023)
Changed paths:
M src/spur32.cog.lowcode/cogit.h
M src/spur32.cog.lowcode/cogitARMv5.c
M src/spur32.cog.lowcode/cogitIA32.c
M src/spur32.cog/cogit.h
M src/spur32.cog/cogitARMv5.c
M src/spur32.cog/cogitIA32.c
M src/spur32.sista/cogit.h
M src/spur32.sista/cogitARMv5.c
M src/spur32.sista/cogitIA32.c
M src/spur64.cog.lowcode/cogit.h
M src/spur64.cog.lowcode/cogitARMv8.c
M src/spur64.cog.lowcode/cogitX64SysV.c
M src/spur64.cog.lowcode/cogitX64WIN64.c
M src/spur64.cog/cogit.h
M src/spur64.cog/cogitARMv8.c
M src/spur64.cog/cogitX64SysV.c
M src/spur64.cog/cogitX64WIN64.c
M src/spur64.sista/cogit.h
M src/spur64.sista/cogitARMv8.c
M src/spur64.sista/cogitX64SysV.c
M src/spur64.sista/cogitX64WIN64.c
M src/v3.cog/cogit.h
M src/v3.cog/cogitARMv5.c
M src/v3.cog/cogitIA32.c
Log Message:
-----------
CogVM source as per VMMaker.oscog-eem.3308
Oops! Fix 32-bit builds by providing the relevant null genPrimitiveHighResClock32
Branch: refs/heads/virtend
Home: https://github.com/OpenSmalltalk/opensmalltalk-vm
Commit: bc49ac6581ea1621e9820127f6a00328c180a80b
https://github.com/OpenSmalltalk/opensmalltalk-vm/commit/bc49ac6581ea1621e9…
Author: Eliot Miranda <eliot.miranda(a)gmail.com>
Date: 2023-02-21 (Tue, 21 Feb 2023)
Changed paths:
M platforms/iOS/plugins/CameraPlugin/AVFoundationVideoGrabber.m
Log Message:
-----------
Oops! the ErrorCode parameter for the CameraPlugin is not needed/should not
be implemented (on macos).
Commit: 885b07470d762993a290888eabc847354eb16289
https://github.com/OpenSmalltalk/opensmalltalk-vm/commit/885b07470d762993a2…
Author: Eliot Miranda <eliot.miranda(a)gmail.com>
Date: 2023-02-22 (Wed, 22 Feb 2023)
Changed paths:
M src/plugins/UnixOSProcessPlugin/UnixOSProcessPlugin.c
M src/spur32.cog.lowcode/cogit.h
M src/spur32.cog.lowcode/cogitARMv5.c
M src/spur32.cog.lowcode/cogitIA32.c
M src/spur32.cog.lowcode/cointerp.c
M src/spur32.cog.lowcode/cointerp.h
M src/spur32.cog.lowcode/gcc3x-cointerp.c
M src/spur32.cog/cogit.h
M src/spur32.cog/cogitARMv5.c
M src/spur32.cog/cogitIA32.c
M src/spur32.cog/cointerp.c
M src/spur32.cog/cointerp.h
M src/spur32.cog/cointerpmt.c
M src/spur32.cog/cointerpmt.h
M src/spur32.cog/gcc3x-cointerp.c
M src/spur32.cog/gcc3x-cointerpmt.c
M src/spur32.sista/cogit.h
M src/spur32.sista/cogitARMv5.c
M src/spur32.sista/cogitIA32.c
M src/spur32.sista/cointerp.c
M src/spur32.sista/cointerp.h
M src/spur32.sista/gcc3x-cointerp.c
M src/spur32.stack.lowcode/gcc3x-interp.c
M src/spur32.stack.lowcode/interp.c
M src/spur32.stack/gcc3x-interp.c
M src/spur32.stack/interp.c
M src/spur32.stack/validImage.c
M src/spur64.cog.lowcode/cogit.h
M src/spur64.cog.lowcode/cogitARMv8.c
M src/spur64.cog.lowcode/cogitX64SysV.c
M src/spur64.cog.lowcode/cogitX64WIN64.c
M src/spur64.cog.lowcode/cointerp.c
M src/spur64.cog.lowcode/cointerp.h
M src/spur64.cog.lowcode/gcc3x-cointerp.c
M src/spur64.cog/cogit.h
M src/spur64.cog/cogitARMv8.c
M src/spur64.cog/cogitX64SysV.c
M src/spur64.cog/cogitX64WIN64.c
M src/spur64.cog/cointerp.c
M src/spur64.cog/cointerp.h
M src/spur64.cog/cointerpmt.c
M src/spur64.cog/cointerpmt.h
M src/spur64.cog/gcc3x-cointerp.c
M src/spur64.cog/gcc3x-cointerpmt.c
M src/spur64.sista/cogit.h
M src/spur64.sista/cogitARMv8.c
M src/spur64.sista/cogitX64SysV.c
M src/spur64.sista/cogitX64WIN64.c
M src/spur64.sista/cointerp.c
M src/spur64.sista/cointerp.h
M src/spur64.sista/gcc3x-cointerp.c
M src/spur64.stack.lowcode/gcc3x-interp.c
M src/spur64.stack.lowcode/interp.c
M src/spur64.stack/gcc3x-interp.c
M src/spur64.stack/interp.c
M src/spur64.stack/validImage.c
M src/v3.cog/cogit.h
M src/v3.cog/cogitARMv5.c
M src/v3.cog/cogitIA32.c
M src/v3.cog/cointerp.c
M src/v3.cog/cointerp.h
M src/v3.cog/gcc3x-cointerp.c
M src/v3.stack/gcc3x-interp.c
M src/v3.stack/interp.c
Log Message:
-----------
CogVM source as per VMMaker.oscog-eem.3307/VMConstruction-Plugins-OSProcessPlugin.oscog-eem.76
Implement a machine code primitive for primitiveHighResClock on 64-bit platforms.
This is faster than the fast-call interpreter primitive by 15% on x86_64 and by
480% on ARMv8 (Apple M1).
Have compilePrimitive use methodOrBlockNumArgs rather than argumentCountOf: methodObj.
Fix the type of semaIndices. This should be int. unsigned char risks corruption
if there are more than 256 external semaphores, which could be the case in a
very complex application.
Fix a few sends of stackIntegerValue: that were not followed by the obligatory
interpreterProxy failed check. I lost patience/time, but this really should be
done. In all VMs, proceeding after a primitive has failed without checking for
primitive failure can have disastrous effects (arguments are popped off the
stack that shouldn't be).
Compare: https://github.com/OpenSmalltalk/opensmalltalk-vm/compare/9ac04b3eab5d...88…
Branch: refs/heads/Cog
Home: https://github.com/OpenSmalltalk/opensmalltalk-vm
Commit: 75b01dac3ee745bab2867f1c56fb2790d57ceee2
https://github.com/OpenSmalltalk/opensmalltalk-vm/commit/75b01dac3ee745bab2…
Author: Eliot Miranda <eliot.miranda(a)gmail.com>
Date: 2023-02-22 (Wed, 22 Feb 2023)
Changed paths:
M src/plugins/UnixOSProcessPlugin/UnixOSProcessPlugin.c
M src/spur32.cog.lowcode/cogit.h
M src/spur32.cog.lowcode/cogitARMv5.c
M src/spur32.cog.lowcode/cogitIA32.c
M src/spur32.cog.lowcode/cointerp.c
M src/spur32.cog.lowcode/cointerp.h
M src/spur32.cog.lowcode/gcc3x-cointerp.c
M src/spur32.cog/cogit.h
M src/spur32.cog/cogitARMv5.c
M src/spur32.cog/cogitIA32.c
M src/spur32.cog/cointerp.c
M src/spur32.cog/cointerp.h
M src/spur32.cog/cointerpmt.c
M src/spur32.cog/cointerpmt.h
M src/spur32.cog/gcc3x-cointerp.c
M src/spur32.cog/gcc3x-cointerpmt.c
M src/spur32.sista/cogit.h
M src/spur32.sista/cogitARMv5.c
M src/spur32.sista/cogitIA32.c
M src/spur32.sista/cointerp.c
M src/spur32.sista/cointerp.h
M src/spur32.sista/gcc3x-cointerp.c
M src/spur32.stack.lowcode/gcc3x-interp.c
M src/spur32.stack.lowcode/interp.c
M src/spur32.stack/gcc3x-interp.c
M src/spur32.stack/interp.c
M src/spur32.stack/validImage.c
M src/spur64.cog.lowcode/cogit.h
M src/spur64.cog.lowcode/cogitARMv8.c
M src/spur64.cog.lowcode/cogitX64SysV.c
M src/spur64.cog.lowcode/cogitX64WIN64.c
M src/spur64.cog.lowcode/cointerp.c
M src/spur64.cog.lowcode/cointerp.h
M src/spur64.cog.lowcode/gcc3x-cointerp.c
M src/spur64.cog/cogit.h
M src/spur64.cog/cogitARMv8.c
M src/spur64.cog/cogitX64SysV.c
M src/spur64.cog/cogitX64WIN64.c
M src/spur64.cog/cointerp.c
M src/spur64.cog/cointerp.h
M src/spur64.cog/cointerpmt.c
M src/spur64.cog/cointerpmt.h
M src/spur64.cog/gcc3x-cointerp.c
M src/spur64.cog/gcc3x-cointerpmt.c
M src/spur64.sista/cogit.h
M src/spur64.sista/cogitARMv8.c
M src/spur64.sista/cogitX64SysV.c
M src/spur64.sista/cogitX64WIN64.c
M src/spur64.sista/cointerp.c
M src/spur64.sista/cointerp.h
M src/spur64.sista/gcc3x-cointerp.c
M src/spur64.stack.lowcode/gcc3x-interp.c
M src/spur64.stack.lowcode/interp.c
M src/spur64.stack/gcc3x-interp.c
M src/spur64.stack/interp.c
M src/spur64.stack/validImage.c
M src/v3.cog/cogit.h
M src/v3.cog/cogitARMv5.c
M src/v3.cog/cogitIA32.c
M src/v3.cog/cointerp.c
M src/v3.cog/cointerp.h
M src/v3.cog/gcc3x-cointerp.c
M src/v3.stack/gcc3x-interp.c
M src/v3.stack/interp.c
Log Message:
-----------
CogVM source as per VMMaker.oscog-eem.3307/VMConstruction-Plugins-OSProcessPlugin.oscog-eem.76
Implement a machine code primitive for primitiveHighResClock on 64-bit platforms.
This is faster than the fast-call interpreter primitive by 15% on x86_64 and by
480% on ARMv8 (Apple M1).
Have compilePrimitive use methodOrBlockNumArgs rather than argumentCountOf: methodObj.
Fix the type of semaIndices. This should be int. unsigned char risks corruption
if there are more than 256 external semaphores, which could be the case in a
very complex application.
Fix a few sends of stackIntegerValue: that were not followed by the obligatory
interpreterProxy failed check. I lost patience/time, but this really should be
done. In all VMs, proceeding after a primitive has failed without checking for
primitive failure can have disastrous effects (arguments are popped off the
stack that shouldn't be).