Hi,
I've pushed a change to the BitBlt simulation code to the Inbox (VMMaker-tfel.358), because I didn't know where else to put it. With these changes, we are able to run a current 4.5 image with VMMaker loaded on our RSqueakVM, with BitBlt entirely run from within the image.
The goal is to have the VM run as many plugins as possible from pure Smalltalk, so there will be more slight changes and maybe the odd performance improvement for the simulation forthcoming. Is this something that would be ok with everyone?
Here's the diff:
BitBltSimulation>>loadColorMap: (changed) loadColorMap "ColorMap, if not nil, must be longWords, and 2^N long, where N = sourceDepth for 1, 2, 4, 8 bits, or N = 9, 12, or 15 (3, 4, 5 bits per color) for 16 or 32 bits." | cmSize oldStyle oop cmOop | <inline: true> cmFlags := cmMask := cmBitsPerColor := 0. cmShiftTable := nil. cmMaskTable := nil. cmLookupTable := nil. cmOop := interpreterProxy fetchPointer: BBColorMapIndex ofObject: bitBltOop. cmOop = interpreterProxy nilObject ifTrue: [^ true]. cmFlags := ColorMapPresent. "even if identity or somesuch - may be cleared later" oldStyle := false. (interpreterProxy isWords: cmOop) ifTrue: ["This is an old-style color map (indexed only, with implicit RGBA conversion)" cmSize := interpreterProxy slotSizeOf: cmOop. cmLookupTable := interpreterProxy firstIndexableField: cmOop. - oldStyle := true. - self - cCode: '' - inSmalltalk: [self assert: cmLookupTable unitSize = 4]] + oldStyle := true] ifFalse: ["A new-style color map (fully qualified)" ((interpreterProxy isPointers: cmOop) and: [(interpreterProxy slotSizeOf: cmOop) >= 3]) ifFalse: [^ false]. cmShiftTable := self loadColorMapShiftOrMaskFrom: (interpreterProxy fetchPointer: 0 ofObject: cmOop). cmMaskTable := self loadColorMapShiftOrMaskFrom: (interpreterProxy fetchPointer: 1 ofObject: cmOop). oop := interpreterProxy fetchPointer: 2 ofObject: cmOop. oop = interpreterProxy nilObject ifTrue: [cmSize := 0] ifFalse: [(interpreterProxy isWords: oop) ifFalse: [^ false]. cmSize := interpreterProxy slotSizeOf: oop. cmLookupTable := interpreterProxy firstIndexableField: oop]. cmFlags := cmFlags bitOr: ColorMapNewStyle. self cCode: '' inSmalltalk: [self assert: cmShiftTable unitSize = 4. self assert: cmMaskTable unitSize = 4. self assert: cmLookupTable unitSize = 4]]. (cmSize bitAnd: cmSize - 1) = 0 ifFalse: [^ false]. cmMask := cmSize - 1. cmBitsPerColor := 0. cmSize = 512 ifTrue: [cmBitsPerColor := 3]. cmSize = 4096 ifTrue: [cmBitsPerColor := 4]. cmSize = 32768 ifTrue: [cmBitsPerColor := 5]. cmSize = 0 ifTrue: [cmLookupTable := nil. cmMask := 0] ifFalse: [cmFlags := cmFlags bitOr: ColorMapIndexedPart]. oldStyle ifTrue: ["needs implicit conversion" self setupColorMasks]. "Check if colorMap is just identity mapping for RGBA parts" (self isIdentityMap: cmShiftTable with: cmMaskTable) ifTrue: [cmMaskTable := nil. cmShiftTable := nil] ifFalse: [cmFlags := cmFlags bitOr: ColorMapFixedPart]. ^ true
BitBltSimulator>>halftoneAt: (added) +halftoneAt: idx + ^ halftoneBase + (idx \ halftoneHeight * 4) long32At: 0
-- View this message in context: http://forum.world.st/VMMaker-tfel-358-in-Inbox-Fixes-for-BitBlt-simulation-... Sent from the Squeak VM mailing list archive at Nabble.com.
Hi Tim,
On Thu, Feb 12, 2015 at 8:55 AM, timfelgentreff timfelgentreff@gmail.com wrote:
Hi,
I've pushed a change to the BitBlt simulation code to the Inbox (VMMaker-tfel.358), because I didn't know where else to put it. With these changes, we are able to run a current 4.5 image with VMMaker loaded on our RSqueakVM, with BitBlt entirely run from within the image.
The goal is to have the VM run as many plugins as possible from pure Smalltalk, so there will be more slight changes and maybe the odd performance improvement for the simulation forthcoming. Is this something that would be ok with everyone?
It's certainly good for me; thanks. Since you're looking at BitBlt code let me try and rope you in to a problem I'm having with 64-bit Spur. Right now a number of tests fail because of byte-swapping of bits data, e.g. ShortIntegerArray, failing on 64-bit Spur. This is done with BitBlt. See ShortIntegerArray>>restoreEndianness. Apart from the fact that this is an absurd way to do things (*) it should work and right now doesn't. Would you be interested in taking a look at it and trying to figure out why? If you're interested you'll need a 64-bit linux for the real VM, and I'll put together a simulator image and a 64-bit test image for you to play with.
(*) more generally a) using 6 bitblt invocations instead of a single byte reversal primitive is...um, diplomatically, a waste of cycles, but more seriously, b) we're paying for needless byte reversals to keep things in big-endian format. Little endian has essentially won with most ARM deployments being little endian and x86 & x64 being little endian. SHouldn't we be looking to eliminate all this unnecessary overhead? It's in image segment load/store, sound processing, and its unnecessary.
Here's the diff:
BitBltSimulation>>loadColorMap: (changed) loadColorMap "ColorMap, if not nil, must be longWords, and 2^N long, where N = sourceDepth for 1, 2, 4, 8 bits, or N = 9, 12, or 15 (3, 4, 5 bits per color) for 16 or 32 bits." | cmSize oldStyle oop cmOop | <inline: true> cmFlags := cmMask := cmBitsPerColor := 0. cmShiftTable := nil. cmMaskTable := nil. cmLookupTable := nil. cmOop := interpreterProxy fetchPointer: BBColorMapIndex ofObject: bitBltOop. cmOop = interpreterProxy nilObject ifTrue: [^ true]. cmFlags := ColorMapPresent. "even if identity or somesuch - may be cleared later" oldStyle := false. (interpreterProxy isWords: cmOop) ifTrue: ["This is an old-style color map (indexed only, with implicit RGBA conversion)" cmSize := interpreterProxy slotSizeOf: cmOop. cmLookupTable := interpreterProxy firstIndexableField: cmOop.
oldStyle := true.
self
cCode: ''
inSmalltalk: [self assert: cmLookupTable
unitSize = 4]]
oldStyle := true] ifFalse: ["A new-style color map (fully qualified)" ((interpreterProxy isPointers: cmOop) and: [(interpreterProxy
slotSizeOf: cmOop) >= 3]) ifFalse: [^ false]. cmShiftTable := self
loadColorMapShiftOrMaskFrom: (interpreterProxy fetchPointer: 0 ofObject: cmOop). cmMaskTable := self
loadColorMapShiftOrMaskFrom: (interpreterProxy fetchPointer: 1 ofObject: cmOop). oop := interpreterProxy fetchPointer: 2 ofObject: cmOop. oop = interpreterProxy nilObject ifTrue: [cmSize := 0] ifFalse: [(interpreterProxy isWords: oop) ifFalse: [^ false]. cmSize := interpreterProxy slotSizeOf: oop. cmLookupTable := interpreterProxy firstIndexableField: oop]. cmFlags := cmFlags bitOr: ColorMapNewStyle. self cCode: '' inSmalltalk: [self assert: cmShiftTable unitSize = 4. self assert: cmMaskTable unitSize = 4. self assert: cmLookupTable unitSize = 4]]. (cmSize bitAnd: cmSize - 1) = 0 ifFalse: [^ false]. cmMask := cmSize - 1. cmBitsPerColor := 0. cmSize = 512 ifTrue: [cmBitsPerColor := 3]. cmSize = 4096 ifTrue: [cmBitsPerColor := 4]. cmSize = 32768 ifTrue: [cmBitsPerColor := 5]. cmSize = 0 ifTrue: [cmLookupTable := nil. cmMask := 0] ifFalse: [cmFlags := cmFlags bitOr: ColorMapIndexedPart]. oldStyle ifTrue: ["needs implicit conversion" self setupColorMasks]. "Check if colorMap is just identity mapping for RGBA parts" (self isIdentityMap: cmShiftTable with: cmMaskTable) ifTrue: [cmMaskTable := nil. cmShiftTable := nil] ifFalse: [cmFlags := cmFlags bitOr: ColorMapFixedPart]. ^ true
BitBltSimulator>>halftoneAt: (added) +halftoneAt: idx
^ halftoneBase + (idx \\ halftoneHeight * 4) long32At: 0
-- View this message in context: http://forum.world.st/VMMaker-tfel-358-in-Inbox-Fixes-for-BitBlt-simulation-... Sent from the Squeak VM mailing list archive at Nabble.com.
I'm interested in getting 64bit to run properly also for the RSqueakVM. I'll see if I can make time to investigate.
-- View this message in context: http://forum.world.st/VMMaker-tfel-358-in-Inbox-Fixes-for-BitBlt-simulation-... Sent from the Squeak VM mailing list archive at Nabble.com.
I've pushed another update to the BitBltSimulator to the inbox as VMMaker-tfel.359. This makes initialiseModule be called only once when we're simulating an entire image, which makes the simulation of various BitBlt operations aroung 200x faster for me on Cog, and around 500x faster on RSqueakVM. This is only in the Simulator class, so it won't affect the plugin. Can someone take a look and if it's ok move it to the VMMaker repository?
-- View this message in context: http://forum.world.st/VMMaker-tfel-358-in-Inbox-Fixes-for-BitBlt-simulation-... Sent from the Squeak VM mailing list archive at Nabble.com.
Hi Tim,
On Wed, Mar 11, 2015 at 10:47 AM, timfelgentreff timfelgentreff@gmail.com wrote:
I've pushed another update to the BitBltSimulator to the inbox as VMMaker-tfel.359. This makes initialiseModule be called only once when we're simulating an entire image, which makes the simulation of various BitBlt operations aroung 200x faster for me on Cog, and around 500x faster on RSqueakVM. This is only in the Simulator class, so it won't affect the plugin. Can someone take a look and if it's ok move it to the VMMaker repository?
Thanks! How does initialiseModule get called so often? I don't see how this happens in the Cog simulator. AFAICT initialiseModule gets called once when the plugin is loaded. What am I missing?
Also, could you explain the changes in VMMaker-tfel.358? What was the bug?
Great to have you on board!
-- View this message in context: http://forum.world.st/VMMaker-tfel-358-in-Inbox-Fixes-for-BitBlt-simulation-... Sent from the Squeak VM mailing list archive at Nabble.com.
Hi Eliot,
The bugs in 358 were:
for loadColorMap:, that the assertion was simply failing. Since in C, no assertion is generated, Tobias and I figured it may simply be not needed, and everything seemed to work without.
and halftoneAt: was simply missing, BitBltSimulator overrides dstLongAt: and srcLongAt:, but didn't include an override for halftoneAt:, and we ran into a debugger when trying to simulate. Adding this method fixes that.
Regarding how initaliseModule was called so often - we're using the code in the plugins a little unconventionally. We have a VM without the BitBlt plugin and when the named primitive comes up, we instead dispatch to BitBlt>>copyBitsSimulated, and then simulate _only_ the BitBlt part, not the entire image. But that entails creating a new InterpreterProxy and initalising it from current context, and and thus also creating a new instance of the BitBltSimulator. That's how initialiseModule ends up being called often. 359 doesn't remove those calls, it just caches those constant tables on the class side.
I've just pushed VMMaker-tfel.360 to the inbox, which adds methods so we can do the same with Balloon (BalloonEngine gains a #simulateBalloonPrimitive:args:). The idea is that we can run the VM without BitBlt and Balloon plugins and just run the Slang code (on RSqueakVM with the changes from 359, we get about 50% of the BitBlt performance running the Simulation compared to the C plugin)
-- View this message in context: http://forum.world.st/VMMaker-tfel-358-in-Inbox-Fixes-for-BitBlt-simulation-... Sent from the Squeak VM mailing list archive at Nabble.com.
vm-dev@lists.squeakfoundation.org