Hello everyone,
When I run a Squeak image on the Raspberry Pie, the UI is much faster with the default VM present on the Raspberry Pie than with a VM compiled on the Cog or Pharo branch.
I heard that this is because Tim Rowledge changed BitBlt implementation in the Pie VM / Pie image, reimplementing it image-side and not VM-side, resulting in a faster BitBlt.
I have questions: - Is it true ? - Is the BitBlt code of Tim Rowledge open source ? If so, where is it and what is exactly its license ? - Would it make sense to port that or do something similar on Intel VM ? Would we see a performance gain / loss ?
I am asking it because last time I discussed with Bert, he said that it would be fun to have a smalltalk-implemented BitBlt combined with a JIT compiler doing automatic vectorization in order to have vector graphics implemented as Bit-based graphics. The more I think about it the more I think this makes sense. Even if it's in the far future (I'm stabilizing inlining and SmallInteger range optimizations first in the JIT), I would like to go in that direction.
Thanks for any answer.
Clement
Hi Clément it sounds like somebody explained things to you very badly.
On 05-02-2015, at 6:02 AM, Clément Bera bera.clement@gmail.com wrote:
Hello everyone,
When I run a Squeak image on the Raspberry Pie, the UI is much faster with the default VM present on the Raspberry Pie than with a VM compiled on the Cog or Pharo branch.
Depending on what exact version of Raspbian you have on your Pi, the default VM may well be a Cog/Stack vm. The most recent releases since (I think) mid-December have nuScratch and stackvm as the defaults. The older plain interpreter is also there in case we find problems.
If VMs you build are any slower than the default one, you have a problem in your build setup. I don’t do anything clever that isn’t already in the repository, and don’t know enough about makefile stuff to be *able* to do anything very clever.
I heard that this is because Tim Rowledge changed BitBlt implementation in the Pie VM / Pie image, reimplementing it image-side and not VM-side, resulting in a faster BitBlt.
Goodness me, that needs explaining. Firstly, I don’t get anywhere near all the credit. I did the specification and integration into the BitBltPlugin but the really clever stuff was done by Ben Avison over in Cambridge - the real Cambridge in the UK. It’s mostly *very* cleverly written ARM assembler with the interface done by perfectly normal Slang code in the plugin. I did do a JitBlt self-compiling ARM blitter 25 or so years ago but that was for monochrome screens (because that was all we had then) and ARM3 level cpus where there was no complicated futzing with nasty unix memory gibberish.
I have questions:
- Is it true ?
- Is the BitBlt code of Tim Rowledge open source ? If so, where is it and what is exactly its license ?
It’s not merely open source it’s *in the vm code repository* and has been for 18 months. If you build a stack vm on an ARM linux it gets included by default. Or at least, it should, though to be honest the autoconf/make stuff is sufficiently confusing that I’d never guarantee it will produce anything. There are also a couple of bitblt extensions to speed up pixel value testing and pixel-touches-pixel testing for sprite collisions.
- Would it make sense to port that or do something similar on Intel VM ? Would we see a performance gain / loss ?
I suspect it wouldn’t be worth the effort on a full desktop machine with fast memory busses and vast caches. You could certainly consider improving the algorithms in some parts of bitblt but I’m not sure it would really result in much faster blits. There may be opportunities to use the media related instructions that can do sort-of parallel processing for 32/16/8 bpp data (that’s effectively what Ben did for ARM v6 and might re-do for v7 with NEON later if I’m very lucky).
I am asking it because last time I discussed with Bert, he said that it would be fun to have a smalltalk-implemented BitBlt combined with a JIT compiler doing automatic vectorization in order to have vector graphics implemented as Bit-based graphics.
I must be misunderstanding. When discussing vectorization and bitblt one would normally be referring to using the parallel instructions I mentioned above.
Doing vector graphic operations is a quite different thing and I’d suggest a much more interesting project for most people. Having a Canvas class that can use vector graphics libraries such as Cairo could be a massive speed up in rendering the UI. Obviously the Smalltalk code would need to be written to be able to make use of it, but I think quite a lot is already in place. You only need to see some of the videos from VPRI showing the Nile graphics work to see how interesting it could be.
Implementing all the clever vector graphics stuff in terms of bitblts would be doable (of course) and some parts already exist.. but I think much better to hook up to the ferocious GPUs we have available these days. The hardest and probably slowest part is that a lot of them seem to want to only output directly to a screen, which rather gets in the way, requires strange configurations and copying bitmaps back to ‘our’ space to do more work. Clearly, we need a custom Squeak GPU. Who will offer me US$100m to fund the development? Anyone?
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- Not all his dogs are barking.
Hello Tim,
Thanks for your answers. Yeah I definitely had misunderstood what you did.
I thought my VM builds got slower on the Pie because of the lack of the fast BitBlt you did with Ben Avison. Seemingly this is not the reason so I'll check why, it's probably another issue (I'm very busy right now but in a few weeks).
I'm glad you contributed your code to the main branch. Anyone improving the Cog should do that so we share the improvements and it'll get better. I'm looking from time to time to your commits on the CogARMCompiler and I can't wait to have an ARM JIT too.
About vector graphics, I know many people are using a Cairo binding made by Igor in the Pharo community and they're quite happy about it. A guy (Ronie Salgado) also implemented some openCL support for GPU, but there are not that many users.
In any case, I was just wondering if a good JIT compiler could translate Bit based graphics to vector based graphics. But it looks like based on my previous discussion with Eliot and your work that it won't happen just out of the box but it requires quite some work.
Best.
2015-02-05 20:08 GMT+01:00 tim Rowledge tim@rowledge.org:
Hi Clément it sounds like somebody explained things to you very badly.
On 05-02-2015, at 6:02 AM, Clément Bera bera.clement@gmail.com wrote:
Hello everyone,
When I run a Squeak image on the Raspberry Pie, the UI is much faster
with the default VM present on the Raspberry Pie than with a VM compiled on the Cog or Pharo branch.
Depending on what exact version of Raspbian you have on your Pi, the default VM may well be a Cog/Stack vm. The most recent releases since (I think) mid-December have nuScratch and stackvm as the defaults. The older plain interpreter is also there in case we find problems.
If VMs you build are any slower than the default one, you have a problem in your build setup. I don’t do anything clever that isn’t already in the repository, and don’t know enough about makefile stuff to be *able* to do anything very clever.
I heard that this is because Tim Rowledge changed BitBlt implementation
in the Pie VM / Pie image, reimplementing it image-side and not VM-side, resulting in a faster BitBlt. Goodness me, that needs explaining. Firstly, I don’t get anywhere near all the credit. I did the specification and integration into the BitBltPlugin but the really clever stuff was done by Ben Avison over in Cambridge - the real Cambridge in the UK. It’s mostly *very* cleverly written ARM assembler with the interface done by perfectly normal Slang code in the plugin. I did do a JitBlt self-compiling ARM blitter 25 or so years ago but that was for monochrome screens (because that was all we had then) and ARM3 level cpus where there was no complicated futzing with nasty unix memory gibberish.
I have questions:
- Is it true ?
- Is the BitBlt code of Tim Rowledge open source ? If so, where is it
and what is exactly its license ?
It’s not merely open source it’s *in the vm code repository* and has been for 18 months. If you build a stack vm on an ARM linux it gets included by default. Or at least, it should, though to be honest the autoconf/make stuff is sufficiently confusing that I’d never guarantee it will produce anything. There are also a couple of bitblt extensions to speed up pixel value testing and pixel-touches-pixel testing for sprite collisions.
- Would it make sense to port that or do something similar on Intel VM ?
Would we see a performance gain / loss ?
I suspect it wouldn’t be worth the effort on a full desktop machine with fast memory busses and vast caches. You could certainly consider improving the algorithms in some parts of bitblt but I’m not sure it would really result in much faster blits. There may be opportunities to use the media related instructions that can do sort-of parallel processing for 32/16/8 bpp data (that’s effectively what Ben did for ARM v6 and might re-do for v7 with NEON later if I’m very lucky).
I am asking it because last time I discussed with Bert, he said that it
would be fun to have a smalltalk-implemented BitBlt combined with a JIT compiler doing automatic vectorization in order to have vector graphics implemented as Bit-based graphics.
I must be misunderstanding. When discussing vectorization and bitblt one would normally be referring to using the parallel instructions I mentioned above.
Doing vector graphic operations is a quite different thing and I’d suggest a much more interesting project for most people. Having a Canvas class that can use vector graphics libraries such as Cairo could be a massive speed up in rendering the UI. Obviously the Smalltalk code would need to be written to be able to make use of it, but I think quite a lot is already in place. You only need to see some of the videos from VPRI showing the Nile graphics work to see how interesting it could be.
Implementing all the clever vector graphics stuff in terms of bitblts would be doable (of course) and some parts already exist.. but I think much better to hook up to the ferocious GPUs we have available these days. The hardest and probably slowest part is that a lot of them seem to want to only output directly to a screen, which rather gets in the way, requires strange configurations and copying bitmaps back to ‘our’ space to do more work. Clearly, we need a custom Squeak GPU. Who will offer me US$100m to fund the development? Anyone?
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- Not all his dogs are barking.
On 05-02-2015, at 12:43 PM, Clément Bera bera.clement@gmail.com wrote:
Hello Tim,
Thanks for your answers. Yeah I definitely had misunderstood what you did.
That’s ok. I generally misunderstand what I did, too.
I thought my VM builds got slower on the Pie because of the lack of the fast BitBlt you did with Ben Avison. Seemingly this is not the reason so I'll check why, it's probably another issue (I'm very busy right now but in a few weeks).
A possible problem is the gcc optimiser doing weird stuff. I’m not a fan.
I'm glad you contributed your code to the main branch. Anyone improving the Cog should do that so we share the improvements and it'll get better. I'm looking from time to time to your commits on the CogARMCompiler and I can't wait to have an ARM JIT too.
Soon, young man, soon.
About vector graphics, I know many people are using a Cairo binding made by Igor in the Pharo community and they're quite happy about it. A guy (Ronie Salgado) also implemented some openCL support for GPU, but there are not that many users.
I use a very simple connection to Cairo/Pango in nuScratch on the Pi to render text, since it does mostly the right thing for NAAWIUT[1] script. It works pretty well and is a lot simpler than trying to do the whole job in bitblt…
In any case, I was just wondering if a good JIT compiler could translate Bit based graphics to vector based graphics.
Ah, now that sounds like the opposite I what I thought you said originally. Taking a bitblt and converting it to calls to a vector lib (like cairo?) would be interesting for some important cases, but I can’t help thinking that doing the job at the higher level makes a lot more sense. An easy case would be a bitblt that was about to fill a rectangular area with a simple pattern and a simple combination rule; sure, we could trap that and convert it to a GrungoLibv2.1a call to drawboxThingExtended(left, bottom, width, height-1, borderwidth, &pattern, &clut[x*3]) but why not make a Canvas that goes more directly? Canvases - for all that they frequently confuse and exasperate me when I use them - are a good way of abstracting out the intent of a drawing operation.
Actually this is possibly a good place to point to Gezira/Nile http://www.vpri.org/vp_wiki/index.php/Gezira
I want that on my Pi. A Pi 2 has 4 cores. Making use of them to render like that would be just lovely.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Who is General Failure and why is he reading my disk?
I have a very cursory understanding of the architecture direction of some of these, but here goes... I search around for Cario on RasPi and find not much. The in-image interface to Cario is Athens, which was designed to have changeable back ends. I see SDL is on RasPi and I though Ronie was sometime working on SDL for Pharo (I think more focused on I/O and sound that graphics)? Maybe later step would be Athens using SDL for its backend to accelerate things with OpenGL? cheers -ben
On Fri, Feb 6, 2015 at 5:17 AM, tim Rowledge tim@rowledge.org wrote:
On 05-02-2015, at 12:43 PM, Clément Bera bera.clement@gmail.com wrote:
Hello Tim,
Thanks for your answers. Yeah I definitely had misunderstood what you
did.
That’s ok. I generally misunderstand what I did, too.
I thought my VM builds got slower on the Pie because of the lack of the
fast BitBlt you did with Ben Avison. Seemingly this is not the reason so I'll check why, it's probably another issue (I'm very busy right now but in a few weeks).
A possible problem is the gcc optimiser doing weird stuff. I’m not a fan.
I'm glad you contributed your code to the main branch. Anyone improving
the Cog should do that so we share the improvements and it'll get better. I'm looking from time to time to your commits on the CogARMCompiler and I can't wait to have an ARM JIT too.
Soon, young man, soon.
About vector graphics, I know many people are using a Cairo binding made
by Igor in the Pharo community and they're quite happy about it. A guy (Ronie Salgado) also implemented some openCL support for GPU, but there are not that many users.
I use a very simple connection to Cairo/Pango in nuScratch on the Pi to render text, since it does mostly the right thing for NAAWIUT[1] script. It works pretty well and is a lot simpler than trying to do the whole job in bitblt…
In any case, I was just wondering if a good JIT compiler could translate
Bit based graphics to vector based graphics.
Ah, now that sounds like the opposite I what I thought you said originally. Taking a bitblt and converting it to calls to a vector lib (like cairo?) would be interesting for some important cases, but I can’t help thinking that doing the job at the higher level makes a lot more sense. An easy case would be a bitblt that was about to fill a rectangular area with a simple pattern and a simple combination rule; sure, we could trap that and convert it to a GrungoLibv2.1a call to drawboxThingExtended(left, bottom, width, height-1, borderwidth, &pattern, &clut[x*3]) but why not make a Canvas that goes more directly? Canvases - for all that they frequently confuse and exasperate me when I use them - are a good way of abstracting out the intent of a drawing operation.
Actually this is possibly a good place to point to Gezira/Nile http://www.vpri.org/vp_wiki/index.php/Gezira
I want that on my Pi. A Pi 2 has 4 cores. Making use of them to render like that would be just lovely.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Who is General Failure and why is he reading my disk?
Hi Clément,
You should check Cuis and Morphic 3. Cuis is a lightwheight fork of Squeak originally motivated by the desire of a very big simplification and refactoring of Morphic. This has evolved to the point where each morph defines its own local coordinates, all coordinates are potentially Float (allowing for subpixel precision graphics) and the canvas protocol has been evolved to be closer to vector graphics.
Standard Cuis is still running BitBlt. The other half of the Morphic 3 project is a new vector graphics engine fully written in Smalltalk / Slang, that was made into a plugin a few times. This engine can do subpixel precision vector graphics, with subpixel anti aliasing and the highest quality ever. It is based on prefiltering, not postfiltering, so it has no relation with any patented technique. There's no specific rasterizer for text. The general vector graphics rasteriser is soo good that it draws unhinted text better than most dedicated text libraries.
This is the way to go for Vector Graphics in Smalltalk.
Regards, Juan Vuletich
References: http://www.cuis-smalltalk.org https://dl.dropboxusercontent.com/u/13285702/Morphic3-TimesNewRomanSample.pn... (compare with http://blog.typekit.com/2013/05/01/adobe-contributes-cff-rasterizer-to-freet... ) http://jvuletich.org/pipermail/cuis_jvuletich.org/attachments/20140915/68e86... http://jvuletich.org/pipermail/cuis_jvuletich.org/2014-September/001692.html http://www.defensivepublications.org/publications/prefiltering-antialiasing-... http://www.jvuletich.org/Morphic3/Morphic3-201006.html http://www.jvuletich.org/Morphic3/Morphic3-200911.html https://dl.dropboxusercontent.com/u/13285702/Morphic3-Demo-2014-11-14.zip https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev/tree/master/Experimenta...
Quoting Clément Bera bera.clement@gmail.com:
....
About vector graphics, I know many people are using a Cairo binding made by Igor in the Pharo community and they're quite happy about it. A guy (Ronie Salgado) also implemented some openCL support for GPU, but there are not that many users.
....
On 05.02.2015, at 15:02, Clément Bera bera.clement@gmail.com wrote:
I am asking it because last time I discussed with Bert, he said that it would be fun to have a smalltalk-implemented BitBlt combined with a JIT compiler doing automatic vectorization in order to have vector graphics implemented as Bit-based graphics. The more I think about it the more I think this makes sense. Even if it's in the far future (I'm stabilizing inlining and SmallInteger range optimizations first in the JIT), I would like to go in that direction.
I'm not exactly sure what we talked about (IIRC there was very nice beer involved) but with "vectorization" I guess I did not mean vector graphics, but using CPU vector instructions, which operate on multiple machine words at once.
About running the Smalltalk Slang code directly instead of transpiling the BitBlt plugin to C: that is what Lars Wassermann and Tim Felgentreff are doing in the RSqueak VM. And quite successfully, it's certainly fast enough to be usable.
- Bert -
On 09-02-2015, at 9:44 AM, Bert Freudenberg bert@freudenbergs.de wrote:
On 05.02.2015, at 15:02, Clément Bera bera.clement@gmail.com wrote:
I am asking it because last time I discussed with Bert, he said that it would be fun to have a smalltalk-implemented BitBlt combined with a JIT compiler doing automatic vectorization in order to have vector graphics implemented as Bit-based graphics. The more I think about it the more I think this makes sense. Even if it's in the far future (I'm stabilizing inlining and SmallInteger range optimizations first in the JIT), I would like to go in that direction.
I'm not exactly sure what we talked about (IIRC there was very nice beer involved) but with "vectorization" I guess I did not mean vector graphics, but using CPU vector instructions, which operate on multiple machine words at once.
That potentially makes sense; I did a very simple jitblt for the original ARM desktop machines and the Active Book waaaaaay back - ‘1990 or so. Given the very much more sophisticated translator stuff we now have access to I could see it being both easier to deal with and more useful. There are quite a few instruction sets one would have to investigate though. There’s probably 4 different sets of simd/vector/media extensions just in the ARM world. In x86 land there may be an uncountable infinity by now.
The BenBlt code we are using on the Pi is pretty much a cached set of compiled-to-ARM-simd cases. It certainly improves things, so it can be considered a proof by example of the value of the concpet.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim A paperless office has about as likely as a paperless bathroom.
vm-dev@lists.squeakfoundation.org