I noticed this a few times, and today I observed it a bit: After resuming my Win10 message from sleep, my two open Squeak.exe processes both made up each ~14% CPU load. I suspect this performance gap is caused by the VM because, in one image, a nearly empty Morphic world was open (two non-stepping windows only) and in the other one, a completely empty MVC world; both images were responsive; and according to Squeak's CPU watcher, only ~20% of time were spent in the UI process but 80% in idle. After a few minutes, both VM instances fall back to ~0.1% of total CPU usage.
Is there any chance to find an explanation for this behavior in the VM implementation, are there any hidden background operations (maybe certain WM_MESSAGES) that are triggered after resuming from sleep? Does anyone else notice similar behavior?
For sake of completeness, I should mention that my RAM and disk usage are chronically near maximum (90% of RAM in use, 25 GB on SSD free. Squeak images are so large 😐). But this does not change after a few minutes so I don't think it explains the observed slowdown completely.
VM build 202010232046; but I could already have experienced similar issues a few months ago.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/537
Hi all,
ideas for a faster SqueakJS JIT have been swirling around in my head for
many years. This weekend I took the time to write some of them down:
https://squeak.js.org/docs/jit.md.html
Feedback welcome!
Vanessa
> On 2021-03-24, at 6:15 PM, David T. Lewis <lewis(a)mail.msen.com> wrote:
>
> I am not certain of the status in the VM, but I think these links provide the background:
>
> https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/505
>
> http://source.squeak.org/VMMaker/VMMaker.oscog-nice.2909.diff
OK, passed them on.
Ben is clearly having a fun time with this. His test suite has 524288 tests! Right now he's down to just 6 failing, all relating to addWord/subWord stuff, specifically when the line length takes up less than a word.
Hell, rather than trying to edit I'll just include his comments -
> The next problem I hit appears to be a genuine bug in the existing code - specifically the function rgbComponentAlphawith() - one which has simply become more visible due to a buffer overrun fix in copyLoop().
>
> What's happening is that source and destination images are 8 bpp, but the source and destination positions differ in 32-bit word alignment. Yet 32-bit blocks of pixels are passed in and out of rgbComponentAlphawith(), aligned to words at the destination. Therefore in a case like the following:
>
> Word boundaries at source | | |
> Pixels X X X X X X X
> Word boundaries at dest | | | |
>
> previously, copyLoop() would read from the word following the last source pixel when constructing the final source word to pass to rgbComponentAlphawith() - but this is unsafe if the last source pixel is at the end of the image and the image ends at memory page alignment, because the following page might not be mapped in. The new version of copyLoop() correctly skips this load; however, it arbitrarily chooses to zero the relevant bits instead, and this triggers a shortcut in rgbComponentAlphawith() to be taken when it wasn't before.
>
> This shortcut leaves the destination word unchanged if the source word was all 0. At first glance this seems OK, since even at < 32bpp, the source pixels are made up of bitfields, and for a component alpha operation, each field is used as an alpha weight to blend a planar colour into the destination. The problem is in the way the blend calculation is done: it's (dest * (0xFF - alpha) + planar_source * alpha) >> 8. Thus when alpha == 0, it will subtract 1 from each colour component value. It was particularly noticeable in my test because it also utilised random gamma tables, so the off-by-one value was very stark.
>
> Ideally, the blending algorithm would be fixed so that an alpha of 0 leaves the destination unchanged. However, in all but a straight-through 1:1 gamma LUT, there will still be at least some incidences of a many-to-one mapping, so a round trip through gamma and ungamma could cause colour component changes even with an alpha of 0 and a better blending algorithm. So the simple solution is to remove the invalid shortcut in rgbComponentAlphawith(), and once you do so, the old and new builds produce the same result for this test. The change does impact the results of some other tests too, so I needed to recalculate the correct CRCs with this change in place.
>
> By now I was down to 190 failing tests out of 524288. Surveying the first 10, their combinationRules are all either addWord or subWord. Bearing the above in mind, I immediately suspected what the problem might be. Since pixels are stored left-to-right in most-significant-to-least-significant order, and carry/borrow propagate from less-significant to more-significant bits, that means that memory locations after the nominal end of the source row can impact the result. That means that the change that zeroes the remaining bits rather than loading them has the potential to change the results whenever word-relative alignment differs between source and destination.
>
> This time I tried backporting the buffer overflow fix to the old build (it's a good thing not to read off the end of the input buffer in any case) and recalculating the CRCs again.
>
> Now we were down to 6 failures out of 524288 tests. They were all addWord or subWord again, but the extra distinguishing feature this time was that the line lengths were very short - no more than 1 pixel for 16bpp, 3 pixels for 8bpp and (though there weren't enough examples to prove it) presumably no more than 7 pixels for 4bpp and so on.
>
> After a bit more digging, I think this change is responsible:
> https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/426
>
> in the sense that it also causes bits that are outside the positions that are masked into the destination to be zeroed when they weren't before, but in such a way that they affect whether or not a carry or borrow into bit positions that *do* matter happens.
>
> Unfortunately the patch from the resolution of that issue doesn't cleanly apply back onto the old source tree - it seems there were other intervening changes in how the "preload" flag was calculated. However, I'm reasonably confident that since there are now only 6 failures, they're probably all caused by this one remaining issue. It's very reassuring that I now get the same results from the new source tree irrespective of whether the 32-bit ARM assembly fast paths are enabled or not. So the easiest thing is to simply recalculate the CRCs again from the new source tree and use them as a baseline for the AArch64 conversion.
>
> One thing that occurs to me is that, given that there are some combinationRules that have this "leaking across pixel boundaries" behaviour (and maybe it might be deliberately added one day - say for a "blur" operation) it arguably should be the case that the source and destination words are masked to only include the bits relating to pixels included in the bounding box *before* calling the combinationRule as well as afterwards. For example, assuming an operation on a single 8bpp pixel:
>
> Words at source |AA AA AA AA|BB BB BB BB|
> Pixel **
> Words at destination |FF FF FF FF|
>
> In the old Squeak VM, addWordwith() would have been passed 0xAAAAAABB and 0xFFFFFFFF. New VMs skip the load of the second source word and pass 0xAAAAAA00 and 0xFFFFFFFF. What I'm saying is, perhaps it should be 0x00AA0000 and 0x00FF0000 instead. (Had this rule been in place already, none of the issues I hit today would have shown up.)
Anything anyone can think of that might help would be appreciated. Although the intent here is to make the ARM64 system as fast as possible for Pi benefit, it will have probably benefits for other machine because some of his improvements are just C code. Bug fixes should be passed up to all systems easily enough.
tim
--
tim Rowledge; tim(a)rowledge.org; http://www.rowledge.org/tim
At the end of the day, a cliché walks into a bar -- fresh as a daisy, cute as a button, and sharp as a tack.
Hi Bruce, all
> On 25. Mar 2021, at 10:32, Bruce O'Neel <bruce.oneel(a)pckswarms.ch> wrote:
>
> Hi,
>
> I've attached sqUnixMain.i from the command below.
Thanks, that helps.
The problem is here:
# 1 "/home/edoneel/tmp/opensmalltalk-vm/platforms/unix/vm/sqUnixMain.c" 2
# 36 "/home/edoneel/tmp/opensmalltalk-vm/platforms/unix/vm/sqUnixMain.c"
# 1 "/home/edoneel/tmp/opensmalltalk-vm/platforms/Cross/vm/sq.h" 1
# 12 "/home/edoneel/tmp/opensmalltalk-vm/platforms/Cross/vm/sq.h"
# 1 "/usr/include/stdio.h" 1 3 4
# 27 "/usr/include/stdio.h" 3 4
# 1 "/usr/include/x86_64-linux-gnu/bits/libc-header-start.h" 1 3 4
# 33 "/usr/include/x86_64-linux-gnu/bits/libc-header-start.h" 3 4
# 1 "/usr/include/features.h" 1 3 4
# 439 "/usr/include/features.h" 3 4
# 1 "/usr/include/stdc-predef.h" 1 3 4
This is due to aafcb78371c7e576073a8dbf2f1f32667568e05e, Where the include of "sqConfig.h" was demoted to _after_ stdio.h and such.
This seems to deal with an interdependency between EXPORT and sqPlatformSpecific.h
but I don't understand what that has to do with sqConfig.h
I see that _only_ win32 is doing a lot of stuff in sqConfig.h (via sqWin32.h) which _most probably_ would belong in sqPlatformSpecific.h in the first place.
Since Eliot strongly-worded to not change that, I wont commit a fix.
best
-tobias
>
> And yes it is in config.h
>
>
> #ifndef _GNU_SOURCE
> # define _GNU_SOURCE 1
> #endif
>
> Thanks.
>
> bruce
>
>
>
> 25 March 2021 10:14 Tobias Pape <Das.Linux(a)gmx.de> wrote:
> Hi Bruce and all
>
> > On 25. Mar 2021, at 09:43, Bruce O'Neel wrote:
> >
> > Hi,
> >
> > Yea that's right, but, I think that is not where the problem is.
> >
> > What seems to have happened is that someone got clever with which is included in the beginning of and which must have changed between different versions of Linux. So I'm on what is basically Ubuntu 20.4 and they have played around what __USE_GNU means.
>
>
> > Now it seams that the magic #define is _GNU_SOURCE.
>
> This is so quite a long time already.
> I just checked a system frozen in time in 2012, it is the same there.
> All __USE_* get undefined in features.h and only set with the respective _*_SOURCE options.
>
> The __USE_GNU dance in
> https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/4f6b75b56b314c20b184…
>
> is hence sadly ineffective.
>
>
> >
> > The easiest patch I found is below and attached above. I also attached features.h just so you can see the nightmare that has become.
> >
> > diff --git a/build.linux64x64/squeak.cog.spur/build/mvm b/build.linux64x64/squeak.cog.spur/build/mvm
> > index 29b710460..eb67f677e 100755
> > --- a/build.linux64x64/squeak.cog.spur/build/mvm
> > +++ b/build.linux64x64/squeak.cog.spur/build/mvm
> > @@ -34,7 +34,7 @@ test -f config.h || ../../../platforms/unix/config/configure --without-npsqueak
> > --with-src=spur64src \
> > TARGET_ARCH="-m64" \
> > CC=clang \
> > - CFLAGS="$CFLAGS" \
> > + CFLAGS="$CFLAGS -D_GNU_SOURCE=1" \
> > LIBS="$LIBS" \
> > LDFLAGS="$LDFLAGS"
> > rm -f vm/sqUnixMain.o # nuke version info
> >
> Configure should define that!
>
> AC_USE_SYSTEM_EXTENSIONS in
> https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/4f6b75b56b314c20b184…
> does exactly that: if GNU Extensions are avaialbe, define _GNU_SORUCE.
>
> This define should be in config.h
>
> This is the unconfigured file: https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/Cog/platforms/unix/c…
>
> Once you run ./configure, it _will_ define _GNU_SOURCE.
>
> However, it must be sur that config.h is included.
> This is done unconditionally in sqMemoryAccess.h and conditionally (HAVE_CONFIG_H) in sqConfig.h (which is pulled by sq.h)
> (HAVE_CONFIG_H is set as I see in your email from 2 days ago)
>
>
> Based on your command in your email, can yous please send the output of
>
> clang -Wall -g -O2 -DNDEBUG -DDEBUGVM=0 -msse2 -DCOGMTVM=0 -pthread -DLSB_FIRST=1 -m64 -Wno-missing-braces -Wno-unknown-pragmas -Wno-unused-value -Wno-unused-label -Wno-unused-function -Wno-unused-variable -DHAVE_CONFIG_H -DSQUEAK_BUILTIN_PLUGIN -I/home/edoneel/tmp/opensmalltalk-vm/build.linux64x64/squeak.cog.spur/build -I/home/edoneel/tmp/opensmalltalk-vm/build.linux64x64/squeak.cog.spur/build -I/home/edoneel/tmp/opensmalltalk-vm/platforms/unix/vm -I/home/edoneel/tmp/opensmalltalk-vm/platforms/Cross/vm -I/home/edoneel/tmp/opensmalltalk-vm/spur64src/vm -I/usr/local/include -I/home/edoneel/tmp/opensmalltalk-vm/platforms/Cross/vm -I/home/edoneel/tmp/opensmalltalk-vm/platforms/unix/vm -I/home/edoneel/tmp/opensmalltalk-vm/spur64src/vm -I/home/edoneel/tmp/opensmalltalk-vm/platforms/Cross/plugins/FilePlugin -I/home/edoneel/tmp/opensmalltalk-vm/platforms/unix/plugins/B3DAcceleratorPlugin -m64 -Wno-missing-braces -Wno-unknown-pragmas -Wno-unused-value -Wno-unused-label -Wno-unused-function -Wno-unused-variable -E -o sqUnixMain.i /home/edoneel/tmp/opensmalltalk-vm/platforms/unix/vm/sqUnixMain.c
>
> and the resulting sqUnixMain.i?
>
>
> Best regards
> -Tobias
>
>
> > I tried to play the same ifdef games currently with __USE_GNU in include_ucontext.h but I was not successful.
> >
> > This builds both with clang and with gcc on Ubuntu 20.4 on x86_64.
> >
> > cheers
> >
> > bruce
> >
> > 24 March 2021 20:58 "David T. Lewis" wrote:
> >
> > On Wed, Mar 24, 2021 at 11:54:55AM -0700, Eliot Miranda wrote:
> > >
> > > Hi David,
> > >
> > > On Mon, Mar 22, 2021 at 4:14 PM David T. Lewis wrote:
> > >
> > > >
> > > > On Mon, Mar 22, 2021 at 11:46:27AM +0100, Bruce O'Neel wrote:
> > > > >
> > > > >
> > > > > Hi,
> > > > >
> > > > > That fixed one problem I was tripping over on the ARMv8 build.
> > > > >
> > > > > There still seems to be a problem with the x86-64 build.????
> > > > >
> > > > > Thanks.
> > > > >
> > > > > bruce
> > > >
> > > > I see the same issue on my Linux x86-64 PC. I cannot spot the exact cause
> > > > of
> > > > the problem, but it seems to have been introduced in this commit:
> > > >
> > > > commit 6d74c4b652c7780110fe327f97e98aaef5242fbc
> > > > Author: Eliot Miranda
> > > > Date: Wed Feb 10 21:41:31 2021 -0800
> > > >
> > > > Revisions to core include files to get EXPORT defined correctly at the
> > > > relevant points.
> > > > Essentially move the default EXPORT definitions from sq.h to
> > > > sqMemoryAccess.h.
> > > > Include fbdev support in build.linux32ARMv6 builds and add -m32 to
> > > > their CFLAGS
> > > > to attempt to get the 32-bit VMs to be compilable on 64-bit ARM. [ci
> > > > skip]
> > > > cuz new cogit files will arrive soon.
> > > >
> > > >
> > > > The changes seem straightforward, but for some reason the header files must
> > > > not be getting included properly.
> > > >
> > > > Dave
> > > >
> > > > >
> > > > /home/edoneel/tmp/opensmalltalk-vm/platforms/unix/vm/sqUnixMain.c:926:35:
> > > > error: use of undeclared identifier 'REG_RBP'
> > > >
> > >
> > > If you have a look at the code here the file is trying to
> > > - find out what platform it is on
> > > - pull in the relevant version of ucontext.h containing the relevant
> > > defines for extracting register values from an interrupt context
> > > - print out the registers for a crash report
> > > So please look to
> > > - find out if platforms/unix/vm/include_ucontext.h is working correctly on
> > > this platform
> > > - find out what are the platform defines (-E -dM is your friend here)
> > > - find out how ucontext.h defines the registers in a ucontext
> > >
> >
> > Hi Eliot,
> >
> > I think that platforms/unix/vm/include_ucontext.h is working correctly.
> > Using -E -dM, I can confirm:
> >
> > #define __linux__ 1
> > #define __x86_64 1
> >
> > Also, include_ucontext.h was not modified in commit 924a24bb54870a621c5283f23631261d5a792000
> > which is where I think the issue originates(?).
> >
> > So there is something else going wrong with the includes for Linux, but
> > I am blind and cannot spot it.
> >
> > Dave
> >
> >
> >
>
>
>
> <sqUnixMain.i>
Branch: refs/heads/Cog
Home: https://github.com/OpenSmalltalk/opensmalltalk-vm
Commit: b4afeac5f51acafd24035e14dc370eb3954c57f1
https://github.com/OpenSmalltalk/opensmalltalk-vm/commit/b4afeac5f51acafd24…
Author: Tobias Pape <tobias(a)netshed.de>
Date: 2021-03-25 (Thu, 25 Mar 2021)
Changed paths:
M platforms/unix/vm/include_ucontext.h
Log Message:
-----------
Play the features.h dance
Since configure takes care that `_GNU_SOURCE` is there
whenever supported, _not_ having that means we cannot reasonably
ensure that we can access reg names, as defined via `__USE_GNU`.
The `features.h` makes sure that `__USE_GNU` is set if and only if
`_GNU_SOURCE` is set. So Bailing when not having it is the right choice.
Hi Bruce and all
> On 25. Mar 2021, at 09:43, Bruce O'Neel <bruce.oneel(a)pckswarms.ch> wrote:
>
> Hi,
>
> Yea that's right, but, I think that is not where the problem is.
>
> What seems to have happened is that someone got clever with <features.h> which is included in the beginning of <sys/ucontext.h> and which must have changed between different versions of Linux. So I'm on what is basically Ubuntu 20.4 and they have played around what __USE_GNU means.
> Now it seams that the magic #define is _GNU_SOURCE.
This is so quite a long time already.
I just checked a system frozen in time in 2012, it is the same there.
All __USE_* get undefined in features.h and only set with the respective _*_SOURCE options.
The __USE_GNU dance in
https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/4f6b75b56b314c20b184…
is hence sadly ineffective.
>
> The easiest patch I found is below and attached above. I also attached features.h just so you can see the nightmare that has become.
>
> diff --git a/build.linux64x64/squeak.cog.spur/build/mvm b/build.linux64x64/squeak.cog.spur/build/mvm
> index 29b710460..eb67f677e 100755
> --- a/build.linux64x64/squeak.cog.spur/build/mvm
> +++ b/build.linux64x64/squeak.cog.spur/build/mvm
> @@ -34,7 +34,7 @@ test -f config.h || ../../../platforms/unix/config/configure --without-npsqueak
> --with-src=spur64src \
> TARGET_ARCH="-m64" \
> CC=clang \
> - CFLAGS="$CFLAGS" \
> + CFLAGS="$CFLAGS -D_GNU_SOURCE=1" \
> LIBS="$LIBS" \
> LDFLAGS="$LDFLAGS"
> rm -f vm/sqUnixMain.o # nuke version info
>
Configure should define that!
AC_USE_SYSTEM_EXTENSIONS in
https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/4f6b75b56b314c20b184…
does exactly that: if GNU Extensions are avaialbe, define _GNU_SORUCE.
This define should be in config.h
This is the unconfigured file: https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/Cog/platforms/unix/c…
Once you run ./configure, it _will_ define _GNU_SOURCE.
However, it must be sur that config.h is included.
This is done unconditionally in sqMemoryAccess.h and conditionally (HAVE_CONFIG_H) in sqConfig.h (which is pulled by sq.h)
(HAVE_CONFIG_H is set as I see in your email from 2 days ago)
Based on your command in your email, can yous please send the output of
clang -Wall -g -O2 -DNDEBUG -DDEBUGVM=0 -msse2 -DCOGMTVM=0 -pthread -DLSB_FIRST=1 -m64 -Wno-missing-braces -Wno-unknown-pragmas -Wno-unused-value -Wno-unused-label -Wno-unused-function -Wno-unused-variable -DHAVE_CONFIG_H -DSQUEAK_BUILTIN_PLUGIN -I/home/edoneel/tmp/opensmalltalk-vm/build.linux64x64/squeak.cog.spur/build -I/home/edoneel/tmp/opensmalltalk-vm/build.linux64x64/squeak.cog.spur/build -I/home/edoneel/tmp/opensmalltalk-vm/platforms/unix/vm -I/home/edoneel/tmp/opensmalltalk-vm/platforms/Cross/vm -I/home/edoneel/tmp/opensmalltalk-vm/spur64src/vm -I/usr/local/include -I/home/edoneel/tmp/opensmalltalk-vm/platforms/Cross/vm -I/home/edoneel/tmp/opensmalltalk-vm/platforms/unix/vm -I/home/edoneel/tmp/opensmalltalk-vm/spur64src/vm -I/home/edoneel/tmp/opensmalltalk-vm/platforms/Cross/plugins/FilePlugin -I/home/edoneel/tmp/opensmalltalk-vm/platforms/unix/plugins/B3DAcceleratorPlugin -m64 -Wno-missing-braces -Wno-unknown-pragmas -Wno-unused-value -Wno-unused-label -Wno-unused-function -Wno-unused-variable -E -o sqUnixMain.i /home/edoneel/tmp/opensmalltalk-vm/platforms/unix/vm/sqUnixMain.c
and the resulting sqUnixMain.i?
Best regards
-Tobias
> I tried to play the same ifdef games currently with __USE_GNU in include_ucontext.h but I was not successful.
>
> This builds both with clang and with gcc on Ubuntu 20.4 on x86_64.
>
> cheers
>
> bruce
>
> 24 March 2021 20:58 "David T. Lewis" <lewis(a)mail.msen.com> wrote:
>
> On Wed, Mar 24, 2021 at 11:54:55AM -0700, Eliot Miranda wrote:
> >
> > Hi David,
> >
> > On Mon, Mar 22, 2021 at 4:14 PM David T. Lewis wrote:
> >
> > >
> > > On Mon, Mar 22, 2021 at 11:46:27AM +0100, Bruce O'Neel wrote:
> > > >
> > > >
> > > > Hi,
> > > >
> > > > That fixed one problem I was tripping over on the ARMv8 build.
> > > >
> > > > There still seems to be a problem with the x86-64 build.????
> > > >
> > > > Thanks.
> > > >
> > > > bruce
> > >
> > > I see the same issue on my Linux x86-64 PC. I cannot spot the exact cause
> > > of
> > > the problem, but it seems to have been introduced in this commit:
> > >
> > > commit 6d74c4b652c7780110fe327f97e98aaef5242fbc
> > > Author: Eliot Miranda
> > > Date: Wed Feb 10 21:41:31 2021 -0800
> > >
> > > Revisions to core include files to get EXPORT defined correctly at the
> > > relevant points.
> > > Essentially move the default EXPORT definitions from sq.h to
> > > sqMemoryAccess.h.
> > > Include fbdev support in build.linux32ARMv6 builds and add -m32 to
> > > their CFLAGS
> > > to attempt to get the 32-bit VMs to be compilable on 64-bit ARM. [ci
> > > skip]
> > > cuz new cogit files will arrive soon.
> > >
> > >
> > > The changes seem straightforward, but for some reason the header files must
> > > not be getting included properly.
> > >
> > > Dave
> > >
> > > >
> > > /home/edoneel/tmp/opensmalltalk-vm/platforms/unix/vm/sqUnixMain.c:926:35:
> > > error: use of undeclared identifier 'REG_RBP'
> > >
> >
> > If you have a look at the code here the file is trying to
> > - find out what platform it is on
> > - pull in the relevant version of ucontext.h containing the relevant
> > defines for extracting register values from an interrupt context
> > - print out the registers for a crash report
> > So please look to
> > - find out if platforms/unix/vm/include_ucontext.h is working correctly on
> > this platform
> > - find out what are the platform defines (-E -dM is your friend here)
> > - find out how ucontext.h defines the registers in a ucontext
> >
>
> Hi Eliot,
>
> I think that platforms/unix/vm/include_ucontext.h is working correctly.
> Using -E -dM, I can confirm:
>
> #define __linux__ 1
> #define __x86_64 1
>
> Also, include_ucontext.h was not modified in commit 924a24bb54870a621c5283f23631261d5a792000
> which is where I think the issue originates(?).
>
> So there is something else going wrong with the includes for Linux, but
> I am blind and cannot spot it.
>
> Dave
>
>
> <diff><features.h>