New subject: GCC optimization levels [was new Cog VMs available [please read]]

26 Sep 2011

      Hi All,
responding to Andrew here because this is generally of interest to the
vm-list.
On Mon, Sep 26, 2011 at 11:06 AM, Andrew Gaylard apg@4dst.com wrote:
...
Hmmm.  Thanks for the advice -- we now build with -O3, and all's well.
I've run the VM at full load (mostly compiling) for 30 hours without a
hiccup.  Interesting that -O2 is problematic, but -O3 isn't; I assumed
that higher optimisations would make things less stable, not more so.
And we get a 17% speed increase.
   My GCC is:
   $ gcc --version
   gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3

So this really surprises me since we see exactly the same thing with gcc
version 3.4.6 20060404 (Red Hat 3.4.6-3).  If we compile with -O1 or -O3 we
get functional Cog VMs, but -O2 crashes on start-up or soon there-after.
 I'm surprised that two very different versions of gcc show the same
behaviour but I guess I shouldn't be.  Some time some of us (me included)
could really do to put the effort into understanding what the issue is.  It
could be a gcc bug or it could be that we're generating C code with
ill-defined behaviour.  I have to say that I suspect the latter given how
different gcc 3.4.x and gcc 4.4.x are (BTW Andrew also sees the same issue
with gcc 4.1.x).
...

Andrew

On 2011.09.25 23:12:50 -0700, Eliot Miranda eliot.miranda@gmail.com
wrote:
...
On Sat, Sep 24, 2011 at 9:02 AM, Andrew Gaylard apg@4dst.com wrote:
...
Actually, it looks like I was wrong.  After rebuiding everything from
scratch, I've been unable to reproduce these crashes, except for the
one with unix-4.4.7.image.
Sorry for the false alarm.  r2495 looks pretty good, at both -O0 and
-O1.  It still crashes at -O2, but that's not a huge concern.
Which gcc are you using?  Here at Cadence on a much older 32-bit machine
using gcc 3.4.x we see crashes at -O2 but no crashes at -O0 -O1 & -O3 :)
...
On 2011.09.24 08:07:47 +0200, Andrew Gaylard apg@4dst.com wrote:
...
On 2011.09.23 13:26:06 -0700, Eliot Miranda <eliot.miranda@gmail.com
...
wrote:
...
...
Thank you, Andrew, you nailed it.  I've found the bug via your
stack
...
...
trace
...
...
below.  Huge relief.  Thanks!  New VMs and explanation to the list
soon.
...
Alas, we spoke too soon.  -2495 exhibits the same symptoms; traces
and
...
...
...
gdb transcripts are attached.

vm-*-2495.0.txt are from our basic.image, running the test-runner.
vm-*-2495.1.txt are from Squeak4.2-10966.image, running the

test-runner.
...

vm-*-2495.2.txt are from unix-4.4.7.image, having just started up

the
...
...
VM.
...
The first two of these appear to be the same problem I encountered
with -2493.  The backtraces certainly look very similar.
The third one is rather different. Looking at the stack trace, the
'rcvr' variable in ceSendsupertonumArgs is 17039140, which is
de-referenced in line 10733, causing a SEGV; the handler duly
confirms
...
...
...
the faulting address as si_addr = 0x103ff24:
  $ perl -e 'print 0x103ff24'
  17039140

-- 
best,
Eliot