Hi All,

    responding to Andrew here because this is generally of interest to the vm-list.

On Mon, Sep 26, 2011 at 11:06 AM, Andrew Gaylard <apg@4dst.com> wrote:
Hmmm.  Thanks for the advice -- we now build with -O3, and all's well.
I've run the VM at full load (mostly compiling) for 30 hours without a
hiccup.  Interesting that -O2 is problematic, but -O3 isn't; I assumed
that higher optimisations would make things less stable, not more so.
And we get a 17% speed increase.

       My GCC is:
       $ gcc --version
       gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3

So this really surprises me since we see exactly the same thing with gcc version 3.4.6 20060404 (Red Hat 3.4.6-3).  If we compile with -O1 or -O3 we get functional Cog VMs, but -O2 crashes on start-up or soon there-after.  I'm surprised that two very different versions of gcc show the same behaviour but I guess I shouldn't be.  Some time some of us (me included) could really do to put the effort into understanding what the issue is.  It could be a gcc bug or it could be that we're generating C code with ill-defined behaviour.  I have to say that I suspect the latter given how different gcc 3.4.x and gcc 4.4.x are (BTW Andrew also sees the same issue with gcc 4.1.x).



- Andrew

On 2011.09.25 23:12:50 -0700, Eliot Miranda <eliot.miranda@gmail.com> wrote:
> On Sat, Sep 24, 2011 at 9:02 AM, Andrew Gaylard <apg@4dst.com> wrote:
>
> > Actually, it looks like I was wrong.  After rebuiding everything from
> > scratch, I've been unable to reproduce these crashes, except for the
> > one with unix-4.4.7.image.
> >
> > Sorry for the false alarm.  r2495 looks pretty good, at both -O0 and
> > -O1.  It still crashes at -O2, but that's not a huge concern.
> >
>
> Which gcc are you using?  Here at Cadence on a much older 32-bit machine
> using gcc 3.4.x we see crashes at -O2 but no crashes at -O0 -O1 & -O3 :)
>
>
> >
> >
> > On 2011.09.24 08:07:47 +0200, Andrew Gaylard <apg@4dst.com> wrote:
> > > On 2011.09.23 13:26:06 -0700, Eliot Miranda <eliot.miranda@gmail.com>
> > wrote:
> > > > Thank you, Andrew, you nailed it.  I've found the bug via your stack
> > trace
> > > > below.  Huge relief.  Thanks!  New VMs and explanation to the list
> > soon.
> > >
> > > Alas, we spoke too soon.  -2495 exhibits the same symptoms; traces and
> > > gdb transcripts are attached.
> > >
> > > - vm-*-2495.0.txt are from our basic.image, running the test-runner.
> > > - vm-*-2495.1.txt are from Squeak4.2-10966.image, running the
> > test-runner.
> > > - vm-*-2495.2.txt are from unix-4.4.7.image, having just started up the
> > VM.
> > >
> > > The first two of these appear to be the same problem I encountered
> > > with -2493.  The backtraces certainly look very similar.
> > >
> > > The third one is rather different. Looking at the stack trace, the
> > > 'rcvr' variable in ceSendsupertonumArgs is 17039140, which is
> > > de-referenced in line 10733, causing a SEGV; the handler duly confirms
> > > the faulting address as si_addr = 0x103ff24:
> > >
> > >       $ perl -e 'print 0x103ff24'
> > >       17039140



--
best,
Eliot