Anyone here interested in this crash? Is there a newer VM I should test with?
Thanks,
-Martin
-------- Forwarded Message -------- Subject: [Pharo-dev] Pharo 7 Repeatable crash: XIO: fatal IO error 14 (Bad address) on X server ":0" Date: Wed, 21 Feb 2018 10:37:18 -0800 From: Martin McClure martin@hand2mouse.com Reply-To: Pharo Development List pharo-dev@lists.pharo.org To: Pharo Development List pharo-dev@lists.pharo.org
On the current Linux32 Pharo 7.0, (https://get.pharo.org/70+vm as of 30 minutes ago) the VM crashes with the message
XIO: fatal IO error 14 (Bad address) on X server ":0"
whenever I try to enlarge the Pharo window by dragging the corner, and get to a size of about 2Mpixels. This is with a virgin image with no windows open inside the main window. A workaround seems to be to open a System Browser before resizing the main window.
This does not reproduce on Pharo 6.1 (https://get.pharo.org/).
This *does* reproduce when running the Pharo7 image on the Pharo 6.1 VM, so it may not be entirely a VM problem.
Most times, no crash dump is created, and nothing is logged to PharoDebug.log.
Occasionally, especially when rapidly changing the size of the window, it segfaults and logs a dump. This may or may not be the same problem.
This is less annoying since I found the workaround, but still seems to be something that should be fixed. Does anyone want further information from me in order to fix it?
Regards,
-Martin
Hi Martin,
On Thu, Feb 22, 2018 at 10:45 AM, Martin McClure martin@hand2mouse.com wrote:
Anyone here interested in this crash? Is there a newer VM I should test with?
Certainly try the most up-to-date Vm you can find. But this looks like some linux-specific, 32-bit specific bug, because no one else is reporting crashes like this. So what I would recommend is that you run from the command line under gdb and hence that you would be able to get a stack trace, and maybe even dig a little further. Such a crash should be due to something obvious, a null pointer, or a buffer overrun. And running under gdb should allow you to narrow in on the bug quite quickly. If the pharo vm is compiled with symbols then use the vm itself, otherwise build your own; you'll need symbols. If and when the bug is easy to reproduce you can switch to the debug vm (again you'll have to build it yourself; but builds these days are easy; checkout, cd, run a build script) and get more information.
HTH
Thanks,
-Martin
-------- Forwarded Message -------- Subject: [Pharo-dev] Pharo 7 Repeatable crash: XIO: fatal IO error 14 (Bad address) on X server ":0" Date: Wed, 21 Feb 2018 10:37:18 -0800 From: Martin McClure martin@hand2mouse.com martin@hand2mouse.com Reply-To: Pharo Development List pharo-dev@lists.pharo.org pharo-dev@lists.pharo.org To: Pharo Development List pharo-dev@lists.pharo.org pharo-dev@lists.pharo.org
On the current Linux32 Pharo 7.0, (https://get.pharo.org/70+vm as of 30 minutes ago) the VM crashes with the message
XIO: fatal IO error 14 (Bad address) on X server ":0"
whenever I try to enlarge the Pharo window by dragging the corner, and get to a size of about 2Mpixels. This is with a virgin image with no windows open inside the main window. A workaround seems to be to open a System Browser before resizing the main window.
This does not reproduce on Pharo 6.1 (https://get.pharo.org/).
This *does* reproduce when running the Pharo7 image on the Pharo 6.1 VM, so it may not be entirely a VM problem.
Most times, no crash dump is created, and nothing is logged to PharoDebug.log.
Occasionally, especially when rapidly changing the size of the window, it segfaults and logs a dump. This may or may not be the same problem.
This is less annoying since I found the workaround, but still seems to be something that should be fixed. Does anyone want further information from me in order to fix it?
Regards,
-Martin
On 02/24/2018 01:15 PM, Eliot Miranda wrote:
Hi Martin,
On Thu, Feb 22, 2018 at 10:45 AM, Martin McClure <martin@hand2mouse.com mailto:martin@hand2mouse.com> wrote:
Anyone here interested in this crash? Is there a newer VM I should test with?
Certainly try the most up-to-date Vm you can find. But this looks like some linux-specific, 32-bit specific bug, because no one else is reporting crashes like this. So what I would recommend is that you run from the command line under gdb and hence that you would be able to get a stack trace, and maybe even dig a little further. Such a crash should be due to something obvious, a null pointer, or a buffer overrun. And running under gdb should allow you to narrow in on the bug quite quickly. If the pharo vm is compiled with symbols then use the vm itself, otherwise build your own; you'll need symbols. If and when the bug is easy to reproduce you can switch to the debug vm (again you'll have to build it yourself; but builds these days are easy; checkout, cd, run a build script) and get more information.
Thanks for the hints.
I can reproduce the problem on the latest VM (pharo.cog.spur_linux32x86_201802232356.tar.gz).
The readily-reproducible problem isn't caught by GDB, the VM just exits out from under GDB after printing
XIO: fatal IO error 14 (Bad address) on X server ":0"
But once I did get a segv instead of the usual error and exit. Stack is below. I don't know when I'll have time to build a debug VM, and don't know whether it would help given that the reproducible problem isn't caught by GDB.
Any more hints on how to diagnose?
Regards,
-Martin
Reading symbols from ./pharo...done. (gdb) run ~/apps/Pharo7Builds/2018-02-21/scratch.image Starting program: /home/martin/Downloads/OpenSmalltalk/phcogspurlinuxht/lib/pharo/5.0-201802232356/pharo ~/apps/Pharo7Builds/2018-02-21/scratch.image [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". [New Thread 0xf7883b40 (LWP 7472)]
Thread 1 "pharo" received signal SIGSEGV, Segmentation fault. 0xf7e7df4f in __memcpy_sse2_unaligned () from /lib32/libc.so.6 (gdb) where #0 0xf7e7df4f in __memcpy_sse2_unaligned () from /lib32/libc.so.6 #1 0xf7985e01 in NoSwap () from /usr/lib32/libX11.so.6 #2 0xf79867e6 in PutSubImage () from /usr/lib32/libX11.so.6 #3 0xf7985f65 in PutSubImage () from /usr/lib32/libX11.so.6 #4 0xf7986d7a in XPutImage () from /usr/lib32/libX11.so.6 #5 0xf7fc7a40 in stXPutImage (h=<optimized out>, w=0x48c, dst_y=0x0, dst_x=0x0, src_y=0x0, src_x=0x0, image=0x81ed0a0, gc=0x81ea5d8, window=<optimized out>, display=<optimized out>) at /home/travis/build/OpenSmalltalk/opensmalltalk-vm/platforms/unix/vm-display-X11/sqUnixX11.c:5383 #6 display_ioShowDisplay (dispBitsIndex=0xc200010, width=0x490, height=0x71d, depth=0x20, affectedL=0x0, affectedR=0x48c, affectedT=0x0, affectedB=<optimized out>) at /home/travis/build/OpenSmalltalk/opensmalltalk-vm/platforms/unix/vm-display-X11/sqUnixX11.c:5758 #7 0xf7fc312c in redrawDisplay (b=<optimized out>, t=0x0, r=0x48c, l=<optimized out>) at /home/travis/build/OpenSmalltalk/opensmalltalk-vm/platforms/unix/vm-display-X11/sqUnixX11.c:1339 #8 handleEvent (evt=evt@entry=0xffff415c) at /home/travis/build/OpenSmalltalk/opensmalltalk-vm/platforms/unix/vm-display-X11/sqUnixX11.c:3873 #9 0xf7fc3ffd in handleEvents () at /home/travis/build/OpenSmalltalk/opensmalltalk-vm/platforms/unix/vm-display-X11/sqUnixX11.c:3956 #10 0xf7fc4a24 in xHandler (fd=fd@entry=0x3, data=0x0, flags=flags@entry=0x2) at /home/travis/build/OpenSmalltalk/opensmalltalk-vm/platforms/unix/vm-display-X11/sqUnixX11.c:3964 #11 0x080d23cf in aioPoll (microSeconds=microSeconds@entry=0x0) at /home/travis/build/OpenSmalltalk/opensmalltalk-vm/platforms/unix/vm/aio.c:292 #12 0x0805efbf in ioProcessEvents () at /home/travis/build/OpenSmalltalk/opensmalltalk-vm/platforms/unix/vm/sqUnixMain.c:652 #13 0x080958c5 in checkForEventsMayContextSwitch (mayContextSwitch=<optimized out>) at /home/travis/build/OpenSmalltalk/opensmalltalk-vm/spursrc/vm/gcc3x-cointerp.c:60739 #14 0x08096f9a in activateCoggedNewMethod (inInterpreter=inInterpreter@entry=0x0) at /home/travis/build/OpenSmalltalk/opensmalltalk-vm/spursrc/vm/gcc3x-cointerp.c:14059 #15 0x08097170 in executeNewMethod () at /home/travis/build/OpenSmalltalk/opensmalltalk-vm/spursrc/vm/gcc3x-cointerp.c:17329 #16 0x08098b84 in ceSendsupertonumArgs (selector=0x8f8a988, superNormalBar=0x1, rcvr=0x957d638, numArgs=0x0) at /home/travis/build/OpenSmalltalk/opensmalltalk-vm/spursrc/vm/gcc3x-cointerp.c:16371 #17 0x082002dc in ?? () #18 0x0829d7a4 in ?? ()
On 02/25/2018 05:11 PM, Martin McClure wrote:
I can reproduce the problem on the latest VM (pharo.cog.spur_linux32x86_201802232356.tar.gz).
The readily-reproducible problem isn't caught by GDB, the VM just exits out from under GDB after printing
XIO: fatal IO error 14 (Bad address) on X server ":0"
But once I did get a segv instead of the usual error and exit. Stack is below. I don't know when I'll have time to build a debug VM, and don't know whether it would help given that the reproducible problem isn't caught by GDB.
Any more hints on how to diagnose?
I built a debug VM, and as expected running under GDB produced no new info, the process just prints the error and exits.
Also as conjectured, I can not reproduce the problem on a 64-bit VM.
Regards,
-Martin
On Feb 27, 2018, at 8:25 PM, Martin McClure martin@hand2mouse.com wrote:
On 02/25/2018 05:11 PM, Martin McClure wrote:
I can reproduce the problem on the latest VM (pharo.cog.spur_linux32x86_201802232356.tar.gz).
The readily-reproducible problem isn't caught by GDB, the VM just exits out from under GDB after printing
XIO: fatal IO error 14 (Bad address) on X server ":0"
But once I did get a segv instead of the usual error and exit. Stack is below. I don't know when I'll have time to build a debug VM, and don't know whether it would help given that the reproducible problem isn't caught by GDB.
Any more hints on how to diagnose?
I built a debug VM, and as expected running under GDB produced no new info, the process just prints the error and exits.
That's strange. Can you put a breakpoint in write or exit so that gdb does stop rather than exit? Martin, if I were trying t debug this I would be trying to get the error to occur within gdb said I could poke around. I don't know any better way if solving problems like this than by first because no able to examine the exception in situ. I get that it's frustrating but there's no magic bullet. One has the keep trying until one can find out what caused the crash.
Also as conjectured, I can not reproduce the problem on a 64-bit VM.
Regards,
-Martin
On 02/28/2018 07:08 AM, Eliot Miranda wrote:
I built a debug VM, and as expected running under GDB produced no new info, the process just prints the error and exits.
That's strange. Can you put a breakpoint in write or exit so that gdb does stop rather than exit? Martin, if I were trying t debug this I would be trying to get the error to occur within gdb said I could poke around. I don't know any better way if solving problems like this than by first because no able to examine the exception in situ. I get that it's frustrating but there's no magic bullet. One has the keep trying until one can find out what caused the crash.
By putting a breakpoint in exit I was able to get the stack below. I hope this gives you a clue as to where to look next. Once again, what I'm doing at the point of failure is dragging the corner of the X window to resize it larger.
Regards,
-Martin
(gdb) break exit Breakpoint 1 at 0x1c2d0 (gdb) run ~/apps/Pharo7Builds/2018-02-26-32bit/scratch.image Starting program: /home/martin/Repositories/opensmalltalk-vm/build.linux32x86/pharo.cog.spur/build.debug/squeak ~/apps/Pharo7Builds/2018-02-26-32bit/scratch.image [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". [New Thread 0xf7833b40 (LWP 26198)] XIO: fatal IO error 14 (Bad address) on X server ":0" after 2906 requests (2872 known processed) with 0 events remaining.
Thread 1 "squeak" hit Breakpoint 1, 0xf7d4b470 in exit () from /lib32/libc.so.6 (gdb) where #0 0xf7d4b470 in exit () from /lib32/libc.so.6 #1 0xf7950688 in _XDefaultIOError () from /usr/lib32/libX11.so.6 #2 0xf79508ed in _XIOError () from /usr/lib32/libX11.so.6 #3 0xf794df16 in _XEventsQueued () from /usr/lib32/libX11.so.6 #4 0xf793f652 in XPending () from /usr/lib32/libX11.so.6 #5 0xf7fc0743 in handleEvents () at /home/martin/Repositories/opensmalltalk-vm/platforms/unix/vm-display-X11/sqUnixX11.c:3952 #6 0xf7fc077c in xHandler (fd=0x3, data=0x0, flags=0x2) at /home/martin/Repositories/opensmalltalk-vm/platforms/unix/vm-display-X11/sqUnixX11.c:3964 #7 0x5663f51c in aioPoll (microSeconds=0x0) at /home/martin/Repositories/opensmalltalk-vm/platforms/unix/vm/aio.c:292 #8 0x5657271d in ioProcessEvents () at /home/martin/Repositories/opensmalltalk-vm/platforms/unix/vm/sqUnixMain.c:652 #9 0x565e9d7f in checkForEventsMayContextSwitch (mayContextSwitch=0x1) at /home/martin/Repositories/opensmalltalk-vm/spursrc/vm/gcc3x-cointerp.c:60739 #10 0x565f0836 in handleStackOverflowOrEventAllowContextSwitch (mayContextSwitch=0x1) at /home/martin/Repositories/opensmalltalk-vm/spursrc/vm/gcc3x-cointerp.c:63988 #11 0x56591a1c in activateCoggedNewMethod (inInterpreter=0x0) at /home/martin/Repositories/opensmalltalk-vm/spursrc/vm/gcc3x-cointerp.c:14059 #12 0x56598fc4 in executeNewMethod () at /home/martin/Repositories/opensmalltalk-vm/spursrc/vm/gcc3x-cointerp.c:17329 #13 0x56597216 in ceSendsupertonumArgs (selector=0x5758a480, superNormalBar=0x1, rcvr=0x57b7e788, numArgs=0x0) at /home/martin/Repositories/opensmalltalk-vm/spursrc/vm/gcc3x-cointerp.c:16371 #14 0x5680034a in ?? () #15 0x5657789d in interpret () at /home/martin/Repositories/opensmalltalk-vm/spursrc/vm/gcc3x-cointerp.c:2706 #16 0x56576175 in main (argc=0x2, argv=0xffffc8c4, envp=0xffffc8d0) at /home/martin/Repositories/opensmalltalk-vm/platforms/unix/vm/sqUnixMain.c:2099
Hi Martin,
I'm sorry, I have no specific ideas as I don't know squeak specifics.
But generally speaking, when debugging X11, I ussually do following:
1) run the X client in "synchronous mode", i.e., XSynchronize(True) 2) trace and log requests/responses to/from an X server, I usually use `xtrace`.
then, you should be able to pinpoint the exact request that generated the error. Once you know which request it is, you can make an educated guess what XLib function may have generated such a request. Then put a breakpoint in XLib and collect both C and smalltalk backtrace. This makes a good start for the debugging.
Laborious indeed. Worked for me couple times.
HTH, Jan
P.S.: Are you running by chance under XWayland? If so, watch out especially for XGetImage() which does not work under XWayland. But I doubt this is the problem here.
On Thu, 2018-03-01 at 22:05 -0800, Martin McClure wrote:
On 02/28/2018 07:08 AM, Eliot Miranda wrote:
I built a debug VM, and as expected running under GDB produced no new info, the process just prints the error and exits.
That's strange. Can you put a breakpoint in write or exit so that gdb does stop rather than exit? Martin, if I were trying t debug this I would be trying to get the error to occur within gdb said I could poke around. I don't know any better way if solving problems like this than by first because no able to examine the exception in situ. I get that it's frustrating but there's no magic bullet. One has the keep trying until one can find out what caused the crash.
By putting a breakpoint in exit I was able to get the stack below. I hope this gives you a clue as to where to look next. Once again, what I'm doing at the point of failure is dragging the corner of the X window to resize it larger.
Regards,
-Martin
(gdb) break exit Breakpoint 1 at 0x1c2d0 (gdb) run ~/apps/Pharo7Builds/2018-02-26-32bit/scratch.image Starting program: /home/martin/Repositories/opensmalltalk- vm/build.linux32x86/pharo.cog.spur/build.debug/squeak ~/apps/Pharo7Builds/2018-02-26-32bit/scratch.image [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". [New Thread 0xf7833b40 (LWP 26198)] XIO: fatal IO error 14 (Bad address) on X server ":0" after 2906 requests (2872 known processed) with 0 events remaining.
Thread 1 "squeak" hit Breakpoint 1, 0xf7d4b470 in exit () from /lib32/libc.so.6 (gdb) where #0 0xf7d4b470 in exit () from /lib32/libc.so.6 #1 0xf7950688 in _XDefaultIOError () from /usr/lib32/libX11.so.6 #2 0xf79508ed in _XIOError () from /usr/lib32/libX11.so.6 #3 0xf794df16 in _XEventsQueued () from /usr/lib32/libX11.so.6 #4 0xf793f652 in XPending () from /usr/lib32/libX11.so.6 #5 0xf7fc0743 in handleEvents () at /home/martin/Repositories/opensmalltalk-vm/platforms/unix/vm-display- X11/sqUnixX11.c:3952 #6 0xf7fc077c in xHandler (fd=0x3, data=0x0, flags=0x2) at /home/martin/Repositories/opensmalltalk-vm/platforms/unix/vm-display- X11/sqUnixX11.c:3964 #7 0x5663f51c in aioPoll (microSeconds=0x0) at /home/martin/Repositories/opensmalltalk- vm/platforms/unix/vm/aio.c:292 #8 0x5657271d in ioProcessEvents () at /home/martin/Repositories/opensmalltalk- vm/platforms/unix/vm/sqUnixMain.c:652 #9 0x565e9d7f in checkForEventsMayContextSwitch (mayContextSwitch=0x1) at /home/martin/Repositories/opensmalltalk-vm/spursrc/vm/gcc3x- cointerp.c:60739 #10 0x565f0836 in handleStackOverflowOrEventAllowContextSwitch (mayContextSwitch=0x1) at /home/martin/Repositories/opensmalltalk-vm/spursrc/vm/gcc3x- cointerp.c:63988 #11 0x56591a1c in activateCoggedNewMethod (inInterpreter=0x0) at /home/martin/Repositories/opensmalltalk-vm/spursrc/vm/gcc3x- cointerp.c:14059 #12 0x56598fc4 in executeNewMethod () at /home/martin/Repositories/opensmalltalk-vm/spursrc/vm/gcc3x- cointerp.c:17329 #13 0x56597216 in ceSendsupertonumArgs (selector=0x5758a480, superNormalBar=0x1, rcvr=0x57b7e788, numArgs=0x0) at /home/martin/Repositories/opensmalltalk-vm/spursrc/vm/gcc3x- cointerp.c:16371 #14 0x5680034a in ?? () #15 0x5657789d in interpret () at /home/martin/Repositories/opensmalltalk-vm/spursrc/vm/gcc3x- cointerp.c:2706 #16 0x56576175 in main (argc=0x2, argv=0xffffc8c4, envp=0xffffc8d0) at /home/martin/Repositories/opensmalltalk- vm/platforms/unix/vm/sqUnixMain.c:2099
On 03/02/2018 12:49 AM, Jan Vrany wrote:
Hi Martin,
I'm sorry, I have no specific ideas as I don't know squeak specifics.
But generally speaking, when debugging X11, I ussually do following:
- run the X client in "synchronous mode", i.e., XSynchronize(True)
- trace and log requests/responses to/from an X server, I usually use `xtrace`.
then, you should be able to pinpoint the exact request that generated the error. Once you know which request it is, you can make an educated guess what XLib function may have generated such a request. Then put a breakpoint in XLib and collect both C and smalltalk backtrace. This makes a good start for the debugging.
Laborious indeed. Worked for me couple times.
HTH, Jan
P.S.: Are you running by chance under XWayland? If so, watch out especially for XGetImage() which does not work under XWayland. But I doubt this is the problem here.
Thanks for the hints, Jan. I'm not sure when I'll have time to dig in that deeply, but I'll try what you suggest if/when I do.
I probably *am* running under Wayland -- it's a Gentoo KDE system, and it does seem to have the package kde-plasma/kwayland-integration installed, along with some other Wayland-related packages, so it seems entirely likely that the window manager, which would be the entity that I'm interacting with in dragging the corner of the outer Pharo window, is now written to Wayland.
-Martin
vm-dev@lists.squeakfoundation.org