Build Update for OpenSmalltalk/opensmalltalk-vm
-------------------------------------
Build: #1814
Status: Errored
Duration: 11 mins and 44 secs
Commit: 352de13 (Cog)
Author: Eliot Miranda
Message: Ensure that sigaltstack is used to establish an alternative signal stack on
Unix platforms, and that the SIGIO handler (forceInterruptCheck) runs on that
stack. Although we don't have absolute proof we have strong evidence to suggest
that on recent macOS versions (e.g. 10.13) the first delivery of SIGIO to the
VM causes corruption of the code zone if the VM is in or transitioning to
machine code. This is similar to crashes seen in the Newspeak VM on linux using
the ITIMER heartbeat. There-on the issue was that the dynamic linker would be
called within the signal handler on first invocation, and that this would cause
the dynamic linker to traverse the Smalltalk JIT code stack, misinteerpret
Smalltalk stack frames as ABI-compliant stack frames and cause corruption as
a result.
Since the code is now system wide on Unix, not merely confined to the ITIMER VM,
move the sigaltstack initialization to platforms/unix/vm/aio.c and delete the
duplications in the ITIMER heartbeat variants.
View the changeset: https://github.com/OpenSmalltalk/opensmalltalk-vm/compare/c114ece5b581...35…
View the full build log and details: https://travis-ci.org/OpenSmalltalk/opensmalltalk-vm/builds/594102107?utm_m…
--
You can unsubscribe from build emails from the OpenSmalltalk/opensmalltalk-vm repository going to https://travis-ci.org/account/preferences/unsubscribe?repository=8795279&ut….
Or unsubscribe from *all* email updating your settings at https://travis-ci.org/account/preferences/unsubscribe?utm_medium=notificati….
Or configure specific recipients for build notifications in your .travis.yml file. See https://docs.travis-ci.com/user/notifications.
Branch: refs/heads/Cog
Home: https://github.com/OpenSmalltalk/opensmalltalk-vm
Commit: 352de13869ef1aefeae1c6f863eeb41111db7ffe
https://github.com/OpenSmalltalk/opensmalltalk-vm/commit/352de13869ef1aefea…
Author: Eliot Miranda <eliot.miranda(a)gmail.com>
Date: 2019-10-05 (Sat, 05 Oct 2019)
Changed paths:
M platforms/unix/vm/aio.c
M platforms/unix/vm/sqUnixITimerHeartbeat.c
M platforms/unix/vm/sqUnixITimerTickerHeartbeat.c
Log Message:
-----------
Ensure that sigaltstack is used to establish an alternative signal stack on
Unix platforms, and that the SIGIO handler (forceInterruptCheck) runs on that
stack. Although we don't have absolute proof we have strong evidence to suggest
that on recent macOS versions (e.g. 10.13) the first delivery of SIGIO to the
VM causes corruption of the code zone if the VM is in or transitioning to
machine code. This is similar to crashes seen in the Newspeak VM on linux using
the ITIMER heartbeat. There-on the issue was that the dynamic linker would be
called within the signal handler on first invocation, and that this would cause
the dynamic linker to traverse the Smalltalk JIT code stack, misinteerpret
Smalltalk stack frames as ABI-compliant stack frames and cause corruption as
a result.
Since the code is now system wide on Unix, not merely confined to the ITIMER VM,
move the sigaltstack initialization to platforms/unix/vm/aio.c and delete the
duplications in the ITIMER heartbeat variants.
Hi All,
there is a VM bug in 64-bit Spur with the Sista V1 bytecode set and
full blocks. The symptom is that when waiting for a remote Monticello
repository to update and/or deliver a package version the system crashes in
JITTED code after what appears to be some kind of wait.
This is a reliably occurring bug b ut maddeningly difficult to reproduce.
The bug reliably occurs when interacting with a remote rep[ository (e.g.
http://source.squeak.org/VMMaker) when the server is "cold", and hence
makes the image wait. Every time I have tried to repeat the failing
sequence the crash has not occurre3d, I think because the server is now
"hot" and serves up the version quickly. Today I even tried shutting down
my machine for over an hour and rebooting. But I could not get the crash
to occur even though it seems to me that every time I try it the first time
in the4 day it does crash.
This is an important bug to fix. If it cannot be fixed then full blocks
and Sista V1 are not ready for use in the upcoming Squeak release. I am
looking for help in debugging this.
- is anyone else uising the 64-bit VM with full blocks and Sista V1 who
sees hard VM crashes? If so, under what circumstances?
- is it possible to flush caches in the http://source.squeak.org/VMMaker
server, or could people tolerate me rebooting the server?
- is there a way of introducing network delays in Mac OS that might help me
induce the bug?
- can anyone think of any other strategies I might take to try and
reproduce this?
I may have to try and reproduce e the bug in the simulator to have a chance
of identifying the bug. Does anyone have a good enough mental model of the
Monticello server interaction and have energy to help me figure this one
out?
Here is some information from the last crash I did see in the debugger
(alas it is incomplete; there are a number of additional pieces of info I
could have collected).
(lldb) thr b
* thread #1, queue = 'com.apple.main-thread', stop reason =
EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
* frame #0: 0x000000010de5700a
frame #1: 0x000000010dd7b174
frame #2: 0x000000010dd45f1c
frame #3: 0x000000010dd44534
frame #4: 0x000000010dd44c60
(lldb) x/10i 0x000000010de5700a
(lldb) call printStackCallStackOf($rbp)
0x7ffeefbdfc30 M Heap>upHeap: 0x11273ca90: a(n) Heap
0x7ffeefbdfc68 M Heap>add: 0x11273ca90: a(n) Heap
0x7ffeefbdfca0 M Delay class>scheduleDelay:from: 0x1123ebfb8: a(n)
Delay class
0x7ffeefbdfcf0 M Delay class>handleTimerEvent 0x1123ebfb8: a(n) Delay
class
0x7ffeefbdfd20 M Delay class>runTimerEventLoop 0x1123ebfb8: a(n) Delay
class
(lldb) x/10i 0x000000010dd7b174
0x10dd7b174: 48 8b 55 10 movq 0x10(%rbp), %rdx
0x10dd7b178: 48 89 ec movq %rbp, %rsp
0x10dd7b17b: 5d popq %rbp
0x10dd7b17c: c2 10 00 retq $0x10
0x10dd7b17f: cc int3
0x10dd7b180: cc int3
0x10dd7b181: cc int3
0x10dd7b182: cc int3
0x10dd7b183: cc int3
0x10dd7b184: cc int3
(lldb) print whereIs(0x000000010dd7b174)
(char *) $0 = 0x00000001000f83ff " is in generated methods"
(lldb) call printCogMethodFor((void *)0x000000010dd7b174)
0x10dd7afc0 <-> 0x10dd7b198: method: 0x112f23c10
selector: 0x112232c20 add:
(lldb) print whereIs(0x000000010de5700a)
(char *) $1 = 0x00000001000f83ff " is in generated methods"
(lldb) call printCogMethodFor((void *)0x000000010de5700a)
0x10de56ba0 <-> 0x10de57078: method: 0x1126ec218 prim
23856 selector: 0x7ffeefbf3d20
this method ends up being the fitted version of Delay class>>
startTimerEventLoop
_,,,^..^,,,_
best, Eliot