Hey Eliot,
Here is what I have found. I never saw any output from the leak checker. I was able to generate seg faults in the original echat-server.image, which is doing socket stuff, AND I was able to generate it in a looping GC image. In the original echat-server image, I have a listening socket and I have a "Vat" which has installed a subclass of Process and is looping and I am running the RFB server. In the looping GC image, I turned off my listening socket, the Vat is not running and I stopped the RFB server. I run headless and I supply a script to run. I supplied the following script:
[Smalltalk garbageCollect] repeat.
It took a few attempts (8 attempts) but I eventually seg faulted.
I have attached the logfiles for both the echat-server scenario and the looping GC scenario. Search for #SIGUSR1 for each process dump section. Search for #SEGFAULT to find the section at the bottom that seg faulted. Search for #PREVSTACK to find Processes in the SEGFAULT sections that have garbage in them and what the corresponding stack in a previous healthy section was doing.
Note that of these corrupted Processes, #PREVSTACK (Delay class>handleTimerEvent) and #PREVSTACK (EventSensor>eventTickler) are bad in both scenarios.
HTH, Rob
From: Rob Withers Sent: Tuesday, July 20, 2010 4:51 AM To: Eliot Miranda Cc: Squeak VM Dev Subject: Re: [Vm-dev] Re: Cog segmentation fault on Linux
--------------------------------------------------------------------------------
Hi Eliot,
(I forgot to CC the mailing list - added)
I made a few things happen.
First, I found that the argument to -leakcheck is an integer that gets masked to determine whether to leak check an incremental or full GC. I made the call with '-leakcheck 7'.
Second, I added a -leakcheck section to the COGVM section:
#if COGVM else if (!strcmp(argv[0], "-leakcheck")) { extern sqInt checkForLeaks; checkForLeaks = atoi(argv[1]); return 2; }
I compiled and ran it. I am unsure where any output from the leak checker goes. If it is to stdout or stderr, I forget the magic incantation to redirect these to files. I think it is '2> stderr.txt 1> stdout.txt' for /bin/sh. Is that right?
So when I ran it, it runs (the new image with stepping button bar - takes 30% cpu). When I send 'kill -USR1 <pid>' it seg faults guaranteed. This may or may not be the original seg fault - it may be the leakchecker?
The only stuff I am doing that calls out of the image is socket stuff. This may or may not be in the middle of a call when it seg faults. I will work to turn off all the socket activity and see if it still seg faults.
Am I activating the leakchecker ok?
Regards, Rob
From: Rob Withers Sent: Monday, July 19, 2010 9:22 PM To: Eliot Miranda Subject: Re: [Vm-dev] Re: Cog segmentation fault on Linux
hey Eliot,
It looks like this command line argument, -leakcheck, is for the STACKVM, not the COGVM. Is this an issue?
Thanks, Rob
#if STACKVM else if (!strcmp(argv[0], "-eden")) { extern sqInt desiredEdenBytes; desiredEdenBytes = strtobkm(argv[1]); return 2; } else if (!strcmp(argv[0], "-leakcheck")) { extern sqInt checkForLeaks; checkForLeaks = atoi(argv[1]); return 2; } else if (!strcmp(argv[0], "-stackpages")) { extern sqInt desiredNumStackPages; desiredNumStackPages = atoi(argv[1]); return 2; } else if (!strcmp(argv[0], "-breaksel")) { extern void setBreakSelector(char *); setBreakSelector(argv[1]); return 2; } else if (!strcmp(argv[0], "-noheartbeat")) { extern sqInt suppressHeartbeatFlag; suppressHeartbeatFlag = 1; return 1; } #endif /* STACKVM */
From: Rob Withers Sent: Monday, July 19, 2010 9:19 PM To: Eliot Miranda Cc: Squeak VM Dev Subject: [Vm-dev] Re: Cog segmentation fault on Linux
--------------------------------------------------------------------------------
Hi Eliot,
Got home from my new job and started looking into this. It turns out that this morning I found that I had a button bar that was stepping and part of the step was a Smalltalk garbageCollect to force collection before checking for instances. It may be something I don't need to do anymore, however it helps expose this seg fault. Both stack dumps were in the garbageCollect. I removed the button bar, uploaded the image, and ran it. CPU% dropped from 33% to 2%. I let it run all day. At some point it exited, for an unknown reason, as it was gone when I returned tonight.
I have reinstated the button bar, to help this bug occur, and uploaded it to the server.
Now I just need to enable -leakcheck. From sqUnixMain.c it looks like it takes an argument. What is that argument?
Thanks, Rob
From: Eliot Miranda Sent: Monday, July 19, 2010 1:47 PM To: Rob Withers Cc: Squeak Virtual Machine Development Discussion Subject: Re: Cog segmentation fault on Linux
Hi Rob,
On Mon, Jul 19, 2010 at 3:10 AM, Rob Withers reefedjib@yahoo.com wrote:
Eliot,
I am getting a segmentation fault running Cog headless on linux. Here is the stack dump. Below is a second stack dump that looks different.
While the heap corruption might be a bug in Cog it might also be heap corruption from external code (e.g. objects passed through FFI calls to external code that overwrites those objects' bounds).
There's a leak checker in Cog (see the -leakcheck argument in platforms/unix/vm/sqUnixMain.c) that can help you localise this. Its best to distrust your code before you distrust the VM, simply because thinking it's the VM can blind-side you to potential bugs in your own code or other parts of the system. The goal here is a reproducible case. If you get a reproducible case that doesn't use any external code then the bug is in the VM.
HTH Eliot
HTH, Rob
FIRST STACK DUMP
vawhigso@vawhigs.org [~/public_html/squeakelib/Cog]# kill -USR1 30247 vawhigso@vawhigs.org [~/public_html/squeakelib/Cog]# Received user signal, printing active stack:
0xff940ab8 I SmalltalkImage>garbageCollect -1207282732: a(n) SmalltalkImage 0xff940ad0 M Introducer class>areVatsRunning -1144872564: a(n) Introducer class 0xff940ae8 M PluggableButtonMorph>getModelState -1134952004: a(n) PluggableButtonMorph 0xff940b00 M PluggableButtonMorph>update: -1134952004: a(n) PluggableButtonMorph 0xff940b1c M StepMessage(MessageSend)>value -1134947088: a(n) StepMessage 0xff940b38 M StepMessage(MorphicAlarm)>value: -1134947088: a(n) StepMessage 0xff940b64 M WorldState>runLocalStepMethodsIn: -1215450852: a(n) WorldState 0xff940b90 M WorldState>runStepMethodsIn: -1215450852: a(n) WorldState 0xff940bac M PasteUpMorph>runStepMethods -1215450600: a(n) PasteUpMorph 0xff940bc8 M WorldState>doOneCycleNowFor: -1215450852: a(n) WorldState 0xff940be4 M WorldState>doOneCycleFor: -1215450852: a(n) WorldState 0xff940c00 M PasteUpMorph>doOneCycle -1215450600: a(n) PasteUpMorph 0xff940c20 I [] in Project class>spawnNewProcess -1138060792: a(n) Project class -1133485544 s [] in Segmentation fault
Smalltalk stack dump: 0xff940ab8 I SmalltalkImage>garbageCollect -1207282732: a(n) SmalltalkImage 0xff940ad0 M Introducer class>areVatsRunning -1144872564: a(n) Introducer class 0xff940ae8 M PluggableButtonMorph>getModelState -1134952004: a(n) PluggableButtonMorph 0xff940b00 M PluggableButtonMorph>update: -1134952004: a(n) PluggableButtonMorph 0xff940b1c M StepMessage(MessageSend)>value -1134947088: a(n) StepMessage 0xff940b38 M StepMessage(MorphicAlarm)>value: -1134947088: a(n) StepMessage 0xff940b64 M WorldState>runLocalStepMethodsIn: -1215450852: a(n) WorldState 0xff940b90 M WorldState>runStepMethodsIn: -1215450852: a(n) WorldState 0xff940bac M PasteUpMorph>runStepMethods -1215450600: a(n) PasteUpMorph 0xff940bc8 M WorldState>doOneCycleNowFor: -1215450852: a(n) WorldState 0xff940be4 M WorldState>doOneCycleFor: -1215450852: a(n) WorldState 0xff940c00 M PasteUpMorph>doOneCycle -1215450600: a(n) PasteUpMorph 0xff940c20 I [] in Project class>spawnNewProcess -1138060792: a(n) Project class
SECOND STACK DUMP
vawhigso@vawhigs.org [~/public_html/squeakelib/Cog]# kill -USR1 7340
Received user signal, printing active stack:
vawhigso@vawhigs.org [~/public_html/squeakelib/Cog]# 0xffaf3398 I SmalltalkImage>garbageCollect -1207897132: a(n) SmalltalkImage 0xffaf33b0 M Introducer class>areVatsRunning -1145486964: a(n) Introducer class 0xffaf33c8 M PluggableButtonMorph>getModelState -1135564980: a(n) PluggableButtonMorph 0xffaf33e0 M PluggableButtonMorph>update: -1135564980: a(n) PluggableButtonMorph 0xffaf33fc M StepMessage(MessageSend)>value -1135561416: a(n) StepMessage 0xffaf3418 M StepMessage(MorphicAlarm)>value: -1135561416: a(n) StepMessage 0xffaf3444 M WorldState>runLocalStepMethodsIn: -1216065252: a(n) WorldState 0xffaf3470 M WorldState>runStepMethodsIn: -1216065252: a(n) WorldState 0xffaf348c M PasteUpMorph>runStepMethods -1216065000: a(n) PasteUpMorph 0xffaf34a8 M WorldState>doOneCycleNowFor: -1216065252: a(n) WorldState 0xffaf34c4 M WorldState>doOneCycleFor: -1216065252: a(n) WorldState 0xffaf34e0 M PasteUpMorph>doOneCycle -1216065000: a(n) PasteUpMorph 0xffaf3500 I [] in Project class>spawnNewProcess -1138675192: a(n) Project class -1134099944 s [] in BlockClosure>newProcess
Received user signal, printing all processes:
Process 0xbc670278 priority 40 0xffaf3398 I SmalltalkImage>garbageCollect -1207897132: a(n) SmalltalkImage 0xffaf33b0 M Introducer class>areVatsRunning -1145486964: a(n) Introducer class 0xffaf33c8 M PluggableButtonMorph>getModelState -1135564980: a(n) PluggableButtonMorph 0xffaf33e0 M PluggableButtonMorph>update: -1135564980: a(n) PluggableButtonMorph 0xffaf33fc M StepMessage(MessageSend)>value -1135561416: a(n) StepMessage 0xffaf3418 M StepMessage(MorphicAlarm)>value: -1135561416: a(n) StepMessage 0xffaf3444 M WorldState>runLocalStepMethodsIn: -1216065252: a(n) WorldState 0xffaf3470 M WorldState>runStepMethodsIn: -1216065252: a(n) WorldState 0xffaf348c M PasteUpMorph>runStepMethods -1216065000: a(n) PasteUpMorph 0xffaf34a8 M WorldState>doOneCycleNowFor: -1216065252: a(n) WorldState 0xffaf34c4 M WorldState>doOneCycleFor: -1216065252: a(n) WorldState 0xffaf34e0 M PasteUpMorph>doOneCycle -1216065000: a(n) PasteUpMorph 0xffaf3500 I [] in Project class>spawnNewProcess -1138675192: a(n) Project class -1134099944 s [] in BlockClosure>newProcess
Process 0xbc97fc44 priority 50 0xffafa4c0 M WeakArray class>finalizationProcess -1210174624: a(n) WeakArray class 0xffafa4e0 I [] in WeakArray class>restartFinalizationProcess -1210174624: a(n) WeakArray class 0xffafa500 I [] in BlockClosure>newProcess -1130890396: a(n) BlockClosure
Process 0xb8125518 priority 80 widowed caller frame
EventualProcess 0xbbbd6e10 priority 60 -1134095516 s [] in Delay>wait -1134049404 s BlockClosure>ifCurtailed: -1134095648 s Delay>wait -1134049312 s [] in VatTPManager class>finalizationLoop -1145210696 s BlockClosure>repeat -1145213336 s VatTPManager class>finalizationLoop -1145213520 s [] in VatTPManager class>?
EventualProcess 0xbc5c9174 priority 30 -1134783048 s SharedQueue>next -1134783140 s [] in Vat>processSends -1134751716 s BlockClosure>ifCurtailed: -1134783276 s Vat>processSends -1134783984 s [] in EventualProcess>setupContext
Process 0xbc86c01c priority 60 0xffaf44c0 I RFBEventSensor(InputSensor)>userInterruptWatcher -1130865472: a(n) RFBEventSensor 0xffaf44e0 I [] in RFBEventSensor(InputSensor)>installInterruptWatcher -1130865472: a(n) RFBEventSensor 0xffaf4500 I [] in BlockClosure>newProcess -1132019908: a(n) BlockClosure
Process 0xbc86c1dc priority 60 widowed caller frame 8TÅSĸ"Ä»r·Zžð·ð·Zžð·ļ§r·TTÅSĸÅr·ĻZžZžÃ,ž4Zžir·tTÅSĸT·Ã,žüYžÃ,žÄ?\žð··TÅSĸĶ·sZžðZžð·8·s·Ä?TÅSĸ(õq·$ZžĪ*îZžüYžÃ,žÄ?\ž ]Ä'·
Process 0xbc86c3c8 priority 60 0xffaf74c0 I SmalltalkImage>lowSpaceWatcher -1207897132: a(n) SmalltalkImage 0xffaf74e0 I [] in SmalltalkImage>installLowSpaceWatcher -1207897132: a(n) SmalltalkImage 0xffaf7500 I [] in BlockClosure>newProcess -1132018968: a(n) BlockClosure
Process 0xbc985ef0 priority 60 widowed caller frame HÃ"ÅSĸúq·Ä?\žð·ð·Ä?\žð· úq·lÃ"ÅSĸ(õq·Ã"\žÄ?\žD\žļ·
Segmentation fault
Can't dump Smalltalk stack. Not in VM thread
Most recent primitives wait signal millisecondClockValue wait signal at:put: at:put: at:put: at:put: at:put: at:put: at:put: at:put: perform:with: basicNew: basicNew value: millisecondClockValue basicNew basicNew new: at:put: at:put: at:put: basicNew basicNew basicNew basicNew: at:put: replaceFrom:to:with:startingAt: replaceFrom:to:with:startingAt: basicNew: at:put: replaceFrom:to:with:startingAt: replaceFrom:to:with:startingAt: species basicNew: replaceFrom:to:with:startingAt: compare:with:collated: at:put: at:put: at:put: at:put: at:put: at:put: at:put: at:put: perform:withArguments: perform: species basicNew: basicAt:put: basicAt:put: basicAt:put: basicAt:put: basicAt:put: basicAt:put: basicAt:put: basicAt:put: basicAt:put: species basicNew: basicReplaceFrom:to:with:startingAt: species basicNew: basicAt:put: basicAt:put: species basicNew: basicReplaceFrom:to:with:startingAt: species basicNew: basicAt:put: species basicNew: basicReplaceFrom:to:with:startingAt: new: basicNew at:put: at:put: at:put: new: basicNew at:put: at:put: at:put: new: basicNew at:put: at:put: at:put: new: basicNew at:put: at:put: at:put: primitiveGarbageCollect millisecondClockValue signal at:put: at:put: at:put: at:put: at:put: at:put: suspend primitiveResume at:put: at:put: at:put: at:put: suspend primitiveResume at:put: at:put: primSignal:atMilliseconds: millisecondClockValue wait millisecondClockValue millisecondClockValue wait signal at:put: at:put: millisecondClockValue primSignal:atMilliseconds: millisecondClockValue wait value wait signal wait value signal millisecondClockValue primSignal:atMilliseconds: millisecondClockValue wait signal primSocketConnectionStatus: millisecondClockValue basicNew: byteAt:put: byteAt:put: species basicNew: replaceFrom:to:with:startingAt: replaceFrom:to:with:startingAt: species basicNew: replaceFrom:to:with:startingAt: replaceFrom:to:with:startingAt: basicNew findNextHandlerContextStarting tempAt: tempAt: tempAt:put: valueNoContextSwitch tempAt: valueWithArguments: findNextUnwindContextUpTo: tempAt: tempAt:put: tempAt: terminateTo: value tempAt:put: findNextUnwindContextUpTo: terminateTo: primSocketConnectionStatus: value value millisecondClockValue primSocketConnectionStatus: millisecondClockValue millisecondClockValue basicNew valueNoContextSwitch millisecondClockValue wait signal at:put: at:put: at:put: millisecondClockValue primSignal:atMilliseconds: millisecondClockValue wait signal wait basicNew new: someInstance nextInstance at:put: species new: replaceFrom:to:with:startingAt: at:put: at:put: at:put: at:put: at:put: at:put: at:put: at:put: perform:withArguments: perform: species basicNew: basicAt:put: basicAt:put: basicAt:put: basicAt:put: basicAt:put: basicAt:put: basicAt:put: basicAt:put: basicAt:put: species basicNew: basicReplaceFrom:to:with:startingAt: species basicNew: basicAt:put: basicAt:put: species basicNew: basicReplaceFrom:to:with:startingAt: species basicNew: basicAt:put: species basicNew: basicReplaceFrom:to:with:startingAt: new: basicNew at:put: at:put: at:put: new: basicNew at:put: at:put: at:put: new: basicNew at:put: at:put: at:put: new: basicNew at:put: at:put: at:put: primitiveGarbageCollect