On Sep 1, 2009, at 10:12 AM, Andreas Raab wrote:
David Farber wrote:
On Sep 1, 2009, at 9:28 AM, Ian Piumarta wrote:
Is enabling IMAGE_DUMP in sqUnixMain.c sufficient for what you need? We could also have an option that names an image dump file, disabling the dump if no name is given but still printing the stack.
I was looking at this code earlier this year. I couldn't convince myself that the resulting image would actually be usable. If the VM just dumps the image, then won't you (at the very least) lose all your file handles including the handle to the changes file? And if you've lost your handle to the changes file then the image won't, for any practical purposes be usable. Am I missing something?
The simulator. It can be used to inspect the contents of an image file regardless of whether you run it or it. Useful for post-mortem analysis.
So how would you use the simulator to do a post-mortem?
Here is what happened to me back at the end of March. I was running a Seaside/Pier site headless [1] on CentOS 5. For persistency, Pier was supposed to snapshot the image after any relevant changes. Somewhere, somehow an error arose in the snapshot codepath, so Pier stopped snapshotting the image (even though the rest of the app ran fine and was accumulating data). I tried to manually snapshot the image (and save precious data) but, because the error was in the snapshot codepath, all I managed to do was hang the web interface. Out of the box, Seaside doesn't do any error logging [2] and I wasn't running a VM that had IMAGE_DUMP or stack-printing enabled[3]. So I was stuck with an image that was running (with unsaved data) but completely incommunicado.
I was able to core dump the running image [4] and manually reconstruct an image file (i.e. what I would have had if my VM had IMAGE_DUMP enabled). I loaded the image into the simulator, but I wasn't able to really do anything with it. Specifically, I didn't see any way to debug why the image was failing to snapshot. I was able to recover the data in the image[5]. But I still don't know what killed my image. How could the simulator help me figure out what when wrong?
David
[1] I wasn't running the image under any kind of remote X setup, which seems to be popular amongst Seaside deployers. [2] I won't make the mistake of deploying a Seaside app without error logging again. [3] I won't make the mistake of deploying a Seaside app on a VM without stack printing again. [4] On Linux, gcore will give you a core dump without terminating the process. A version for OS X is at http://osxbook.com/book/bonus/chapter8/core/ [5] I wrote code that will recover an object tree and write it to a ReferenceStream. At some point I should package and release it.