Hi Christoph,

> in technical terms it is still an abuse of the stack
Indeed, that's what it is :) Given the extremely short lifespan of the abused context - literally one return till VM raising an exception I thought it acceptable. There's a note "backup the pc before nilling for the sake of debugging." in the comment but maybe not clear and visible enough.

> Or would it be fine to just store the pc in an unused tempvar?
I wanted to associate the information about the original pc with the problematic context but it can't be known in advance which one will be problematic (I assume you meant a tempvar in the #cannotReturn method - tried but didn't like much) hence the simple push.

> To me, this more looks like a bug in the simulator. The simulator should mimic the behavior of the VM, so if the VM manages to put a #cannotReturn: context on the stack, so should the simulator. Example [...]
Even better... I suspected it may be a more general debugger issue. I was just looking for such an example - thanks!

In that case Context>>#step and #stepToSendOrReturn are hot candidates for placing the changes. What would you think, however, about this: It looks like #interpretNextInstructionFor: may be a single channel where all debugging requests must go through but I'm still just skimming the surface of the Debugger code:

```
interpretNextInstructionFor: client
    "Send to the argument, client, a message that specifies the type of the next instruction."

    (self isContext and: [self sender notNil and: [self sender isDead]]) ifTrue:
        [^Processor debugWithTitle: 'Illegal return to a dead context' translated full: false].
    ^self method encoderClass interpretNextInstructionFor: client in: self
```

I tried sending #cannotReturn instead of Processor debugWithTitle: but it created infinite loops.

What do you think?
best,
Jaromir

PS: re tests - I'll reserve some time this weekend :)
--

Jaromir Matas



------ Original Message ------
From christoph.thiede@student.hpi.uni-potsdam.de
To squeak-dev@lists.squeakfoundation.org
Date 11/19/2023 8:57:44 PM
Subject [squeak-dev] Re: Resuming on BlockCannotReturn exception

Hi Jaromir,
 
On 2023-11-19T18:56:54+00:00, mail@jaromir.net wrote:
 
Hi Christoph,
all good points, thanks!
 
> Do we already have tests for that expectation?
I haven't figured out yet how to test it :)
 
Hm, maybe via the DebuggerTests again? When try to resume from your second (ifTrue:) example, it should open a "cannot return" debugger. Tricky, I know. :-)
 
 
> Do I understand correctly that the pushed pc is only visible when
manually inspecting the context?
Yes
 
> Given the expected rareness [...] I wonder whether this information is
actually required. If yes, why don't store it in the BlockCannotReturn
exception instead?
Definitely not required, just though it might come handy :) And
precisely because of the rareness I just pushed it on the problematic
context's stack in case you inspect it and wonder what the instruction
where the BCR started from was. We could push more verbose info like
`push: 'original pc: ', pc` or store in BCR exception but KISS :)
 
Fair ... in technical terms it is still an abuse of the stack, I think, so maybe at least add a comment to that "push: pc" to avoid confusion in the reader? Or would it be fine to just store the pc in an unused tempvar?
 
 
> Can you give me an example of where this extra notification adds any
value for the user?
I should have given it in the message, sorry. You can e.g. run
 
[[true ifTrue: [^ 1]] on: BlockCannotReturn do: [:ex | self halt. ex
resume] ] fork
 
highlight Context>>cannotReturn: line, go three times Over and then dive
in via Into, Over or Through :) You get a strange error without the fix
and 'Illegal attempt...' with the fix.
 
I by no means understand exactly how the Debugger works so you may have
a way better idea where to place a fix or maybe even fix the pc = nil
debugging situation cleanly instead of my ugly patch.
 
 
Ah, I see! To me, this more looks like a bug in the simulator. The simulator should mimic the behavior of the VM, so if the VM manages to put a #cannotReturn: context on the stack, so should the simulator.
 
Example:
 
process :=
[(c := thisContext) pc: nil.
2+3] newProcess.
process runUntil: [:ctx | ctx selector = #cannotReturn:].
self assert: process suspendedContext sender = c.
self assert: process suspendedContext arguments = {c}.
 
... if my assumptions are correct.
 
I think Context>>#step and Context>>#stepToSendOrReturn should be made aware of a nil pc and push a #cannotReturn: context on the stack, like in Context>>#return:from:. Do you maybe feel like submitting an inbox version (plus ideally tests) for this? :-) If we get that to work, this should be a much cleaner solution than adding another check in the debugger to explain our own simulator issues. :-)
 
 
Thanks for your input!
 
--
 
Jaromir Matas
 
 
 
 
------ Original Message ------
>From christoph.thiede(a)student.hpi.uni-potsdam.de
To squeak-dev(a)lists.squeakfoundation.org
Date 11/19/2023 7:04:35 PM
Subject [squeak-dev] Re: Resuming on BlockCannotReturn exception
 
>Hi Jaromir,
>
>thanks for looking into this!
>
>> [[^ 1] on: BlockCannotReturn do: #resume ] fork. "VM crash"
>
>Indeed, this could and should be handled more robustly in the image!
>
>> [[true ifTrue: [^ 1]] on: BlockCannotReturn do: #resume ] fork. "Illegal return from ^1"
>
>Is this really illegal? Remember that resuming an exception is generally something special, and as this exception refers to a return instruction, resuming it is something I would definitely consider metaprogramming or worse. Thinking of ^ 1 as a syntactic sugar for "thisContext home return: 1", I would actually find it plausible that resuming from the exception would skip the #return: send. However, I have not read the entire conversation, so if Eliot says something different, please ignore this objection. :-)
>
>Kernel-jar.1535:
>
>> + closureOrNil ifNotNil: [^self cannotReturn: result to: self home sender; push: pc; pc: nil].
>
>Do I understand correctly that the pushed pc is only visible when manually inspecting the context? This is something very unexpected to me, it just feels as if you did not find any better place to store this information. Especially if there is no ready path available to access it later. Or am I wrong? Given the expected rareness of those events and the even greater rareness of multiple non-local returns from a single block context that has survived their home context, I wonder whether this information is actually required. If yes, why don't store it in the BlockCannotReturn exception instead?
>
>> Tools-jar.1240
>
>Can you give me an example of where this extra notification adds any value for the user? If I run any of your two examples without the change in the Debugger, I just get a Computation has been terminated error while the #cannotReturn: stack is still available, which already seems to give me all information required to identify the origin of the error, in particular due to the extensive comment in this method.
>
>Best,
>Christoph
>
>---
>Sent from Squeak Inbox Talk
>
>On 2023-11-19T17:44:24+00:00, mail(a)jaromir.net wrote:
>
>> Hi Eliot, all,
>>
>> I've sent Kernel-jar.1535 with a suggested fix dealing with the
>> following two examples discussed in this thread:
>>
>> [[^ 1] on: BlockCannotReturn do: #resume ] fork. "VM crash"
>>
>> [[true ifTrue: [^ 1]] on: BlockCannotReturn do: #resume ] fork.
>> "Illegal return from ^1"
>>
>> Now both examples work correctly.
>>
>> In addition, Tools-jar.1240 is an attempt to facilitate debugging the
>> resumption from the BCR exception. Currently the debugger opens with a
>> confusing error as a consequence of decoding an instruction with nil
>> index. In case you have a better idea where to place the fix, please let
>> me know.
>>
>> Thanks for your help,
>>
>> best,
>> Jaromir
>>
>> --
>> Jaromir Matas
>>
>>
>>
>> ------ Original Message ------
>> >From "Jaromir Matas" <mail(a)jaromir.net>
>> To "Eliot Miranda" <eliot.miranda(a)gmail.com>;
>> squeak-dev(a)lists.squeakfoundation.org
>> Date 11/18/2023 12:53:18 PM
>> Subject [squeak-dev] Re: Resuming on BlockCannotReturn exception
>>
>> >Hi Eliot,
>> >
>> >
>> >------ Original Message ------
>> >From "Jaromir Matas" <mail(a)jaromir.net>
>> >To "Eliot Miranda" <eliot.miranda(a)gmail.com>;
>> >squeak-dev(a)lists.squeakfoundation.org
>> >Date 11/18/2023 12:14:31 AM
>> >Subject [squeak-dev] Re: Resuming on BlockCannotReturn exception
>> >
>> >>Hi Eliot,
>> >>
>> >>How about to nil the pc just before making the return:
>> >>```
>> >>Context >> #cannotReturn: result
>> >>
>> >> self push: self pc. "backup the pc for the sake of debugging"
>> >> closureOrNil ifNotNil: [^self cannotReturn: result to: self home
>> >>sender; pc: nil].
>> >> Processor debugWithTitle: 'Computation has been terminated!'
>> >>translated full: false
>> >>```
>> >>The nilled pc should not even potentially interfere with the #isDead
>> >>now.
>> >>
>> >>I hope this is at least a step in the right direction :)
>> >>
>> >>However, there's still a problem when debugging the resumption of
>> >>#cannotReturn because the encoders expect a reasonable index. I
>> >>haven't figured out yet where to place a nil check - #step,
>> >>#stepToSendOrReturn... ?
>> >
>> >This debugger issue has nothing to do with the proposed change... the
>> >debugging fails in the current trunk image too:
>> >
>> >Run `[[^ 1] on: BlockCannotReturn do: [:ex | self halt. ex resume] ]
>> >fork`
>> >Click on the highlighted context
>> >Step into
>> >
>> >
>> >
>> >So, if the suggested solution makes sense to you, it would make
>> >'playing with fire' a bit safer :)
>> >
>> >best,
>> >Jaromir
>> >
>> >
>> >>
>> >>
>> >>Thanks again,
>> >>Jaromir
>> >>
>> >>
>> >>------ Original Message ------
>> >>From "Eliot Miranda" <eliot.miranda(a)gmail.com>
>> >>To "Jaromir Matas" <mail(a)jaromir.net>
>> >>Date 11/17/2023 8:36:50 PM
>> >>Subject Re: [squeak-dev] Re: Resuming on BlockCannotReturn exception
>> >>
>> >>>Hi Jaromir,
>> >>>
>> >>>>On Nov 17, 2023, at 7:05 AM, Jaromir Matas <mail(a)jaromir.net> wrote:
>> >>>>
>> >>>>
>> >>>>Eliot, hi again,
>> >>>>
>> >>>>Please disregard my previous comment about nilling the contexts that
>> >>>>have returned. We are indeed talking about the context directly
>> >>>>under the #cannotReturn context which is totally different from the
>> >>>>home context in another thread that's gone.
>> >>>>
>> >>>>I may still be confused but would nilling the pc of the context
>> >>>>directly under the cannotReturn context help? Here's what I mean:
>> >>>>```
>> >>>>Context >> #cannotReturn: result
>> >>>>
>> >>>> closureOrNil ifNotNil: [^self pc: nil; cannotReturn: result to:
>> >>>>self home sender].
>> >>>> Processor debugWithTitle: 'Computation has been terminated!'
>> >>>>translated full: false.
>> >>>>```
>> >>>>Instead of crashing the VM invokes the debugger with the
>> >>>>'Computation has been terminated!' message.
>> >>>>
>> >>>>Does this make sense?
>> >>>
>> >>>Nearly. But it loses the information on what the pc actually is, and
>> >>>that’s potentially vital information. So IMO the ox should only be
>> >>>nilled between the BlockCannotReturn exception being created and
>> >>>raised.
>> >>>
>> >>>[But if you try this don’t be surprised if it causes a few temporary
>> >>>problems. It looks to me that without a little refactoring this
>> >>>could easily cause an infinite recursion around the sending of
>> >>>isDead. I’m sure you’ll be able to fix the code to work correctly]
>> >>>
>> >>>>Thanks,
>> >>>>Jaromir
>> >>>>
>> >>>>
>> >>>>------ Original Message ------
>> >>>>From "Jaromir Matas" <mail(a)jaromir.net>
>> >>>>To "Eliot Miranda" <eliot.miranda(a)gmail.com>; "The general-purpose
>> >>>>Squeak developers list" <squeak-dev(a)lists.squeakfoundation.org>
>> >>>>Date 11/17/2023 10:15:17 AM
>> >>>>Subject [squeak-dev] Re: Resuming on BlockCannotReturn exception
>> >>>>
>> >>>>>Hi Eliot,
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>------ Original Message ------
>> >>>>>From "Eliot Miranda" <eliot.miranda(a)gmail.com>
>> >>>>>To "Jaromir Matas" <mail(a)jaromir.net>
>> >>>>>Cc "The general-purpose Squeak developers list"
>> >>>>><squeak-dev(a)lists.squeakfoundation.org>
>> >>>>>Date 11/16/2023 11:52:45 PM
>> >>>>>Subject Re: Re[2]: [squeak-dev] Re: Resuming on BlockCannotReturn
>> >>>>>exception
>> >>>>>
>> >>>>>>Hi Jaromir,
>> >>>>>>
>> >>>>>>On Thu, Nov 16, 2023 at 2:22 PM Jaromir Matas <mail(a)jaromir.net>
>> >>>>>>wrote:
>> >>>>>>>Hi Nicolas, Eliot,
>> >>>>>>>
>> >>>>>>>here's what I understand is happening (see the enclosed
>> >>>>>>>screenshot):
>> >>>>>>>
>> >>>>>>>1) we fork a new process to evaluate [^1]
>> >>>>>>>2) the new process evaluates [^1] which means instruction 18 is
>> >>>>>>>being evaluated, hence pc points to instruction 19 now
>> >>>>>>>3) however, the home context where ^1 should return to is gone by
>> >>>>>>>this time (the process that executed the fork has already
>> >>>>>>>returned - notice the two up arrows in the debugger screenshot)
>> >>>>>>>4) the VM can't finish the instruction and returns control to the
>> >>>>>>>image via placing the #cannotReturn: context on top of the [^1]
>> >>>>>>>context
>> >>>>>>>5) #cannotReturn: evaluation results in signalling the BCR
>> >>>>>>>exception which is then handled by the #resume handler
>> >>>>>>> (in our debugged case the [:ex | self halt. ex resume]
>> >>>>>>>handler)
>> >>>>>>>6) ex resume is evaluated, however, this means requesting the VM
>> >>>>>>>to evaluate instruction 19 of the [^1] context - which is past
>> >>>>>>>the last instruction of the context and the crash ensues
>> >>>>>>>
>> >>>>>>>I wonder whether such situations could/should be prevented inside
>> >>>>>>>the VM or whether such an expectation is wrong for some reason.
>> >>>>>>
>> >>>>>>As Nicolas says, IMO this is best done at the image level.
>> >>>>>>
>> >>>>>>It could be prevented in the VM, but at great cost, and only
>> >>>>>>partially. The performance issue is that the last bytecode in a
>> >>>>>>method is not marked in any way, and that to determine the last
>> >>>>>>bytecode the bytecodes must be symbolically evaluated from the
>> >>>>>>start of the method. See implementors of endPC at the image level
>> >>>>>>(which defer to the method trailer) and implementors of endPCOf:
>> >>>>>>in the VMMaker code. Doing this every time execution commences is
>> >>>>>>prohibitively expensive. The "only partially" issue is that
>> >>>>>>following the return instruction may be other valid bytecodes, but
>> >>>>>>these are not a continuation.
>> >>>>>>
>> >>>>>>
>> >>>>>>Consider the following code in some block:
>> >>>>>> [self expression ifTrue:
>> >>>>>> [^1].
>> >>>>>> ^2
>> >>>>>>
>> >>>>>>The bytecodes for this are
>> >>>>>> pushReceiver
>> >>>>>> send #expression
>> >>>>>> jumpFalse L1
>> >>>>>> push 1
>> >>>>>> methodReturnTop
>> >>>>>>L1
>> >>>>>> push 2
>> >>>>>> methodReturnTop
>> >>>>>>
>> >>>>>>Clearly if expression is true these should be *no* continuation in
>> >>>>>>which ^2 is executed.
>> >>>>>
>> >>>>>Well, in that case there's a bug because the computation in the
>> >>>>>following example shouldn't continue past the [^1] block but it
>> >>>>>silently does:
>> >>>>>`[[true ifTrue: [^ 1]] on: BlockCannotReturn do: #resume ] fork`
>> >>>>>
>> >>>>>The bytecodes are
>> >>>>> push true
>> >>>>> jumpFalse L1
>> >>>>> push 1
>> >>>>> returnTop
>> >>>>>L1
>> >>>>> push nil
>> >>>>> blockReturn
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>>
>> >>>>>>So even if the VM did try and detect whether the return was at the
>> >>>>>>last block method, it would only work for special cases.
>> >>>>>>
>> >>>>>>
>> >>>>>>It seems to me the issue is simply that the context that cannot be
>> >>>>>>returned from should be marked as dead (see Context>>isDead) by
>> >>>>>>setting its pc to nil at some point, presumably after copying the
>> >>>>>>actual return pc into the BlockCannotReturn exception, to avoid
>> >>>>>>ever trying to resume the context.
>> >>>>>
>> >>>>>Does this mean, in other words, that every context that returns
>> >>>>>should nil its pc to avoid being "wrongly" reused/executed in the
>> >>>>>future, which concerns primarily those being referenced somewhere
>> >>>>>hence potentially executable in the future, is that right?
>> >>>>>Hypothetical question: would nilling the pc during returns "fix"
>> >>>>>the example?
>> >>>>>Thanks a lot for helping me understand this.
>> >>>>>Best,
>> >>>>>Jaromir
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>>
>> >>>>>>>Thanks,
>> >>>>>>>Jaromir
>> >>>>>>>
>> >>>>>>><bdxuqalu.png>
>> >>>>>>>
>> >>>>>>>------ Original Message ------
>> >>>>>>>From "Eliot Miranda" <eliot.miranda(a)gmail.com>
>> >>>>>>>To "Jaromir Matas" <mail(a)jaromir.net>; "The general-purpose
>> >>>>>>>Squeak developers list" <squeak-dev(a)lists.squeakfoundation.org>
>> >>>>>>>Date 11/16/2023 6:48:43 PM
>> >>>>>>>Subject Re: [squeak-dev] Re: Resuming on BlockCannotReturn
>> >>>>>>>exception
>> >>>>>>>
>> >>>>>>>>Hi Jaromir,
>> >>>>>>>>
>> >>>>>>>>>On Nov 16, 2023, at 3:23 AM, Jaromir Matas <mail(a)jaromir.net>
>> >>>>>>>>>wrote:
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>Hi Nicloas,
>> >>>>>>>>>No no, I don't have any practical scenario in mind, I'm just
>> >>>>>>>>>trying to understand why the VM is implemented like this,
>> >>>>>>>>>whether there were a reason to leave this possibility of a
>> >>>>>>>>>crash, e.g. it would slow down the VM to try to prevent such a
>> >>>>>>>>>dumb situation (who would resume from BCR in his right mind? :)
>> >>>>>>>>>) - or perhaps I have overlooked some good reason to even keep
>> >>>>>>>>>this behavior in the VM. That's all.
>> >>>>>>>>
>> >>>>>>>>Let’s first understand what’s really happening. Presumably at
>> >>>>>>>>tone point a context is resumed those pc is already at the block
>> >>>>>>>>return bytecode (effectively, because it crashes in JITted code,
>> >>>>>>>>but I bet the stack vm will crash also, but not as cleanly - it
>> >>>>>>>>will try and execute the bytes in the encoded method trailer).
>> >>>>>>>>So which method actually sends resume, and to what, and what
>> >>>>>>>>state is resume’s receiver when resume is sent?
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>Thanks for your reply.
>> >>>>>>>>>Regards,
>> >>>>>>>>>Jaromir
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>------ Original Message ------
>> >>>>>>>>>From "Nicolas Cellier" <nicolas.cellier.aka.nice(a)gmail.com>
>> >>>>>>>>>To "Jaromir Matas" <mail(a)jaromir.net>; "The general-purpose
>> >>>>>>>>>Squeak developers list" <squeak-dev(a)lists.squeakfoundation.org>
>> >>>>>>>>>Date 11/16/2023 7:20:20 AM
>> >>>>>>>>>Subject Re: [squeak-dev] Resuming on BlockCannotReturn
>> >>>>>>>>>exception
>> >>>>>>>>>
>> >>>>>>>>>>Hi Jaromir,
>> >>>>>>>>>>Is there a scenario where it would make sense to resume a
>> >>>>>>>>>>BlockCannotReturn?
>> >>>>>>>>>>If not, I would suggest to protect at image side and override
>> >>>>>>>>>>#resume.
>> >>>>>>>>>>
>> >>>>>>>>>>Le mer. 15 nov. 2023, 23:42, Jaromir Matas <mail(a)jaromir.net>
>> >>>>>>>>>>a écrit :
>> >>>>>>>>>>>Hi Eliot, Christoph, All,
>> >>>>>>>>>>>
>> >>>>>>>>>>>It's known the following example crashes the VM. Is this an
>> >>>>>>>>>>>intended behavior or a "tolerated bug"?
>> >>>>>>>>>>>
>> >>>>>>>>>>>`[[^ 1] on: BlockCannotReturn do: #resume] fork`
>> >>>>>>>>>>>
>> >>>>>>>>>>>I understand why it crashes: the non-local return has nowhere
>> >>>>>>>>>>>to return to and so resuming the computation leads to a
>> >>>>>>>>>>>crash. But why not raise another BCR exception to prevent the
>> >>>>>>>>>>>crash? Potential infinite loop? Perhaps I'm just missing the
>> >>>>>>>>>>>purpose of this behavior...
>> >>>>>>>>>>>
>> >>>>>>>>>>>Thanks for an explanation.
>> >>>>>>>>>>>
>> >>>>>>>>>>>Best,
>> >>>>>>>>>>>Jaromir
>> >>>>>>>>>>>
>> >>>>>>>>>>>--
>> >>>>>>>>>>>
>> >>>>>>>>>>>Jaromir Matas
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>--
>> >>>>>>_,,,^..^,,,_
>> >>>>>>best, Eliot
>> >>>><Context-cannotReturn.st>
>
 
---
Sent from Squeak Inbox Talk