Re: [Vm-dev] Exploring the simulator (was Re: REPL image for simulation)

12 Jun 2016

      Hi Ben,
On Sun, Jun 12, 2016 at 10:36 AM, Ben Coman btc@openinworld.com wrote:
...
On Sun, Jun 12, 2016 at 10:59 PM, Clément Bera bera.clement@gmail.com
wrote:
...
Hi again,
On Sun, Jun 12, 2016 at 10:44 AM, Clément Bera bera.clement@gmail.com
wrote:
...
...
Hi Ben,
I'm glad you're now looking into the JIT. If you have some blog or
something, please write an experience report about you looking into the
simulator. It's helpful for us to have noise around the VM.
Cool. I'll have a go.
...
...
On Sun, Jun 12, 2016 at 8:35 AM, Ben Coman btc@openinworld.com wrote:
...
I am stepping for the first time through the CogVM, having [set break
selector...] forkAt:
After stepping in a few times I get to #activateCoggedNewMethod.
  CogVMSimulatorLSB(CoInterpreter)>>dispatchOn:in:
  CogVMSimulatorLSB(CoInterpreter)>>sendLiteralSelector1ArgBytecode
  CogVMSimulatorLSB(CoInterpreter)>>commonSendOrdinary
  CogVMSimulatorLSB(CoInterpreter)>>insternalExecuteNewMethod
  CogVMSimulatorLSB(CoInterpreter)>>activateCoggedNewMethod
Here from the code at the top.
    methodHeader := self rawHeaderOf: newMethod.
    self assert: (self isCogMethodReference: methodHeader).
    cogMethod := self cCoerceSimple: methodHeader to: #'CogMethod *'.
    methodHeader := cogMethod methodHeader.
I guess methodHeader's double assignment above is related to the
machine code frame having two addresses as Clement described...
Errr... I don't really fancy the way you say it but I think yes that's
it.
...
...
A method can have 2 addresses, the address of the bytecoded version in
the heap and the address of its jitted version in the machine code zone. In
the machine code frame printing, the simulator displays the 2 addresses.
But the frame has a single pointer to the method.
...
...
So what you're looking at is the dispatch logic from the bytecoded
method to the jitted method. When the JIT compiles a bytecoded method to
machine code, it replaces the bytecoded method compiled method header
(first literal) by a pointer to the jitted version. The machine code
version of the method keeps the  compiled method header, so accessing it is
different in methods compiled to machine code and methods not compiled to
machine code.
...
...
#rawHeaderOf: answers the first literal of the bytecoded method which
is a pointer to the jitted version of the method if the method has a jitted
version, else is the compiled method header. In the code you show, the VM
ensures the method has a jitted version with the assertion, hence the
compiled method header is fetched from the jitted version.
I think I've got it. So upon JITing, CompiledMethod and its literals
and bytecodes don't move.
Only its bytecodeHeader is manipulated and re-purposed.
Before JIT...
compiledMethod := { bytecodeHeader, literals, bytecodes }.
byteCodeHeader := compiledMethod at: 1
After JIT something like...
cogMethod := { cogMethodHeader, bytecodeHeader, machineCode }
compiledMethod := { pointerTo_cogMethod, literals, bytecodes }.
rawHeader := compiledMethod at: 1
cogMethodHeader := dereferenced(rawHeader) at: 1.
Right.  When a method is cogged (jitter) its header is set to point to the
Cog method (machine code method), and the actual header is stashed inside
the Cog method.  This is invisible to the image because only the objectAt:
primitive accesses CompiledMethod literals and this primitive checks.  In
the VM all points where methodHeader is accessed must check for a normal
method (header is a SmallInteger) and a cogged method (header is not a
SmallInteger).
...
...
...
...
...
...
On Mon, May 30, 2016 at 4:12 PM, Clément Bera <
bera.clement@gmail.com> wrote:
...
...
...
...
...
> Now that you've print the frame, you can see the method addresses
in this line:
...
...
...
...
...
> 16r103144:      method:    16r51578  16r102BDD0 16r102BDD0: a(n)
CompiledMethod.
...
...
...
...
...
> This is a machine code frame, so the method has two addresses:
> 16r51578 => in generated method, so you need to use
[disassembleMethod/trampoline...] and write down the hex to see the
disassembly.
...
...
...
...
...
> 16r102BDD0 => in the heap. This is the bytecode version of the
method. You can print it using [print oop...]
...
...
...
This time...
[print ext head frame] ==>
  16r101214 M BlockClosure>forkAt: 16r2FC420: a(n) BlockClosure
  16r101210: method:     16rBBF0  16rC4E948 16rC4E948: a(n)
CompiledMethod
...
...
...
self rawHeaderOf: newMethod ==> 16rBBF0
So the "raw header" is the cogged method.
Looking at the output below, the space ship operator <-> seems to link
between cogged method headers like a call stack, except   #forkAt:
calls  #newProcess  which calls  #asContext
[print cog method for...] 16rBBF0 ==>
  16rBBF0 <-> 16rBC80: method: 16rC4E948 selector: 16r6CC798 forkAt:
[print cog method for...] 16rBC80 ==>
   16rBC80 <-> 16rBEA8: method: 16rC51970 prim 19 selector: 16r6D1620
newProcess
...
...
...
[print cog method for...] 16rBEA8 ==>
   16rBEA8 <->     16rBF28: method:   16rC518C0 selector:   16r76A600
asContext
...
...
...
However the links don't seem to go back up the call stack but forward,
to statements to be executed in the future.   So I am confused?
Yeah it's the jitted version of the method header address, then <->,
then the jitted method entry point address, the bytecode version address,
selector address.
...
...
The cogMethod header is used to store the bytecoded compiled method
header (because it was replaced with a pointer to the cogMethod) and
various flags.
...
...
...

Considering further [print cog method for...] 16rBBF0 ==>
  16rBBF0 <-> 16rBC80: method: 16rC4E948 selector: 16r6CC798 forkAt:
[print oop...] 16r6CC798 ==>
   a(n) ByteSymbol nbytes 7  forkAt:
Clement early advised is the bytecode version of the method is this...
[print oop...] 16rC4E948 ==>
  16rC4E948: a(n) CompiledMethod nbytes 37
     16rBBF0  is in generated methods
   16r6D1620 #newProcess   16r6CC650 #priority:   16r6CC690 #resume
   16r6CC798 #forkAt:   16rAE5490 a ClassBinding #BlockClosure ->
16r0088D618
...
...
...
16rC4E968:  70/112 D0/208 88/136 10/16 E1/225 87/135 D2/210 7C/124
  16rC4E970:  28/40 AF/175 BA/186 F3/243 20/32
Now I've been a bit slow on the uptake and only just realised, but to
confirm...
...
...
...
the line 16r6CC798 is the one specifying the method as
BlockClosure>>forkAt:
...
...
16r6CC798 is the address of the selector #forkAt:
Sorry I wasn't clear.  I wasn't referring to the address itself of the
selector - that was just a line reference.  My insight I wanted to
confirm was that the last oop before the bytecode was...
     a ClassBinding #BlockClosure -> 16r0088D618
and the next last before that was...
    #forkAt:   16rAE5490
indicating the output of [print oop...] was method BlockClosure>>forkAt: ,
while above that line are the methods called by #forkAt: and below it
is the bytecode.
Ahhh, actually I just saw this relevant comment in CompiledMethod...
"The last literal in a CompiledMethod must be its
methodClassAssociation, a binding whose value is the class the method
is installed in.  The methodClassAssociation is used to implement
super sends.  If a method contains no super send then its
methodClassAssociation may be nil (as would be the case for example of
methods providing a pool of inst var accessors). By convention the
penultimate literal of a method is either its selector or an instance
of AdditionalMethodState. "
So it seems it won't always show the Class>>method, but often will.
...
...
...
For the last two lines, I notice the numbers before the slash (70, 88,
10...) are the method bytecode, but what are the numbers after the
slash?
The bytecode in decimal instead of hexa I think.
I checked. You are right.  Obvious in hindsight.
Note that you can use the image[level byte code printing machinery on a
method in the simulator by using Stackinterpreter>>symbolicMethod:.  The
text is output in the simulator window's transcript or the system
transcript.  See "toggle transcript" towards the top of the bottom right
hand simulator window's menu.
...
...
...
...

In #activeCoggedNewMethod: the second assignment to methodHeader
  ==> 16r208000B
which matches the mthhdr field of the raw header
[print cog method header for...] 16rBBF0 ==>
    BBF0
    objhdr: 8000000A000035
    nArgs: 1 type: 2
    blksiz: 90
    method: C4E948
    mthhdr: 208000B
    selctr: 6CC798=#forkAt:
    blkentry: 0
    stackCheckOffset: 5E/BC4E
    cmRefersToYoung: no cmIsFullBlock: no
What is "type: 2" ?
Haha.
Well when you iterate over the machine code zone you need to know what
the current element you iterate on is. In the machine code zone there can
be:
...
...

cog method
closed PICS
open PICS
free space

And now we're adding cog full block method but it's sharing the index
with cog method and have a separated flag :-)
...
...
The type tells you what it is. Look at the Literal variables CMFree,
CMClosedPIC, CMOpenPIC, etc .
...
...
2 is CMMethod with is a constant. You can improve the printing there
and commit the changes if you feel so.
...
What did I write here I don't understand myself ? I mean CMMethod = 2,
so type = 2 means the struct you're looking at in the machine code zone is
a method and not free space or a PIC.
...
...
Ok I have to go I will look at the rest of your mail later.
Let's do this...
...
...

Stepping through to  Cogit>>ceEnterCogCodePopReceiverReg
I notice its protocol is "simulation only"
and it calls  "simulateEnilopmart:numArgs:
ceEnterCogCodePopReceiverReg"
...
...
...
but I don't see any other implementors of
#ceEnterCogCodePopReceiverReg.
...
...
...
Also there is a pragma <doNotGenerate>.
Obviously the real non-simulated VM works differently, but I can't
determine how.
btw, I have noticed that  ceEnterCogCodePopReceiverReg
   ==> 16r10B8
and [print cog method for...] 16r10B8
   ==> trampoline ceEnterCogCodePopReceiverReg
Is ceEnterCogCodePopReceiverReg provided by the platform C code?
Well it's in cogitIA32.c. I don't remember where it comes from.
Cool. I had a peek.
...
Basically in Cog you have specific machine code routines, called
trampolines, that switch from machine code to C code. When trampoline is
written backward (Enilopmart) it means that the routine is meant to switch
from C code to machine code.
...
The real VM calls in ceEnterCogCodePopReceiverReg a machine code routine
that does the right thing (register remapped, maybe fp and sp saved, etc)
to switch from the C runtime from the C compiler to the machine code
runtime executing code generated by the JIT.
I see its a function pointer...
   void (*ceEnterCogCodePopReceiverReg)(void)
set by...
   ceEnterCogCodePopReceiverReg =
genEnilopmartForandandforCallcalled(ReceiverResultReg, NoReg, NoReg,
0, "ceEnterCogCodePopReceiverReg");
which is beyond my current level need-to-know.  Still useful to fill
in the background architecture.  This comment comparing
trampoline/enilopmart to system-call-like transition was
enlightening...
/*      An enilopmart (the reverse of a trampoline) is a piece of code
that makes
        the system-call-like transition from the C runtime into
generated machine
        code. The desired arguments and entry-point are pushed on a
stackPage's
        stack. The enilopmart pops off the values to be loaded into
registers and
        then executes a return instruction to pop off the entry-point
and jump to
        it.
        BEFORE                          AFTER
(stacks grow down)
        whatever                        stackPointer -> whatever
        target address =>       reg1 = reg1val, etc
        reg1val                         pc = target address
        reg2val
        stackPointer -> reg3val */
    /* Cogit>>#genEnilopmartFor:and:and:forCall:called: */

Right.  Trampolines are the machine code routines that call into the
Smalltalk (simulator) / C (real VM) run-time support routines.  They make a
stack switch from the Smalltalk/Cog-machine-code stack to the actual C
stack, and pass parameters.  Enilopmarts do the reverse.  They switch from
the actual C stack back into the Smalltalk/Cog-machine-code stack, possibly
popping values pushed onto that stack into specific registers, and then
executing a return instruction to jump to some machine code address in the
machine code zone to start or resume machine-code execution.
...
...
In simulation, the C code is simulated by executing Slang as Smalltalk
code and the machine code is simulated using the processor simulator (Bochs
for IA32). So it has to be done differently as there is no C stack with
register state and stuff. Both trampolines and enilmoparts are simulated
with specific code.
...
...
...

Stepping through to simulateCogCodeAt:
it called processor singleStepIn:minimumAddress:readOnlyBelow:
which called
BochsIA32Alien>>primitiveSingleStepInMemory:minimumAddress:readOnlyBelow:
...
...
...
 <primitive: 'primitiveSingleStepInMemoryMinimumAddressReadWrite'
   module: 'BochsIA32Plugin'
   error: ec>
 ^ec == #'inappropriate operation'
     ifTrue: [self handleExecutionPrimitiveFailureIn: memoryArray
            minimumAddress: minimumAddress]
     ifFalse: [self reportPrimitiveFailure]

and the debugger cursor was inside the ifTrue: statement.  I found I
didn't have bochs installed, but after installing bochs-2.6-2, I go
the same result. So could I get some background around this..
Also I'm curious how the simulator seemed to be running a CogVM before
bochs was installed. Perhaps since I was not debugging through it, the
machine code ran for real rather than being simulated.
No the machine code is always simulated. Bochs was working for sure if
you successfully simulated the image on top of the cog simulator until the
display was shown.
...
If you have a VM from one of Eliot's build (from the Cog blog) the
processor simulators are present as plugins by default. On Mac you can do
[show package contents...] and then look at the file inside to check the
Bochs Plugin is there. It's not the case on the Pharo VMs so don't use them
for CogVM simulation. You don't need to install anything.
Ahhh... I see them now.
./lib/squeak/5.0-3692/BochsX64Plugin
./lib/squeak/5.0-3692/BochsIA32Plugin
The clears my misconception - a lack of understanding the purpose of
the primitive failure and a red herring when I saw the Boch's system
package wasn't installed.
...
On normal simulation the simulator goes often in the branch you've just
shown. It means it reached a simulation trap. As for enilmopart that can't
be properly simulated, trampolines can't be simulated. So to simulate a
trampoline the processor simulator fails a call and the trampoline is done
in the simulation code. Look at #handleCallOrJumpSimulationTrap: for
example.
Not quite.  The trampolines are simulated.  The calls the trampolines make
can't be simulated.  These calls are to illegal addresses and cause traps.
The trap handler maps the addresses into the appropriate Smalltalk
blocks/methods and invokes them.  The same goes for accessing variables in
the simulator such as framePointer, stackPointer, instructionPointer etc.
These are Smalltalk objects that are instanced variables of the
CoInterpreter.   They are mapped to illegal addresses in machine code and
attempts to access them cause traps and the trap handler maps these to
fetch/store the relevant inst var.  In the real VM the actual addresses of
the variables are used directly.
...
Ah, so its an 'inappropriate operation' from Bochs' perspective, but
from the Simulator's perspective the primitiveFail is a useful
condition like the #ensure: "Primitive 198 always fails.  The VM uses
prim 198 in a context's method as the mark for an ensure:/ifCurtailed:
activation."  ?
Right.  Andreas realised specific primitives that did nothing could be used
to mark methods for the VM's benefit, without needing a bit in the method
header.  Very clever, very economical.  A nice idea.
...
cheers -ben
btw, I bumped into a bit of history...
http://www.mirandabanda.org/cogblog/2008/12/12/simulate-out-of-the-bochs/
:-)
-- 
_,,,^..^,,,_
best, Eliot

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Vm-dev] Exploring the simulator (was Re: REPL image for simulation)