Hi Tobias,

On Mon, Jun 21, 2021 at 4:20 AM Tobias Pape <Das.Linux@gmx.de> wrote:
 
Hi


> On 21. Jun 2021, at 12:15, Bruce O'Neel <bruce.oneel@pckswarms.ch> wrote:
>
> Hi,
>
> I have now gone back and checked this carefully.  The git repository at the commit before this one produces the right answer for 2 raisedTo: 64 on Aarch64.  It produces the wrong answer, 0, when this commit is included.
>
> So it's this commit.  I'll look at the code to see if I can see why the code is bad.
>
> BTW, how would one find the two referenced packages?

https://source.squeak.org/VMMaker/ has all the packages. (or http://source.squeak.org/VMMaker.html)

        The diff for the first one (https://source.squeak.org/VMMaker/VMMaker.oscog-eem.2799.diff) ist truncated,
        while the diff-changeset is ok: http://source.squeak.org/VMMaker/changes/VMMaker.oscog-eem.2799(2798).cs
        (you see all changed methods, but not the actual diff)

But this only adds the JumpMulOverflow / JumpNoMulOverflow names/class-vars.

The implementation diff is found at
        http://www.squeaksource.com/ClosedVMMaker/ClosedVMMaker-eem.98.diff

I would suggest that our bug does not concern Jumps, but rather only Mul, so we have to start looking
at the differences for concretizeMulOverflowRRR ...

>From what I see, tho, is that the MUL and the SMULH instruction order has been reversed.

Indeed.  I created a regression.  Apologies for all this effort over my mistake.


(Also, the arm blog uses an different interesting method of detecting overflow,
by missusing the argument shitfter of CMP: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/detecting-overflow-from-mul
Although this is written for older arm, it seems to be applicable to the 64-variant too)

So here's how I decided to do it on AArch64.  Maybe you can find something better.

AArch64 implement 64 x 64 => 128 bit multiply by providing a 64x64 => low 64 bits and a 64x64 => high 64 bits multiply instruction.  So if a 64x64=>128 multiply has not overflowed the high 64 bits will be an extension of the sign bit of the 64x64=>low 64 instruction.  i.e. high 64 will either be all 0s or all 1s (all 1s = -1). So if one subtracts the sign bit from the high 64 result one gets 0 if there is no overflow.  Here's the instruction sequence:

smulh x9/RISCTemp, x4/Class, x8/Arg1
mul x4/Class, x4/Class, x8/Arg1
lsr x1, x4/Class, #63
adds x9/RISCTemp, x9/RISCTemp, x1
b.ne 0x44b8  // b.any = *@88 (to overflow)

In the JIT's assembler back end for ARMv8 this is done by two methods:

concretizeMulOverflowRRR
"ARMv8 has no multiply overflow detection.  Instead it is synthesized from the two halves of
a 64x64=>128 bit multiply. The upper 64-bits are tested.  The sequence is
high64 := SMULH a,b
a/low64 := MUL a,b
signBit := low64/a >> 63
high64 := high64 + signBit
If high64 is zero after this sequence then the multiply has not overflowed,
since high64 is an extension of signBit if no overflow (either 0 or -1) and
both -1 + 1 = 0 and 0 + 0 = 0. Note that because we restrict ourselves to
three concrete ARMv8 instructions per abstract instruction the last operation
of the sequence is generated in concretizeMulOverflowJump.

C6.2.196 MUL C6-1111
C6.2.242 SMULH C6-1184
C6.2.180 LSR (immediate) C6-1081 110100110 (1)"

<inline: true>
| reg1 reg2 reg3 |
reg1 := operands at: 0.
reg2 := operands at: 1.
reg3 := operands at: 2.
"RISCTempReg := high(reg1 * reg2); must precede destructive MUL"
machineCode
at: 0
put: 2r1001101101 << 22
+ (reg1 << 16)
+ (XZR << 10)
+ (reg2 << 5)
+ RISCTempReg.
"reg3 := reg1 * reg2"
machineCode
at: 1
put: 2r10011011 << 24
+ (reg1 << 16)
+ (XZR << 10)
+ (reg2 << 5)
+ reg3.
"CArg1Reg := sign(reg3)"
machineCode
at: 2
put: 2r1101001101 << 22
+ (63 << 16) "constant to shift by"
+ (63 << 10)
+ (reg3 << 5)
+ CArg1Reg. "cuz CArg0Reg == TempReg"
"RISCTempReg := RISCTempReg + CArg1Reg/sign
is in concretizeMulOverflowJump"
"cogit processor disassembleInstructionAt: 0 In: machineCode object"
"cogit processor disassembleInstructionAt: 4 In: machineCode object"
"cogit processor disassembleInstructionAt: 8 In: machineCode object"
^12

concretizeMulOverflowJump
"Sizing/generating jumps.
Jump targets can be to absolute addresses or other abstract instructions.
Generating initial trampolines instructions may have no maxSize and be to absolute addresses.
Otherwise instructions must have a machineCodeSize which must be kept to."
<inline: false>
| offset |
offset := self computeJumpTargetOffset - 4. "-4 because the jump is from the second word..."
  self assert: (offset ~= 0 and: [self isInImmediateBranchRange: offset]).
"See concretizeMulOverflowRRR
RISCTempReg := RISCTempReg + CArg1Reg/sign.
JumpZero/NonZero"
machineCode
at: 0
put: 2r10001011 << 24
+ (ArithmeticAddS << 29)
+ (CArg1Reg << 16)
+ (RISCTempReg << 5)
+ RISCTempReg;
at: 1
put: (self
cond: (opcode = JumpMulOverflow ifTrue: [NE] ifFalse: [EQ])
offset: offset). "B offset"
^8

Best regards
        -Tobias



> cheers
>
> bruce
>
> On 2021-06-20T22:44:36.000+02:00, Bruce O'Neel <bruce.oneel@pckswarms.ch> wrote:
> Playing with Git bisect gets us to the commit below as the first bad one.
>
> I started the git bisect with the two commits below.  The first one is bad, the second is the good one.
>
> a783502b249c4a4fedc88b6e07837d405feab144 - zero9 - builds, zero error.
>
> 9f73148b8da4bc00278b83faa8da6b1c418fa54f - zero10 - builds works
>
> Now this is not 100% guaranteed because one of the commits that git bisect chose had a build error so I had to mark that one as bad.  But the commit found does sound possible.
>
> f632ee2888014ee88330ee994e13c9c609b57b5f is the first bad commit
> commit f632ee2888014ee88330ee994e13c9c609b57b5f
> Author: Eliot Miranda <eliot.miranda@gmail.com>
> Date:   Wed Sep 2 10:45:27 2020 -0700
>
>     CogVM source as per VMMaker.oscog-eem.2799/ClosedVMMaker-eem.98
>     
>     Ha!  I am *STUPID*.
>     Integer overflow is not only determined by the upper 64-bits of a 64x64=>128 bit
>     multiply being either all zero or all ones (0 or -1), but by the upper 64-bits
>     being an extension of the most significant bit of the lower 64 bits!!
>
> :040000 040000 ffb8fbcd8ab5e2ca1936429b2daaade62909c178 a5ac89d8b1d4980c5329835cdd3d4a2387b1fac5 M      nsspur64src
> :040000 040000 e606dac14ee1db4ac59b0949bb03cd8e657d7aa7 be666051cd1d52d0055e319432b66a0fba61063e M      spur64src
> :040000 040000 1dea4faf99821c60e2c2461076bfe8b99d4dea9b afbef904e8cbc7e4b80e1606420dd6293e12f3e9 M      spurlowcode64src
> :040000 040000 f66c681004806d5880af7c0a3115038f4f2f3361 0e8d76bedcf22b6ad7497ea2e3dc44f3c36c2f3a M      spursista64src
>
> On 2021-06-20T21:48:49.000+02:00, Craig Latta <craig@blackpagedigital.com> wrote:
> Hi Ken--
>
> I brought up a development environment on a Raspberry Pi 4.
>
> > Unfortunately, my attempt to build a VM simulator on aarch64 fails due
> > to a divide by zero bug.
> >
> > Perhaps because
> > 16r8000000000000000 printStringHex. ==> '0'
>
> update-eem-463.mcm has a postload action which turns on Sista,
> which recompiles the system, which tries to recompile
> FloatArrayTest>>testFloatArrayPluginPrimitiveAtPut, which exercises this
> very bug during scanning, when trying to create the float 1e-127.
>
> For now I've rewritten testFloatArrayPluginPrimitiveAtPut and
> testFloatArrayPluginPrimitiveAt on my system to do nothing. When
> building GdbARMv8Plugin, there's some other kind of
> header-include-ordering mayhem, similar to the problem with Linux
> features.h when building the VM. (At least two levels of it, between
> bfd.h and the aarch64 simulator's config.h, and config.h and other
> system headers.) I expect no one has successfully built GdbARMv8Plugin
> on anything but a M1 macOS machine so far.
>
>
> -C





--
_,,,^..^,,,_
best, Eliot