Re: [Vm-dev] Aarch64 number bug

List overview All Threads
Download

newer

older

[OpenSmalltalk/opensmalltalk-vm]...

VM Maker:...

ken.dickey＠whidbey.com

20 Jun 2021 20 Jun '21

12:16 a.m.

...

"If the community valued its primary execution technology then it would be the case that a few members were competent enough to startup the simulation environment and debug this. "

Unfortunately, my attempt to build a VM simulator on aarch64 fails due to a divide by zero bug.

Perhaps because 16r8000000000000000 printStringHex. ==> '0'

You may be over-estimating my ability to debug this sort of thing.

-------------------

In the wider context, my suggestion is to declare, say the first week in July, as "VM Appreciation Week".

The observation being that OpenSmalltalk is common to Cuis, Squeak, and Pharo. OpenSmalltalk is open source. OpenSmalltalk is a _Commons_. A "commons" in our case is a shared resource which is managed by its users.

To be better managers (or managers at all, rather than just freeloaders), we should take some time to experience it. [a] Build and run a/the simulator, [b] Build a VM.

Participation means knowing the resource in order to be able to sustain it. -------------------

OK, so raising awareness to become visible in the right way.

There are a few speed bumps to be overcome first.

One desires that Joe or Jane coder be able to do [a] and [b] above.

Building VMMaker on Linux/x86-64 I get a large number of merge dialogs and then: vvv===vvv This virtual machine is too old to support correct versions of the FloatArray>>at:[put:] primitives 238 and 239. FloatArray subclasses will not behave correctly and FloatArray[64]Test tests will fail. Please upgrade your VM. You may continue and upgrade later or abort and upgrade now. ^^^===^^^

But if I try to build a VM, I run into a number of problems (notes attached; the "included too late" bug shows up in the arm build as well).

It is difficult to see today how Joe or Jane can build a VM if like me their knowledge of build mechanics is fairly shallow.

Like you all, I have a very busy life. Among other things, given a destabalized climate I am now tending my garden to raise food.

On the other hand, I am happy to spend time supporting the OpenSmalltalk Commons as, yes, this _is_ important to me.

I don't feel that I have the skillset to fix the current speed bumps, but I certainly would like to declare a "VM Appreciation Week" sometime soon and welcome a discussion on how we engender a better stewardship of our commons..

-KenD

Attachments:

ZeroDivide.png (image/png — 272.0 KB)
build-bug.txt (text/plain — 22.3 KB)

Show replies by date

Craig Latta

20 Jun 20 Jun

9:48 p.m.

New subject: Aarch64 number bug

Hi Ken--

I brought up a development environment on a Raspberry Pi 4.

...

Unfortunately, my attempt to build a VM simulator on aarch64 fails due to a divide by zero bug.

Perhaps because 16r8000000000000000 printStringHex. ==> '0'

update-eem-463.mcm has a postload action which turns on Sista, which recompiles the system, which tries to recompile FloatArrayTest>>testFloatArrayPluginPrimitiveAtPut, which exercises this very bug during scanning, when trying to create the float 1e-127.

For now I've rewritten testFloatArrayPluginPrimitiveAtPut and testFloatArrayPluginPrimitiveAt on my system to do nothing. When building GdbARMv8Plugin, there's some other kind of header-include-ordering mayhem, similar to the problem with Linux features.h when building the VM. (At least two levels of it, between bfd.h and the aarch64 simulator's config.h, and config.h and other system headers.) I expect no one has successfully built GdbARMv8Plugin on anything but a M1 macOS machine so far.

-C

-- Craig Latta :: research computer scientist Black Page Digital :: Berkeley, California 663137D7940BF5C0AFC :: 1349FB2ADA32C4D5314CE

Bruce O'Neel

10:44 p.m.

New subject: Aarch64 number bug

Playing with Git bisect gets us to the commit below as the first bad one.

I started the git bisect with the two commits below. The first one is bad, the second is the good one.

a783502b249c4a4fedc88b6e07837d405feab144 - zero9 - builds, zero error.

9f73148b8da4bc00278b83faa8da6b1c418fa54f - zero10 - builds works

Now this is not 100% guaranteed because one of the commits that git bisect chose had a build error so I had to mark that one as bad. But the commit found does sound possible.

f632ee2888014ee88330ee994e13c9c609b57b5f is the first bad commit

commit f632ee2888014ee88330ee994e13c9c609b57b5f

Author: Eliot Miranda eliot.miranda@gmail.com

Date: Wed Sep 2 10:45:27 2020 -0700

CogVM source as per VMMaker.oscog-eem.2799/ClosedVMMaker-eem.98

Ha! I am *STUPID*.

Integer overflow is not only determined by the upper 64-bits of a 64x64=>128 bit

multiply being either all zero or all ones (0 or -1), but by the upper 64-bits

being an extension of the most significant bit of the lower 64 bits!!

:040000 040000 ffb8fbcd8ab5e2ca1936429b2daaade62909c178 a5ac89d8b1d4980c5329835cdd3d4a2387b1fac5 M nsspur64src

:040000 040000 e606dac14ee1db4ac59b0949bb03cd8e657d7aa7 be666051cd1d52d0055e319432b66a0fba61063e M spur64src

:040000 040000 1dea4faf99821c60e2c2461076bfe8b99d4dea9b afbef904e8cbc7e4b80e1606420dd6293e12f3e9 M spurlowcode64src

:040000 040000 f66c681004806d5880af7c0a3115038f4f2f3361 0e8d76bedcf22b6ad7497ea2e3dc44f3c36c2f3a M spursista64src

On 2021-06-20T21:48:49.000+02:00, Craig Latta craig@blackpagedigital.com wrote:

...

Hi Ken-- I brought up a development environment on a Raspberry Pi 4. > Unfortunately, my attempt to build a VM simulator on aarch64 fails due > to a divide by zero bug. > > Perhaps because > 16r8000000000000000 printStringHex. ==> '0' update-eem-463.mcm has a postload action which turns on Sista, which recompiles the system, which tries to recompile FloatArrayTest>>testFloatArrayPluginPrimitiveAtPut, which exercises this very bug during scanning, when trying to create the float 1e-127. For now I've rewritten testFloatArrayPluginPrimitiveAtPut and testFloatArrayPluginPrimitiveAt on my system to do nothing. When building GdbARMv8Plugin, there's some other kind of header-include-ordering mayhem, similar to the problem with Linux features.h when building the VM. (At least two levels of it, between bfd.h and the aarch64 simulator's config.h, and config.h and other system headers.) I expect no one has successfully built GdbARMv8Plugin on anything but a M1 macOS machine so far. -C -- Craig Latta :: research computer scientist Black Page Digital :: Berkeley, California 663137D7940BF5C0AFC :: 1349FB2ADA32C4D5314CE

Bruce O'Neel

21 Jun 21 Jun

12:15 p.m.

New subject: Aarch64 number bug

Hi,

I have now gone back and checked this carefully. The git repository at the commit before this one produces the right answer for 2 raisedTo: 64 on Aarch64. It produces the wrong answer, 0, when this commit is included.

So it's this commit. I'll look at the code to see if I can see why the code is bad.

BTW, how would one find the two referenced packages?

cheers

bruce

On 2021-06-20T22:44:36.000+02:00, Bruce O'Neel bruce.oneel@pckswarms.ch wrote:

...

Playing with Git bisect gets us to the commit below as the first bad one. I started the git bisect with the two commits below. The first one is bad, the second is the good one. a783502b249c4a4fedc88b6e07837d405feab144 - zero9 - builds, zero error. 9f73148b8da4bc00278b83faa8da6b1c418fa54f - zero10 - builds works Now this is not 100% guaranteed because one of the commits that git bisect chose had a build error so I had to mark that one as bad. But the commit found does sound possible. f632ee2888014ee88330ee994e13c9c609b57b5f is the first bad commit commit f632ee2888014ee88330ee994e13c9c609b57b5f Author: Eliot Miranda eliot.miranda@gmail.com Date: Wed Sep 2 10:45:27 2020 -0700 CogVM source as per VMMaker.oscog-eem.2799/ClosedVMMaker-eem.98 Ha! I am *STUPID*. Integer overflow is not only determined by the upper 64-bits of a 64x64=>128 bit multiply being either all zero or all ones (0 or -1), but by the upper 64-bits being an extension of the most significant bit of the lower 64 bits!! :040000 040000 ffb8fbcd8ab5e2ca1936429b2daaade62909c178 a5ac89d8b1d4980c5329835cdd3d4a2387b1fac5 M nsspur64src :040000 040000 e606dac14ee1db4ac59b0949bb03cd8e657d7aa7 be666051cd1d52d0055e319432b66a0fba61063e M spur64src :040000 040000 1dea4faf99821c60e2c2461076bfe8b99d4dea9b afbef904e8cbc7e4b80e1606420dd6293e12f3e9 M spurlowcode64src :040000 040000 f66c681004806d5880af7c0a3115038f4f2f3361 0e8d76bedcf22b6ad7497ea2e3dc44f3c36c2f3a M spursista64src On 2021-06-20T21:48:49.000+02:00, Craig Latta craig@blackpagedigital.com wrote:

...
Hi Ken-- I brought up a development environment on a Raspberry Pi 4. > Unfortunately, my attempt to build a VM simulator on aarch64 fails due > to a divide by zero bug. > > Perhaps because > 16r8000000000000000 printStringHex. ==> '0' update-eem-463.mcm has a postload action which turns on Sista, which recompiles the system, which tries to recompile FloatArrayTest>>testFloatArrayPluginPrimitiveAtPut, which exercises this very bug during scanning, when trying to create the float 1e-127. For now I've rewritten testFloatArrayPluginPrimitiveAtPut and testFloatArrayPluginPrimitiveAt on my system to do nothing. When building GdbARMv8Plugin, there's some other kind of header-include-ordering mayhem, similar to the problem with Linux features.h when building the VM. (At least two levels of it, between bfd.h and the aarch64 simulator's config.h, and config.h and other system headers.) I expect no one has successfully built GdbARMv8Plugin on anything but a M1 macOS machine so far. -C -- Craig Latta :: research computer scientist Black Page Digital :: Berkeley, California 663137D7940BF5C0AFC :: 1349FB2ADA32C4D5314CE

Tobias Pape

1:20 p.m.

New subject: Aarch64 number bug

...

On 21. Jun 2021, at 12:15, Bruce O'Neel bruce.oneel@pckswarms.ch wrote:

Hi,

I have now gone back and checked this carefully. The git repository at the commit before this one produces the right answer for 2 raisedTo: 64 on Aarch64. It produces the wrong answer, 0, when this commit is included.

So it's this commit. I'll look at the code to see if I can see why the code is bad.

BTW, how would one find the two referenced packages?

https://source.squeak.org/VMMaker/ has all the packages. (or http://source.squeak.org/VMMaker.html)

The diff for the first one (https://source.squeak.org/VMMaker/VMMaker.oscog-eem.2799.diff) ist truncated, while the diff-changeset is ok: http://source.squeak.org/VMMaker/changes/VMMaker.oscog-eem.2799(2798).cs (you see all changed methods, but not the actual diff)

But this only adds the JumpMulOverflow / JumpNoMulOverflow names/class-vars.

The implementation diff is found at http://www.squeaksource.com/ClosedVMMaker/ClosedVMMaker-eem.98.diff

I would suggest that our bug does not concern Jumps, but rather only Mul, so we have to start looking at the differences for concretizeMulOverflowRRR ...

From what I see, tho, is that the MUL and the SMULH instruction order has been reversed.

(Also, the arm blog uses an different interesting method of detecting overflow, by missusing the argument shitfter of CMP: https://community.arm.com/developer/ip-products/processors/b/processors-ip-b... Although this is written for older arm, it seems to be applicable to the 64-variant too)

Best regards -Tobias

...

cheers

bruce

On 2021-06-20T22:44:36.000+02:00, Bruce O'Neel bruce.oneel@pckswarms.ch wrote: Playing with Git bisect gets us to the commit below as the first bad one.

I started the git bisect with the two commits below. The first one is bad, the second is the good one.

a783502b249c4a4fedc88b6e07837d405feab144 - zero9 - builds, zero error.

9f73148b8da4bc00278b83faa8da6b1c418fa54f - zero10 - builds works

Now this is not 100% guaranteed because one of the commits that git bisect chose had a build error so I had to mark that one as bad. But the commit found does sound possible.

f632ee2888014ee88330ee994e13c9c609b57b5f is the first bad commit commit f632ee2888014ee88330ee994e13c9c609b57b5f Author: Eliot Miranda eliot.miranda@gmail.com Date: Wed Sep 2 10:45:27 2020 -0700
CogVM source as per VMMaker.oscog-eem.2799/ClosedVMMaker-eem.98

Ha!  I am *STUPID*.
Integer overflow is not only determined by the upper 64-bits of a 64x64=>128 bit
multiply being either all zero or all ones (0 or -1), but by the upper 64-bits
being an extension of the most significant bit of the lower 64 bits!!
:040000 040000 ffb8fbcd8ab5e2ca1936429b2daaade62909c178 a5ac89d8b1d4980c5329835cdd3d4a2387b1fac5 M nsspur64src :040000 040000 e606dac14ee1db4ac59b0949bb03cd8e657d7aa7 be666051cd1d52d0055e319432b66a0fba61063e M spur64src :040000 040000 1dea4faf99821c60e2c2461076bfe8b99d4dea9b afbef904e8cbc7e4b80e1606420dd6293e12f3e9 M spurlowcode64src :040000 040000 f66c681004806d5880af7c0a3115038f4f2f3361 0e8d76bedcf22b6ad7497ea2e3dc44f3c36c2f3a M spursista64src

On 2021-06-20T21:48:49.000+02:00, Craig Latta craig@blackpagedigital.com wrote: Hi Ken--

I brought up a development environment on a Raspberry Pi 4.

...
Unfortunately, my attempt to build a VM simulator on aarch64 fails due to a divide by zero bug.

Perhaps because 16r8000000000000000 printStringHex. ==> '0'

update-eem-463.mcm has a postload action which turns on Sista, which recompiles the system, which tries to recompile FloatArrayTest>>testFloatArrayPluginPrimitiveAtPut, which exercises this very bug during scanning, when trying to create the float 1e-127.

For now I've rewritten testFloatArrayPluginPrimitiveAtPut and testFloatArrayPluginPrimitiveAt on my system to do nothing. When building GdbARMv8Plugin, there's some other kind of header-include-ordering mayhem, similar to the problem with Linux features.h when building the VM. (At least two levels of it, between bfd.h and the aarch64 simulator's config.h, and config.h and other system headers.) I expect no one has successfully built GdbARMv8Plugin on anything but a M1 macOS machine so far.

-C

Eliot Miranda

2 Jul 2 Jul

4:07 p.m.

New subject: Aarch64 number bug

Hi Tobias,

On Mon, Jun 21, 2021 at 4:20 AM Tobias Pape Das.Linux@gmx.de wrote:

...

Hi

...
On 21. Jun 2021, at 12:15, Bruce O'Neel bruce.oneel@pckswarms.ch

wrote:

...
Hi,

I have now gone back and checked this carefully. The git repository at

the commit before this one produces the right answer for 2 raisedTo: 64 on Aarch64. It produces the wrong answer, 0, when this commit is included.

...
So it's this commit. I'll look at the code to see if I can see why the

code is bad.

...
BTW, how would one find the two referenced packages?

https://source.squeak.org/VMMaker/ has all the packages. (or http://source.squeak.org/VMMaker.html)
    The diff for the first one (
https://source.squeak.org/VMMaker/VMMaker.oscog-eem.2799.diff) ist truncated, while the diff-changeset is ok: http://source.squeak.org/VMMaker/changes/VMMaker.oscog-eem.2799(2798).cs (you see all changed methods, but not the actual diff)

But this only adds the JumpMulOverflow / JumpNoMulOverflow names/class-vars.

The implementation diff is found at

http://www.squeaksource.com/ClosedVMMaker/ClosedVMMaker-eem.98.diff

I would suggest that our bug does not concern Jumps, but rather only Mul, so we have to start looking at the differences for concretizeMulOverflowRRR ...

From what I see, tho, is that the MUL and the SMULH instruction order has been reversed.

Indeed. I created a regression. Apologies for all this effort over my mistake.

...

(Also, the arm blog uses an different interesting method of detecting overflow, by missusing the argument shitfter of CMP: https://community.arm.com/developer/ip-products/processors/b/processors-ip-b... Although this is written for older arm, it seems to be applicable to the 64-variant too)

So here's how I decided to do it on AArch64. Maybe you can find something better.

AArch64 implement 64 x 64 => 128 bit multiply by providing a 64x64 => low 64 bits and a 64x64 => high 64 bits multiply instruction. So if a 64x64=>128 multiply has not overflowed the high 64 bits will be an extension of the sign bit of the 64x64=>low 64 instruction. i.e. high 64 will either be all 0s or all 1s (all 1s = -1). So if one subtracts the sign bit from the high 64 result one gets 0 if there is no overflow. Here's the instruction sequence:

smulh x9/RISCTemp, x4/Class, x8/Arg1 mul x4/Class, x4/Class, x8/Arg1 lsr x1, x4/Class, #63 adds x9/RISCTemp, x9/RISCTemp, x1 b.ne 0x44b8 // b.any = *@88 (to overflow)

In the JIT's assembler back end for ARMv8 this is done by two methods:

concretizeMulOverflowRRR "ARMv8 has no multiply overflow detection. Instead it is synthesized from the two halves of a 64x64=>128 bit multiply. The upper 64-bits are tested. The sequence is high64 := SMULH a,b a/low64 := MUL a,b signBit := low64/a >> 63 high64 := high64 + signBit If high64 is zero after this sequence then the multiply has not overflowed, since high64 is an extension of signBit if no overflow (either 0 or -1) and both -1 + 1 = 0 and 0 + 0 = 0. Note that because we restrict ourselves to three concrete ARMv8 instructions per abstract instruction the last operation of the sequence is generated in concretizeMulOverflowJump.

C6.2.196 MUL C6-1111 C6.2.242 SMULH C6-1184 C6.2.180 LSR (immediate) C6-1081 110100110 (1)"

<inline: true> | reg1 reg2 reg3 | reg1 := operands at: 0. reg2 := operands at: 1. reg3 := operands at: 2. "RISCTempReg := high(reg1 * reg2); must precede destructive MUL" machineCode at: 0 put: 2r1001101101 << 22 + (reg1 << 16) + (XZR << 10) + (reg2 << 5) + RISCTempReg. "reg3 := reg1 * reg2" machineCode at: 1 put: 2r10011011 << 24 + (reg1 << 16) + (XZR << 10) + (reg2 << 5) + reg3. "CArg1Reg := sign(reg3)" machineCode at: 2 put: 2r1101001101 << 22 + (63 << 16) "constant to shift by" + (63 << 10) + (reg3 << 5) + CArg1Reg. "cuz CArg0Reg == TempReg" "RISCTempReg := RISCTempReg + CArg1Reg/sign is in concretizeMulOverflowJump" "cogit processor disassembleInstructionAt: 0 In: machineCode object" "cogit processor disassembleInstructionAt: 4 In: machineCode object" "cogit processor disassembleInstructionAt: 8 In: machineCode object" ^12

concretizeMulOverflowJump "Sizing/generating jumps. Jump targets can be to absolute addresses or other abstract instructions. Generating initial trampolines instructions may have no maxSize and be to absolute addresses. Otherwise instructions must have a machineCodeSize which must be kept to." <inline: false> | offset | offset := self computeJumpTargetOffset - 4. "-4 because the jump is from the second word..." self assert: (offset ~= 0 and: [self isInImmediateBranchRange: offset]). "See concretizeMulOverflowRRR RISCTempReg := RISCTempReg + CArg1Reg/sign. JumpZero/NonZero" machineCode at: 0 put: 2r10001011 << 24 + (ArithmeticAddS << 29) + (CArg1Reg << 16) + (RISCTempReg << 5) + RISCTempReg; at: 1 put: (self cond: (opcode = JumpMulOverflow ifTrue: [NE] ifFalse: [EQ]) offset: offset). "B offset" ^8

Best regards

...

    -Tobias
...
cheers

bruce

On 2021-06-20T22:44:36.000+02:00, Bruce O'Neel bruce.oneel@pckswarms.ch

wrote:

...
Playing with Git bisect gets us to the commit below as the first bad one.

I started the git bisect with the two commits below. The first one is

bad, the second is the good one.

...
a783502b249c4a4fedc88b6e07837d405feab144 - zero9 - builds, zero error.

9f73148b8da4bc00278b83faa8da6b1c418fa54f - zero10 - builds works

Now this is not 100% guaranteed because one of the commits that git

bisect chose had a build error so I had to mark that one as bad. But the commit found does sound possible.

...
f632ee2888014ee88330ee994e13c9c609b57b5f is the first bad commit commit f632ee2888014ee88330ee994e13c9c609b57b5f Author: Eliot Miranda eliot.miranda@gmail.com Date: Wed Sep 2 10:45:27 2020 -0700
CogVM source as per VMMaker.oscog-eem.2799/ClosedVMMaker-eem.98

Ha!  I am *STUPID*.
Integer overflow is not only determined by the upper 64-bits of a
64x64=>128 bit

...
multiply being either all zero or all ones (0 or -1), but by the
upper 64-bits

...
being an extension of the most significant bit of the lower 64 bits!!
:040000 040000 ffb8fbcd8ab5e2ca1936429b2daaade62909c178
a5ac89d8b1d4980c5329835cdd3d4a2387b1fac5 M nsspur64src

...
:040000 040000 e606dac14ee1db4ac59b0949bb03cd8e657d7aa7

be666051cd1d52d0055e319432b66a0fba61063e M spur64src

...
:040000 040000 1dea4faf99821c60e2c2461076bfe8b99d4dea9b

afbef904e8cbc7e4b80e1606420dd6293e12f3e9 M spurlowcode64src

...
:040000 040000 f66c681004806d5880af7c0a3115038f4f2f3361

0e8d76bedcf22b6ad7497ea2e3dc44f3c36c2f3a M spursista64src

...
On 2021-06-20T21:48:49.000+02:00, Craig Latta <

craig@blackpagedigital.com> wrote:

...
Hi Ken--

I brought up a development environment on a Raspberry Pi 4.

...
Unfortunately, my attempt to build a VM simulator on aarch64 fails due to a divide by zero bug.

Perhaps because 16r8000000000000000 printStringHex. ==> '0'

update-eem-463.mcm has a postload action which turns on Sista, which recompiles the system, which tries to recompile FloatArrayTest>>testFloatArrayPluginPrimitiveAtPut, which exercises this very bug during scanning, when trying to create the float 1e-127.

For now I've rewritten testFloatArrayPluginPrimitiveAtPut and testFloatArrayPluginPrimitiveAt on my system to do nothing. When building GdbARMv8Plugin, there's some other kind of header-include-ordering mayhem, similar to the problem with Linux features.h when building the VM. (At least two levels of it, between bfd.h and the aarch64 simulator's config.h, and config.h and other system headers.) I expect no one has successfully built GdbARMv8Plugin on anything but a M1 macOS machine so far.

-C

-- _,,,^..^,,,_ best, Eliot

Tobias Pape

5 Jul 5 Jul

8:32 p.m.

New subject: Aarch64 number bug

Hi Eliot

...

On 2. Jul 2021, at 16:07, Eliot Miranda eliot.miranda@gmail.com wrote:

[…]

From what I see, tho, is that the MUL and the SMULH instruction order has been reversed.

Indeed. I created a regression. Apologies for all this effort over my mistake.

This happens :)

...

...
(Also, the arm blog uses an different interesting method of detecting overflow, by missusing the argument shitfter of CMP: https://community.arm.com/developer/ip-products/processors/b/processors-ip-b... Although this is written for older arm, it seems to be applicable to the 64-variant too)

So here's how I decided to do it on AArch64. Maybe you can find something better.

AArch64 implement 64 x 64 => 128 bit multiply by providing a 64x64 => low 64 bits and a 64x64 => high 64 bits multiply instruction. So if a 64x64=>128 multiply has not overflowed the high 64 bits will be an extension of the sign bit of the 64x64=>low 64 instruction.

Yes.

...

i.e. high 64 will either be all 0s or all 1s (all 1s = -1).

Yes

...

So if one subtracts the sign bit from the high 64 result one gets 0 if there is no overflow. Here's the instruction sequence:

smulh x9/RISCTemp, x4/Class, x8/Arg1 mul x4/Class, x4/Class, x8/Arg1 lsr x1, x4/Class, #63

Ok, cases are:

OVF x9 x4[63] x1 1. N ff.. 1 1 2. Y ff.. 0 0 3. Y 00.. 1 1 4. N 00.. 0 0 5. Y 1-7f.. 1 1 6. Y 1-7f.. 0 0

x1 is either 0 or 1, because LSR shifts in 0.

...

adds x9/RISCTemp, x9/RISCTemp, x1

OVF x9 x4[63] x1 x9' V Z N 1. N ff.. 1 1 0..0 1 1 0 2. Y ff.. 0 0 ff.. 0 0 1 3. Y 00.. 1 1 0..1 0 0 0 4. N 00.. 0 0 0..0 0 1 0 5a. Y 1-7f.e 1 1 2-7f.f 0 0 0 5b. Y 7f..f 1 1 80..0 0 0 1 6. Y 1-7f.f 0 0 1-7f.f 0 0 0

...

b.ne 0x44b8 // b.any = *@88 (to overflow)

B.NE branches on Z = 0

== So, yes, this does what you want. ==

Let me also try to adapt what the blog post suggested:

smulh x9/RISCTemp, x4/Class, x8/Arg1 mul x4/Class, x4/Class, x8/Arg1 cmp x9/RISCTemp, x4/Class, ASR #63 b.ne // on overflow

With the ASR modifier on x4 effectively propagating the signbit all through the LSB just for this operation

OVF x9 x4[63] x4 ASR intern Z 1. N ff.. 1 ff..f 00.. 1 2. Y ff.. 0 00..0 ff.. 0 3. Y 00.. 1 ff..f 00..01 0 4. N 00.. 0 00..0 00.. 1 5. Y 1-7f.. 1 ff..f 2-80..0 0 6. Y 1-7f.. 0 00..0 1-7ff.. 0

You would generate that as:

CMP(ASR) = 2r1101011100 << 22 + (reg1 << 16) "x4/Class" + (63 << 10) "ASR shift amount" + (reg2 << 5) "x9/RISCTemp" + XZR "This differentiates CMP from SUBS"

This has the following properties:

- its just 3 concrete ARM instruction, so no need to split it. - After the sequence: - x4/Class has (x4/Class * x8/Arg1) - x9/RISCTemp is either all 00..00 or all ff..ff - Z == 1 on no overflow, Z == 0 on overflow. - It "speaks" "compare all x9 bits to bit 63 of x4"

=====

Both things get the job done.

I think the original lsr+adds can be improved, tho. First, you can push the lsr op into the shifter-arg of adds. Second, you can use xzr as dest reg, which is equivalent to cmn (compare neg). [1]

But whether to compare against the (right-)_propagated_ sign bit or compare-negative against the right-_shifted_ sign bit makes no difference.

That was fun!

Best regards -Tobias

[1] Beware, just recently, Raymond Chen pointed out the "lie in the CMN": https://devblogs.microsoft.com/oldnewthing/20210607-00/?p=105288

...

In the JIT's assembler back end for ARMv8 this is done by two methods:

concretizeMulOverflowRRR "ARMv8 has no multiply overflow detection. Instead it is synthesized from the two halves of a 64x64=>128 bit multiply. The upper 64-bits are tested. The sequence is high64 := SMULH a,b a/low64 := MUL a,b signBit := low64/a >> 63 high64 := high64 + signBit If high64 is zero after this sequence then the multiply has not overflowed, since high64 is an extension of signBit if no overflow (either 0 or -1) and both -1 + 1 = 0 and 0 + 0 = 0. Note that because we restrict ourselves to three concrete ARMv8 instructions per abstract instruction the last operation of the sequence is generated in concretizeMulOverflowJump.
C6.2.196	MUL				C6-1111
C6.2.242	SMULH				C6-1184
C6.2.180	LSR (immediate)	C6-1081	110100110 (1)"
<inline: true> | reg1 reg2 reg3 | reg1 := operands at: 0. reg2 := operands at: 1. reg3 := operands at: 2. "RISCTempReg := high(reg1 * reg2); must precede destructive MUL" machineCode at: 0 put: 2r1001101101 << 22 + (reg1 << 16) + (XZR << 10) + (reg2 << 5) + RISCTempReg. "reg3 := reg1 * reg2" machineCode at: 1 put: 2r10011011 << 24 + (reg1 << 16) + (XZR << 10) + (reg2 << 5) + reg3. "CArg1Reg := sign(reg3)" machineCode at: 2 put: 2r1101001101 << 22 + (63 << 16) "constant to shift by" + (63 << 10) + (reg3 << 5) + CArg1Reg. "cuz CArg0Reg == TempReg" "RISCTempReg := RISCTempReg + CArg1Reg/sign is in concretizeMulOverflowJump" "cogit processor disassembleInstructionAt: 0 In: machineCode object" "cogit processor disassembleInstructionAt: 4 In: machineCode object" "cogit processor disassembleInstructionAt: 8 In: machineCode object" ^12

concretizeMulOverflowJump "Sizing/generating jumps. Jump targets can be to absolute addresses or other abstract instructions. Generating initial trampolines instructions may have no maxSize and be to absolute addresses. Otherwise instructions must have a machineCodeSize which must be kept to." <inline: false> | offset | offset := self computeJumpTargetOffset - 4. "-4 because the jump is from the second word..." self assert: (offset ~= 0 and: [self isInImmediateBranchRange: offset]). "See concretizeMulOverflowRRR RISCTempReg := RISCTempReg + CArg1Reg/sign. JumpZero/NonZero" machineCode at: 0 put: 2r10001011 << 24 + (ArithmeticAddS << 29) + (CArg1Reg << 16) + (RISCTempReg << 5) + RISCTempReg; at: 1 put: (self cond: (opcode = JumpMulOverflow ifTrue: [NE] ifFalse: [EQ]) offset: offset). "B offset" ^8

Best regards -Tobias

...
cheers

bruce

On 2021-06-20T22:44:36.000+02:00, Bruce O'Neel bruce.oneel@pckswarms.ch wrote: Playing with Git bisect gets us to the commit below as the first bad one.

I started the git bisect with the two commits below. The first one is bad, the second is the good one.

a783502b249c4a4fedc88b6e07837d405feab144 - zero9 - builds, zero error.

9f73148b8da4bc00278b83faa8da6b1c418fa54f - zero10 - builds works

Now this is not 100% guaranteed because one of the commits that git bisect chose had a build error so I had to mark that one as bad. But the commit found does sound possible.

f632ee2888014ee88330ee994e13c9c609b57b5f is the first bad commit commit f632ee2888014ee88330ee994e13c9c609b57b5f Author: Eliot Miranda eliot.miranda@gmail.com Date: Wed Sep 2 10:45:27 2020 -0700
CogVM source as per VMMaker.oscog-eem.2799/ClosedVMMaker-eem.98

Ha!  I am *STUPID*.
Integer overflow is not only determined by the upper 64-bits of a 64x64=>128 bit
multiply being either all zero or all ones (0 or -1), but by the upper 64-bits
being an extension of the most significant bit of the lower 64 bits!!
:040000 040000 ffb8fbcd8ab5e2ca1936429b2daaade62909c178 a5ac89d8b1d4980c5329835cdd3d4a2387b1fac5 M nsspur64src :040000 040000 e606dac14ee1db4ac59b0949bb03cd8e657d7aa7 be666051cd1d52d0055e319432b66a0fba61063e M spur64src :040000 040000 1dea4faf99821c60e2c2461076bfe8b99d4dea9b afbef904e8cbc7e4b80e1606420dd6293e12f3e9 M spurlowcode64src :040000 040000 f66c681004806d5880af7c0a3115038f4f2f3361 0e8d76bedcf22b6ad7497ea2e3dc44f3c36c2f3a M spursista64src

On 2021-06-20T21:48:49.000+02:00, Craig Latta craig@blackpagedigital.com wrote: Hi Ken--

I brought up a development environment on a Raspberry Pi 4.

...
Unfortunately, my attempt to build a VM simulator on aarch64 fails due to a divide by zero bug.

Perhaps because 16r8000000000000000 printStringHex. ==> '0'

update-eem-463.mcm has a postload action which turns on Sista, which recompiles the system, which tries to recompile FloatArrayTest>>testFloatArrayPluginPrimitiveAtPut, which exercises this very bug during scanning, when trying to create the float 1e-127.

For now I've rewritten testFloatArrayPluginPrimitiveAtPut and testFloatArrayPluginPrimitiveAt on my system to do nothing. When building GdbARMv8Plugin, there's some other kind of header-include-ordering mayhem, similar to the problem with Linux features.h when building the VM. (At least two levels of it, between bfd.h and the aarch64 simulator's config.h, and config.h and other system headers.) I expect no one has successfully built GdbARMv8Plugin on anything but a M1 macOS machine so far.

-C

Tobias Pape

8:34 p.m.

New subject: Aarch64 number bug

Ouch,

...

On 5. Jul 2021, at 20:32, Tobias Pape Das.Linux@gmx.de wrote:

Hi Eliot […] This has the following properties:

its just 3 concrete ARM instruction, so no need to split it.

After the sequence:

x4/Class has (x4/Class * x8/Arg1)

x9/RISCTemp is either all 00..00 or all ff..ff

Oh, I'm wrong. x9/RISCTemp is _still_ the "high" portion of the multiplication, (which _can_ bee all 00 or all ff); but it is untouched by CMP.

sorry.

...

Z == 1 on no overflow, Z == 0 on overflow.

It "speaks" "compare all x9 bits to bit 63 of x4"

-Tobias

tim Rowledge

8:44 p.m.

New subject: Aarch64 number bug

...

On 2021-07-05, at 11:32 AM, Tobias Pape Das.Linux@gmx.de wrote:

That was fun!

It's amazing how absorbing and fascinating these sort of problems can be. Take care or you'll end up like Eliot & me!

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim I came, I saw, I deleted all your files.

Tobias Pape

8:46 p.m.

New subject: Aarch64 number bug

...

On 5. Jul 2021, at 20:44, tim Rowledge tim@rowledge.org wrote:

...
On 2021-07-05, at 11:32 AM, Tobias Pape Das.Linux@gmx.de wrote:

That was fun!

It's amazing how absorbing and fascinating these sort of problems can be. Take care or you'll end up like Eliot & me!

It's so funny that "the old new thing" just had a series on the ARM ... -t

tim Rowledge

4 Aug 4 Aug

7:46 p.m.

New subject: Aarch64 number bug

Just to complete this issue, the ARMv8 cog vm I built on Sunday does this correctly.

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: RLB: Ruin Logic Board

Bruce O'Neel

9:05 p.m.

New subject: Aarch64 number bug

Hi,

Excellent!

So can someone see what am I doing wrong because mine segfaults pretty quickly.

576 git clone https://github.com/OpenSmalltalk/opensmalltalk-vm.git

577 ls

578 cd opensmalltalk-vm/

579 ./scripts/updateSCCSVersions

580 cd build.linux64ARMv8/squeak.cog.spur/build [http://squeak.cog.spur/build]

581 ./mvm

582 ./squeak ~/archive/stcstw/Squeak6.0alpha-20582-64bit

Or are we getting caught out with some combo of libraries, compilers, and maybe even chips..

Thanks.

bruce

On 2021-08-04T19:46:18.000+02:00, tim Rowledge tim@rowledge.org wrote:

...

Just to complete this issue, the ARMv8 cog vm I built on Sunday does this correctly. tim -- tim Rowledge; tim@rowledge.org; www.rowledge.org/tim [http://www.rowledge.org/tim] Strange OpCodes: RLB: Ruin Logic Board

tim Rowledge

9:33 p.m.

New subject: Aarch64 number bug

...

On 2021-08-04, at 12:05 PM, Bruce O'Neel bruce.oneel@pckswarms.ch wrote:

Hi,

Excellent!

So can someone see what am I doing wrong because mine segfaults pretty quickly.

This is a bit weird. For ages the flushXcache code has been happily working on my Pi 4 based on CogARMv8Compiler>>#generateICacheFlush using the dc instruction. I have a working VM that uses it.

On Sunday (or Monday? Days mean so little...) Eliot & I were trying something out and the VM built then (with the dc instruction flushing) simply blew up, segfaulting at that dc instruction. We changed to use the same flushing as CogARMv8Compiler>>#initialFlushICacheFrom:to: instead and ... no segfault.

I could imagine this becoming an issue if I had updated the OS kernel etc, but I haven't. Just how is it possible ... I dunno. Maybe some GDB examinations will reveal the dc instruction being created differently because of some seemingly unrelated change moving an instruction 4 bytes and triggering an obscure alignment issue that causes a bit to get dropped in the code word. Or perhaps the universe is just annoyed with me again.

This is the change, try it and see if it helps you -

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim People who deal with bits should expect to get bitten.

1018

Age (days ago)

1064

Last active (days ago)

vm-dev@lists.squeakfoundation.org

12 comments

6 participants

tags (0)

participants (6)

Bruce O'Neel
Craig Latta
Eliot Miranda
ken.dickey＠whidbey.com
tim Rowledge
Tobias Pape