Vm-dev April 2017

vm-dev@lists.squeakfoundation.org

44 participants
105 discussions

StackInterp/Cog>>transferTo: - assertValidExecutionPointe:r:s differences
by Ben Coman 06 May '18

06 May '18

Just curious about the difference in instructionPointer between these two... StackInterpreter>>transferTo: self assertValidExecutionPointe: instructionPointer + 1 r: framePointer s: stackPointer. CoIntepreter>>transferTo:from: self assertValidExecutionPointe: instructionPointer r: framePointer s: stackPointer. cheers -ben

3 8

64bits Pharo VM for windows
by Nicolas Cellier 19 May '17

19 May '17

Hi, I've built a 64bits pharo.stack.spur VM for windows on my machine, and I'm uploading the changes to opensmalltalk-vm in branch build_pharo_win32_with_cygwin If the appveyor job correctly succeed, I will emit a pull request. The VM does not have the SqueakSSL plugin yet. The 64bits squeak/pharo.cog.spur JIT for windows is still to come, but I did not work on it for a few months... One thing at a time. Let's cross finger Nicolas

4 18

Re: [Vm-dev] Freeze after Morph Activity
by Dan Norton 15 May '17

15 May '17

Hi Levente, I directed stdout to a file and prior to the freeze, did: pkill -USR1 -n -x squeak a few times and noticed "stack overflow" mentioned twice. After the freeze, pkill sent nothing else to stdout. Maybe something in stderr? I'll try again and concatenate stderr to the end of stdout. Meanwhile, the file with the stack overflows is attached. Thanks for the help. - Dan On Sat, 28 Jan 2017,*Levente Uzonyi* wrote: >Hi Dan, >You can send the USR1 signal to the VM process to make it write some debug >information to the console. With this information you can easily tell >what's happening. I suspect it's stuck in a long GC. >Levente >On Fri, 27 Jan 2017, Dan Norton wrote: >>//>>/On Debian8, CogSpur64 5.0-201612221637, Cuis 5.0 3043... />>//>>/after lots of user interaction, views opening and closing, and />>/animation, the image becomes unresponsive. Cmd+. does nothing and the />>/clock in the Cuis taskbar no longer updates. The length of time to />>/produce this varies from 5 to 20 minutes. />>//>>/There seems to be no dump and no log. 'Smalltalk garbageCollectMost' />>/reports 25550736 to 30517872 over 34 samples. />>//>>/Sorry to be so vague. This has occurred with several images - sometimes />>/scrolling through a senders list, or stepping through a debugger, but />>/repeatably with one of my images. I can supply this image, warmed up so />>/that it might not take so long to reproduce the problem if you would />>/like. Or give me a hint as to how to narrow down the problem /

10 65

[vm-dev] latest VM crashed on raspberry
by Denis Kudriashov 10 May '17

10 May '17

Hi. I tested latest vm on raspberry squeak -headless Squeak6.0alpha-16548-32bit.image it crashed with attached dump with message: stack page bytes 4096 available headroom 2788 minimum unused headroom 3520 (Segmentation fault) Details on OS: Linux raspberrypi 3.12.28+ #709 PREEMPT Mon Sep 8 15:28:00 BST 2014 armv6l GNU/Linux I have also some old VM which is working fine: CoInterpreter * VMMaker.oscog-eem.2107 uuid: 19c0fa53-acc2-40f9-9a07-17510e614ae5 Jan 23 2017 StackToRegisterMappingCogit * VMMaker.oscog-eem.2107 uuid: 19c0fa53-acc2-40f9-9a07-17510e614ae5 Jan 23 2017 VM: 201701231021 https://github.com/pharo-project/pharo-vm.git $ Date: Mon Jan 23 11:21:48 2017 +0100 $ Plugins: 201701231021 https://github.com/pharo-project/pharo-vm.git $ Would be nice to get it fixed for upcoming Pharo releaze. Best regards, Denis

5 14

ImageFormat updates
by K K Subbu 07 May '17

07 May '17

Hi, Is there an Inbox to upload patches to VMMaker for review? I generated a small fix to ImageFormat to generate magic pattern checks at offset 512 also. I think some old Etoys images use this offset with a launch script stuffed in the first 512 bytes. Is this deprecated now? I also feel 'unix' should be dropped from the method's name. file(1) utility originated in Unix but is related to file contents and not to kernel. It should be available for Mac/Win too. I have also attached the resulting magic file and a workspace script to generate a bunch of test header files to test the magic file. e.g. $ file -m magic *.image Regards .. Subbu

3 6

Amazing ARM simulator experience
by Eliot Miranda 03 May '17

03 May '17

Hi All, just had to tell people about this morning's experience using the ARM simulator. I've been building Smalltalk VMs since 1983, so 33 years. My first on the Three Rivers PERQ was dog slow. My undergraduate project was done the the National Semiconductor 32016 based Whitechapel Computer Works workstation and its and 32032-based successor. This morning I was revamping register management in Cog's ARMv5/v6 back end, making more registers available by using two of the C argument registers for two of registers the Cogit assigns to various fixed tasks, and using store and load multiple instructions to save and restore concisely the registers around calls into the runtime. Remember the architecture here. The simulator generates ARM machine code into the ByteArray that re[resents the address space, holding all of generated machine code, a small C stack, and the Smalltalk heap. A plugin, derived from Gdb's ARM simulator written in C, interprets that machine code, running for a couple of milliseconds at a time in a loop, applying breakpoint calculations, and asserts on every call into the run-time, and that calls into the run-time and accesses of variables in the simulator is done by using illegal addresses in the generated machine code. Each illegal access causes the Gdb-derived machine code interpreter to return with an error, this error is turned into an exception, the handler for which maps the specific illegal address into a variable or message selector, accesses the variable or activates the message selector, providing the result, and allowing the execution to continue. In changing the register management I had a test case that worked, an image that prompted for an expression ands evaluated it, which worked both in the simulator and with the generated VM. But the real VM, crashed when used on a proper image. So to debug I started launching the real interactive image in the simulator. Well, the amazing experience is that that image, whose machine code is being _interpreted by a C program_ feels /faster/ than my 32016 based implementation back in 1984-ish. Quite amazing. I can open windows, type text, access source code (was playing with Message Names) etc. It's sluggish, but usable. Amazing how fast modern machines are. All on my 2012 vintage 2.2 Ghz Core i7 MacBook Pro. I'm blown away :-) _,,,^..^,,,_ best, Eliot

5 7

errors while building a cog development image
by Sophie Kaleba 03 May '17

03 May '17

Hi, I have tried to build a cog dvlpt image using the following instructions : $ git clone http://www.github.com/OpenSmalltalk/opensmalltalk-vm $ cd opensmalltalk-vm/image $ ./buildspurtrunkvmmakerimage.sh while running the script, i get 2 errors (see attached file): - "Context cannot be changed" - and then "stackp store failure" which prevent me from actually building the image (the last error goes in an infinite loop) Has anyone ever experienced this problem? I am using Ubuntu 15.10, 64 bits. Thanks Sophie

4 8

[OpenSmalltalk/opensmalltalk-vm] 9fd4e3: Use LF instead of CR as image/*.st line ending
by GitHub 03 May '17

03 May '17

Branch: refs/heads/Cog Home: https://github.com/OpenSmalltalk/opensmalltalk-vm Commit: 9fd4e371ae0895078fce13ea35c491daf0e448e0 https://github.com/OpenSmalltalk/opensmalltalk-vm/commit/9fd4e371ae0895078f… Author: Nicolas Cellier <nicolas.cellier.aka.nice(a)gmail.com> Date: 2017-04-27 (Thu, 27 Apr 2017) Changed paths: M .gitattributes M image/BuildSpurReader64Image.st M image/BuildSpurTrunk64Image.st M image/BuildSqueakSpurTrunkVMMakerImage.st M image/CompiledMethod-usesAlternateBytecodeSet.st M image/FT2Constants.st M image/LoadReader.st M image/LoadSistaSupport.st M image/Object-performwithwithwithwithwith.st M image/RunATestClass.st M image/StartReader.st M image/UpdateSqueakTrunkImage.st Log Message: ----------- Use LF instead of CR as image/*.st line ending This is to be able to review/blame/ etc... from GitHub web interface (1 liner with 500+ columns is not tool friendly) Commit: c4881946c2bb8b7da7b191489dbf7ae180b05f51 https://github.com/OpenSmalltalk/opensmalltalk-vm/commit/c4881946c2bb8b7da7… Author: Nicolas Cellier <nicolas.cellier.aka.nice(a)gmail.com> Date: 2017-04-27 (Thu, 27 Apr 2017) Changed paths: M platforms/Cross/plugins/CroquetPlugin/CroquetPlugin.h M platforms/Cross/plugins/CroquetPlugin/TriBoxStub.c Log Message: ----------- Merge branch 'Cog' of https://github.com/OpenSmalltalk/opensmalltalk-vm into Cog Compare: https://github.com/OpenSmalltalk/opensmalltalk-vm/compare/bfe983b7a720...c4…

7 18

Byte & String collection hash performance; a modest proposal for change.
by Eliot Miranda 02 May '17

02 May '17

Hi All, the hash algorithm used for ByteString in Squeak and Pharo is good for "small" strings and overkill for large strings. It is important in many applications to get well distributed string hashes, especially over the range of strings that constitute things like method names, URLs, etc. Consequently, the current algorithm includes every character in a string. This works very well for "small" strings and results in very slow hashes (and hence long latencies, because the hash is an uninterruptible primitive) for large strings, where large may be several megabytes. Let's look at the basic hash algorithm. The following method is translated my compiler machinery in VMMaker from Smalltalk to C. It creates a primitive function called primitiveStringHash, and so when invoked in normal Smalltalk code the method below invokes its C translation; neat. ByteArray>>hashBytes: aByteArray startingWith: speciesHash "Answer the hash of a byte-indexed collection, using speciesHash as the initial value. See SmallInteger>>hashMultiply. The primitive should be renamed at a suitable point in the future" <primitive: 'primitiveStringHash' module: 'MiscPrimitivePlugin'> | byteArraySize hash | <var: 'aByteArray' type: #'unsigned char *'> <var: 'speciesHash' type: #int> byteArraySize := aByteArray size. hash := speciesHash bitAnd: 16rFFFFFFF. 1 to: byteArraySize do: [:pos | hash := hash + (aByteArray basicAt: pos). "Inlined hashMultiply, written this way for translation to C." hash := hash * 1664525 bitAnd: 16r0FFFFFFF]. ^hash This function is invokes by a rather convoluted chain: String>>hash "#hash is implemented, because #= is implemented" "ar 4/10/2005: I had to change this to use ByteString hash as initial hash in order to avoid having to rehash everything and yet compute the same hash for ByteString and WideString. md 16/10/2006: use identityHash as initialHash, as behavior hash will use String hash (name) to have a better hash soon. eem 4/17/2017 it's not possible to use String hash (name) for the initial hash because that would be recursive." ^self class stringHash: self initialHash: ByteString identityHash ByteString class>>stringHash: aString initialHash: speciesHash "Answer the hash of a byte-indexed string, using speciesHash as the initial value. See SmallInteger>>hashMultiply." <primitive: 'primitiveStringHash' module: 'MiscPrimitivePlugin'> | hash | hash := speciesHash bitAnd: 16rFFFFFFF. 1 to: aString size do: [:pos | hash := (hash + (aString basicAt: pos)) hashMultiply]. ^hash and the generic string implementation is String class>>stringHash: aString initialHash: speciesHash "Answer the hash of a byte-indexed string, using speciesHash as the initial value. See SmallInteger>>hashMultiply." | hash | hash := speciesHash bitAnd: 16rFFFFFFF. 1 to: aString size do: [:pos | hash := (hash + (aString basicAt: pos)) hashMultiply]. ^hash (it simply omits the primitive declaration). As of yesterday the inner loop was written differently by Andres Valoud to avoid overflow: hash := hash + (aByteArray basicAt: pos). "Begin hashMultiply" low := hash bitAnd: 16383. hash := (16r260D * low + ((16r260D * (hash bitShift: -14) + (16r0065 * low) bitAnd: 16383) * 16384)) bitAnd: 16r0FFFFFFF. The problem here is that the Smalltalk-to-C translation machinery is naive and entirely incapable of transforming low := hash bitAnd: 16383. hash := (16r260D * low + ((16r260D * (hash bitShift: -14) + (16r0065 * low) bitAnd: 16383) * 16384)) bitAnd: 16r0FFFFFFF. into hash := hash * 1664525 bitAnd: 16r0FFFFFFF The reformulation makes the primitive a little quicker, gaining for larger strings, but still suffers the high invocation overhead as described in the Cog Primitive Performance thread. In looking at this I've added a primitive for hashMultiply; primitive #159 implements precisely self * 1664525 bitAnd: 16r0FFFFFFF for SmallInteger and LargePositiveInteger receivers, as fast as possible in the Cog JIT. With this machinery in place it's instructive to compare the cost of the primitive against the non-primitive Smalltalk code. First let me introduce a set of replacement hash functions, newHashN. These hash all characters in strings up to a certain size, and then no more than that number for larger strings. Here are newHash64 and newHash2048, which use pure Smalltalk, including an inlined hashMultiply written to avoid SmallInteger overflow. Also measured are the obvious variants newHash128, newHash256, newHash512 & mewHash1024. String>>newHash64 "#hash is implemented, because #= is implemented" "choice of primes: (HashedCollection goodPrimes select: [:n| n bitCount = (n highBit // 2) and: [n <= 16rFFFFFFF]]) collect: [:ea| {ea. ea hex}]" | size hash | size := self size. size = 0 ifTrue: [^214748357 "16rCCCCCC5"]. hash := size < 262144 ifTrue: [size * 2617 "16rA39"] ifFalse: [size + (size >> 16)]. 1 to: size by: (size // 32 max: 1) do: "At most 63 characters" [:i| | low | hash := hash + (self basicAt: i). "hash multiply" low := hash bitAnd: 16383. hash := (16r260D * low + ((16r260D * (hash bitShift: -14) + (16r0065 * low) bitAnd: 16383) * 16384)) bitAnd: 16r0FFFFFFF]. ^hash String>>newHash2048 "#hash is implemented, because #= is implemented" "choice of primes: (HashedCollection goodPrimes select: [:n| n bitCount = (n highBit // 2) and: [n <= 16rFFFFFFF]]) collect: [:ea| {ea. ea hex}]" | size hash | size := self size. size = 0 ifTrue: [^214748357 "16rCCCCCC5"]. hash := size < 262144 ifTrue: [size * 2617 "16rA39"] ifFalse: [size + (size >> 16)]. 1 to: size by: (size // 1024 max: 1) do: "At most 2047 characters" [:i| | low | hash := hash + (self basicAt: i). "hash multiply" low := hash bitAnd: 16383. hash := (16r260D * low + ((16r260D * (hash bitShift: -14) + (16r0065 * low) bitAnd: 16383) * 16384)) bitAnd: 16r0FFFFFFF]. ^hash So the idea here is to step through the string by 1 for strings sizes up to N - 1, and by greater than 1 for strings of size >= N, limiting the maximum number of characters sampled to between N // 2 and N - 1. Another idea is to implement the methods on String, so they are invoked directly. Another idea is to discard the speciesHash and use a better value for the null string hash, a prime whose bitCount is about half its highBit (i.e. about half of its bits are set). We can rewrite these more cleanly to use the hashMultiply primitive, so here are newHashP64 through newHashP2048: String>>newHashP64 "#hash is implemented, because #= is implemented" | size hash | size := self size. size = 0 ifTrue: [^214748357 "16rCCCCCC5"]. hash := size < 262144 ifTrue: [size * 2617 "16rA39"] ifFalse: [size + (size >> 16)]. 1 to: size by: (size // 32 max: 1) do: "At most 63 characters" [:i| hash := (hash + (self basicAt: i)) hashMultiply]. ^hash String>>newHashP2048 "#hash is implemented, because #= is implemented" | size hash | size := self size. size = 0 ifTrue: [^214748357 "16rCCCCCC5"]. hash := size < 262144 ifTrue: [size * 2617 "16rA39"] ifFalse: [size + (size >> 16)]. 1 to: size by: (size // 1024 max: 1) do: "At most 2047 characters" [:i| hash := (hash + (self basicAt: i)) hashMultiply]. ^hash So e.g. newHash2048 and newHashP2048 sample at most 2047 and at least 1024 characters for strings whose size exceeds 1024 elements, and all of the elements for all strings with size <= 1024 elements. Let's compare both the hash spread (the number of distinct hashes produced) and the time taken to evaluate the three variants of hash function. We have the interpreter primitive (hash implemented in terms of stringHash:initialHash:), newHash64 through newHash1024 (inlined hashMultiply in pure Smalltalk written to avoid overflow in to LargeInteger arithmetic) and newHashP64 through newHashP2048, written in pure Smalltalk but using the hashMultiply primitive (that avoids the need to decompose the multiplication to avoid overflow). Here's the test harness. A few things; it computes the blocks used rather than inlining them in the method to eliminate the cost of block dispatch form the measurements. The block dispatch isn't complex but introduces a little noise. Second, garbageCollectMost is used to run the scavenger before each measurement so that GC is in the same initial state; again this reduces noise. | strs "strings" ns "number of strings" nus "number of unique strings" ass "average string size" blocks "the blocks that invoke each hash" | Smalltalk garbageCollect. strs := ByteString allSubInstances select: [:s| s size <= 32]. ns := strs size. nus := strs asSet size. ass := ((strs inject: 0 into: [:sum :s| sum + s size]) / strs size) rounded. blocks := #('hash' 'newHash64' 'newHash128' 'newHash256' 'newHash512' 'newHash1024' 'newHash2048' 'newHashP64' 'newHashP128' 'newHashP256' 'newHashP512' 'newHashP1024' 'newHashP2048' ) collect: [:f| Compiler evaluate: '[:ea| ea ', f, ']']. blocks do: [:ea| ea value: '']; do: [:ea| ea value: '']. blocks collect: [:hashBlock| | nh | Smalltalk garbageCollectMost. { ns. nus. nh := (strs collect: hashBlock) asSet size. nus - nh. 1.0 - (nh asFloat / nus asFloat). ass. [1 to: 100 do: [:i| strs do: hashBlock]] timeToRun - [1 to: 100 do: [:i| strs do: [:ea| ea class]]] timeToRun. (hashBlock sourceString allButFirst: 10) allButLast}] N Strings N Unique N Hashes N Collisions fraction of collisions Avg String Size Time (ms) hash function #(121162 54439 54435 4 7.347e-5 11 1926 'hash') #(121162 54439 54435 3 5.510e-5 11 8913 'newHash64') #(121162 54439 54435 3 5.510e-5 11 8879 'newHash128') #(121162 54439 54435 3 5.510e-5 11 8870 'newHash256') #(121162 54439 54435 3 5.510e-5 11 8835 'newHash512') #(121162 54439 54435 3 5.510e-5 11 8879 'newHash1024') #(121162 54439 54435 3 5.510e-5 11 8876 newHash2048') #(121162 54439 54435 3 5.510e-5 11 5658 'newHashP64') #(121162 54439 54435 3 5.510e-5 11 5506 'newHashP128') #(121162 54439 54435 3 5.510e-5 11 5677 'newHashP256') #(121162 54439 54435 3 5.510e-5 11 5595 'newHashP512') #(121162 54439 54435 3 5.510e-5 11 5645 'newHashP1024') #(121162 54439 54435 3 5.510e-5 11 5571 'newHashP2048')) So for small strings the interpreter primitive wins on speed, considerably, but has one more collision (I suspect because the seed, ByteString identityHash, is poor). Now for byte strings with sizes in the range 33 to 1024; I'll dispense with the newHash forms; they're essentially half the speed of the newHashP forms but otherwise identical. N Strings N Unique N Hashes N Collisions fraction of collisions Avg String Size Time (ms) hash function #(34044 25853 25852 1 3.8680e-5 148 1045 'hash') #(34044 25853 25790 63 0.00243 148 2918 'newHashP64') #(34044 25853 25847 6 0.00023 148 4929 'newHashP128') #(34044 25853 25852 1 3.8680e-5 148 6757 'newHashP256') #(34044 25853 25851 2 7.7360e-5 148 8959 'newHashP512') #(34044 25853 25852 1 3.8680e-5 148 10055 'newHashP1024') #(34044 25853 25852 1 3.8680e-5 148 10382 'newHashP2048')) So here, hashing between 256 and 511 characters gives as good a distribution of hashes as considering all of the string. So I think this shows that the cut off for effectiveness of string hashing is around 256 characters. At least on the strings in my image not much is to be gained by hashing more. So let's look at strings > 1024 in size N Strings N Unique N Hashes N Collisions fraction of collisions Avg String Size Time (ms) hash function #(732 606 606 0 0.0 50741 5834 'hash') #(732 606 605 1 0.001650 50741 60 'newHashP64') #(732 606 605 1 0.001650 50741 106 'newHashP128') #(732 606 605 1 0.001650 50741 199 'newHashP256') #(732 606 605 1 0.001650 50741 416 'newHashP512') #(732 606 606 0 0.0 50741 822 'newHashP1024') #(732 606 606 0 0.0 50741 1875 'newHashP2048')) By this time the cost of hashing all characters overwhelms the primitive implementation and the pure Smalltalk code becomes much faster. And the hash spread, the number of distinct hashes, is as good. So that's the data. My conclusions are that - the primitive is clearly still a win, especially for small strings. It could be written as a primitive that is run on the Smalltalk stack, and that would boost performance for small strings considerably. But the primitive still wins against Cog code up through at least 150 byte strings. We could run a different doit to detect the cross over in string length, but not today :-). - replacing the primitive with one that behaves like newHash1024 or newHash2048 seems the best to me. Such a primitive would hash between N and 2*N-1 characters for strings of length > N, where N would likely be 512, 1024 or 2048. The primitive should also be written to hash 16-bit, 32-bit and 64-bit non-pointer arrays. _,,,^..^,,,_ best, Eliot

5 12

testing the B3DAcceleratorPlugin on Mac OS X
by Eliot Miranda 01 May '17

01 May '17

Hi All, I hope I've built the B3DAcceleratorPlugin for the 32-bit Mac Cocoa VMs. B3DAcceleratorPlugin needs Carbon and QuickTime frameworks and so is linked against them and will only be available on 32-bits (until we can rewrite to avoid using the 32-bit only APIs). But I need to test this. What's a minimal Balloon demo I can load into a trunk image to test the plugin? (URLs appreciated) ______,,,^..^,,,______ AdvThanksance, Eliot

3 2

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Vm-dev April 2017