ScriptBook is a disposable app installer. You simply load, click and explore.
It now has 11 applications to install: Refactoring Tools; NuScratch; OSProcess; Tonel; Seaside3; Command Shell; Scorch; Squot; Fuel; FFI; JSON.
(Installer ss3)
project: 'ScriptBook';
install: 'ScriptBook'.
ScriptBook new
Chris
Megadeth - Hanger 18
https://www.youtube.com/watch?v=B-oU2xlViRQ
>Maybe the how to have FunSqueak.
>I check how to load some and test works.
>The base is 5.2 stable release ?
Yea, 5.2.
If I understand you correctly, I’d be happy to have a script for FunSqueak. That’d be cool.
Chris
>Tonel seems Pharo, probably the others too.
>How you test if works ?
>Pharo is very different of Squeak this days.
Yup, more good questions. And I’ll point the responsibility elsewhere. I figure if people publish a script, then it probably works.
ScriptBook isn’t going to test anything. I think Tonel works for Squeak. I think I probably goofed with the Bera stuff. Probably Pharo only.
Most of the specs for ScriptBook are soft. I have ideas about importing and exporting Json to disk, etc.
I do know that since scripts fit so well in code and there are so many code repos that a dedicated server for Installer scripts doesn’t make much sense.
I know that I’m building something for myself, because I’ll probably never have any users.
I also know that as a 3rd party developer I never want to have any dealings with the Inbox/Trunk process. I think it made Squeak itself the only application.
And while it’s emphasis on conformity and peer review is good for the core language, its mentality is the opposite of the reason a lot of people (like me) hack in the first place. So if people think that every piece of code is trying to qualify for assimilation, then perhaps that’s an overreach?
Chris
Anthrax - Got The Time
https://www.youtube.com/watch?v=be7iNHw8QoQ
>Could explain in few lines Tonel, Scorch, Squot ?
>I do not follow Squeak closely this days and swiki do not help.
That’s a fair question. And I was afraid somebody was going to ask me that. Unfortunately I have no idea what Tonel, Scorch and Squot are. Not a clue.
I saw people talking about them and added the scripts.
I do know that it’s not the job of ScriptBook to know. It should only do one thing. Like saving your preferences to disk to import later.
Chris
Red Hot Chili Peppers On Nightmusic
https://www.youtube.com/watch?v=ALCgn9wdd7g
Fabio Niephaus uploaded a new version of Collections to project The Trunk:
http://source.squeak.org/trunk/Collections-fn.828.mcz
==================== Summary ====================
Name: Collections-fn.828
Author: fn
Time: 29 April 2019, 12:52:37.794699 pm
UUID: a663df49-93bf-413e-a942-29f1d0fa4605
Ancestors: Collections-nice.827
Character shouldNotImplement postCopy.
=============== Diff against Collections-nice.827 ===============
Item was added:
+ ----- Method: Character>>postCopy (in category 'copying') -----
+ postCopy
+ "I will never be copied"
+ ^self shouldNotImplement!
> ------------------------------
>
> Message: 4
> Date: Fri, 26 Apr 2019 13:03:42 -0700
> From: tim Rowledge <tim(a)rowledge.org>
> To: The general-purpose Squeak developers list
> <squeak-dev(a)lists.squeakfoundation.org>
> Subject: Re: [squeak-dev] Lessons learned from (Re: Faster fibonacci)
> Message-ID: <9AF99F20-9405-430D-9F96-4DFE2D2089CB(a)rowledge.org>
>
>
>
>
>> On 2019-04-25, at 7:40 PM, tim Rowledge <tim(a)rowledge.org> wrote:
>>
>>
>> Looking at FileStream class initialise I see
>> Smalltalk
>> addToStartUpList: self after: SecurityManager; "the intent being before: AutoStart"
>> and yet inspecting the startuplist shows that FileStream comes well *after* AutoStart (which in a somewhat convoluted way actually makes ProjectLauncher actually do the startup document reading) and so we got things messed up sometime.
> It seems all we need is to run
> FileStream initialise
> to fix this, at least in a current image.
>
> I'd guess that making sure the order of things in the startup list is correct is something the ReleaseManager ought to be able to do. I would have guessed that it might use SmalltalkImage class>>#initializeStartUpList but can't see a connection in the code right now.
>
> Mind you, that method has a fairly big bug in that it ignores the ordering desires of all the classes involved, which may explain the original problem. We can't simplistically run through the list of classes currently in the startup list and send #initialize to them since any number of them might do much more stuff in those methods, including stuff that could be destructive. We would need to separate out the startup list related code and only run that.
>
> So conceptually,
> a) in all senders of #addToStartUpList* and related methods, split out the startup list part to an agreed method such as #addToStartUpSequence that handles the ordering requirements, if any. A default nul version inObject would be nice.
> b) to rebuild the startup list, take copy of the current list and send each item #addToStartUpSequence
> c) some classes may need to be checked out for where they need to be in the ordering - I see thsat ProcessorScheduler is mentioned particularly in SmalltalkImage class>>#initializeStartUpList and yet its #initialize makes no reference to #addToStartUpList*
> d) use the rebuilt method from ReleaseBuilder
>
> Ideas?
Object class>>systemStartupDependsOnMe
"
If it does, then override this method
to notify yourself of at least one other class
whose startup initialization you are
certain you depend on.
Something like
self #startupDependsOn:
aClassOrSomeClasses
; startupDependsOn:
aClassOrSomeClasses2
"
^self startupDependsOn: #()
Some class>>systemStartupDependsOnMe
self startupDependsOn: aClassOrSomeClasses
; startupDependsOn: aClassOrSomeClasses2
; ...
At startup, we send #systemStartupDependsOnMe
to all the classes which define it.
We then do a topological sort of the identified dependencies,
and send #initialize or whatever to each class in the resulting list.
If the ordering proves insufficient, we add dependency
corrections as and where they are discovered, and keep adding new ones
until we get a partial ordering that works.
Thereafter, maintenance should be pretty straightforward.
-Jim
Jim Sawyer: jas(a)cruzio.com
> tim
> --
> tim Rowledge; tim(a)rowledge.org; http://www.rowledge.org/tim
> Useful random insult:- Lights are on but nobody's home.
>
>
>
>
> ------------------------------
>
> Message: 5
> Date: Fri, 26 Apr 2019 22:03:32 +0200
> From: Nicolas Cellier <nicolas.cellier.aka.nice(a)gmail.com>
> To: The general-purpose Squeak developers list
> <squeak-dev(a)lists.squeakfoundation.org>
> Subject: Re: [squeak-dev] Using high res clock in benchmarks
> Message-ID:
> <CAKnRiT6L8RC=6Rb9riUqVX-4cw0Yim+jvztK_my6+12ot0jX=Q(a)mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Thanks
>
> Le ven. 26 avr. 2019 à 01:15, Levente Uzonyi <leves(a)caesar.elte.hu> a
> écrit :
>
>> 70ns difference can easily come from cache misses. I think you'd get
>> more consistent results if you compiled a method and ran that instead of
>> compiling a new block for each execution.
>>
>> The first two numbers are larger because the jit kicks in. IIRC no jit for
>> the first run, jitting during the second run. All futher executions use
>> the jitted code.
>>
> Yes, it makes sense
>
>> Also, the "hickups" probably happen because your OS is running other
>> processes as well.
>>
>> IIRC the highResClock primitive uses TSC on Intel CPUs, so it may not be
>> as reliable as you'd expect it to be[1], especially on older CPUs (clock
>> rate depends on CPU frequency, cores' clocks are not synced).
>>
>> Yes, constant rate was the first question that came to my mind.
> But it's not really a problem: having the number of ticks is a good
> information per se, whatever the power saving, it tells how many cycles we
> use
> More problematic is un-synced drifting TSC on multi-core, because diffing
> makes no sense then if ever affinity changes...
> I suggest making it a Preference (inbox).
>
> Normally, the best feature of TSC is that they have very cheap access.
> Unfortunately, t's not that cheap thru the primitive.
> I re-ran today:
>
> [Time utcMicrosecondClock] bench.
> '20,500,000 per second. 48.7 nanoseconds per run.'
> [Time highResClock] bench.
> '22,800,000 per second. 43.8 nanoseconds per run.'
>
> and
> (1 to: 200) collect: [:i | Time highResClock - Time highResClock * 1000000
> // Time highResClockTicksPerMillisecond]
> #(-5479 -203 -45 -41 -36 -39 -38 -38 -41 -41 -39 -42 -41 -41 -41 -41 -39
> -41 -39 -41 -39 -39 -38 -39 -41 -38 -41 -41 -41 -41 -41 -41 -39 -41 -41 -39
> -39 -41 -41 -39 -43 -43 -39 -36 -36 -39 -38 -38 -38 -36 -35 -36 -35 -35 -35
> -35 -35 -36 -36 -35 -35 -35 -36 -35 -36 -35 -35 -36 -35 -35 -35 -35 -36 -35
> -36 -35 -36 -35 -36 -35 -35 -36 -36 -35 -36 -35 -36 -35 -36 -36 -35 -35 -36
> -35 -36 -36 -35 -35 -35 -36 -35 -35 -35 -35 -38 -36 -35 -36 -35 -35 -35 -35
> -35 -36 -36 -35 -35 -35 -36 -35 -36 -36 -35 -36 -35 -35 -35 -35 -36 -35 -36
> -35 -36 -35 -36 -35 -35 -36 -36 -35 -36 -35 -36 -35 -36 -36 -35 -35 -36 -36
> -36 -36 -35 -35 -35 -35 -36 -35 -36 -35 -36 -36 -36 -35 -35 -36 -35 -35 -35
> -36 -36 -35 -35 -35 -36 -35 -36 -36 -35 -36 -35 -35 -35 -35 -36 -35 -36 -35
> -36 -35 -36 -35 -35 -36 -36 -35 -36 -35 -36 -35)
>
> which is somehow consistent
>
> The utcMicrosecondsClock is very poor on windows VM (1 ms resolution).
> So maybe the high res clock can still be an interesting add-on...
>
> Levente
>
> [1] https://en.wikipedia.org/wiki/Time_Stamp_Counter
>
> On Fri, 26 Apr 2019, Nicolas Cellier wrote:
>
>> Huh, in fact it's a bit different...
>>
>> Le jeu. 25 avr. 2019 à 23:28, Nicolas Cellier <
> nicolas.cellier.aka.nice(a)gmail.com> a écrit :
>>
>> Le jeu. 25 avr. 2019 à 23:22, Nicolas Cellier <
> nicolas.cellier.aka.nice(a)gmail.com> a écrit :
>> Hi,
>> I recently published a Chronology-Core version for using high resolution
> clock.
>> On my 2.7GHz core i5 MBP (2015) I get this:
>>
>> Time highResClockTicksPerMillisecond
>>
>> 2699824
>> => OK, consistent with 2.7GHz
>>
>> Time highResClock - Time highResClock * 1000000 // Time
> highResClockTicksPerMillisecond.
>> -578 -563
>> => Huh, invoking a primitive takes that long (500 to 600 nanoseconds)
>>
>>
>>
>> (1 to: 200) collect: [:i | Time highResClock - Time highResClock *
> 1000000 // Time highResClockTicksPerMillisecond].
>> #(-528 -96 -71 -61 -63 -61 -61 -61 -63 -61 -58 -61 -61 -61 -61 -61 -58
> -58 -61 -57 -61 -58 -63 -61 -61 -61 -61 -61 -61 -63 -61 -61 -61 -61 -61 -61
> -61 -58 -61 -61 -61 -61 -61 -63 -61 -61 -61 -61 -61 -61 -61 -61 -61 -64 -58
>> -63 -61 -58 -61 -58 -61 -61 -61 -57 -61 -61 -58 -61 -61 -61 -58 -61 -61
> -61 -61 -61 -61 -58 -61 -61 -61 -61 -61 -65 -61 -61 -61 -61 -61 -61 -61 -61
> -61 -61 -61 -61 -61 -63 -61 -58 -61 -58 -61 -58 -63 -61 -61 -61 -61 -61 -61
>> -63 -61 -58 -61 -57 -61 -58 -61 -58 -61 -61 -61 -61 -61 -61 -61 -61 -61
> -61 -61 -61 -61 -57 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -58
> -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -63 -61 -63 -61 -58 -61 -57 -61 -61
>> -61 -61 -58 -61 -58 -63 -57 -61 -58 -61 -61 -58 -63 -58 -63 -58 -58 -61
> -61 -61 -61 -63 -61 -58 -61 -58 -61 -61 -61 -61 -57 -61 -58)
>> In fact it's 500ns at first eval (JIT or something), then it's only 60ns.
>>
>>
>> But I can correct it. Let's try it:
>>
>> [10 factorial] bench.
>>
>> '14,000,000 per second. 71.2 microseconds per run.'
>> => this is the reference result
>>
>> Huh, I messed up with printing the timing (because I modified benchFor:
> using µsec clock)
>> [10 factorial] bench.
>> '14,500,000 per second. 69 nanoseconds per run.'
>>
>> [] bench.
>> '191,000,000 per second. 5.25 nanoseconds per run.'
>>
>> So this is including 5ns for evaluating block + incrementing counter +
> looping...
>> tmp := [].
>> (1 to: 50) collect: [:i || ticks |
>> ticks := Time highResClock.
>> tmp value.
>> Time highResClock - ticks + (Time highResClock - Time highResClock) *
> 1000000 // Time highResClockTicksPerMillisecond].
>> #(1020 54 14 11 4 4 4 4 2 5 1 2 4 4 2 4 5 2 4 2 2 5 2 2 2 4 4 4 5 4 4 4
> 4 2 5 4 4 4 2 0 5 2 5 4 5 5 4 4 4 4)
>> => it's about 5ns but here we are in the noise of correction (it varies
> between 57 and 63ns as we saw above)
>> That's the limit os single evaluation with µ-bench
>>
>> | tmp |
>> tmp := [10 factorial].
>> (1 to: 200) collect: [:i || ticks |
>> ticks := Time highResClock.
>> tmp value.
>> Time highResClock - ticks + (Time highResClock - Time highResClock) *
> 1000000 // Time highResClockTicksPerMillisecond].
>> | tmp |
>> tmp := [10 factorial].
>> (1 to: 100) collect: [:i || ticks |
>> ticks := Time highResClock.
>> tmp value.
>> Time highResClock - ticks + (Time highResClock - Time highResClock) *
> 1000000 // Time highResClockTicksPerMillisecond].
>> #(3227 267 212 161 130 125 130 127 127 126 127 126 125 133 124 130 125
> 128 125 130 127 125 126 127 128 127 130 127 130 127 130 124 125 124 125 126
> 124 126 124 131 127 132 125 130 124 130 124 130 126 124 128 126 130 126 130
>> 127 131 126 130 125 124 125 124 127 125 127 125 130 127 133 124 130 125
> 128 127 125 122 130 124 127 126 124 128 124 130 123 130 124 130 126 130 124
> 127 128 127 128 124 130 125 130)
>> #(1663 208 97 94 96 118 97 94 94 94 96 94 93 93 94 93 94 93 93 94 94 93
> 93 96 93 94 93 94 91 94 93 93 94 93 94 94 93 93 96 94 93 93 94 91 93 93 94
> 93 93 94 94 93 93 94 93 94 91 93 91 77 94 93 94 94 93 93 96 94 93 93 94 91
> 93
>> 93 94 93 93 94 94 93 93 94 93 94 91 93 91 77 77 93 93 77 80 94 80 94 93
> 93 94 91)
>> two regimes again... and not the 70ns of bench this time...
>>
>>
>> (1 to: 10) collect: [:i |
>> | ticks |
>> ticks := Time highResClock.
>> 10 factorial.
>> Time highResClock - ticks +
>>
>> sorry for the extra +, editing code in mail and not doing it is a recipe
> for not validating!
>> + (Time highResClock - Time highResClock) "correction"
>> * 1000000 // Time highResClockTicksPerMillisecond "get nanoseconds"].
>>
>> #(1309 247 88 84 74 69 71 71 71 69)
>> => OK, first runs are a bit long, but we get 70ns per run as the reference
>>
>> #(1977 191 143 148 142 120 122 122 117 120)
>> => Oups??? Second run gives different results???
>>
>> #(2239 180 143 143 142 116 117 117 117 114)
>> => and third about the same than second...
>> Any idea how to explain that?
>>
>>
>> If I run 200 times, I sometimes get the two regimes separated by hickups
>>
>> #(1899 107 142 150 123 124 115 122 117 122 118 120 120 120 120 120 121
> 120 118 120 120 120 120 120 121 120 118 120 120 120 120 120 121 120 118 120
> 120 120 120 120 121 120 118 120 120 123 117 122 117 122 116 122 117 123
>> 117 122 117 122 7950 128 117 122 120 118 9963 87 69 71 69 71 69 69 71 71
> 68 69 72 71 68 69 72 71 68 69 72 71 68 69 69 71 69 71 69 71 69 71 69 71 69
> 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69
>> 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69
> 71 69 71 69 71 72 71 68 69 72 71 68 69 69 71 69 71 69 71 68 69 69 71 69 71
> 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71
>> 69 71 69 71)
>>
>>
>>
>
Nicolas Cellier uploaded a new version of Kernel to project The Inbox:
http://source.squeak.org/inbox/Kernel-nice.1218.mcz
==================== Summary ====================
Name: Kernel-nice.1218
Author: nice
Time: 27 April 2019, 10:41:24.794539 am
UUID: 21f74fbe-a0cd-4b6f-86e9-be13465d57fe
Ancestors: Kernel-nice.1217
Implement the recursive fast division of Burnikel-Ziegler for large integers and connect it to digitDiv:neg: when operands are large enough.
This is not the fastest known division which is a composition of Barrett and Newton-Raphson inversion - but is easy to implement and should have similar performances for at least a few thousand bytes long integers - see for example http://bioinfo.ict.ac.cn/~dbu/AlgorithmCourses/Lectures/Lec5-Fast-Division-…
Use digitDiv:neg: in large integer printString so as to obtain the quotient (head) and remainder (tail) in a single operation. Together with divide and conquer division, this results in a factor of about 3x for 50000 factorial printString.
Implement the 4-way Toom-Cook squaring variant of Chung-Hasan. This over-performs the symetrical squaredToom3 even for medium size (800 bytes).
=============== Diff against Kernel-nice.1217 ===============
Item was added:
+ ----- Method: Integer>>digitDiv21: (in category 'private') -----
+ digitDiv21: anInteger
+
+ ^(self digitDiv: anInteger neg: false) collect: #normalize!
Item was added:
+ ----- Method: Integer>>digitDiv32: (in category 'private') -----
+ digitDiv32: anInteger
+
+ ^(self digitDiv: anInteger neg: false) collect: #normalize!
Item was changed:
----- Method: Integer>>digitDiv:neg: (in category 'private') -----
+ digitDiv: anInteger neg: aBoolean
+ ^self primDigitDiv: anInteger neg: aBoolean!
- digitDiv: arg neg: ng
- "Answer with an array of (quotient, remainder)."
- | quo rem ql d div dh dnh dl qhi qlo j l hi lo r3 a t divDigitLength remDigitLength |
- <primitive: 'primDigitDivNegative' module:'LargeIntegers'>
- arg = 0 ifTrue: [^ (ZeroDivide dividend: self) signal].
- "TFEI added this line"
- l := self digitLength - arg digitLength + 1.
- l <= 0 ifTrue: [^ Array with: 0 with: self].
- "shortcut against #highBit"
- d := 8 - arg lastDigit highBitOfByte.
- div := arg digitLshift: d.
- divDigitLength := div digitLength + 1.
- div := div growto: divDigitLength.
- "shifts so high order word is >=128"
- rem := self digitLshift: d.
- rem digitLength = self digitLength ifTrue: [rem := rem growto: self digitLength + 1].
- remDigitLength := rem digitLength.
- "makes a copy and shifts"
- quo := Integer new: l neg: ng.
- dl := divDigitLength - 1.
- "Last actual byte of data"
- ql := l.
- dh := div digitAt: dl.
- dnh := dl = 1
- ifTrue: [0]
- ifFalse: [div digitAt: dl - 1].
- 1 to: ql do:
- [:k |
- "maintain quo*arg+rem=self"
- "Estimate rem/div by dividing the leading to bytes of rem by dh."
- "The estimate is q = qhi*16+qlo, where qhi and qlo are nibbles."
- j := remDigitLength + 1 - k.
- "r1 := rem digitAt: j."
- (rem digitAt: j)
- = dh
- ifTrue: [qhi := qlo := 15
- "i.e. q=255"]
- ifFalse:
- ["Compute q = (r1,r2)//dh, t = (r1,r2)\\dh.
- Note that r1,r2 are bytes, not nibbles.
- Be careful not to generate intermediate results exceeding 13
- bits."
- "r2 := (rem digitAt: j - 1)."
- t := ((rem digitAt: j)
- bitShift: 4)
- + ((rem digitAt: j - 1)
- bitShift: -4).
- qhi := t // dh.
- t := (t \\ dh bitShift: 4)
- + ((rem digitAt: j - 1)
- bitAnd: 15).
- qlo := t // dh.
- t := t \\ dh.
- "Next compute (hi,lo) := q*dnh"
- hi := qhi * dnh.
- lo := qlo * dnh + ((hi bitAnd: 15)
- bitShift: 4).
- hi := (hi bitShift: -4)
- + (lo bitShift: -8).
- lo := lo bitAnd: 255.
- "Correct overestimate of q.
- Max of 2 iterations through loop -- see Knuth vol. 2"
- r3 := j < 3
- ifTrue: [0]
- ifFalse: [rem digitAt: j - 2].
- [(t < hi
- or: [t = hi and: [r3 < lo]])
- and:
- ["i.e. (t,r3) < (hi,lo)"
- qlo := qlo - 1.
- lo := lo - dnh.
- lo < 0
- ifTrue:
- [hi := hi - 1.
- lo := lo + 256].
- hi >= dh]]
- whileTrue: [hi := hi - dh].
- qlo < 0
- ifTrue:
- [qhi := qhi - 1.
- qlo := qlo + 16]].
- "Subtract q*div from rem"
- l := j - dl.
- a := 0.
- 1 to: divDigitLength do:
- [:i |
- hi := (div digitAt: i)
- * qhi.
- lo := a + (rem digitAt: l) - ((hi bitAnd: 15)
- bitShift: 4) - ((div digitAt: i)
- * qlo).
- rem digitAt: l put: lo - (lo // 256 * 256).
- "sign-tolerant form of (lo bitAnd: 255)"
- a := lo // 256 - (hi bitShift: -4).
- l := l + 1].
- a < 0
- ifTrue:
- ["Add div back into rem, decrease q by 1"
- qlo := qlo - 1.
- l := j - dl.
- a := 0.
- 1 to: divDigitLength do:
- [:i |
- a := (a bitShift: -8)
- + (rem digitAt: l) + (div digitAt: i).
- rem digitAt: l put: (a bitAnd: 255).
- l := l + 1]].
- quo digitAt: ql + 1 - k put: (qhi bitShift: 4)
- + qlo].
- rem := rem
- digitRshift: d
- bytes: 0
- lookfirst: dl.
- ^ Array with: quo with: rem!
Item was added:
+ ----- Method: Integer>>primDigitDiv:neg: (in category 'private') -----
+ primDigitDiv: arg neg: ng
+ "Answer with an array of (quotient, remainder)."
+ | quo rem ql d div dh dnh dl qhi qlo j l hi lo r3 a t divDigitLength remDigitLength |
+ <primitive: 'primDigitDivNegative' module:'LargeIntegers'>
+ arg = 0 ifTrue: [^ (ZeroDivide dividend: self) signal].
+ "TFEI added this line"
+ l := self digitLength - arg digitLength + 1.
+ l <= 0 ifTrue: [^ Array with: 0 with: self].
+ "shortcut against #highBit"
+ d := 8 - arg lastDigit highBitOfByte.
+ div := arg digitLshift: d.
+ divDigitLength := div digitLength + 1.
+ div := div growto: divDigitLength.
+ "shifts so high order word is >=128"
+ rem := self digitLshift: d.
+ rem digitLength = self digitLength ifTrue: [rem := rem growto: self digitLength + 1].
+ remDigitLength := rem digitLength.
+ "makes a copy and shifts"
+ quo := Integer new: l neg: ng.
+ dl := divDigitLength - 1.
+ "Last actual byte of data"
+ ql := l.
+ dh := div digitAt: dl.
+ dnh := dl = 1
+ ifTrue: [0]
+ ifFalse: [div digitAt: dl - 1].
+ 1 to: ql do:
+ [:k |
+ "maintain quo*arg+rem=self"
+ "Estimate rem/div by dividing the leading to bytes of rem by dh."
+ "The estimate is q = qhi*16+qlo, where qhi and qlo are nibbles."
+ j := remDigitLength + 1 - k.
+ "r1 := rem digitAt: j."
+ (rem digitAt: j)
+ = dh
+ ifTrue: [qhi := qlo := 15
+ "i.e. q=255"]
+ ifFalse:
+ ["Compute q = (r1,r2)//dh, t = (r1,r2)\\dh.
+ Note that r1,r2 are bytes, not nibbles.
+ Be careful not to generate intermediate results exceeding 13
+ bits."
+ "r2 := (rem digitAt: j - 1)."
+ t := ((rem digitAt: j)
+ bitShift: 4)
+ + ((rem digitAt: j - 1)
+ bitShift: -4).
+ qhi := t // dh.
+ t := (t \\ dh bitShift: 4)
+ + ((rem digitAt: j - 1)
+ bitAnd: 15).
+ qlo := t // dh.
+ t := t \\ dh.
+ "Next compute (hi,lo) := q*dnh"
+ hi := qhi * dnh.
+ lo := qlo * dnh + ((hi bitAnd: 15)
+ bitShift: 4).
+ hi := (hi bitShift: -4)
+ + (lo bitShift: -8).
+ lo := lo bitAnd: 255.
+ "Correct overestimate of q.
+ Max of 2 iterations through loop -- see Knuth vol. 2"
+ r3 := j < 3
+ ifTrue: [0]
+ ifFalse: [rem digitAt: j - 2].
+ [(t < hi
+ or: [t = hi and: [r3 < lo]])
+ and:
+ ["i.e. (t,r3) < (hi,lo)"
+ qlo := qlo - 1.
+ lo := lo - dnh.
+ lo < 0
+ ifTrue:
+ [hi := hi - 1.
+ lo := lo + 256].
+ hi >= dh]]
+ whileTrue: [hi := hi - dh].
+ qlo < 0
+ ifTrue:
+ [qhi := qhi - 1.
+ qlo := qlo + 16]].
+ "Subtract q*div from rem"
+ l := j - dl.
+ a := 0.
+ 1 to: divDigitLength do:
+ [:i |
+ hi := (div digitAt: i)
+ * qhi.
+ lo := a + (rem digitAt: l) - ((hi bitAnd: 15)
+ bitShift: 4) - ((div digitAt: i)
+ * qlo).
+ rem digitAt: l put: lo - (lo // 256 * 256).
+ "sign-tolerant form of (lo bitAnd: 255)"
+ a := lo // 256 - (hi bitShift: -4).
+ l := l + 1].
+ a < 0
+ ifTrue:
+ ["Add div back into rem, decrease q by 1"
+ qlo := qlo - 1.
+ l := j - dl.
+ a := 0.
+ 1 to: divDigitLength do:
+ [:i |
+ a := (a bitShift: -8)
+ + (rem digitAt: l) + (div digitAt: i).
+ rem digitAt: l put: (a bitAnd: 255).
+ l := l + 1]].
+ quo digitAt: ql + 1 - k put: (qhi bitShift: 4)
+ + qlo].
+ rem := rem
+ digitRshift: d
+ bytes: 0
+ lookfirst: dl.
+ ^ Array with: quo with: rem!
Item was added:
+ ----- Method: LargePositiveInteger>>digitDiv21: (in category 'private') -----
+ digitDiv21: anInteger
+ "This is part of the recursive division algorithm from Burnikel - Ziegler
+ Divide a two limbs receiver by 1 limb dividend
+ Each limb is decomposed in two halves of p bytes (8*p bits)
+ so as to continue the recursion"
+
+ | p qr1 qr2 |
+ p := anInteger digitLength + 1 bitShift: -1.
+ p <= 256 ifTrue: [^(self primDigitDiv: anInteger neg: false) collect: #normalize].
+ qr1 := (self butLowestNDigits: p) digitDiv32: anInteger.
+ qr2 := (self lowestNDigits: p) + (qr1 last bitShift: 8*p) digitDiv32: anInteger.
+ qr2 at: 1 put: (qr2 at: 1) + ((qr1 at: 1) bitShift: 8*p).
+ ^qr2!
Item was added:
+ ----- Method: LargePositiveInteger>>digitDiv32: (in category 'private') -----
+ digitDiv32: anInteger
+ "This is part of the recursive division algorithm from Burnikel - Ziegler
+ Divide 3 limb (a2,a1,a0) by 2 limb (b1,b0).
+ Each limb is made of p bytes (8*p bits).
+ This step transforms the division problem into multiplication
+ It must use the fastMultiply: to be worth the overhead costs."
+
+ | a2 b1 d p q qr r |
+ p := anInteger digitLength + 1 bitShift: -1.
+ (a2 := self butLowestNDigits: 2*p)
+ < (b1 := anInteger butLowestNDigits: p)
+ ifTrue:
+ [qr := (self butLowestNDigits: p) digitDiv21: b1.
+ q := qr first.
+ r := qr last]
+ ifFalse:
+ [q := (1 bitShift: 8*p) - 1.
+ r := (self butLowestNDigits: p) - (b1 bitShift: 8*p) + b1].
+ d := q fastMultiply: (anInteger lowestNDigits: p).
+ r := (self lowestNDigits: p) + (r bitShift: 8*p) - d.
+ [r < 0]
+ whileTrue:
+ [q := q - 1.
+ r := r + anInteger].
+ ^Array with: q with: r
+ !
Item was added:
+ ----- Method: LargePositiveInteger>>digitDiv:neg: (in category 'private') -----
+ digitDiv: anInteger neg: aBoolean
+ "If length is worth, engage a recursive divide and conquer strategy"
+ | qr |
+ (anInteger digitLength <= 256
+ or: [self digitLength <= anInteger digitLength])
+ ifTrue: [^ self primDigitDiv: anInteger neg: aBoolean].
+ qr := self abs recursiveDigitDiv: anInteger abs.
+ ^ aBoolean
+ ifTrue: [qr collect: #negated]
+ ifFalse: [qr]!
Item was changed:
----- Method: LargePositiveInteger>>printOn:base: (in category 'printing') -----
printOn: aStream base: b
"Append a representation of this number in base b on aStream.
In order to reduce cost of LargePositiveInteger ops, split the number in approximately two equal parts in number of digits."
+ | halfDigits halfPower head tail nDigitsUnderestimate qr |
- | halfDigits halfPower head tail nDigitsUnderestimate |
"Don't engage any arithmetic if not normalized"
(self digitLength = 0 or: [(self digitAt: self digitLength) = 0]) ifTrue: [^self normalize printOn: aStream base: b].
nDigitsUnderestimate := b = 10
ifTrue: [((self highBit - 1) * 1233 >> 12) + 1. "This is because (2 log)/(10 log)*4096 is slightly greater than 1233"]
ifFalse: [self highBit quo: b highBit].
"splitting digits with a whole power of two is more efficient"
halfDigits := 1 bitShift: nDigitsUnderestimate highBit - 2.
halfDigits <= 1
ifTrue: ["Hmmm, this could happen only in case of a huge base b... Let lower level fail"
^self printOn: aStream base: b nDigits: (self numberOfDigitsInBase: b)].
"Separate in two halves, head and tail"
halfPower := b raisedToInteger: halfDigits.
+ qr := self digitDiv: halfPower neg: self negative.
+ head := qr first normalize.
+ tail := qr last normalize.
- head := self quo: halfPower.
- tail := self - (head * halfPower).
"print head"
head printOn: aStream base: b.
"print tail without the overhead to count the digits"
tail printOn: aStream base: b nDigits: halfDigits!
Item was changed:
----- Method: LargePositiveInteger>>printOn:base:nDigits: (in category 'printing') -----
printOn: aStream base: b nDigits: n
"Append a representation of this number in base b on aStream using n digits.
In order to reduce cost of LargePositiveInteger ops, split the number of digts approximatily in two
Should be invoked with: 0 <= self < (b raisedToInteger: n)"
+ | halfPower half head tail qr |
- | halfPower half head tail |
n <= 1 ifTrue: [
n <= 0 ifTrue: [self error: 'Number of digits n should be > 0'].
"Note: this is to stop an infinite loop if one ever attempts to print with a huge base
This can happen because choice was to not hardcode any limit for base b
We let Character>>#digitValue: fail"
^aStream nextPut: (Character digitValue: self) ].
halfPower := n bitShift: -1.
half := b raisedToInteger: halfPower.
+ qr := self digitDiv: half neg: self negative.
+ head := qr first normalize.
+ tail := qr last normalize.
- head := self quo: half.
- tail := self - (head * half).
head printOn: aStream base: b nDigits: n - halfPower.
tail printOn: aStream base: b nDigits: halfPower!
Item was added:
+ ----- Method: LargePositiveInteger>>recursiveDigitDiv: (in category 'private') -----
+ recursiveDigitDiv: anInteger
+ "This is the recursive division algorithm from Burnikel - Ziegler
+ See Fast Recursive Division - Christoph Burnikel, Joachim Ziegler
+ Research Report MPI-I-98-1-022, MPI Saarbrucken, Oct 1998
+ https://pure.mpg.de/rest/items/item_1819444_4/component/file_2599480/content"
+
+ | s m t a b z qr q i |
+ "round digits up to next power of 2"
+ s := anInteger digitLength.
+ m := 1 bitShift: (s - 1) highBit.
+ "shift so that leading bit of leading byte be 1, and digitLength power of two"
+ s := m * 8 - anInteger highBit.
+ a := self bitShift: s.
+ b := anInteger bitShift: s.
+
+ "Decompose a into t limbs - each limb have m bytes
+ choose t such that leading bit of leading limb of a be 0"
+ t := (a highBit + 1 / (m * 8)) ceiling.
+ z := a butLowestNDigits: t - 2 * m.
+ i := t - 2.
+ q := 0.
+ "and do a division of two limb by 1 limb b for each pair of limb of a"
+ [qr := z digitDiv21: b.
+ q := (q bitShift: 8*m) + qr first. "Note: this naive recomposition of q cost O(t^2) - it is possible in O(t log(t))"
+ (i := i - 1) >= 0] whileTrue:
+ [z := (qr last bitShift: 8*m) + (a copyDigitsFrom: i * m + 1 to: i + 1 * m)].
+ ^Array with: q with: (qr last bitShift: s negated)!
Item was changed:
----- Method: LargePositiveInteger>>sqrtRem (in category 'mathematical functions') -----
sqrtRem
"Like super, but use a divide and conquer method to perform this operation.
See Paul Zimmermann. Karatsuba Square Root. [Research Report] RR-3805, INRIA. 1999, pp.8. <inria-00072854>
https://hal.inria.fr/inria-00072854/PDF/RR-3805.pdf"
+ | n qr q s r sr high mid low |
- | n qr s r sr high mid low |
n := self digitLength bitShift: -2.
n >= 16 ifFalse: [^super sqrtRem].
high := self butLowestNDigits: n * 2.
mid := self copyDigitsFrom: n + 1 to: n * 2.
low := self lowestNDigits: n.
sr := high sqrtRem.
qr := (sr last bitShift: 8 * n) + mid digitDiv: (sr first bitShift: 1) neg: false.
+ q := qr first normalize.
+ s := (sr first bitShift: 8 * n) + q.
+ r := (qr last normalize bitShift: 8 * n) + low - q squared.
- s := (sr first bitShift: 8 * n) + qr first.
- r := (qr last bitShift: 8 * n) + low - qr first squared.
r negative
ifTrue:
[r := (s bitShift: 1) + r - 1.
s := s - 1].
sr at: 1 put: s; at: 2 put: r.
^sr
!
Item was changed:
----- Method: LargePositiveInteger>>squared (in category 'mathematical functions') -----
squared
"Eventually use a divide and conquer algorithm to perform the multiplication"
(self digitLength >= 400) ifFalse: [^self * self].
+ (self digitLength >= 800) ifFalse: [^self squaredKaratsuba].
+ ^self squaredToom4!
- (self digitLength >= 1600) ifFalse: [^self squaredKaratsuba].
- ^self squaredToom3!
Item was added:
+ ----- Method: LargePositiveInteger>>squaredToom4 (in category 'mathematical functions') -----
+ squaredToom4
+ "Use a 4-way Toom-Cook divide and conquer algorithm to perform the multiplication.
+ See Asymmetric Squaring Formulae Jaewook Chung and M. Anwar Hasan
+ https://www.lirmm.fr/arith18/papers/Chung-Squaring.pdf"
+
+ | p a0 a1 a2 a3 a02 a13 s0 s1 s2 s3 s4 s5 s6 t2 t3 |
+ "divide in 4 parts"
+ p := (self digitLength + 3 bitShift: -2) bitClear: 2r11.
+ a3 := self butLowestNDigits: p * 3.
+ a2 := self copyDigitsFrom: p * 2 + 1 to: p * 3.
+ a1 := self copyDigitsFrom: p + 1 to: p * 2.
+ a0 := self lowestNDigits: p.
+
+ "Toom-4 trick: 7 multiplications instead of 16"
+ a02 := a0 - a2.
+ a13 := a1 - a3.
+ s0 := a0 squared.
+ s1 := (a0 fastMultiply: a1) bitShift: 1.
+ s2 := (a02 + a13) fastMultiply: (a02 - a13).
+ s3 := ((a0 + a1) + (a2 + a3)) squared.
+ s4 := (a02 fastMultiply: a13) bitShift: 1.
+ s5 := (a3 fastMultiply: a2) bitShift: 1.
+ s6 := a3 squared.
+
+ "Interpolation"
+ t2 := s1 + s5.
+ t3 := (s2 + s3 + s4 bitShift: -1) - t2.
+ s3 := t2 - s4.
+ s4 := t3 - s0.
+ s2 := t3 - s2 - s6.
+
+ "Sum the parts of decomposition"
+ ^s0 + (s1 bitShift: 8*p) + (s2 + (s3 bitShift: 8*p) bitShift: 16*p)
+ +(s4 + (s5 bitShift: 8*p) + (s6 bitShift: 16*p) bitShift: 32*p)
+
+ "
+ | a |
+ a := 770 factorial-1.
+ a digitLength.
+ [a * a - a squaredToom4 = 0] assert.
+ [Smalltalk garbageCollect.
+ [1000 timesRepeat: [a squaredToom4]] timeToRun] value /
+ [Smalltalk garbageCollect.
+ [1000 timesRepeat: [a squaredKaratsuba]] timeToRun] value asFloat
+ "!
Nicolas Cellier uploaded a new version of Kernel to project The Trunk:
http://source.squeak.org/trunk/Kernel-nice.1222.mcz
==================== Summary ====================
Name: Kernel-nice.1222
Author: nice
Time: 28 April 2019, 3:55:49.124486 pm
UUID: e697844c-d6bd-4781-a572-553f79c98644
Ancestors: Kernel-nice.1221
Make fastMultiply: the default for multiplying integers
The scheme is:
- first try 64bit imul (primitive 29) if receiver is Large, or (primitive 9) if receiver is Small.
- then check for operand type in super *, use fastMultiply: if Integer
- then check receiver length and invoke O(N^2) schoolbook multiply if too small ( invoke LargeIntegers primitive thru digitMultiply:neg: )
- then check operand byte length
- then dispatch to Karatsuba or Toom3 (or future enhancements)
For medium integers (>64 bits) that's only 1 indirection more than previous implementation, so a very small (negligible) penalty.
For larger integers (a few thousand bits) that's a win.
Note that the length threshold heuristics may vary from one platform to another, one VM to another, and wether images are 32 or 64 bits...
We could make them class var, and offer some initialize method so as to automatically tune them.
=============== Diff against Kernel-nice.1221 ===============
Item was changed:
----- Method: Integer>>* (in category 'arithmetic') -----
* aNumber
"Refer to the comment in Number * "
aNumber isInteger ifTrue:
+ [^ self fastMultiply: aNumber].
- [^ self digitMultiply: aNumber
- neg: self negative ~~ aNumber negative].
^ aNumber adaptToInteger: self andSend: #*!
Item was changed:
+ ----- Method: Integer>>fastMultiply: (in category 'private') -----
- ----- Method: Integer>>fastMultiply: (in category 'arithmetic') -----
fastMultiply: anInteger
+ ^self digitMultiply: anInteger
+ neg: self negative ~~ anInteger negative!
- ^self * anInteger!
Item was changed:
+ ----- Method: Integer>>karatsubaTimes: (in category 'private') -----
- ----- Method: Integer>>karatsubaTimes: (in category 'arithmetic') -----
karatsubaTimes: anInteger
"Eventually use Karatsuba algorithm to perform the multiplication.
This is efficient only for large integers (see subclass).
By default, use regular multiplication."
+ ^ self digitMultiply: anInteger
+ neg: self negative ~~ anInteger negative!
- ^self * anInteger!
Item was changed:
+ ----- Method: Integer>>toom3Times: (in category 'private') -----
- ----- Method: Integer>>toom3Times: (in category 'arithmetic') -----
toom3Times: anInteger
+ ^ self digitMultiply: anInteger
+ neg: self negative ~~ anInteger negative!
- ^self * anInteger!
Item was changed:
----- Method: LargePositiveInteger>>digitDiv32: (in category 'private') -----
digitDiv32: anInteger
"This is part of the recursive division algorithm from Burnikel - Ziegler
Divide 3 limb (a2,a1,a0) by 2 limb (b1,b0).
Each limb is made of p bytes (8*p bits).
This step transforms the division problem into multiplication
It must use the fastMultiply: to be worth the overhead costs."
| a2 b1 d p q qr r |
p := anInteger digitLength + 1 bitShift: -1.
(a2 := self butLowestNDigits: 2*p)
< (b1 := anInteger butLowestNDigits: p)
ifTrue:
[qr := (self butLowestNDigits: p) digitDiv21: b1.
q := qr first.
r := qr last]
ifFalse:
[q := (1 bitShift: 8*p) - 1.
r := (self butLowestNDigits: p) - (b1 bitShift: 8*p) + b1].
+ d := q * (anInteger lowestNDigits: p).
- d := q fastMultiply: (anInteger lowestNDigits: p).
r := (self lowestNDigits: p) + (r bitShift: 8*p) - d.
[r < 0]
whileTrue:
[q := q - 1.
r := r + anInteger].
^Array with: q with: r
!
Item was changed:
+ ----- Method: LargePositiveInteger>>fastMultiply: (in category 'private') -----
- ----- Method: LargePositiveInteger>>fastMultiply: (in category 'arithmetic') -----
fastMultiply: anInteger
+ "Return the result of multiplying the receiver by the Integer argument.
+ This method dispatch to the fastest algorithm based on operands length."
- "Eventually use Karatsuba or 3-way Toom-Cook algorithm to perform the multiplication"
+ | length |
+ "Revert to schoolbook multiplication if short"
+ (length := self digitLength) < 600
+ ifTrue: [^ self digitMultiply: anInteger
+ neg: self negative ~~ anInteger negative].
+
+ "Arrange to have the receiver be the shorter and retry"
+ length > anInteger digitLength
- | xLen |
- "arrange to have the receiver be the shorter"
- (xLen := self digitLength) > anInteger digitLength
ifTrue: [^ anInteger fastMultiply: self ].
+ "Choose the fastest algorithm based on another heuristic"
+ length < 6000 ifTrue: [^self karatsubaTimes: anInteger].
- "If too short to be worth, fallback to naive algorithm"
- (xLen >= 600) ifFalse: [^self * anInteger].
- (xLen >= 6000) ifFalse: [^self karatsubaTimes: anInteger].
^self toom3Times: anInteger!
Item was changed:
+ ----- Method: LargePositiveInteger>>karatsubaTimes: (in category 'private') -----
- ----- Method: LargePositiveInteger>>karatsubaTimes: (in category 'arithmetic') -----
karatsubaTimes: anInteger
"Eventually use Karatsuba algorithm to perform the multiplication
The Karatsuba algorithm divide number in two smaller parts.
Then it uses tricks to perform only 3 multiplications.
See https://en.wikipedia.org/wiki/Karatsuba_algorithm"
| half xHigh xLow yHigh yLow low high mid xLen yLen |
+ "Revert to schoolbook multiplication if short"
+ (xLen := self digitLength) < 600
+ ifTrue: [^ self digitMultiply: anInteger
+ neg: self negative ~~ anInteger negative].
+
+ "Arrange to have the receiver be the shorter and retry"
+ xLen > (yLen := anInteger digitLength)
- "arrange to have the receiver be the shorter"
- (xLen := self digitLength) > (yLen := anInteger digitLength)
ifTrue: [^ anInteger karatsubaTimes: self ].
-
- "If too short to be worth, fallback to naive algorithm"
- (xLen >= 600) ifFalse: [^self * anInteger].
"Seek for integers of about the same length, else split the long one"
yLen >= (7 * xLen bitShift: -2)
ifTrue:
[^(0 to: yLen - 1 by: xLen + 1) digitShiftSum: [:yShift |
self karatsubaTimes:
(anInteger copyDigitsFrom: yShift + 1 to: yShift + 1 + xLen)]].
"At this point, lengths are similar, divide each integer in two halves"
half := (yLen + 1 bitShift: -1) bitClear: 2r11.
xLow := self lowestNDigits: half.
xHigh := self butLowestNDigits: half.
yLow := anInteger lowestNDigits: half.
yHigh := anInteger butLowestNDigits: half.
"Karatsuba trick: perform with 3 multiplications instead of 4"
low := xLow karatsubaTimes: yLow.
high := xHigh karatsubaTimes: yHigh.
mid := high + low + (xHigh - xLow karatsubaTimes: yLow - yHigh).
"Sum the parts of decomposition"
^low + (mid bitShiftMagnitude: 8*half) + (high bitShiftMagnitude: 16*half)
"
| a b |
a := 650 factorial-1.
b := 700 factorial-1.
{a digitLength. b digitLength}.
self assert: (a karatsubaTimes: b) - (a * b) = 0.
[Smalltalk garbageCollect.
[1000 timesRepeat: [a karatsubaTimes: b]] timeToRun] value /
[Smalltalk garbageCollect.
[1000 timesRepeat: [a * b]] timeToRun] value asFloat
"!
Item was changed:
----- Method: LargePositiveInteger>>squaredToom4 (in category 'mathematical functions') -----
squaredToom4
"Use a 4-way Toom-Cook divide and conquer algorithm to perform the multiplication.
See Asymmetric Squaring Formulae Jaewook Chung and M. Anwar Hasan
https://www.lirmm.fr/arith18/papers/Chung-Squaring.pdf"
| p a0 a1 a2 a3 a02 a13 s0 s1 s2 s3 s4 s5 s6 t2 t3 |
"divide in 4 parts"
p := (self digitLength + 3 bitShift: -2) bitClear: 2r11.
a3 := self butLowestNDigits: p * 3.
a2 := self copyDigitsFrom: p * 2 + 1 to: p * 3.
a1 := self copyDigitsFrom: p + 1 to: p * 2.
a0 := self lowestNDigits: p.
"Toom-4 trick: 7 multiplications instead of 16"
a02 := a0 - a2.
a13 := a1 - a3.
s0 := a0 squared.
+ s1 := (a0 * a1) bitShift: 1.
+ s2 := (a02 + a13) * (a02 - a13).
- s1 := (a0 fastMultiply: a1) bitShift: 1.
- s2 := (a02 + a13) fastMultiply: (a02 - a13).
s3 := ((a0 + a1) + (a2 + a3)) squared.
+ s4 := (a02 * a13) bitShift: 1.
+ s5 := (a3 * a2) bitShift: 1.
- s4 := (a02 fastMultiply: a13) bitShift: 1.
- s5 := (a3 fastMultiply: a2) bitShift: 1.
s6 := a3 squared.
"Interpolation"
t2 := s1 + s5.
t3 := (s2 + s3 + s4 bitShift: -1) - t2.
s3 := t2 - s4.
s4 := t3 - s0.
s2 := t3 - s2 - s6.
"Sum the parts of decomposition"
^s0 + (s1 bitShift: 8*p) + (s2 + (s3 bitShift: 8*p) bitShift: 16*p)
+(s4 + (s5 bitShift: 8*p) + (s6 bitShift: 16*p) bitShift: 32*p)
"
| a |
a := 770 factorial-1.
a digitLength.
[a * a - a squaredToom4 = 0] assert.
[Smalltalk garbageCollect.
[1000 timesRepeat: [a squaredToom4]] timeToRun] value /
[Smalltalk garbageCollect.
[1000 timesRepeat: [a squaredKaratsuba]] timeToRun] value asFloat
"!
Item was changed:
+ ----- Method: LargePositiveInteger>>toom3Times: (in category 'private') -----
- ----- Method: LargePositiveInteger>>toom3Times: (in category 'arithmetic') -----
toom3Times: anInteger
"Eventually use a variant of Toom-Cooke for performing multiplication.
Toom-Cooke is a generalization of Karatsuba divide and conquer algorithm.
It divides operands in n parts instead of 2.
See https://en.wikipedia.org/wiki/Toom%E2%80%93Cook_multiplication
Here we divide each operand in 3 parts, thus the name Toom-3.
And use a Bodrato-Zanoni variant for the choice of interpolation points and matrix inversion
See What about Toom-Cook matrices optimality? - Marco Bodrato, Alberto Zanoni - Oct. 2006
http://www.bodrato.it/papers/WhatAboutToomCookMatricesOptimality.pdf"
| third x2 x1 x0 y2 y1 y0 xLen yLen y20 z4 z3 z2 z1 z0 x20 |
+ "Revert to schoolbook multiplication if short"
+ (xLen := self digitLength) < 6000
+ ifTrue: [^ self karatsubaTimes: anInteger ].
- "arrange to have the receiver be the shorter"
- (xLen := self digitLength) > (yLen := anInteger digitLength)
- ifTrue: [^anInteger toom3Times: self ].
-
- "If too short to be worth, fallback to Karatsuba algorithm"
- (xLen > 6000) ifFalse: [^self karatsubaTimes: anInteger].
+ "Arrange to have the receiver be the shorter and retry"
+ xLen > (yLen := anInteger digitLength)
+ ifTrue: [^ anInteger toom3Times: self ].
+
"Seek for well balanced integers"
yLen > (5 * xLen bitShift: -2)
ifTrue: [^(0 to: yLen - 1 by: xLen + 1) digitShiftSum: [:yShift |
self toom3Times: (anInteger copyDigitsFrom: yShift + 1 to: yShift +1 + xLen)]].
"At this point, lengths are well balanced, divide in 3 parts"
third := yLen + 2 // 3 bitClear: 2r11.
x2 := self butLowestNDigits: third * 2.
x1 := self copyDigitsFrom: third + 1 to: third * 2.
x0 := self lowestNDigits: third.
y2 := anInteger butLowestNDigits: third * 2.
y1 := anInteger copyDigitsFrom: third + 1 to: third * 2.
y0 := anInteger lowestNDigits: third.
"Toom-3 trick: 5 multiplications instead of 9"
z0 := x0 toom3Times: y0.
z4 := x2 toom3Times: y2.
x20 := x2 + x0.
y20 := y2 + y0.
z1 := x20 + x1 toom3Times: y20 + y1.
x20 := x20 - x1.
y20 := y20 - y1.
z2 := x20 toom3Times: y20.
z3 := (x20 + x2 bitShift: 1) - x0 toom3Times: (y20 + y2 bitShift: 1) - y0.
"Sum the parts of decomposition"
z3 := z3 - z1 quo: 3.
z1 := z1 - z2 bitShift: -1.
z2 := z2 - z0.
z3 := (z2 - z3 bitShift: -1) + (z4 bitShift: 1).
z2 := z2 + z1 - z4.
z1 := z1 - z3.
^z0 + (z1 bitShift: 8*third) + (z2 bitShift: 16*third) + (z3 + (z4 bitShift: 8*third) bitShift: 24*third)
"
| a b |
a :=5000 factorial - 1.
b := 6000 factorial - 1.
a digitLength min: b digitLength.
a digitLength max: b digitLength.
(a toom3Times: b) = (a * b) ifFalse: [self error].
[Smalltalk garbageCollect.
[300 timesRepeat: [a toom3Times: b]] timeToRun] value /
[Smalltalk garbageCollect.
[300 timesRepeat: [a karatsubaTimes: b]] timeToRun] value asFloat
"!
I've been trying to update the NuScratch packages a bit and having 'fun'.
The first problem was simply trying to install the current version; the gpio package has two dependencies and one of them complained that it could not be found. That would be JSON->tonyg.37. Which was a moment's confusion because that is not listed on SM at all, where the versions are 39 a&b. Then I looked at the mcz file in a zip window and noticed that the json tonyg.37 package was included within the mcz, which added some more confusion because it's *right there* so how can it not be found?
After much puzzling I was able to change the dependency to the newer JSON->tony39b and get it uploaded to squeaksource. Anyone wanting a nice monticello project could do worse than improving the tools around those dependencies; we don't see relevant info anywhere except (so far as I could find) in the info window displayed just after saving a package. Nor does it seem we can removed just one dependency out of several.
I then needed to update the SM entry in order to load the updated gpio package. Unfortunately this failed to upload properly - eventually I got a timeout error, which locked up the image for a couple of minutes before finally returning control to me. It looks like the http://map.squeak.org/accountbyid/4340a66e-2296-48b7-9aa8-5305d303752f/file… file didn't get there - attemtping to fetch it results in a 'not here mate' error. Rather confusingly the SM tool does actually show the new version in the list, which isn't reflected in the reality of the web server - opening another image to check in a separate SM catalog shows the lack of the new version. I guess it would be nice to only update the SM tool if the upload is successful.
Two attempts have failed to achieve anything and I don't want to overload the server image if it is having problems. What next?
tim
--
tim Rowledge; tim(a)rowledge.org; http://www.rowledge.org/tim
Try not to let implementation details sneak into design documents.
Nicolas Cellier uploaded a new version of Kernel to project The Trunk:
http://source.squeak.org/trunk/Kernel-nice.1221.mcz
==================== Summary ====================
Name: Kernel-nice.1221
Author: nice
Time: 28 April 2019, 2:31:07.899491 am
UUID: 4aeee53e-f320-4ece-bf93-a5458f65bef3
Ancestors: Kernel-nice.1220
Fix broken pre-condition a3>=b/4 in sqrtRem.
While at it, care to explain.
=============== Diff against Kernel-nice.1220 ===============
Item was changed:
----- Method: LargePositiveInteger>>sqrtRem (in category 'mathematical functions') -----
sqrtRem
"Like super, but use a divide and conquer method to perform this operation.
See Paul Zimmermann. Karatsuba Square Root. [Research Report] RR-3805, INRIA. 1999, pp.8. <inria-00072854>
https://hal.inria.fr/inria-00072854/PDF/RR-3805.pdf"
+ | n qr q s r sr a3a2 a1 a0 |
+ "Split self in 4 digits a3,a2,a1,a0 in base b,
+ such that most significant digit a3 >= b/4
+ It is not a problem to have a3 >= b,
+ so we can round b down to a whole number of bytes n"
+ n := self highBit bitShift: -5. "bitShift: -2 divide in 4 parts, bitShift: -3 round down in bytes"
- | n qr q s r sr high mid low |
- n := self digitLength bitShift: -2.
n >= 16 ifFalse: [^super sqrtRem].
+ a3a2 := self butLowestNDigits: n * 2.
+ a1 := self copyDigitsFrom: n + 1 to: n * 2.
+ a0 := self lowestNDigits: n.
- high := self butLowestNDigits: n * 2.
- mid := self copyDigitsFrom: n + 1 to: n * 2.
- low := self lowestNDigits: n.
+ sr := a3a2 sqrtRem.
+ qr := (sr last bitShift: 8 * n) + a1 digitDiv: (sr first bitShift: 1) neg: false.
- sr := high sqrtRem.
- qr := (sr last bitShift: 8 * n) + mid digitDiv: (sr first bitShift: 1) neg: false.
q := qr first normalize.
s := (sr first bitShift: 8 * n) + q.
+ r := (qr last normalize bitShift: 8 * n) + a0 - q squared.
- r := (qr last normalize bitShift: 8 * n) + low - q squared.
r negative
ifTrue:
[r := (s bitShift: 1) + r - 1.
s := s - 1].
sr at: 1 put: s; at: 2 put: r.
^sr
!