Faster fibonacci

List overview All Threads
Download

newer

older

Found at last - swiki sources!

The Inbox: Graphics-tobe.425.mcz

tim Rowledge

16 Apr 2019 16 Apr '19

9:45 p.m.

I've been involved in a long and sometimes weird discussion on the RaspberryPi forums about fibonacci implementations in different languages and using assorted algorithms. I'd never really thought about alternatives to recursive approaches before but it turns out there is massively faster way. Clearly, this is not to replace the benchFib code since that has a specific use, but if anyone with an interest in numerics wants a faster way to generate fib(8577) for example, this is how -

Integer >> fastDoublingFib "derived from https://www.nayuki.io/page/fast-fibonacci-algorithms" "(1 to: 20) collect:[:i| i fastDoublingFib]" "testing a quite large one - " "8577 fastDoublingFib= 13724780954457889052017147206983806244049002655849289934662148526555403650297300643609932653364630032094175733360509578504423049114004867523161091564824674600978308740378989479162611031273424686573759784740689516901416473328655422390895971263265867635819510949686102571388751758998017349379498918930220900180038324615253426530740473855269056304380498630108126734047701362218655030270360608989081398866489746698916626888016696892691933237089180504631788915650757757515944644732966345269202761471025651071790297611079540881155092137592980230998234868586211881093892491570520577408961869977892273540956424095750855208529072246641778982103984467921868950012668004047986803017482248992968482737462668300684879633714025755790485860328854796518843956263863014632532331145699658530054942590047273403691531821918862996422405159427262092477196755988981309029424760342802374213122162727840557722145891090413688461745240415668189577836068480363407847582529735341950500636735281963089675493707159434777756081146452522323681782226760627277553296721358921412115264845467834979154061137421532609247762981818564578019888974692581079593575783553856910367568474613323528337733872069223030834774749130478360574004172522316484339530942110067893000847800932306298725285623628731149337468217751734165148732164148285915275115006479682658150442259002271790547596033006363411193653337536041106069912826015502035140618407668385378737477702597473151509972754111640855347958033314453349633268543893894677097708945041254623018915871109789412793709229204261914803477697183287924195770678873001065036313926288444791424871512110658175954743584548831946767673488152740675550518235698898217693311515366329280005757014637854214769152690638778904780724293185353992279724740604674926819294787586671833537117545443846365508358918882" | a b c | a := 0. b := 1. self highBit to: 1 by: -1 do:[:i||d e| d := (b bitShift: 1) - a * a. e := a squared + b squared. a := d. b := e. (self bitAt: i) = 1 ifTrue:[ c := a + b. a := b. b := c] ]. ^a I bet it could be sped up more, but a simply spyOn run just allocated all the primitive time to delay waiting. Didn't we improve that some time ago?

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: BYEBYE: Store in Write-Only Storage

Show replies by date

Levente Uzonyi

17 Apr 17 Apr

1:26 p.m.

On Tue, 16 Apr 2019, tim Rowledge wrote:

...

I've been involved in a long and sometimes weird discussion on the RaspberryPi forums about fibonacci implementations in different languages and using assorted algorithms. I'd never really thought about alternatives to recursive approaches before but it turns out there is massively faster way. Clearly, this is not to replace the benchFib code since that has a specific use, but if anyone with an interest in numerics wants a faster way to generate fib(8577) for example, this is how -

Integer >> fastDoublingFib "derived from https://www.nayuki.io/page/fast-fibonacci-algorithms" "(1 to: 20) collect:[:i| i fastDoublingFib]" "testing a quite large one - " "8577 fastDoublingFib= 13724780954457889052017147206983806244049002655849289934662148526555403650297300643609932653364630032094175733360509578504423049114004867523161091564824674600978308740378989479162611031273424686573759784740689516901416473328655422390895971263265867635819510949686102571388751758998017349379498918930220900180038324615253426530740473855269056304380498630108126734047701362218655030270360608989081398866489746698916626888016696892691933237089180504631788915650757757515944644732966345269202761471025651071790297611079540881155092137592980230998234868586211881093892491570520577408961869977892273540956424095750855208529072246641778982103984467921868950012668004047986803017482248992968482737462668300684879633714025755790485860328854796518843956263863014632532331145699658530054942590047273403691531821918862996422405159427262092477196755988981309029424760342802374213122162727840557722145891090413688461745240415668189577836068480363407847582529735341950500636735281963089675493707159434777756081146452522323681782226760627277553296721358921412115264845467834979154061137421532609247762981818564578019888974692581079593575783553856910367568474613323528337733872069223030834774749130478360574004172522316484339530942110067893000847800932306298725285623628731149337468217751734165148732164148285915275115006479682658150442259002271790547596033006363411193653337536041106069912826015502035140618407668385378737477702597473151509972754111640855347958033314453349633268543893894677097708945041254623018915871109789412793709229204261914803477697183287924195770678873001065036313926288444791424871512110658175954743584548831946767673488152740675550518235698898217693311515366329280005757014637854214769152690638778904780724293185353992279724740604674926819294787586671833537117545443846365508358918882" | a b c | a := 0. b := 1. self highBit to: 1 by: -1 do:[:i||d e| d := (b bitShift: 1) - a * a. e := a squared + b squared. a := d. b := e. (self bitAt: i) = 1 ifTrue:[ c := a + b. a := b. b := c] ]. ^a I bet it could be sped up more, but a simply spyOn run just allocated all the primitive time to delay waiting. Didn't we improve that some time ago?

That's some pretty nifty code. Other than the typical tricks, you can't make it any quicker for larger receivers. What are these typical tricks? - inlining methods - decreasing the number of temporaries needed - reordering temporaries so that the more frequently used ones use registers - declare all temporaries at the method level so that the compiler can skip a few bytecodes setting their values to nil in each loop

But as the receiver gets larger, the LargeInteger primitives will dominate the run time more and more, so the tricks will have neglibile effect on performance.

And yes, we do have a better profiler: AndreasSystemProfiler. It hasn't been integrated into Squeak yet for some reason, but it's available on SqueakMap.

Levente

...

tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: BYEBYE: Store in Write-Only Storage

tim Rowledge

8:56 p.m.

...

On 2019-04-17, at 4:26 AM, Levente Uzonyi leves@caesar.elte.hu wrote:

On Tue, 16 Apr 2019, tim Rowledge wrote:

...
I've been involved in a long and sometimes weird discussion on the RaspberryPi forums about fibonacci implementations in different languages and using assorted algorithms. I'd never really thought about alternatives to recursive approaches before but it turns out there is massively faster way. Clearly, this is not to replace the benchFib code since that has a specific use, but if anyone with an interest in numerics wants a faster way to generate fib(8577) for example, this is how - Integer >> fastDoublingFib

[snip]

...

That's some pretty nifty code. Other than the typical tricks, you can't make it any quicker for larger receivers. What are these typical tricks?

inlining methods

decreasing the number of temporaries needed

reordering temporaries so that the more frequently used ones use registers

declare all temporaries at the method level so that the compiler can skip a few bytecodes setting their values to nil in each loop

But as the receiver gets larger, the LargeInteger primitives will dominate the run time more and more, so the tricks will have neglibile effect on performance.

A spyOn of the amusing fib(4784969) (find the first fib which has more than a million digits, apparently) showed 99.8% in primitives, within the limits of the spyOn code.

There is also "Karatsuba fast multiplication" on that same site, which is apparently faster for biiiiiig numbers. There is even example code in python and java. Who feels like trying it out?

I'll note that a C implementation of the fast-doubling-fib-with-karatsuba code is on github at https://github.com/ZiCog/fibo_4784969/tree/master/c and declares a runtime of 63 seconds on a Pi 3. I got 72 on Squeak on my Pi 3 (without karatsuba multiply). So pretty good so far.

...

And yes, we do have a better profiler: AndreasSystemProfiler. It hasn't been integrated into Squeak yet for some reason, but it's available on SqueakMap.

Perhaps I'll try it out just for fun.

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- Wise as the world is flat.

Nicolas Cellier

9:28 p.m.

Hi Tim, I have published some Karatsuba implementation in squeak for years, it's quite easy. For example http://smallissimo.blogspot.com/2011/07/optimizing-product-of-prime-powers.h...

Andres Valloud did provide some in VW too, then probably in Cuis too...

Le mer. 17 avr. 2019 à 20:56, tim Rowledge tim@rowledge.org a écrit :

...

...
On 2019-04-17, at 4:26 AM, Levente Uzonyi leves@caesar.elte.hu wrote:

On Tue, 16 Apr 2019, tim Rowledge wrote:

...
I've been involved in a long and sometimes weird discussion on the

RaspberryPi forums about fibonacci implementations in different languages and using assorted algorithms. I'd never really thought about alternatives to recursive approaches before but it turns out there is massively faster way. Clearly, this is not to replace the benchFib code since that has a specific use, but if anyone with an interest in numerics wants a faster way to generate fib(8577) for example, this is how -

...
...
Integer >> fastDoublingFib

[snip]

...
That's some pretty nifty code. Other than the typical tricks, you can't

make it any quicker for larger receivers.

...
What are these typical tricks?

inlining methods

decreasing the number of temporaries needed

reordering temporaries so that the more frequently used ones use

registers

...

declare all temporaries at the method level so that the compiler can

skip a few bytecodes setting their values to nil in each loop

...
But as the receiver gets larger, the LargeInteger primitives will

dominate the run time more and more, so the tricks will have neglibile effect on performance.

A spyOn of the amusing fib(4784969) (find the first fib which has more than a million digits, apparently) showed 99.8% in primitives, within the limits of the spyOn code.

There is also "Karatsuba fast multiplication" on that same site, which is apparently faster for biiiiiig numbers. There is even example code in python and java. Who feels like trying it out?

I'll note that a C implementation of the fast-doubling-fib-with-karatsuba code is on github at https://github.com/ZiCog/fibo_4784969/tree/master/c and declares a runtime of 63 seconds on a Pi 3. I got 72 on Squeak on my Pi 3 (without karatsuba multiply). So pretty good so far.

...
And yes, we do have a better profiler: AndreasSystemProfiler. It hasn't

been integrated into Squeak yet for some reason, but it's available on SqueakMap.

Perhaps I'll try it out just for fun.

tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- Wise as the world is flat.

Nicolas Cellier

9:38 p.m.

Let me see... For example here in 2011:

MCHttpRepository location: 'http://www.squeaksource.com/FactorialContest' user: '' password: ''.

I use Karatsuba a lot for ArbitraryPrecisionFloat... But there is even more efficient for monstruous integers, Schoenhage-*Strassen *which use kind of Fourier Transform, but harder to program... And I think that I recently heard that an algorithm approching the O(log N) barrier has been published...

Le mer. 17 avr. 2019 à 21:28, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> a écrit :

...

Hi Tim, I have published some Karatsuba implementation in squeak for years, it's quite easy. For example http://smallissimo.blogspot.com/2011/07/optimizing-product-of-prime-powers.h...

Andres Valloud did provide some in VW too, then probably in Cuis too...

Le mer. 17 avr. 2019 à 20:56, tim Rowledge tim@rowledge.org a écrit :

...
...
On 2019-04-17, at 4:26 AM, Levente Uzonyi leves@caesar.elte.hu wrote:

On Tue, 16 Apr 2019, tim Rowledge wrote:

...
I've been involved in a long and sometimes weird discussion on the

RaspberryPi forums about fibonacci implementations in different languages and using assorted algorithms. I'd never really thought about alternatives to recursive approaches before but it turns out there is massively faster way. Clearly, this is not to replace the benchFib code since that has a specific use, but if anyone with an interest in numerics wants a faster way to generate fib(8577) for example, this is how -

...
...
Integer >> fastDoublingFib

[snip]

...
That's some pretty nifty code. Other than the typical tricks, you can't

make it any quicker for larger receivers.

...
What are these typical tricks?

inlining methods

decreasing the number of temporaries needed

reordering temporaries so that the more frequently used ones use

registers

...

declare all temporaries at the method level so that the compiler can

skip a few bytecodes setting their values to nil in each loop

...
But as the receiver gets larger, the LargeInteger primitives will

dominate the run time more and more, so the tricks will have neglibile effect on performance.

A spyOn of the amusing fib(4784969) (find the first fib which has more than a million digits, apparently) showed 99.8% in primitives, within the limits of the spyOn code.

There is also "Karatsuba fast multiplication" on that same site, which is apparently faster for biiiiiig numbers. There is even example code in python and java. Who feels like trying it out?

I'll note that a C implementation of the fast-doubling-fib-with-karatsuba code is on github at https://github.com/ZiCog/fibo_4784969/tree/master/c and declares a runtime of 63 seconds on a Pi 3. I got 72 on Squeak on my Pi 3 (without karatsuba multiply). So pretty good so far.

...
And yes, we do have a better profiler: AndreasSystemProfiler. It hasn't

been integrated into Squeak yet for some reason, but it's available on SqueakMap.

Perhaps I'll try it out just for fun.

tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- Wise as the world is flat.

Nicolas Cellier

9:39 p.m.

Le mer. 17 avr. 2019 à 21:38, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> a écrit :

...

Let me see... For example here in 2011:

MCHttpRepository location: 'http://www.squeaksource.com/FactorialContest' user: '' password: ''.

I use Karatsuba a lot for ArbitraryPrecisionFloat... But there is even more efficient for monstruous integers, Schoenhage-*Strassen *which use kind of Fourier Transform, but harder to program... And I think that I recently heard that an algorithm approching the O(log N) barrier has been published...

O(N log N) of course...

...

Le mer. 17 avr. 2019 à 21:28, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> a écrit :

...
Hi Tim, I have published some Karatsuba implementation in squeak for years, it's quite easy. For example http://smallissimo.blogspot.com/2011/07/optimizing-product-of-prime-powers.h...

Andres Valloud did provide some in VW too, then probably in Cuis too...

Le mer. 17 avr. 2019 à 20:56, tim Rowledge tim@rowledge.org a écrit :

...
...
On 2019-04-17, at 4:26 AM, Levente Uzonyi leves@caesar.elte.hu

wrote:

...
On Tue, 16 Apr 2019, tim Rowledge wrote:

...
I've been involved in a long and sometimes weird discussion on the

RaspberryPi forums about fibonacci implementations in different languages and using assorted algorithms. I'd never really thought about alternatives to recursive approaches before but it turns out there is massively faster way. Clearly, this is not to replace the benchFib code since that has a specific use, but if anyone with an interest in numerics wants a faster way to generate fib(8577) for example, this is how -

...
...
Integer >> fastDoublingFib

[snip]

...
That's some pretty nifty code. Other than the typical tricks, you

can't make it any quicker for larger receivers.

...
What are these typical tricks?

inlining methods

decreasing the number of temporaries needed

reordering temporaries so that the more frequently used ones use

registers

...

declare all temporaries at the method level so that the compiler can

skip a few bytecodes setting their values to nil in each loop

...
But as the receiver gets larger, the LargeInteger primitives will

dominate the run time more and more, so the tricks will have neglibile effect on performance.

A spyOn of the amusing fib(4784969) (find the first fib which has more than a million digits, apparently) showed 99.8% in primitives, within the limits of the spyOn code.

There is also "Karatsuba fast multiplication" on that same site, which is apparently faster for biiiiiig numbers. There is even example code in python and java. Who feels like trying it out?

I'll note that a C implementation of the fast-doubling-fib-with-karatsuba code is on github at https://github.com/ZiCog/fibo_4784969/tree/master/c and declares a runtime of 63 seconds on a Pi 3. I got 72 on Squeak on my Pi 3 (without karatsuba multiply). So pretty good so far.

...
And yes, we do have a better profiler: AndreasSystemProfiler. It

hasn't been integrated into Squeak yet for some reason, but it's available on SqueakMap.

Perhaps I'll try it out just for fun.

tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- Wise as the world is flat.

Nicolas Cellier

9:40 p.m.

Le mer. 17 avr. 2019 à 21:39, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> a écrit :

...

Le mer. 17 avr. 2019 à 21:38, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> a écrit :

...
Let me see... For example here in 2011:

MCHttpRepository location: 'http://www.squeaksource.com/FactorialContest' user: '' password: ''.

I use Karatsuba a lot for ArbitraryPrecisionFloat... But there is even more efficient for monstruous integers, Schoenhage-*Strassen *which use kind of Fourier Transform, but harder to program... And I think that I recently heard that an algorithm approching the O(log N) barrier has been published...

O(N log N) of course...

https://hal.archives-ouvertes.fr/hal-02070778/document

...

...
Le mer. 17 avr. 2019 à 21:28, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> a écrit :

...
Hi Tim, I have published some Karatsuba implementation in squeak for years, it's quite easy. For example http://smallissimo.blogspot.com/2011/07/optimizing-product-of-prime-powers.h...

Andres Valloud did provide some in VW too, then probably in Cuis too...

Le mer. 17 avr. 2019 à 20:56, tim Rowledge tim@rowledge.org a écrit :

...
...
On 2019-04-17, at 4:26 AM, Levente Uzonyi leves@caesar.elte.hu

wrote:

...
On Tue, 16 Apr 2019, tim Rowledge wrote:

...
I've been involved in a long and sometimes weird discussion on the

RaspberryPi forums about fibonacci implementations in different languages and using assorted algorithms. I'd never really thought about alternatives to recursive approaches before but it turns out there is massively faster way. Clearly, this is not to replace the benchFib code since that has a specific use, but if anyone with an interest in numerics wants a faster way to generate fib(8577) for example, this is how -

...
...
Integer >> fastDoublingFib

[snip]

...
That's some pretty nifty code. Other than the typical tricks, you

can't make it any quicker for larger receivers.

...
What are these typical tricks?

inlining methods

decreasing the number of temporaries needed

reordering temporaries so that the more frequently used ones use

registers

...

declare all temporaries at the method level so that the compiler

can skip a few bytecodes setting their values to nil in each loop

...
But as the receiver gets larger, the LargeInteger primitives will

dominate the run time more and more, so the tricks will have neglibile effect on performance.

A spyOn of the amusing fib(4784969) (find the first fib which has more than a million digits, apparently) showed 99.8% in primitives, within the limits of the spyOn code.

There is also "Karatsuba fast multiplication" on that same site, which is apparently faster for biiiiiig numbers. There is even example code in python and java. Who feels like trying it out?

I'll note that a C implementation of the fast-doubling-fib-with-karatsuba code is on github at https://github.com/ZiCog/fibo_4784969/tree/master/c and declares a runtime of 63 seconds on a Pi 3. I got 72 on Squeak on my Pi 3 (without karatsuba multiply). So pretty good so far.

...
And yes, we do have a better profiler: AndreasSystemProfiler. It

hasn't been integrated into Squeak yet for some reason, but it's available on SqueakMap.

Perhaps I'll try it out just for fun.

tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- Wise as the world is flat.

tim Rowledge

10:53 p.m.

...

On 2019-04-17, at 12:38 PM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:

Let me see... For example here in 2011:

MCHttpRepository location: 'http://www.squeaksource.com/FactorialContest' user: '' password: ''.

Nice, nice !

That does fib(4784969) in 14 seconds on my Pi3, or 5 times faster. Wow. And, er, 4 times faster than the C code version on that github repo :-)

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim "How many Slavers does it take to change a lightbulb?” "Dunno. How susceptible are lightbulbs to telepathy?"

tim Rowledge

11:28 p.m.

And just because it's fun, here's a small cs file that can be run from a commandline (as in `squeak my.image bigFib.cs`) to write the time taken to stdio -

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Science adjusts its views based on what is observed. Faith denies observation so belief can be preserved

tim Rowledge

19 Apr 19 Apr

10:41 p.m.

I'm being told that the .cs file I posted doesn't work on a Windows VM because of an apparent inability to write to stout on Windows - is this correct? *should* it work and it just got broken?

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: KFP: Kindle Fire in Printer

Nicolas Cellier

24 Apr 24 Apr

10:29 p.m.

Hi Tim, I've posted some enhancements with a 3-way Toom-Cook multiplication algorithm (it is a generalization of karatsuba, but we divide in n parts instead of 2, here 3 parts for Toom-3)

Use fastMultiply: instead of karatsubaTimes: Maybe I'll post more enhancements if time permits...

Le ven. 19 avr. 2019 à 22:41, tim Rowledge tim@rowledge.org a écrit :

...

I'm being told that the .cs file I posted doesn't work on a Windows VM because of an apparent inability to write to stout on Windows - is this correct? *should* it work and it just got broken?

tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: KFP: Kindle Fire in Printer

tim Rowledge

26 Apr 26 Apr

4:01 a.m.

...

On 2019-04-24, at 1:29 PM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:

Hi Tim, I've posted some enhancements with a 3-way Toom-Cook multiplication algorithm (it is a generalization of karatsuba, but we divide in n parts instead of 2, here 3 parts for Toom-3)

Wow. That's nearly twice as fast for the fib(4784969) - 8.4 sec. Amazing. So about 10x over the original. Now *printing* the number is rather slower and if you have any magic code to improve that it would be interesting.

I would *strongly* support putting the algorithm improvements into trunk. Very little code for colossal speedup and a really interesting exemplar of advanced algorithms.

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: ESBD: Erase System and Burn Documentation

Fabio Niephaus

1:54 p.m.

On Fri, Apr 26, 2019 at 4:02 AM tim Rowledge tim@rowledge.org wrote:

...

...
On 2019-04-24, at 1:29 PM, Nicolas Cellier <

nicolas.cellier.aka.nice@gmail.com> wrote:

...
Hi Tim, I've posted some enhancements with a 3-way Toom-Cook multiplication

algorithm

...
(it is a generalization of karatsuba, but we divide in n parts instead

of 2, here 3 parts for Toom-3)

Wow. That's nearly twice as fast for the fib(4784969) - 8.4 sec. Amazing. So about 10x over the original. Now *printing* the number is rather slower and if you have any magic code to improve that it would be interesting.

I would *strongly* support putting the algorithm improvements into trunk. Very little code for colossal speedup and a really interesting exemplar of advanced algorithms.

...

tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: ESBD: Erase System and Burn Documentation

Nicolas Cellier

6:10 p.m.

Le ven. 26 avr. 2019 à 13:55, Fabio Niephaus lists@fniephaus.com a écrit :

...

On Fri, Apr 26, 2019 at 4:02 AM tim Rowledge tim@rowledge.org wrote:

...
...
On 2019-04-24, at 1:29 PM, Nicolas Cellier <

nicolas.cellier.aka.nice@gmail.com> wrote:

...
Hi Tim, I've posted some enhancements with a 3-way Toom-Cook multiplication

algorithm

...
(it is a generalization of karatsuba, but we divide in n parts instead

of 2, here 3 parts for Toom-3)

Wow. That's nearly twice as fast for the fib(4784969) - 8.4 sec. Amazing. So about 10x over the original. Now *printing* the number is rather slower and if you have any magic code to improve that it would be interesting.

I would *strongly* support putting the algorithm improvements into trunk. Very little code for colossal speedup and a really interesting exemplar of advanced algorithms.

+1

changes are in inbox now they should have associated tests

For the printing, cost is dominated by division. division cost is dominated by multiplication. We could opt for a divide and conquer algorithm too, or mixed Barett-Newton-Raphson which is the most efficient known algorithm See for example http://bioinfo.ict.ac.cn/~dbu/AlgorithmCourses/Lectures/Lec5-Fast-Division-H...

Another interesting POV is that most of these divide and conquer lead to easy parallelisation... If we could spawn image clones above a certain level, and communicate the results via sockets. The printing though would require to share the target String memory, or we could end in a O(n*log(n)) reconstruction cost to gather the pieces.

...

...
tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: ESBD: Erase System and Burn Documentation

Nicolas Cellier

27 Apr 27 Apr

10:43 a.m.

Le ven. 26 avr. 2019 à 18:10, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> a écrit :

...

Le ven. 26 avr. 2019 à 13:55, Fabio Niephaus lists@fniephaus.com a écrit :

...
On Fri, Apr 26, 2019 at 4:02 AM tim Rowledge tim@rowledge.org wrote:

...
...
On 2019-04-24, at 1:29 PM, Nicolas Cellier <

nicolas.cellier.aka.nice@gmail.com> wrote:

...
Hi Tim, I've posted some enhancements with a 3-way Toom-Cook multiplication

algorithm

...
(it is a generalization of karatsuba, but we divide in n parts instead

of 2, here 3 parts for Toom-3)

Wow. That's nearly twice as fast for the fib(4784969) - 8.4 sec. Amazing. So about 10x over the original. Now *printing* the number is rather slower and if you have any magic code to improve that it would be interesting.

I would *strongly* support putting the algorithm improvements into trunk. Very little code for colossal speedup and a really interesting exemplar of advanced algorithms.

+1

changes are in inbox now they should have associated tests

For the printing, cost is dominated by division. division cost is dominated by multiplication. We could opt for a divide and conquer algorithm too, or mixed Barett-Newton-Raphson which is the most efficient known algorithm See for example http://bioinfo.ict.ac.cn/~dbu/AlgorithmCourses/Lectures/Lec5-Fast-Division-H...

Just ask, it's in the inbox now :)

Another interesting POV is that most of these divide and conquer lead to

...

easy parallelisation... If we could spawn image clones above a certain level, and communicate the results via sockets. The printing though would require to share the target String memory, or we could end in a O(n*log(n)) reconstruction cost to gather the pieces.

...
...
tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: ESBD: Erase System and Burn Documentation

tim Rowledge

8:26 p.m.

...

On 2019-04-27, at 1:43 AM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:

Just ask, it's in the inbox now :)

Good grief. That's some interesting code.

BUT - it isn't in the inbox so far as I can see; the newest Kernel mcz is http://source.squeak.org/inbox/Kernel-nice.1216.mcz ... oh, wait, I see you've put an even newer version in trunk. Phew, I thought you might have been suffering from uploading problems like my NuScratch stuff.

I merged (because I had my fastfib method already) your Kernel-nice.1220 and tried out [4784969 fastDoublingFib asString ] timeToRun . It crashed because of no Interval>>digitShiftSum: method, which needs to go into Collections I guess.

Previously [4784969 fastDoublingFib asString ] timeToRun -> 136833mS (on my Pi3+) Now [4784969 fastDoublingFib asString ] timeToRun -> 21060mS Call 6.5X faster.

Assuming we are all confident about the correctness I'd urge making that the default multiply for large ints. Unless somebody is going to improve the plugin to assist in making this even faster?

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- Talks to plants on their own level.

Nicolas Cellier

10:49 p.m.

Le sam. 27 avr. 2019 à 20:26, tim Rowledge tim@rowledge.org a écrit :

...

...
On 2019-04-27, at 1:43 AM, Nicolas Cellier <

nicolas.cellier.aka.nice@gmail.com> wrote:

...
Just ask, it's in the inbox now :)

Good grief. That's some interesting code.

BUT - it isn't in the inbox so far as I can see; the newest Kernel mcz is http://source.squeak.org/inbox/Kernel-nice.1216.mcz ... oh, wait, I see you've put an even newer version in trunk. Phew, I thought you might have been suffering from uploading problems like my NuScratch stuff.

I merged (because I had my fastfib method already) your Kernel-nice.1220 and tried out [4784969 fastDoublingFib asString ] timeToRun . It crashed because of no Interval>>digitShiftSum: method, which needs to go into Collections I guess.

Oups! It's in Collection, I forgot to commit it!

Previously

...

[4784969 fastDoublingFib asString ] timeToRun -> 136833mS (on my Pi3+) Now [4784969 fastDoublingFib asString ] timeToRun -> 21060mS Call 6.5X faster.

Assuming we are all confident about the correctness I'd urge making that the default multiply for large ints. Unless somebody is going to improve the plugin to assist in making this even faster?

tim

The problem is that fastMultiply: just put overhead for every moderately LargeIntegers, which is the most common case, so I hesitate to do it the default.

Implementing divide and conquer in the primitives is also not a pleasant path: divide and conquer often trade speed for space, so it means that we must massively allocate new objects in primitives which is a difficult task. Or resuse some preallocated memory, which gives a horrible taste to these beautiful and simple algorithms. Also it hides those algorithms from our view, make debugging not a pleasure, and make improvment a less enjoyable hobby.

What we could do instead, is to make some primitive fail when digitLength is above a certain threshold. That threshold would be a parameter controlled from the image, and a default value of 0 at VM startup could mean no primitive failure making the VM friendly to those image not Karatsuba aware...

...

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- Talks to plants on their own level.

Nicolas Cellier

28 Apr 28 Apr

3:35 p.m.

Le sam. 27 avr. 2019 à 22:49, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> a écrit :

...

Le sam. 27 avr. 2019 à 20:26, tim Rowledge tim@rowledge.org a écrit :

...
...
On 2019-04-27, at 1:43 AM, Nicolas Cellier <

nicolas.cellier.aka.nice@gmail.com> wrote:

...
Just ask, it's in the inbox now :)

Good grief. That's some interesting code.

BUT - it isn't in the inbox so far as I can see; the newest Kernel mcz is http://source.squeak.org/inbox/Kernel-nice.1216.mcz ... oh, wait, I see you've put an even newer version in trunk. Phew, I thought you might have been suffering from uploading problems like my NuScratch stuff.

I merged (because I had my fastfib method already) your Kernel-nice.1220 and tried out [4784969 fastDoublingFib asString ] timeToRun . It crashed because of no Interval>>digitShiftSum: method, which needs to go into Collections I guess.

Oups! It's in Collection, I forgot to commit it!

Previously

...
[4784969 fastDoublingFib asString ] timeToRun -> 136833mS (on my Pi3+) Now [4784969 fastDoublingFib asString ] timeToRun -> 21060mS Call 6.5X faster.

Assuming we are all confident about the correctness I'd urge making that the default multiply for large ints. Unless somebody is going to improve the plugin to assist in making this even faster?

tim

The problem is that fastMultiply: just put overhead for every moderately LargeIntegers, which is the most common case, so I hesitate to do it the default.

Implementing divide and conquer in the primitives is also not a pleasant path: divide and conquer often trade speed for space, so it means that we must massively allocate new objects in primitives which is a difficult task. Or resuse some preallocated memory, which gives a horrible taste to these beautiful and simple algorithms. Also it hides those algorithms from our view, make debugging not a pleasure, and make improvment a less enjoyable hobby.

What we could do instead, is to make some primitive fail when digitLength is above a certain threshold. That threshold would be a parameter controlled from the image, and a default value of 0 at VM startup could mean no primitive failure making the VM friendly to those image not Karatsuba aware...

Hmm, after thoughts and experiments, I retract what I said. The overhead cost in Squeak is very marginal at least in 64bits VM, so we can make #fastMultiply: the default. Last time I tried was in VW, where the arrangement of primitives is a bit different.

Seuls les imbéciles ne changent pas d'avis ;)

...

...
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- Talks to plants on their own level.

Fabio Niephaus

3:40 p.m.

On Sun, 28 Apr 2019 at 3:35 pm, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> wrote:

...

Le sam. 27 avr. 2019 à 22:49, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> a écrit :

...
Le sam. 27 avr. 2019 à 20:26, tim Rowledge tim@rowledge.org a écrit :

...
...
On 2019-04-27, at 1:43 AM, Nicolas Cellier <

nicolas.cellier.aka.nice@gmail.com> wrote:

...
Just ask, it's in the inbox now :)

Good grief. That's some interesting code.

BUT - it isn't in the inbox so far as I can see; the newest Kernel mcz is http://source.squeak.org/inbox/Kernel-nice.1216.mcz ... oh, wait, I see you've put an even newer version in trunk. Phew, I thought you might have been suffering from uploading problems like my NuScratch stuff.

I merged (because I had my fastfib method already) your Kernel-nice.1220 and tried out [4784969 fastDoublingFib asString ] timeToRun . It crashed because of no Interval>>digitShiftSum: method, which needs to go into Collections I guess.

Oups! It's in Collection, I forgot to commit it!

Previously

...
[4784969 fastDoublingFib asString ] timeToRun -> 136833mS (on my Pi3+) Now [4784969 fastDoublingFib asString ] timeToRun -> 21060mS Call 6.5X faster.

Assuming we are all confident about the correctness I'd urge making that the default multiply for large ints. Unless somebody is going to improve the plugin to assist in making this even faster?

tim

The problem is that fastMultiply: just put overhead for every moderately LargeIntegers, which is the most common case, so I hesitate to do it the default.

Implementing divide and conquer in the primitives is also not a pleasant path: divide and conquer often trade speed for space, so it means that we must massively allocate new objects in primitives which is a difficult task. Or resuse some preallocated memory, which gives a horrible taste to these beautiful and simple algorithms. Also it hides those algorithms from our view, make debugging not a pleasure, and make improvment a less enjoyable hobby.

What we could do instead, is to make some primitive fail when digitLength is above a certain threshold. That threshold would be a parameter controlled from the image, and a default value of 0 at VM startup could mean no primitive failure making the VM friendly to those image not Karatsuba aware...

Hmm, after thoughts and experiments, I retract what I said. The overhead cost in Squeak is very marginal at least in 64bits VM, so we can make #fastMultiply: the default. Last time I tried was in VW, where the arrangement of primitives is a bit different.

Java's BigInteger applies different multiplication algorithms depending on the values to multiply: https://github.com/openjdk/jdk/blob/4ebb1e2702e7d22f86eacef3b4a5204df70416ab...

Maybe we could do something similar?

Fabio

...

Seuls les imbéciles ne changent pas d'avis ;)

--

...
...
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- Talks to plants on their own level.

tim Rowledge

2 May 2 May

10:45 p.m.

By sheer fluke I stumbled on an article that may be of interest here; apparently the 'ultimate multiplication' technique has now been found, building on the work of Karatsuba, Schönhage and Strassen, and lately Harvey & Van Der Hoeven.

Strange stuff but apparently achieving n log(n) steps for n digit multiplies. https://www.quantamagazine.org/mathematicians-discover-the-perfect-way-to-mu... and the original paper at https://hal.archives-ouvertes.fr/hal-02070778/document

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim "bOtHeR" said Pooh, mistaking the LSD tablet for aspirin

Nicolas Cellier

3 May 3 May

7:21 a.m.

Hi Tim, That's what i wrote 16 mails above in th thread :)

Le jeu. 2 mai 2019 à 22:45, tim Rowledge tim@rowledge.org a écrit :

...

By sheer fluke I stumbled on an article that may be of interest here; apparently the 'ultimate multiplication' technique has now been found, building on the work of Karatsuba, Schönhage and Strassen, and lately Harvey & Van Der Hoeven.

Strange stuff but apparently achieving n log(n) steps for n digit multiplies.

https://www.quantamagazine.org/mathematicians-discover-the-perfect-way-to-mu... and the original paper at https://hal.archives-ouvertes.fr/hal-02070778/document

tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim "bOtHeR" said Pooh, mistaking the LSD tablet for aspirin

tim Rowledge

7:46 a.m.

...

On 2019-05-02, at 10:21 PM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:

Hi Tim, That's what i wrote 16 mails above in th thread :)

I knew that, of course I did, absolutely, how could you possibly doubt me?

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- So dumb, he faxes face up.

tim Rowledge

19 Jul 19 Jul

7:21 a.m.

A small but valuable improvement in the fibonacci algorithm blatantly stolen from a python version by gkreidel on the Pi forums seems to be ~25% faster for 4784969 fibonacci

!Integer methodsFor: 'mathematical functions' stamp: 'tpr 5/6/2019 12:22'! fibonacci "derived from https://www.nayuki.io/page/fast-fibonacci-algorithms" "4784969 fibonacci" | a b c | self <= 0 ifTrue: [^0]. "in case somebody tries an inappropriate number" a := 0. b := 1. self highBit to: 2 by: -1 do:[:i||d e| d := ((b bitShift: 1) - a) * a. e := a squared + b squared. a := d. b := e. (self bitAt: i) = 1 ifTrue:[ c := a + b. a := b. b := c] ]. ^self odd ifTrue:[a squared + b squared] ifFalse: [((b bitShift: 1) - a) * a]! !

About 75% of time goes on the #squared and 8% on WeakArray finalization.

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- A room temperature IQ.

tim Rowledge

26 Jan 26 Jan

10:59 p.m.

Remember the fun with we had calculating 4784969 fibonacci? There's an interesting article in the Comm.ACM this month about the maybe-final improvement in huge number multiplication; https://cacm.acm.org/magazines/2020/1/241707-multiplication-hits-the-speed-l...

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: RIW: Re-Invent Wheel

David T. Lewis

28 Jan 28 Jan

2:28 a.m.

On Sun, Jan 26, 2020 at 01:59:21PM -0800, tim Rowledge wrote:

...

Remember the fun with we had calculating 4784969 fibonacci? There's an interesting article in the Comm.ACM this month about the maybe-final improvement in huge number multiplication; https://cacm.acm.org/magazines/2020/1/241707-multiplication-hits-the-speed-l...

tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: RIW: Re-Invent Wheel

Very interesting stuff. Probably not ready for a practical implementation in Squeak though. The performance benefits come into play only for very large numbers. The ACM article says:

The n(logn) bound means Harvey and van der Hoeven's algorithm is faster than Schönhage and Strassen's algorithm, or Fürer's algorithm, or any other known multiplication algorithm, provided n is sufficiently large. For now, "sufficiently large" means almost unfathomably large: Harvey and van der Hoeven's algorithm doesn't even kick in until the number of bits in the two numbers being multiplied is greater than 2 raised to the 172912 power. (By comparison, the number of particles in the observable Universe is commonly put at about 2270.)

Checking the size of that boundary in Squeak:

(2 raisedTo: 172912) digitLength. ==> 21615 (2 raisedTo: 172912) asFloat. ==> Infinity

I don't know nothin' about nothin' but even I know that 'Infinity' is a very big number ;-)

But maybe that is missing the point. After all, we actually do a very good job of implementing large integer arithmetic in Squeak. Well OK, it is not exactly "we", Nicolas is the one who did most of it.

Indeed, Squeak has no problem at all dealing with a rediculous expression such as this:

(((2 raisedTo: 172912) cubed cubed cubed + 13 / (2 raisedTo: 172912) * Float pi asScaledDecimal) asInteger) cubed cubed digitLength.

" ==> 5057678 "

That's an integer with about 5 MB of internal representation. So maybe there is one more LargePositiveInteger>>digitMulXXXX method waiting to be created?

Dave

Phil B

3:06 a.m.

That article was an interesting read. One thing I've always been curious about: what applications are people using the arbitrary precision stuff in Squeak for? (I generally make an effort to avoid straying into large integers due to the performance implications, but the things I'm working on don't require the precision they offer.)

On Mon, Jan 27, 2020 at 8:28 PM David T. Lewis lewis@mail.msen.com wrote:

...

On Sun, Jan 26, 2020 at 01:59:21PM -0800, tim Rowledge wrote:

...
Remember the fun with we had calculating 4784969 fibonacci? There's an interesting article in the Comm.ACM this month about the maybe-final improvement in huge number multiplication;

https://cacm.acm.org/magazines/2020/1/241707-multiplication-hits-the-speed-l...

...
tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: RIW: Re-Invent Wheel

Very interesting stuff. Probably not ready for a practical implementation in Squeak though. The performance benefits come into play only for very large numbers. The ACM article says:

The n(logn) bound means Harvey and van der Hoeven's algorithm is faster than Schönhage and Strassen's algorithm, or Fürer's algorithm, or any other known multiplication algorithm, provided n is sufficiently large. For now, "sufficiently large" means almost unfathomably large: Harvey and van der Hoeven's algorithm doesn't even kick in until the number of bits in the two numbers being multiplied is greater than 2 raised to the 172912 power. (By comparison, the number of particles in the observable Universe is commonly put at about 2270.)

Checking the size of that boundary in Squeak:

(2 raisedTo: 172912) digitLength. ==> 21615 (2 raisedTo: 172912) asFloat. ==> Infinity

I don't know nothin' about nothin' but even I know that 'Infinity' is a very big number ;-)

But maybe that is missing the point. After all, we actually do a very good job of implementing large integer arithmetic in Squeak. Well OK, it is not exactly "we", Nicolas is the one who did most of it.

Indeed, Squeak has no problem at all dealing with a rediculous expression such as this:

(((2 raisedTo: 172912) cubed cubed cubed + 13 / (2 raisedTo: 172912) * Float pi asScaledDecimal) asInteger) cubed cubed digitLength.
 " ==> 5057678 "
That's an integer with about 5 MB of internal representation. So maybe there is one more LargePositiveInteger>>digitMulXXXX method waiting to be created?

Dave

Eliot Miranda

4:56 a.m.

Hi Phil,

On Mon, Jan 27, 2020 at 6:07 PM Phil B pbpublist@gmail.com wrote:

...

That article was an interesting read. One thing I've always been curious about: what applications are people using the arbitrary precision stuff in Squeak for? (I generally make an effort to avoid straying into large integers due to the performance implications, but the things I'm working on don't require the precision they offer.)

One thing I used it for was in implementing 64-bit Spur VM above the 32-bit Spur implementation.

...

On Mon, Jan 27, 2020 at 8:28 PM David T. Lewis lewis@mail.msen.com wrote:

...
On Sun, Jan 26, 2020 at 01:59:21PM -0800, tim Rowledge wrote:

...
Remember the fun with we had calculating 4784969 fibonacci? There's an interesting article in the Comm.ACM this month about the maybe-final improvement in huge number multiplication;

https://cacm.acm.org/magazines/2020/1/241707-multiplication-hits-the-speed-l...

...
tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: RIW: Re-Invent Wheel

Very interesting stuff. Probably not ready for a practical implementation in Squeak though. The performance benefits come into play only for very large numbers. The ACM article says:

The n(logn) bound means Harvey and van der Hoeven's algorithm is faster than Schönhage and Strassen's algorithm, or Fürer's algorithm, or any other known multiplication algorithm, provided n is sufficiently large. For now, "sufficiently large" means almost unfathomably large: Harvey and van der Hoeven's algorithm doesn't even kick in until the number of bits in the two numbers being multiplied is greater than 2 raised to the 172912 power. (By comparison, the number of particles in the observable Universe is commonly put at about 2270.)

Checking the size of that boundary in Squeak:

(2 raisedTo: 172912) digitLength. ==> 21615 (2 raisedTo: 172912) asFloat. ==> Infinity

I don't know nothin' about nothin' but even I know that 'Infinity' is a very big number ;-)

But maybe that is missing the point. After all, we actually do a very good job of implementing large integer arithmetic in Squeak. Well OK, it is not exactly "we", Nicolas is the one who did most of it.

Indeed, Squeak has no problem at all dealing with a rediculous expression such as this:

(((2 raisedTo: 172912) cubed cubed cubed + 13 / (2 raisedTo: 172912) * Float pi asScaledDecimal) asInteger) cubed cubed digitLength.
 " ==> 5057678 "
That's an integer with about 5 MB of internal representation. So maybe there is one more LargePositiveInteger>>digitMulXXXX method waiting to be created?

Dave

-- _,,,^..^,,,_ best, Eliot

Nicolas Cellier

7:52 a.m.

Hi Phil, It is essential. Without it, it's very difficult to have correctly rounded conversion float -> ascii decimal ->float. Also, it may help to implement correctly rounded float functions. GNU libc uses gmp. Sometimes, you don't know the accuracy of a float calculus. There are several possible easy strategies like changing the rounding direction, or double the precision, and check the interval you obtain. It's far easier to implement multi precision with integer than with float (though possible). Chryptography heavily rely on large integer arithmetics. Symbolic computer algebra also rely on it (the p-adic factorization of polynomials may rapidly lead to huge integers).

Le mar. 28 janv. 2020 à 04:56, Eliot Miranda eliot.miranda@gmail.com a écrit :

...

Hi Phil,

On Mon, Jan 27, 2020 at 6:07 PM Phil B pbpublist@gmail.com wrote:

...
That article was an interesting read. One thing I've always been curious about: what applications are people using the arbitrary precision stuff in Squeak for? (I generally make an effort to avoid straying into large integers due to the performance implications, but the things I'm working on don't require the precision they offer.)

One thing I used it for was in implementing 64-bit Spur VM above the 32-bit Spur implementation.

...
On Mon, Jan 27, 2020 at 8:28 PM David T. Lewis lewis@mail.msen.com wrote:

...
On Sun, Jan 26, 2020 at 01:59:21PM -0800, tim Rowledge wrote:

...
Remember the fun with we had calculating 4784969 fibonacci? There's an interesting article in the Comm.ACM this month about the maybe-final improvement in huge number multiplication;

https://cacm.acm.org/magazines/2020/1/241707-multiplication-hits-the-speed-l...

...
tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: RIW: Re-Invent Wheel

Very interesting stuff. Probably not ready for a practical implementation in Squeak though. The performance benefits come into play only for very large numbers. The ACM article says:

The n(logn) bound means Harvey and van der Hoeven's algorithm is faster than Schönhage and Strassen's algorithm, or Fürer's algorithm, or any other known multiplication algorithm, provided n is sufficiently large. For now, "sufficiently large" means almost unfathomably large: Harvey and van der Hoeven's algorithm doesn't even kick in until the number of bits in the two numbers being multiplied is greater than 2 raised to the 172912 power. (By comparison, the number of particles in the observable Universe is commonly put at about 2270.)

Checking the size of that boundary in Squeak:

(2 raisedTo: 172912) digitLength. ==> 21615 (2 raisedTo: 172912) asFloat. ==> Infinity

I don't know nothin' about nothin' but even I know that 'Infinity' is a very big number ;-)

But maybe that is missing the point. After all, we actually do a very good job of implementing large integer arithmetic in Squeak. Well OK, it is not exactly "we", Nicolas is the one who did most of it.

Indeed, Squeak has no problem at all dealing with a rediculous expression such as this:

(((2 raisedTo: 172912) cubed cubed cubed + 13 / (2 raisedTo: 172912) * Float pi asScaledDecimal) asInteger) cubed cubed digitLength.
 " ==> 5057678 "
That's an integer with about 5 MB of internal representation. So maybe there is one more LargePositiveInteger>>digitMulXXXX method waiting to be created?

Dave
-- _,,,^..^,,,_ best, Eliot

tim Rowledge

26 Apr 26 Apr

4:40 a.m.

New subject: Lessons learned from (Re: Faster fibonacci)

Several actual things (as opposed to some really stupid whinging from a couple of people on the Pi forums) have come up in pursuit of faster fibonacci finding.

First up, still a question -

...

On 2019-04-19, at 1:41 PM, tim Rowledge tim@rowledge.org wrote:

I'm being told that the .cs file I posted doesn't work on a Windows VM because of an apparent inability to write to stout on Windows - is this correct? *should* it work and it just got broken?

I looked a the VM code and it appears to be there to make stdout etc for Windows work. I still can't use it on Win10. then again, just what *does* work properly on Win10? I see lots of complaints about pretty much everything.

Running stuff from a commandline.

Conceptually it ought to be practical to do - `squeak my.image myCode.cs` ... and it sort of is. The good news is that the readDocumentAtStartup related code mostly gets things right and even handles the file(s) being zipped or on an http server somewhere. Cool! However, it seems that the stdout related initialisation in FileStream gets run *after* the readDocumentAtStartup code - which means any code wanting to write to stdout is out of luck unless the file handles just happen to be already correct from a previous start/save cycle. Clearly this is unlikely when starting a clean image, especially form a freshly downloaded all-in-one package.

Looking at FileStream class initialise I see Smalltalk addToStartUpList: self after: SecurityManager; "the intent being before: AutoStart" and yet inspecting the startuplist shows that FileStream comes well *after* AutoStart (which in a somewhat convoluted way actually makes ProjectLauncher actually do the startup document reading) and so we got things messed up sometime.

Also, it would be nice for there to be an accessible way to have the system read the file and execute it and never open the main window. I have whined about this before to usually uninterested ears, but the main window should not open until some code actually pushes pixels to the glass. That's largely a simple (ish) VM change, to the best of my knowledge. It's always been that way on RISC OS, so it clearly can work. There is, of course, also the possibility that some class in the StartUpList would trigger this before we get to ProjectLauncher>>#startUpAfterLogin. VisualWorks can do this, so we should be able to.

Default file directories, particularly with all-in-one

This is always a tricky one to make consistent; we always know what we *want* it to be but working it out can be fun. The general algorithm we use is 'the directory the image was in' but most OSs have their own idea of what a current working directory is and users often want to work with that instead. I got trapped by this in the fastfib stuff because the all-in-one package has the image in most definitely-not the CWD. Unless you do actually cd to the image's directory. Trying to explain this to people that are doing their best to make Smalltalk look bad is ... tricky. As an example, consider using the all-in-one after cd-ing to the Squeak-x.x-All-in-One directory as suggested. Run the vm via the squeak.sh on the image in Contents/Resources - works ok. Now try including the myCode.cs from above; if it is in the cwd it will not be found because the default dir is actually seen as Squeak-x.x-All-in-One/Contents/Resources and you have a puzzled user. Even more fun happens if myCode.cs is 'correctly' referred to with an absolute path but tries to write an output file (because we couldn't write to stdout under Win10, ibid.)

Screen-centred dialogues and the self-centred progressbar

Turns out that an annoyingly common case is for a dialogue to try to popup - to ask for what to do about some error? - whilst the progressbar is running. At the same centre of the screen. Where the progressbar insists on being on top and hiding the dialogue. Preventing anyone from seeing that there is a dialogue to respond to. Irritating all and sundry.

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- Thinks E=MC^2 is a rap star.

tim Rowledge

10:03 p.m.

New subject: Lessons learned from (Re: Faster fibonacci)

...

On 2019-04-25, at 7:40 PM, tim Rowledge tim@rowledge.org wrote:

Looking at FileStream class initialise I see Smalltalk addToStartUpList: self after: SecurityManager; "the intent being before: AutoStart" and yet inspecting the startuplist shows that FileStream comes well *after* AutoStart (which in a somewhat convoluted way actually makes ProjectLauncher actually do the startup document reading) and so we got things messed up sometime.

It seems all we need is to run FileStream initialise to fix this, at least in a current image.

I'd guess that making sure the order of things in the startup list is correct is something the ReleaseManager ought to be able to do. I would have guessed that it might use SmalltalkImage class>>#initializeStartUpList but can't see a connection in the code right now.

Mind you, that method has a fairly big bug in that it ignores the ordering desires of all the classes involved, which may explain the original problem. We can't simplistically run through the list of classes currently in the startup list and send #initialize to them since any number of them might do much more stuff in those methods, including stuff that could be destructive. We would need to separate out the startup list related code and only run that.

So conceptually, a) in all senders of #addToStartUpList* and related methods, split out the startup list part to an agreed method such as #addToStartUpSequence that handles the ordering requirements, if any. A default nul version inObject would be nice. b) to rebuild the startup list, take copy of the current list and send each item #addToStartUpSequence c) some classes may need to be checked out for where they need to be in the ordering - I see thsat ProcessorScheduler is mentioned particularly in SmalltalkImage class>>#initializeStartUpList and yet its #initialize makes no reference to #addToStartUpList* d) use the rebuilt method from ReleaseBuilder

Ideas?

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- Lights are on but nobody's home.

Marcel Taeumel

29 Apr 29 Apr

8:40 a.m.

New subject: Lessons learned from (Re: Faster fibonacci)

Hi Tim,

command-line scripts are executed *after* all other start-up calls were made using a deferred UI message. There is no need to fiddle around with the order in the start-up list. See:

SmalltalkImage >> #processStartUpList: AutoStart class >> #startUp: (last line) ProjectLauncher >> #startUp ProjectLauncher >> #startUpAfterLogin

Best, Marcel

Am 26.04.2019 22:03:54 schrieb tim Rowledge tim@rowledge.org:

...

On 2019-04-25, at 7:40 PM, tim Rowledge wrote:

Looking at FileStream class initialise I see Smalltalk addToStartUpList: self after: SecurityManager; "the intent being before: AutoStart" and yet inspecting the startuplist shows that FileStream comes well *after* AutoStart (which in a somewhat convoluted way actually makes ProjectLauncher actually do the startup document reading) and so we got things messed up sometime.

It seems all we need is to run FileStream initialise to fix this, at least in a current image.

Ideas?

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- Lights are on but nobody's home.

tim Rowledge

30 Apr 30 Apr

12:14 a.m.

New subject: Lessons learned from (Re: Faster fibonacci)

Hi marcel -

...

On 2019-04-28, at 11:40 PM, Marcel Taeumel marcel.taeumel@hpi.de wrote: command-line scripts are executed *after* all other start-up calls were made using a deferred UI message. There is no need to fiddle around with the order in the start-up list. See:

SmalltalkImage >> #processStartUpList: AutoStart class >> #startUp: (last line) ProjectLauncher >> #startUp ProjectLauncher >> #startUpAfterLogin

I see that; but then we have to work out why the stdin/out/err filehandles need (re)initialising. FileStream class>>startUp: (true) is supposed to do it as part of the normal startup sequence but I had to add it at the beginning of my doit-file. Not to mention why does it seem to not work at all on Windows? And also, there is still the point that things are not in the requested/required order in the StartUp list.

As an extra excitement, how shall we allow for having startup files that want to avoid the UI starting up - say to support the idea of using Squeak for quick jobs as part of scripting? I'm not at all sure what a good answer would be here, it seems at first thought that we'd need either have two ways to specify the files in the commandline (say, -earlyfilein & -uifilein, which are both horrible and should not be used!) or to have some tags in the files to say where sections get evaluated - perhaps [preUI] & [postUI].

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- Fell out of the family tree.

Keith

1:46 a.m.

New subject: Lessons learned from (Re: Faster fibonacci)

On 29 Apr 2019, at 23:14, tim Rowledge wrote:

...

Hi marcel -

...
On 2019-04-28, at 11:40 PM, Marcel Taeumel marcel.taeumel@hpi.de wrote: command-line scripts are executed *after* all other start-up calls were made using a deferred UI message. There is no need to fiddle around with the order in the start-up list. See:

SmalltalkImage >> #processStartUpList: AutoStart class >> #startUp: (last line) ProjectLauncher >> #startUp ProjectLauncher >> #startUpAfterLogin

I see that; but then we have to work out why the stdin/out/err filehandles need (re)initialising. FileStream class>>startUp: (true) is supposed to do it as part of the normal startup sequence but I had to add it at the beginning of my doit-file. Not to mention why does it seem to not work at all on Windows? And also, there is still the point that things are not in the requested/required order in the StartUp list.

As an extra excitement, how shall we allow for having startup files that want to avoid the UI starting up - say to support the idea of using Squeak for quick jobs as part of scripting? I'm not at all sure what a good answer would be here, it seems at first thought that we'd need either have two ways to specify the files in the commandline (say, -earlyfilein & -uifilein, which are both horrible and should not be used!) or to have some tags in the files to say where sections get evaluated - perhaps [preUI] & [postUI].

This was the functionality of my InstallSeries code.

The idea was to aim to load a project as a series of file ins, from a hierarchy of directories whose structure would indicate the priority, such as #preUI, "I am the UI" and #postUI.

The file out scheme, would create InstallSeries compatible changesets according to specific slice definitions, such as "methods in the image categorised as (startup-shutdown)." Thus a slice definition could be applied to more than one image or fork, for import or export.

The aim was to patch each fork (pharo squeak cuis et al) at an early stage in the bootstrap so that subsequent slices of functionality could be held in common. I think I got as far as loading Seaside in Cuis.

Keith

Marcel Taeumel

9:07 a.m.

New subject: Lessons learned from (Re: Faster fibonacci)

Hi Tim,

a "GUI" that would represent a "headless" Squeak (instead of Morphic) could (or should?) also support a main loop with deferred messages. See SqueakShell. :-)

Best, Marcel Am 30.04.2019 00:15:05 schrieb tim Rowledge tim@rowledge.org: Hi marcel -

...

On 2019-04-28, at 11:40 PM, Marcel Taeumel wrote: command-line scripts are executed *after* all other start-up calls were made using a deferred UI message. There is no need to fiddle around with the order in the start-up list. See:

SmalltalkImage >> #processStartUpList: AutoStart class >> #startUp: (last line) ProjectLauncher >> #startUp ProjectLauncher >> #startUpAfterLogin

tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- Fell out of the family tree.

Keith

1:15 a.m.

New subject: Lessons learned from (Re: Faster fibonacci)

Years ago I was trying to rationalise all the various functionality of Smalltalk/SmalltalkImage across forks. I also wanted to support a scheme that would handle being bootstrapped from sources.

i.e. the startup process may load in or unload a number of classes from sources at an early stage, and when the time came to think about #startUp they may not have been #initialised, (the #startUp might be their initialisation).

So I vaguely recollect that when I had a go at re-doing the startuplist I removed this responsibility to a class of its own, called StartupManager, and removed the class var that holds the list in favour of a voting system.

I found some code hiding away (somewhat obfuscated)

https://bazaar.launchpad.net/~keithy/cuis/System-Support_StartupManager/revi...

Looking through the comments I find

StatupManager

New simple startUp shutDown protocol

classes implement #startUpPriority #shutDownPriority to register. Note: this code could be hosted on a different class, such as SmalltalkImage,

The startUp and shutdownLists are compiled by looking at all implementors of:' #startUpPriority and #shutDownPriority' Priorities are: #first #earliest #earlier #early #normal #late #later #latest #last'

As you can imagine this was very much a first attempt at resolving these problems.

Keith

1:32 a.m.

New subject: Lessons learned from (Re: Faster fibonacci)

On 30 Apr 2019, at 0:15, Keith wrote:

...

Years ago I was trying to rationalise all the various functionality of Smalltalk/SmalltalkImage across forks. I also wanted to support a scheme that would handle being bootstrapped from sources.

i.e. the startup process may load in or unload a number of classes from sources at an early stage, and when the time came to think about #startUp they may not have been #initialised, (the #startUp might be their initialisation).

So I vaguely recollect that when I had a go at re-doing the startuplist I removed this responsibility to a class of its own, called StartupManager, and removed the class var that holds the list in favour of a voting system.

I found some code hiding away (somewhat obfuscated)

https://bazaar.launchpad.net/~keithy/cuis/System-Support_StartupManager/revi...

Looking through the comments I find

StatupManager

New simple startUp shutDown protocol

classes implement #startUpPriority #shutDownPriority to register. Note: this code could be hosted on a different class, such as SmalltalkImage,

The startUp and shutdownLists are compiled by looking at all implementors of:' #startUpPriority and #shutDownPriority' Priorities are: #first #earliest #earlier #early #normal #late #later #latest #last'

As you can imagine this was very much a first attempt at resolving these problems.

Keith

One of the other goals was to support removable/loadable UI in the bootstrap process

I found this code in there, which indicates that the vote was essentially on a numeric 0-999 scale.

StartupManager class methodsFor: 'system-startup-shutdown' stamp: 'kph 2/4/2010 16:44'!"

#priorities

^ IdentityDictionary new

at: #first put: 0; "reserved for Delay" at: #earliest put: 10; at: #earlier put: 100; at: #early put: 200; at: #normal put: 500; at: #late put: 800; at: #later put: 900; at: #latest put: 990; at: #last put: 999; "reserved for Delay" yourself.

and some code, sorting on given priority value, followed by class name.

startUpList

| priorities list |

priorities := self class priorities. list := SortedCollection sortBlock: [ :a :b | a value = b value ifFalse: [ a value <= b value] ifTrue: [ a key name <= b key name ] ].'

self systemNavigation allClassesRespondingTo: #startUpPriority do: [ :cl | list add: (cl -> (priorities at: cl startUpPriority ifAbsent: [ cl startUpPriority ] )) ].

^ list

1556

Age (days ago)

1843

Last active (days ago)

squeak-dev@lists.squeakfoundation.org

35 comments

9 participants

tags (0)

participants (9)

David T. Lewis
Eliot Miranda
Fabio Niephaus
Keith
Levente Uzonyi
Marcel Taeumel
Nicolas Cellier
Phil B
tim Rowledge