Do we have a catalogue of best practices for writing efficient Squeak code?

List overview All Threads
Download

newer

older

The Inbox: Kernel-jar.1553.mcz

[ANN] Updated bundles for Squeak...

christoph.thiede＠student.hpi.uni-potsdam.de

10 Jan 2024 10 Jan '24

8:26 p.m.

Some things that come to my mind for optimizing your Smalltalk code:

* use MessageTally (tally it, #timeProfile, profiler in docking bar) to find bottlenecks w.r.t. speed * use #timeToRun or #bench[For:] to benchmark expressions * inline frequently executed methods (inspect the bytecode, avoid sending messages that are not inlined) * use quick return methods * avoid slow instructions (e.g., thisContext or repeated block closure creations) * be aware of slow primitives (e.g., #becomeForward: is pretty slow while #elementsForwardIdentityTo: is faster for bulk mutations) * use SpaceTally to find bottlenecks w.r.t. memory consumption * for collections of homogenous objects, use RawBitsArrays when available (as they optimize memory consumption and operations, e.g., thanks to the FloatArrayPlugin) * know the object layout in the VM (e.g., in the OSVM you get one instance variable for free) * avoid eager optimization

Surely this list is not comprehensive. But I was wondering, is there already a community document where we can collect and refine such tricks and insights? If not, what would be the right form for it? A wiki page, a chapter or book in the help browser, something else? And what would you add to this list? :-)

Best, Christoph

--- Sent from Squeak Inbox Talk

Show replies by date

Chris Muller

11 Jan 11 Jan

12:10 a.m.

Minimize creation of temporary objects.

On Wed, Jan 10, 2024 at 2:01 PM christoph.thiede@student.hpi.uni-potsdam.de wrote:

...

Some things that come to my mind for optimizing your Smalltalk code:

use MessageTally (tally it, #timeProfile, profiler in docking bar) to

find bottlenecks w.r.t. speed

use #timeToRun or #bench[For:] to benchmark expressions

inline frequently executed methods (inspect the bytecode, avoid sending

messages that are not inlined)

use quick return methods

avoid slow instructions (e.g., thisContext or repeated block closure

creations)

be aware of slow primitives (e.g., #becomeForward: is pretty slow while

#elementsForwardIdentityTo: is faster for bulk mutations)

use SpaceTally to find bottlenecks w.r.t. memory consumption

for collections of homogenous objects, use RawBitsArrays when available

(as they optimize memory consumption and operations, e.g., thanks to the FloatArrayPlugin)

know the object layout in the VM (e.g., in the OSVM you get one instance

variable for free)

avoid eager optimization

Surely this list is not comprehensive. But I was wondering, is there already a community document where we can collect and refine such tricks and insights? If not, what would be the right form for it? A wiki page, a chapter or book in the help browser, something else? And what would you add to this list? :-)

Best, Christoph

Sent from Squeak Inbox Talk

Thiede, Christoph

12:28 a.m.

Yes! Worth to note, this also includes blocks and thisContext.

And some others:

- cache results of common operations (e.g., using WeakIdentityKeyDictionary or MRUCache) - pooling: reuse objects instead of throwing them away to reduce GC load

For multiprocessing:

- run multiple network/OSProcess requests concurrently - to avoid frame rate drops, send Processor yield with a high frequency in long-running background operations (it only costs a couple of nanoseconds, which is as much as 2 regular message sends) ________________________________ Von: Chris Muller asqueaker@gmail.com Gesendet: Donnerstag, 11. Januar 2024 00:10 Uhr An: The general-purpose Squeak developers list squeak-dev@lists.squeakfoundation.org Betreff: [squeak-dev] Re: Do we have a catalogue of best practices for writing efficient Squeak code?

Minimize creation of temporary objects.

On Wed, Jan 10, 2024 at 2:01 PM <christoph.thiede@student.hpi.uni-potsdam.demailto:christoph.thiede@student.hpi.uni-potsdam.de> wrote: Some things that come to my mind for optimizing your Smalltalk code:

Best, Christoph

--- Sent from Squeak Inbox Talk

Chris Muller

13 Jan 13 Jan

3:45 a.m.

- Never intern Symbols boundlessly. Not even liberally, a compact SymbolTable is a fast SymbolTable. Faster comparison via #= is offset by slow #asSymbol lookup (+ potential table update). Regular Strings are usually sufficient for cases other than method selectors (which are already intern'd).

On Thu, Jan 11, 2024 at 8:47 AM Thiede, Christoph < Christoph.Thiede@student.hpi.uni-potsdam.de> wrote:

...

Yes! Worth to note, this also includes blocks and thisContext.

And some others:

cache results of common operations (e.g., using

WeakIdentityKeyDictionary or MRUCache)

pooling: reuse objects instead of throwing them away to reduce GC load

For multiprocessing:

run multiple network/OSProcess requests concurrently

to avoid frame rate drops, send Processor yield with a high frequency in

long-running background operations (it only costs a couple of nanoseconds, which is as much as 2 regular message sends)

*Von:* Chris Muller asqueaker@gmail.com *Gesendet:* Donnerstag, 11. Januar 2024 00:10 Uhr *An:* The general-purpose Squeak developers list < squeak-dev@lists.squeakfoundation.org> *Betreff:* [squeak-dev] Re: Do we have a catalogue of best practices for writing efficient Squeak code?

Minimize creation of temporary objects.

On Wed, Jan 10, 2024 at 2:01 PM < christoph.thiede@student.hpi.uni-potsdam.de> wrote:

Some things that come to my mind for optimizing your Smalltalk code:

use MessageTally (tally it, #timeProfile, profiler in docking bar) to

find bottlenecks w.r.t. speed

use #timeToRun or #bench[For:] to benchmark expressions

inline frequently executed methods (inspect the bytecode, avoid sending

messages that are not inlined)

use quick return methods

avoid slow instructions (e.g., thisContext or repeated block closure

creations)

be aware of slow primitives (e.g., #becomeForward: is pretty slow while

#elementsForwardIdentityTo: is faster for bulk mutations)

use SpaceTally to find bottlenecks w.r.t. memory consumption

for collections of homogenous objects, use RawBitsArrays when available

(as they optimize memory consumption and operations, e.g., thanks to the FloatArrayPlugin)

know the object layout in the VM (e.g., in the OSVM you get one instance

variable for free)

avoid eager optimization

Surely this list is not comprehensive. But I was wondering, is there already a community document where we can collect and refine such tricks and insights? If not, what would be the right form for it? A wiki page, a chapter or book in the help browser, something else? And what would you add to this list? :-)

Best, Christoph

Sent from Squeak Inbox Talk

christoph.thiede＠student.hpi.uni-potsdam.de

11:44 p.m.

Thanks for all your ideas! Couldn't agree more with Tim's notes on premature optimization. I have updated https://wiki.squeak.org/squeak/1799, plus a couple of related pages, to incorporate all our tips from here. Feedback will be appreciated. I hope this helps someone in the future.

...

don't keep recreating objects but also don't (as Chris says) create and hold on to too many temporaries

Some other finding I made today is that holding temporary variables in outer loops is actually *slower* than moving them into loop bodies. Not sure why, I would have rather expected the opposite:

array := (1 to: 1000) asArray. digits := (1 to: 10) asArray.

[array do: [:ea | digits do: [:digit | | sum | sum := digit + ea]]] bench. "--> '5,660 per second. 177 microseconds per run. 5.33893 % GC time.'"

[array do: [:ea | | sum | digits do: [:digit | sum := digit + ea]]] bench. "--> '5,300 per second. 189 microseconds per run. 6.7 % GC time.'"

Best, Christoph

--- Sent from Squeak Inbox Talk

On 2024-01-12T20:45:14-06:00, asqueaker@gmail.com wrote:

...

Never intern Symbols boundlessly. Not even liberally, a compact

SymbolTable is a fast SymbolTable. Faster comparison via #= is offset by slow #asSymbol lookup (+ potential table update). Regular Strings are usually sufficient for cases other than method selectors (which are already intern'd).

On Thu, Jan 11, 2024 at 8:47 AM Thiede, Christoph < Christoph.Thiede(a)student.hpi.uni-potsdam.de> wrote:

...
Yes! Worth to note, this also includes blocks and thisContext.

And some others:

cache results of common operations (e.g., using

WeakIdentityKeyDictionary or MRUCache)

pooling: reuse objects instead of throwing them away to reduce GC load

For multiprocessing:

run multiple network/OSProcess requests concurrently

to avoid frame rate drops, send Processor yield with a high frequency in

long-running background operations (it only costs a couple of nanoseconds, which is as much as 2 regular message sends)

*Von:* Chris Muller <asqueaker(a)gmail.com> *Gesendet:* Donnerstag, 11. Januar 2024 00:10 Uhr *An:* The general-purpose Squeak developers list < squeak-dev(a)lists.squeakfoundation.org> *Betreff:* [squeak-dev] Re: Do we have a catalogue of best practices for writing efficient Squeak code?

Minimize creation of temporary objects.

On Wed, Jan 10, 2024 at 2:01 PM < christoph.thiede(a)student.hpi.uni-potsdam.de> wrote:

Some things that come to my mind for optimizing your Smalltalk code:

use MessageTally (tally it, #timeProfile, profiler in docking bar) to

find bottlenecks w.r.t. speed

use #timeToRun or #bench[For:] to benchmark expressions

inline frequently executed methods (inspect the bytecode, avoid sending

messages that are not inlined)

use quick return methods

avoid slow instructions (e.g., thisContext or repeated block closure

creations)

be aware of slow primitives (e.g., #becomeForward: is pretty slow while

#elementsForwardIdentityTo: is faster for bulk mutations)

use SpaceTally to find bottlenecks w.r.t. memory consumption

for collections of homogenous objects, use RawBitsArrays when available

(as they optimize memory consumption and operations, e.g., thanks to the FloatArrayPlugin)

know the object layout in the VM (e.g., in the OSVM you get one instance

variable for free)

avoid eager optimization

Surely this list is not comprehensive. But I was wondering, is there already a community document where we can collect and refine such tricks and insights? If not, what would be the right form for it? A wiki page, a chapter or book in the help browser, something else? And what would you add to this list? :-)

Best, Christoph

Sent from Squeak Inbox Talk

Jakob Reschke

14 Jan 14 Jan

10:07 a.m.

Am Sa., 13. Jan. 2024 um 23:45 Uhr schrieb christoph.thiede@student.hpi.uni-potsdam.de:

...

Some other finding I made today is that holding temporary variables in outer loops is actually *slower* than moving them into loop bodies. Not sure why, I would have rather expected the opposite:

Does it compare in the same way in an interpreter or stack VM, or is this the JIT compiler's work?

christoph.thiede＠student.hpi.uni-potsdam.de

5:51 p.m.

On 2024-01-14T10:07:06+01:00, jakres+squeak@gmail.com wrote:

...

Am Sa., 13. Jan. 2024 um 23:45 Uhr schrieb <christoph.thiede(a)student.hpi.uni-potsdam.de>:

...
Some other finding I made today is that holding temporary variables in outer loops is actually *slower* than moving them into loop bodies. Not sure why, I would have rather expected the opposite:

Does it compare in the same way in an interpreter or stack VM, or is this the JIT compiler's work?

For the Stack VM differences are marginal (503 vs 507 µs), and if I add a second temporary variable to the outer block in the Stack VM, the second version is even faster. Not sure how to get a build of the interpreter VM. Is it squeak.sista.spur?

--- Sent from Squeak Inbox Talk

lewis＠mail.msen.com

9:50 p.m.

On 2024-01-14 16:51, christoph.thiede@student.hpi.uni-potsdam.de wrote:

...

On 2024-01-14T10:07:06+01:00, jakres+squeak@gmail.com wrote:

...
Am Sa., 13. Jan. 2024 um 23:45 Uhr schrieb <christoph.thiede(a)student.hpi.uni-potsdam.de>:

...
Some other finding I made today is that holding temporary variables in outer loops is actually *slower* than moving them into loop bodies. Not sure why, I would have rather expected the opposite:

Does it compare in the same way in an interpreter or stack VM, or is this the JIT compiler's work?

For the Stack VM differences are marginal (503 vs 507 µs), and if I add a second temporary variable to the outer block in the Stack VM, the second version is even faster. Not sure how to get a build of the interpreter VM. Is it squeak.sista.spur?

For purposes of this discussion, you are doing the right thing by comparing a Cog/Spur VM to a Stack/Spur VM. One has a JIT and the other does not, but otherwise they are equivalent.

That said, here is what I get with a classic interpreter VM running a trunk level image in V3 image format:

array := (1 to: 1000) asArray. digits := (1 to: 10) asArray.

[array do: [:ea | digits do: [:digit | | sum | sum := digit + ea]]] bench. '1,800 per second. 557 microseconds per run. ==> 11.00368 % GC time.'

[array do: [:ea | | sum | digits do: [:digit | sum := digit + ea]]] bench. '1,990 per second. 504 microseconds per run. 12.69604 % GC time.'

Note that you can also run tests like this on SqueakJS which might also provide some interesting insights. Here is what I get on Chrome running https://squeak.js.org/run/#zip=https://files.squeak.org/6.0/Squeak6.0-22104-...

array := (1 to: 1000) asArray. digits := (1 to: 10) asArray.

[array do: [:ea | digits do: [:digit | | sum | sum := digit + ea]]] bench. '31.2 per second. ==> 32.1 milliseconds per run. 0 % GC time.'

[array do: [:ea | | sum | digits do: [:digit | sum := digit + ea]]] bench. '31.6 per second. ==> 31.6 milliseconds per run. 0 % GC time.' sum := digit + ea]]] bench. '1,990 per second. 504 microseconds per run. ==> 12.69604 % GC time.'

Regarding how to build an interpreter VM, the instructions are here:

https://wiki.squeak.org/squeak/6354

This VM works with the old V3 image format only, so you cannot run a Squeak trunk image on it. If you want a "trunk equivalent" V3 image, I can share one with you privately.

Dave

Tim Rowledge

11 Jan 11 Jan

12:44 a.m.

...

On 2024-01-10, at 11:26 AM, christoph.thiede@student.hpi.uni-potsdam.de christoph.thiede@student.hpi.uni-potsdam.de wrote:

Some things that come to my mind for optimizing your Smalltalk code:

General advice, so old I think it must come from Cronos, or at least Alan Perlis

Rules for optimising 1) Don't 2) (for experts only) Don't *yet*

Oh, wait, Princeton claims that came from a Micheal Jackson. Probably not the one you're thinking.

An actual Perlism is "Optimization hinders evolution". Another important warning is "Debugging a program is twice as hard as writing it in the first place. So, by definition, if you write the program as cleverly as you can, you will not be able to debug it." (Probably Kernighan)

Another important koan is "Don't diddle the code, find a better algorithm".

In the Smalltalk world one could add some thoughts on taking proper advantage of the system;

- don't test for 'types'; the VM is quite good at doing that part. - don't keep recreating objects but also don't (as Chris says) create and hold on to too many temporaries - never use #isKindOf:, especially in loops, with possible exceptions in meta-pgrogramming situations.[1] - document what the code was *supposed* to do. Code tells you what it does. The two are far too rarely the same. Remember to optimise the time spent by later readers. They might be your boss...

tim [1] getting rid of a staggering collection of misuses of #isKindOf: in the original Scratch code allowed me to improve script running performance by about an order of magnitude -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: MC: Melt down Core

lewis＠mail.msen.com

8:21 p.m.

I would say that reading Tim's email below at least two full times end to end qualifies as a best practice.

Dave

On 2024-01-10 23:44, Tim Rowledge wrote:.

...

...
On 2024-01-10, at 11:26 AM, christoph.thiede@student.hpi.uni-potsdam.de christoph.thiede@student.hpi.uni-potsdam.de wrote:

Some things that come to my mind for optimizing your Smalltalk code:

General advice, so old I think it must come from Cronos, or at least Alan Perlis

Rules for optimising

Don't

(for experts only) Don't *yet*

Oh, wait, Princeton claims that came from a Micheal Jackson. Probably not the one you're thinking.

An actual Perlism is "Optimization hinders evolution". Another important warning is "Debugging a program is twice as hard as writing it in the first place. So, by definition, if you write the program as cleverly as you can, you will not be able to debug it." (Probably Kernighan)

Another important koan is "Don't diddle the code, find a better algorithm".

In the Smalltalk world one could add some thoughts on taking proper advantage of the system;

don't test for 'types'; the VM is quite good at doing that part.

don't keep recreating objects but also don't (as Chris says) create

and hold on to too many temporaries

never use #isKindOf:, especially in loops, with possible exceptions

in meta-pgrogramming situations.[1]

document what the code was *supposed* to do. Code tells you what it

does. The two are far too rarely the same. Remember to optimise the time spent by later readers. They might be your boss...

tim [1] getting rid of a staggering collection of misuses of #isKindOf: in the original Scratch code allowed me to improve script running performance by about an order of magnitude -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: MC: Melt down Core

Rachel Crowther

18 Jan 18 Jan

12:47 p.m.

On Wed, 10 Jan 2024 at 23:44, Tim Rowledge tim@rowledge.org wrote: [...]

...

document what the code was *supposed* to do. Code tells you what it

does. The two are far too rarely the same. Remember to optimise the time spent by later readers. They might be your boss...

Worse, they might be you. I tend to document *why* rather than *what*.

One key question is "what are you optimising for - minimum runtime cost, minimum development cost, minimum maintenance cost (over what expected lifetime?), or minimum total cost (over how many running instances?)?" Until you answer that question, there's no point optimising for performance; if you're making one of something and costing your time properly, hardware is almost certainly cheaper than brainpower. If you're making a billion of something, spend the brainpower - unless you're Microsoft, of course, in which case minimising development cost is far more important than minimising total cost as they don't pay for customers' hardware upgrades.

Take care,

Rachel

Stéphane Rollandin

11 Jan 11 Jan

11:51 a.m.

...

And what would you add to this list? :-)

Think DSL: do not use common methods that seem elementary, when there can be a faster alternative in the context of your domain.

For example, I found useful to define

Point>>mult: aNumber ^ (x * aNumber) @ (y * aNumber)

When you know that aNumber is indeed a Number, #mult: is much faster than #*

Stef

123

Age (days ago)

131

Last active (days ago)

squeak-dev@lists.squeakfoundation.org

11 comments

8 participants

tags (0)

participants (8)

Chris Muller
christoph.thiede＠student.hpi.uni-potsdam.de
Jakob Reschke
lewis＠mail.msen.com
Rachel Crowther
Stéphane Rollandin
Thiede, Christoph
Tim Rowledge