Some things that come to my mind for optimizing your Smalltalk code:
* use MessageTally (tally it, #timeProfile, profiler in docking bar) to find bottlenecks w.r.t. speed * use #timeToRun or #bench[For:] to benchmark expressions * inline frequently executed methods (inspect the bytecode, avoid sending messages that are not inlined) * use quick return methods * avoid slow instructions (e.g., thisContext or repeated block closure creations) * be aware of slow primitives (e.g., #becomeForward: is pretty slow while #elementsForwardIdentityTo: is faster for bulk mutations) * use SpaceTally to find bottlenecks w.r.t. memory consumption * for collections of homogenous objects, use RawBitsArrays when available (as they optimize memory consumption and operations, e.g., thanks to the FloatArrayPlugin) * know the object layout in the VM (e.g., in the OSVM you get one instance variable for free) * avoid eager optimization
Surely this list is not comprehensive. But I was wondering, is there already a community document where we can collect and refine such tricks and insights? If not, what would be the right form for it? A wiki page, a chapter or book in the help browser, something else? And what would you add to this list? :-)
Best, Christoph
--- Sent from Squeak Inbox Talk
Minimize creation of temporary objects.
On Wed, Jan 10, 2024 at 2:01 PM christoph.thiede@student.hpi.uni-potsdam.de wrote:
Some things that come to my mind for optimizing your Smalltalk code:
- use MessageTally (tally it, #timeProfile, profiler in docking bar) to
find bottlenecks w.r.t. speed
- use #timeToRun or #bench[For:] to benchmark expressions
- inline frequently executed methods (inspect the bytecode, avoid sending
messages that are not inlined)
- use quick return methods
- avoid slow instructions (e.g., thisContext or repeated block closure
creations)
- be aware of slow primitives (e.g., #becomeForward: is pretty slow while
#elementsForwardIdentityTo: is faster for bulk mutations)
- use SpaceTally to find bottlenecks w.r.t. memory consumption
- for collections of homogenous objects, use RawBitsArrays when available
(as they optimize memory consumption and operations, e.g., thanks to the FloatArrayPlugin)
- know the object layout in the VM (e.g., in the OSVM you get one instance
variable for free)
- avoid eager optimization
Surely this list is not comprehensive. But I was wondering, is there already a community document where we can collect and refine such tricks and insights? If not, what would be the right form for it? A wiki page, a chapter or book in the help browser, something else? And what would you add to this list? :-)
Best, Christoph
Sent from Squeak Inbox Talk
Yes! Worth to note, this also includes blocks and thisContext.
And some others:
- cache results of common operations (e.g., using WeakIdentityKeyDictionary or MRUCache) - pooling: reuse objects instead of throwing them away to reduce GC load
For multiprocessing:
- run multiple network/OSProcess requests concurrently - to avoid frame rate drops, send Processor yield with a high frequency in long-running background operations (it only costs a couple of nanoseconds, which is as much as 2 regular message sends) ________________________________ Von: Chris Muller asqueaker@gmail.com Gesendet: Donnerstag, 11. Januar 2024 00:10 Uhr An: The general-purpose Squeak developers list squeak-dev@lists.squeakfoundation.org Betreff: [squeak-dev] Re: Do we have a catalogue of best practices for writing efficient Squeak code?
Minimize creation of temporary objects.
On Wed, Jan 10, 2024 at 2:01 PM <christoph.thiede@student.hpi.uni-potsdam.demailto:christoph.thiede@student.hpi.uni-potsdam.de> wrote: Some things that come to my mind for optimizing your Smalltalk code:
* use MessageTally (tally it, #timeProfile, profiler in docking bar) to find bottlenecks w.r.t. speed * use #timeToRun or #bench[For:] to benchmark expressions * inline frequently executed methods (inspect the bytecode, avoid sending messages that are not inlined) * use quick return methods * avoid slow instructions (e.g., thisContext or repeated block closure creations) * be aware of slow primitives (e.g., #becomeForward: is pretty slow while #elementsForwardIdentityTo: is faster for bulk mutations) * use SpaceTally to find bottlenecks w.r.t. memory consumption * for collections of homogenous objects, use RawBitsArrays when available (as they optimize memory consumption and operations, e.g., thanks to the FloatArrayPlugin) * know the object layout in the VM (e.g., in the OSVM you get one instance variable for free) * avoid eager optimization
Surely this list is not comprehensive. But I was wondering, is there already a community document where we can collect and refine such tricks and insights? If not, what would be the right form for it? A wiki page, a chapter or book in the help browser, something else? And what would you add to this list? :-)
Best, Christoph
--- Sent from Squeak Inbox Talk
- Never intern Symbols boundlessly. Not even liberally, a compact SymbolTable is a fast SymbolTable. Faster comparison via #= is offset by slow #asSymbol lookup (+ potential table update). Regular Strings are usually sufficient for cases other than method selectors (which are already intern'd).
On Thu, Jan 11, 2024 at 8:47 AM Thiede, Christoph < Christoph.Thiede@student.hpi.uni-potsdam.de> wrote:
Yes! Worth to note, this also includes blocks and thisContext.
And some others:
- cache results of common operations (e.g., using
WeakIdentityKeyDictionary or MRUCache)
- pooling: reuse objects instead of throwing them away to reduce GC load
For multiprocessing:
- run multiple network/OSProcess requests concurrently
- to avoid frame rate drops, send Processor yield with a high frequency in
long-running background operations (it only costs a couple of nanoseconds, which is as much as 2 regular message sends)
*Von:* Chris Muller asqueaker@gmail.com *Gesendet:* Donnerstag, 11. Januar 2024 00:10 Uhr *An:* The general-purpose Squeak developers list < squeak-dev@lists.squeakfoundation.org> *Betreff:* [squeak-dev] Re: Do we have a catalogue of best practices for writing efficient Squeak code?
Minimize creation of temporary objects.
On Wed, Jan 10, 2024 at 2:01 PM < christoph.thiede@student.hpi.uni-potsdam.de> wrote:
Some things that come to my mind for optimizing your Smalltalk code:
- use MessageTally (tally it, #timeProfile, profiler in docking bar) to
find bottlenecks w.r.t. speed
- use #timeToRun or #bench[For:] to benchmark expressions
- inline frequently executed methods (inspect the bytecode, avoid sending
messages that are not inlined)
- use quick return methods
- avoid slow instructions (e.g., thisContext or repeated block closure
creations)
- be aware of slow primitives (e.g., #becomeForward: is pretty slow while
#elementsForwardIdentityTo: is faster for bulk mutations)
- use SpaceTally to find bottlenecks w.r.t. memory consumption
- for collections of homogenous objects, use RawBitsArrays when available
(as they optimize memory consumption and operations, e.g., thanks to the FloatArrayPlugin)
- know the object layout in the VM (e.g., in the OSVM you get one instance
variable for free)
- avoid eager optimization
Surely this list is not comprehensive. But I was wondering, is there already a community document where we can collect and refine such tricks and insights? If not, what would be the right form for it? A wiki page, a chapter or book in the help browser, something else? And what would you add to this list? :-)
Best, Christoph
Sent from Squeak Inbox Talk
Thanks for all your ideas! Couldn't agree more with Tim's notes on premature optimization. I have updated https://wiki.squeak.org/squeak/1799, plus a couple of related pages, to incorporate all our tips from here. Feedback will be appreciated. I hope this helps someone in the future.
- don't keep recreating objects but also don't (as Chris says) create and hold on to too many temporaries
Some other finding I made today is that holding temporary variables in outer loops is actually *slower* than moving them into loop bodies. Not sure why, I would have rather expected the opposite:
array := (1 to: 1000) asArray. digits := (1 to: 10) asArray.
[array do: [:ea | digits do: [:digit | | sum | sum := digit + ea]]] bench. "--> '5,660 per second. 177 microseconds per run. 5.33893 % GC time.'"
[array do: [:ea | | sum | digits do: [:digit | sum := digit + ea]]] bench. "--> '5,300 per second. 189 microseconds per run. 6.7 % GC time.'"
Best, Christoph
--- Sent from Squeak Inbox Talk
On 2024-01-12T20:45:14-06:00, asqueaker@gmail.com wrote:
- Never intern Symbols boundlessly. Not even liberally, a compact
SymbolTable is a fast SymbolTable. Faster comparison via #= is offset by slow #asSymbol lookup (+ potential table update). Regular Strings are usually sufficient for cases other than method selectors (which are already intern'd).
On Thu, Jan 11, 2024 at 8:47 AM Thiede, Christoph < Christoph.Thiede(a)student.hpi.uni-potsdam.de> wrote:
Yes! Worth to note, this also includes blocks and thisContext.
And some others:
- cache results of common operations (e.g., using
WeakIdentityKeyDictionary or MRUCache)
- pooling: reuse objects instead of throwing them away to reduce GC load
For multiprocessing:
- run multiple network/OSProcess requests concurrently
- to avoid frame rate drops, send Processor yield with a high frequency in
long-running background operations (it only costs a couple of nanoseconds, which is as much as 2 regular message sends)
*Von:* Chris Muller <asqueaker(a)gmail.com> *Gesendet:* Donnerstag, 11. Januar 2024 00:10 Uhr *An:* The general-purpose Squeak developers list < squeak-dev(a)lists.squeakfoundation.org> *Betreff:* [squeak-dev] Re: Do we have a catalogue of best practices for writing efficient Squeak code?
Minimize creation of temporary objects.
On Wed, Jan 10, 2024 at 2:01 PM < christoph.thiede(a)student.hpi.uni-potsdam.de> wrote:
Some things that come to my mind for optimizing your Smalltalk code:
- use MessageTally (tally it, #timeProfile, profiler in docking bar) to
find bottlenecks w.r.t. speed
- use #timeToRun or #bench[For:] to benchmark expressions
- inline frequently executed methods (inspect the bytecode, avoid sending
messages that are not inlined)
- use quick return methods
- avoid slow instructions (e.g., thisContext or repeated block closure
creations)
- be aware of slow primitives (e.g., #becomeForward: is pretty slow while
#elementsForwardIdentityTo: is faster for bulk mutations)
- use SpaceTally to find bottlenecks w.r.t. memory consumption
- for collections of homogenous objects, use RawBitsArrays when available
(as they optimize memory consumption and operations, e.g., thanks to the FloatArrayPlugin)
- know the object layout in the VM (e.g., in the OSVM you get one instance
variable for free)
- avoid eager optimization
Surely this list is not comprehensive. But I was wondering, is there already a community document where we can collect and refine such tricks and insights? If not, what would be the right form for it? A wiki page, a chapter or book in the help browser, something else? And what would you add to this list? :-)
Best, Christoph
Sent from Squeak Inbox Talk
Am Sa., 13. Jan. 2024 um 23:45 Uhr schrieb christoph.thiede@student.hpi.uni-potsdam.de:
Some other finding I made today is that holding temporary variables in outer loops is actually *slower* than moving them into loop bodies. Not sure why, I would have rather expected the opposite:
Does it compare in the same way in an interpreter or stack VM, or is this the JIT compiler's work?
On 2024-01-14T10:07:06+01:00, jakres+squeak@gmail.com wrote:
Am Sa., 13. Jan. 2024 um 23:45 Uhr schrieb <christoph.thiede(a)student.hpi.uni-potsdam.de>:
Some other finding I made today is that holding temporary variables in outer loops is actually *slower* than moving them into loop bodies. Not sure why, I would have rather expected the opposite:
Does it compare in the same way in an interpreter or stack VM, or is this the JIT compiler's work?
For the Stack VM differences are marginal (503 vs 507 µs), and if I add a second temporary variable to the outer block in the Stack VM, the second version is even faster. Not sure how to get a build of the interpreter VM. Is it squeak.sista.spur?
--- Sent from Squeak Inbox Talk
On 2024-01-14 16:51, christoph.thiede@student.hpi.uni-potsdam.de wrote:
On 2024-01-14T10:07:06+01:00, jakres+squeak@gmail.com wrote:
Am Sa., 13. Jan. 2024 um 23:45 Uhr schrieb <christoph.thiede(a)student.hpi.uni-potsdam.de>:
Some other finding I made today is that holding temporary variables in outer loops is actually *slower* than moving them into loop bodies. Not sure why, I would have rather expected the opposite:
Does it compare in the same way in an interpreter or stack VM, or is this the JIT compiler's work?
For the Stack VM differences are marginal (503 vs 507 µs), and if I add a second temporary variable to the outer block in the Stack VM, the second version is even faster. Not sure how to get a build of the interpreter VM. Is it squeak.sista.spur?
For purposes of this discussion, you are doing the right thing by comparing a Cog/Spur VM to a Stack/Spur VM. One has a JIT and the other does not, but otherwise they are equivalent.
That said, here is what I get with a classic interpreter VM running a trunk level image in V3 image format:
array := (1 to: 1000) asArray. digits := (1 to: 10) asArray.
[array do: [:ea | digits do: [:digit | | sum | sum := digit + ea]]] bench. '1,800 per second. 557 microseconds per run. ==> 11.00368 % GC time.'
[array do: [:ea | | sum | digits do: [:digit | sum := digit + ea]]] bench. '1,990 per second. 504 microseconds per run. 12.69604 % GC time.'
Note that you can also run tests like this on SqueakJS which might also provide some interesting insights. Here is what I get on Chrome running https://squeak.js.org/run/#zip=https://files.squeak.org/6.0/Squeak6.0-22104-...
array := (1 to: 1000) asArray. digits := (1 to: 10) asArray.
[array do: [:ea | digits do: [:digit | | sum | sum := digit + ea]]] bench. '31.2 per second. ==> 32.1 milliseconds per run. 0 % GC time.'
[array do: [:ea | | sum | digits do: [:digit | sum := digit + ea]]] bench. '31.6 per second. ==> 31.6 milliseconds per run. 0 % GC time.' sum := digit + ea]]] bench. '1,990 per second. 504 microseconds per run. ==> 12.69604 % GC time.'
Regarding how to build an interpreter VM, the instructions are here:
https://wiki.squeak.org/squeak/6354
This VM works with the old V3 image format only, so you cannot run a Squeak trunk image on it. If you want a "trunk equivalent" V3 image, I can share one with you privately.
Dave
On 2024-01-10, at 11:26 AM, christoph.thiede@student.hpi.uni-potsdam.de christoph.thiede@student.hpi.uni-potsdam.de wrote:
Some things that come to my mind for optimizing your Smalltalk code:
General advice, so old I think it must come from Cronos, or at least Alan Perlis
Rules for optimising 1) Don't 2) (for experts only) Don't *yet*
Oh, wait, Princeton claims that came from a Micheal Jackson. Probably not the one you're thinking.
An actual Perlism is "Optimization hinders evolution". Another important warning is "Debugging a program is twice as hard as writing it in the first place. So, by definition, if you write the program as cleverly as you can, you will not be able to debug it." (Probably Kernighan)
Another important koan is "Don't diddle the code, find a better algorithm".
In the Smalltalk world one could add some thoughts on taking proper advantage of the system;
- don't test for 'types'; the VM is quite good at doing that part. - don't keep recreating objects but also don't (as Chris says) create and hold on to too many temporaries - never use #isKindOf:, especially in loops, with possible exceptions in meta-pgrogramming situations.[1] - document what the code was *supposed* to do. Code tells you what it does. The two are far too rarely the same. Remember to optimise the time spent by later readers. They might be your boss...
tim [1] getting rid of a staggering collection of misuses of #isKindOf: in the original Scratch code allowed me to improve script running performance by about an order of magnitude -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: MC: Melt down Core
I would say that reading Tim's email below at least two full times end to end qualifies as a best practice.
Dave
On 2024-01-10 23:44, Tim Rowledge wrote:.
On 2024-01-10, at 11:26 AM, christoph.thiede@student.hpi.uni-potsdam.de christoph.thiede@student.hpi.uni-potsdam.de wrote:
Some things that come to my mind for optimizing your Smalltalk code:
General advice, so old I think it must come from Cronos, or at least Alan Perlis
Rules for optimising
- Don't
- (for experts only) Don't *yet*
Oh, wait, Princeton claims that came from a Micheal Jackson. Probably not the one you're thinking.
An actual Perlism is "Optimization hinders evolution". Another important warning is "Debugging a program is twice as hard as writing it in the first place. So, by definition, if you write the program as cleverly as you can, you will not be able to debug it." (Probably Kernighan)
Another important koan is "Don't diddle the code, find a better algorithm".
In the Smalltalk world one could add some thoughts on taking proper advantage of the system;
- don't test for 'types'; the VM is quite good at doing that part.
- don't keep recreating objects but also don't (as Chris says) create
and hold on to too many temporaries
- never use #isKindOf:, especially in loops, with possible exceptions
in meta-pgrogramming situations.[1]
- document what the code was *supposed* to do. Code tells you what it
does. The two are far too rarely the same. Remember to optimise the time spent by later readers. They might be your boss...
tim [1] getting rid of a staggering collection of misuses of #isKindOf: in the original Scratch code allowed me to improve script running performance by about an order of magnitude -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Strange OpCodes: MC: Melt down Core
On Wed, 10 Jan 2024 at 23:44, Tim Rowledge tim@rowledge.org wrote: [...]
- document what the code was *supposed* to do. Code tells you what it
does. The two are far too rarely the same. Remember to optimise the time spent by later readers. They might be your boss...
Worse, they might be you. I tend to document *why* rather than *what*.
One key question is "what are you optimising for - minimum runtime cost, minimum development cost, minimum maintenance cost (over what expected lifetime?), or minimum total cost (over how many running instances?)?" Until you answer that question, there's no point optimising for performance; if you're making one of something and costing your time properly, hardware is almost certainly cheaper than brainpower. If you're making a billion of something, spend the brainpower - unless you're Microsoft, of course, in which case minimising development cost is far more important than minimising total cost as they don't pay for customers' hardware upgrades.
Take care,
Rachel
And what would you add to this list? :-)
Think DSL: do not use common methods that seem elementary, when there can be a faster alternative in the context of your domain.
For example, I found useful to define
Point>>mult: aNumber ^ (x * aNumber) @ (y * aNumber)
When you know that aNumber is indeed a Number, #mult: is much faster than #*
Stef
squeak-dev@lists.squeakfoundation.org