Hi.
I compared performance between object instantiation and object cloning. I was wondering that instantiation almost twice faster than clone (primitive 70 vs 148).
Could you explain why it like that and could it be improved?
I was think that new object construction is much complex because it requires to fill all object fields (header structure and etc). And I was think that copy is just simple function like memcpy which just copy bytes without any logic.
Here is my code:
object := Object new. 3 timesRepeat: [ Smalltalk garbageCollect ]. result1 := [ Object basicNew ] benchFor: 10 seconds. 3 timesRepeat: [ Smalltalk garbageCollect ]. result2 := [ object shallowCopy ] benchFor: 10 seconds. {result1. result2}. "an Array(a BenchmarkResult(518,021,045 iterations in 10 seconds 2 milliseconds. 51,791,746 per second) a BenchmarkResult(302,807,253 iterations in 10 seconds 4 milliseconds. 30,268,618 per second))"
(I run it on latest Pharo on Mac SpurVM)
Best regards, Denis
Hi Denis,
the difference us because basicNew[:] are implemented with machine code primitives whereas shallowCopy is an interpreter primitive and machine code primitives are much faster to invoke. I could add a machine code shallowCopy primitive that would handle common cases and exclude the complex ones (CompiledMethod and Context, because they contain hidden JIT state that must not be copied). How important is shallowCopy performance to you?
_,,,^..^,,,_ (phone)
On Jul 29, 2016, at 2:02 AM, Denis Kudriashov dionisiydk@gmail.com wrote:
Hi.
I compared performance between object instantiation and object cloning. I was wondering that instantiation almost twice faster than clone (primitive 70 vs 148).
Could you explain why it like that and could it be improved?
I was think that new object construction is much complex because it requires to fill all object fields (header structure and etc). And I was think that copy is just simple function like memcpy which just copy bytes without any logic.
Here is my code:
object := Object new. 3 timesRepeat: [ Smalltalk garbageCollect ]. result1 := [ Object basicNew ] benchFor: 10 seconds. 3 timesRepeat: [ Smalltalk garbageCollect ]. result2 := [ object shallowCopy ] benchFor: 10 seconds. {result1. result2}. "an Array(a BenchmarkResult(518,021,045 iterations in 10 seconds 2 milliseconds. 51,791,746 per second) a BenchmarkResult(302,807,253 iterations in 10 seconds 4 milliseconds. 30,268,618 per second))"
(I run it on latest Pharo on Mac SpurVM)
Best regards, Denis
BTW, to better compare the two, unroll the inner loop ten times, put Object in a temp to eliminate the indirection, and use explicit temps, i.e.
| o object | o := Object new. object := Object. 3 timesRepeat: [ Smalltalk garbageCollect ]. result1 := [ object basicNew. object basicNew. object basicNew. object basicNew. object basicNew. object basicNew. object basicNew. object basicNew. object basicNew. object basicNew ] benchFor: 10 seconds. 3 timesRepeat: [ Smalltalk garbageCollect ]. result2 := [ o shallowCopy. o shallowCopy. o shallowCopy. o shallowCopy. o shallowCopy. o shallowCopy. o shallowCopy. o shallowCopy. o shallowCopy. o shallowCopy ] benchFor: 10 seconds. {result1. result2}.
_,,,^..^,,,_ (phone)
On Jul 29, 2016, at 2:02 AM, Denis Kudriashov dionisiydk@gmail.com wrote:
Hi.
I compared performance between object instantiation and object cloning. I was wondering that instantiation almost twice faster than clone (primitive 70 vs 148).
Could you explain why it like that and could it be improved?
I was think that new object construction is much complex because it requires to fill all object fields (header structure and etc). And I was think that copy is just simple function like memcpy which just copy bytes without any logic.
Here is my code:
object := Object new. 3 timesRepeat: [ Smalltalk garbageCollect ]. result1 := [ Object basicNew ] benchFor: 10 seconds. 3 timesRepeat: [ Smalltalk garbageCollect ]. result2 := [ object shallowCopy ] benchFor: 10 seconds. {result1. result2}. "an Array(a BenchmarkResult(518,021,045 iterations in 10 seconds 2 milliseconds. 51,791,746 per second) a BenchmarkResult(302,807,253 iterations in 10 seconds 4 milliseconds. 30,268,618 per second))"
(I run it on latest Pharo on Mac SpurVM)
Best regards, Denis
Hi Eliot,
2016-07-30 7:48 GMT+02:00 Eliot Miranda eliot.miranda@gmail.com:
the difference us because basicNew[:] are implemented with machine
code primitives whereas shallowCopy is an interpreter primitive and machine code primitives are much faster to invoke. I could add a machine code shallowCopy primitive that would handle common cases and exclude the complex ones (CompiledMethod and Context, because they contain hidden JIT state that must not be copied). How important is shallowCopy performance to you?
I would not say that it is too much critical but it could improve some prototype based frameworks where objects are created by cloning.
What you expect from optimized version performance? Will it be faster than #basicNew?
It can't be quicker than basicNew. It will be quicker than basicNew + manual copying of the fields. And yes, this is quite important to be quick along with the #copyFrom: primitive (168).
Levente
Hi Denis, Hi Levente,
On Jul 31, 2016, at 3:41 AM, Levente Uzonyi leves@caesar.elte.hu wrote:
It can't be quicker than basicNew.
Depends. Decoding size and encoding header from the class receiver in basicNew might be slower than decoding size and encoding header from the instance receiver in shallowCopy. But the difference should be small.
It will be quicker than basicNew + manual copying of the fields.
Agreed.
And yes, this is quite important to be quick along with the #copyFrom: primitive (168).
It's on the to do list then :-)
Levente
_,,,^..^,,,_ (phone)
So has anyone thought about pre-allocating a few commonly used objects during idle time? Then grabbing that object and filling in the actual details when one is needed. The Squeak VM did that for method context (and recycled them).
Also a decade back I looked at the new logic and rearranged the code to fast path the creation of commonly used objects. This improved things a bit but didn't really outweight the resulting mess of if statements
Sent from my iPhone
On Jul 31, 2016, at 08:45, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Denis, Hi Levente,
On Jul 31, 2016, at 3:41 AM, Levente Uzonyi leves@caesar.elte.hu wrote:
It can't be quicker than basicNew.
Depends. Decoding size and encoding header from the class receiver in basicNew might be slower than decoding size and encoding header from the instance receiver in shallowCopy. But the difference should be small.
It will be quicker than basicNew + manual copying of the fields.
Agreed.
And yes, this is quite important to be quick along with the #copyFrom: primitive (168).
It's on the to do list then :-)
Levente
_,,,^..^,,,_ (phone)
Hi John,
On Jul 31, 2016, at 7:32 AM, John McIntosh johnmci@smalltalkconsulting.com wrote:
So has anyone thought about pre-allocating a few commonly used objects during idle time? Then grabbing that object and filling in the actual details when one is needed. The Squeak VM did that for method context (and recycled them).
Thus only works to the extent that a particular kind if object us allocated all the time and to the extent that synthesizing an object is relatively expensive relative to filling in its slots. This works for contexts above a complex object representation. In my experience it doesn't work for floats either. But Cog does not allocate contexts often because of context-to-stack mapping and because the Spur object representation is simple, regular and very quick to synthesize (allocate).
Also a decade back I looked at the new logic and rearranged the code to fast path the creation of commonly used objects. This improved things a bit but didn't really outweight the resulting mess of if statements
Sent from my iPhone
On Jul 31, 2016, at 08:45, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Denis, Hi Levente,
On Jul 31, 2016, at 3:41 AM, Levente Uzonyi leves@caesar.elte.hu wrote:
It can't be quicker than basicNew.
Depends. Decoding size and encoding header from the class receiver in basicNew might be slower than decoding size and encoding header from the instance receiver in shallowCopy. But the difference should be small.
It will be quicker than basicNew + manual copying of the fields.
Agreed.
And yes, this is quite important to be quick along with the #copyFrom: primitive (168).
It's on the to do list then :-)
Levente
_,,,^..^,,,_ (phone)
On 01-08-2016, at 12:58 AM, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi John,
On Jul 31, 2016, at 7:32 AM, John McIntosh johnmci@smalltalkconsulting.com wrote:
So has anyone thought about pre-allocating a few commonly used objects during idle time? Then grabbing that object and filling in the actual details when one is needed. The Squeak VM did that for method context (and recycled them).
Thus only works to the extent that a particular kind if object us allocated all the time and to the extent that synthesizing an object is relatively expensive relative to filling in its slots. This works for contexts above a complex object representation. In my experience it doesn't work for floats either. But Cog does not allocate contexts often because of context-to-stack mapping and because the Spur object representation is simple, regular and very quick to synthesize (allocate).
Yeah but with a 64-bit address space we could pre-allocate *millions* of objects of every class! And speculatively initialise them!
Oh, wait; what do you mean we don’t have 17,592,186,044,416Mb ram in our Raspberry Pi’s yet?
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- A one-bit brain with a parity error.
Hi. I measure latest VM with shallowCopy support. Copy now is much better:
object := 10@20. 3 timesRepeat: [ Smalltalk garbageCollect ]. result1 := [ Point x: 10 y: 20 ] benchFor: 10 seconds.
3 timesRepeat: [ Smalltalk garbageCollect ]. result2 := [ object shallowCopy ] benchFor: 10 seconds. {result1. result2}.
"a BenchmarkResult(310,321,301 iterations in 10 seconds 2 milliseconds. 31,025,925 per second) a BenchmarkResult(426,311,468 iterations in 10 seconds 3 milliseconds. 42,618,361 per second)"
But with "Point basicNew" it is almost same:
"a BenchmarkResult(402,708,088 iterations in 10 seconds 2 milliseconds. 40,262,756 per second) a BenchmarkResult(405,145,766 iterations in 10 seconds 3 milliseconds. 40,502,426 per second)"
It also improves my veryDeepCopy test which works better on preSpur VM. (preSpur VM is still better):
m := Morph new. r2 := [ m veryDeepCopy ] benchFor: 10 seconds.
"a BenchmarkResult(34,007 iterations in 10 seconds 3 milliseconds. 3,400 per second)" - no shallow copy optimization
"a BenchmarkResult(43,333 iterations in 10 seconds 1 millisecond. 4,333 per second)" - latest VM with shallow copy.
"a BenchmarkResult(52,985 iterations in 10 seconds 1 millisecond. 5,298 per second)" - preSpur VM
2016-08-01 19:32 GMT+02:00 tim Rowledge tim@rowledge.org:
On 01-08-2016, at 12:58 AM, Eliot Miranda eliot.miranda@gmail.com
wrote:
Hi John,
On Jul 31, 2016, at 7:32 AM, John McIntosh <
johnmci@smalltalkconsulting.com> wrote:
So has anyone thought about pre-allocating a few commonly used objects
during idle time? Then grabbing that object and filling in the actual details when one is needed. The Squeak VM did that for method context (and recycled them).
Thus only works to the extent that a particular kind if object us
allocated all the time and to the extent that synthesizing an object is relatively expensive relative to filling in its slots. This works for contexts above a complex object representation. In my experience it doesn't work for floats either. But Cog does not allocate contexts often because of context-to-stack mapping and because the Spur object representation is simple, regular and very quick to synthesize (allocate).
Yeah but with a 64-bit address space we could pre-allocate *millions* of objects of every class! And speculatively initialise them!
Oh, wait; what do you mean we don’t have 17,592,186,044,416Mb ram in our Raspberry Pi’s yet?
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- A one-bit brain with a parity error.
vm-dev@lists.squeakfoundation.org