I remember there was a discussion about that somewhere but I can't find it. I cc vm-dev they may have a clue.
When copying a pointer object in 64 bits instead of 32 bits, you need to copy twice many data, so it is going to be slower in any case. 2.63 times slower seems to be too much however.
The latest Pharo 64 bits image can be found here: https://ci.inria.fr/pharo/job/Pharo-6.0-Update-Step-3.1-64bits/
The latest Pharo 64 bits VM can be found here: https://bintray.com/opensmalltalk/vm/cog
Best,
On Sun, Feb 5, 2017 at 1:27 PM, Ciprian Teodorov <ciprian.teodorov@gmail.com
wrote:
Hi all,
I'm very happy to see that the 64 bit Pharo vm is progressing. I've even managed to get a ~6.85 GB heap allocated (see http://bit.ly/2lbp8n6). This is great!
There seems however to be a small problem with the #shallowCopy message, which is 2.63 times slower on the 64bit VM (image/vm details bellow).
The bench that I used is a simple random graph analysis tool that is intended to do a lot of random memory accesses on big heaps, which is accessible at http://www.smalltalkhub.com/#!/~CipT/PlugMC In this case I expect the execution time to be dominated by the Set implementation (which is the case with pharo 5 -- see http://bit.ly/2lbzJhd), and not by the array copy (see http://bit.ly/2kvbqvy).
Is this a 64bit limitation, or only a feature "not yet available" ? Where can I access the latests versions of 64 bit pharo image and vm ?
Image
/Users/ciprian/Downloads/Pharo64/60371-64/Pharo64-60371.image Pharo6.0 Latest update: #60371 Unnamed
Virtual Machine
/Users/ciprian/Downloads/Pharo64/Pharo 4.app/Contents/MacOS/Pharo CoInterpreter * VMMaker.oscog-eem.2111 uuid: 7c02b557-bdcc-4a91-92b1-7fc15f1e8605 Jan 27 2017 StackToRegisterMappingCogit * VMMaker.oscog-eem.2111 uuid: 7c02b557-bdcc-4a91-92b1-7fc15f1e8605 Jan 27 2017 VM: 201701271449 https://github.com/pharo-project/pharo-vm.git $ Date: Fri Jan 27 15:49:20 2017 +0100 $ Plugins: 201701271449 https://github. com/pharo-project/pharo-vm.git $
Mac OS X built on Jan 27 2017 15:28:14 UTC Compiler: 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31) VMMaker versionString VM: 201701271449 https://github. com/pharo-project/pharo-vm.git $ Date: Fri Jan 27 15:49:20 2017 +0100 $ Plugins: 201701271449 https://github.com/pharo-project/pharo-vm.git $ CoInterpreter * VMMaker.oscog-eem.2111 uuid: 7c02b557-bdcc-4a91-92b1-7fc15f1e8605 Jan 27 2017 StackToRegisterMappingCogit * VMMaker.oscog-eem.2111 uuid: 7c02b557-bdcc-4a91-92b1-7fc15f1e8605 Jan 27 2017
Cheers,
Dr. Ciprian TEODOROV Enseignant-chercheur ENSTA Bretagne
tél : 06 08 54 73 48 mail : ciprian.teodorov@gmail.com www.teodorov.ro
On 05-02-2017, at 5:08 AM, Clément Bera bera.clement@gmail.com wrote:
I remember there was a discussion about that somewhere but I can't find it. I cc vm-dev they may have a clue.
When copying a pointer object in 64 bits instead of 32 bits, you need to copy twice many data, so it is going to be slower in any case.
Err, not really. Probably. Assuming you have a 64 bit cpu etc, of course. And dependent on details of the memory architecture outside the cpu too - after all many systems do not need the memory chip organisation to match the cpu word size, having multiple lanes, burst read cache loading, even heterogenous regions (I suspect mostly in embedded systems for that, but y’never know).
Yes, you’re moving twice as much stuff but it will still be a single read & write per word. After that you’re at the mercy of cache lines, write buffers, chip specs and not to mention the Hamsters.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim We can rescue a hostage or bankrupt a system. Now, what would you like us to do?
Thanks guys I'll will try with the latest version and I'll come back with updates.
On Sun, Feb 5, 2017 at 8:25 PM, tim Rowledge tim@rowledge.org wrote:
On 05-02-2017, at 5:08 AM, Clément Bera bera.clement@gmail.com wrote:
I remember there was a discussion about that somewhere but I can't find
it. I cc vm-dev they may have a clue.
When copying a pointer object in 64 bits instead of 32 bits, you need to
copy twice many data, so it is going to be slower in any case.
Err, not really. Probably. Assuming you have a 64 bit cpu etc, of course. And dependent on details of the memory architecture outside the cpu too - after all many systems do not need the memory chip organisation to match the cpu word size, having multiple lanes, burst read cache loading, even heterogenous regions (I suspect mostly in embedded systems for that, but y’never know).
Yes, you’re moving twice as much stuff but it will still be a single read & write per word. After that you’re at the mercy of cache lines, write buffers, chip specs and not to mention the Hamsters.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim We can rescue a hostage or bankrupt a system. Now, what would you like us to do?
It is strange, to me it seems like the <primitive: 148> fails back to the smalltalk implementation (http://bit.ly/2kjYdHv). However when trying to copy a small array like #(1 2 3 4) copy I cannot step-into the #shallowCopy nor when I try to copy a big array like (1 to: 100000) asArray copy
However, when I do cmd+. while running my bench the debugger stops in the shallowCopy
is this a debugger thing ? or the primitive really fails ? -- which can explain the > 2.6 slowdown
best regards, cip
On Mon, Feb 6, 2017 at 7:36 PM, Ciprian Teodorov <ciprian.teodorov@gmail.com
wrote:
Thanks guys I'll will try with the latest version and I'll come back with updates.
On Sun, Feb 5, 2017 at 8:25 PM, tim Rowledge tim@rowledge.org wrote:
On 05-02-2017, at 5:08 AM, Clément Bera bera.clement@gmail.com wrote:
I remember there was a discussion about that somewhere but I can't find
it. I cc vm-dev they may have a clue.
When copying a pointer object in 64 bits instead of 32 bits, you need
to copy twice many data, so it is going to be slower in any case.
Err, not really. Probably. Assuming you have a 64 bit cpu etc, of course. And dependent on details of the memory architecture outside the cpu too - after all many systems do not need the memory chip organisation to match the cpu word size, having multiple lanes, burst read cache loading, even heterogenous regions (I suspect mostly in embedded systems for that, but y’never know).
Yes, you’re moving twice as much stuff but it will still be a single read & write per word. After that you’re at the mercy of cache lines, write buffers, chip specs and not to mention the Hamsters.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim We can rescue a hostage or bankrupt a system. Now, what would you like us to do?
-- Dr. Ciprian TEODOROV Enseignant-chercheur ENSTA Bretagne
tél : 06 08 54 73 48 mail : ciprian.teodorov@gmail.com www.teodorov.ro
On Tue, Feb 7, 2017 at 3:05 AM, Ciprian Teodorov <ciprian.teodorov@gmail.com
wrote:
It is strange, to me it seems like the <primitive: 148> fails back to the smalltalk implementation (http://bit.ly/2kjYdHv). However when trying to copy a small array like #(1 2 3 4) copy I cannot step-into the #shallowCopy nor when I try to copy a big array like (1 to: 100000) asArray copy
However, when I do cmd+. while running my bench the debugger stops in the shallowCopy
is this a debugger thing ?
To check, can you add a transcript output next line after the primitive pragma? cheers -ben
or the primitive really fails ? -- which can explain the > 2.6 slowdown
best regards, cip
On Mon, Feb 6, 2017 at 7:36 PM, Ciprian Teodorov < ciprian.teodorov@gmail.com> wrote:
Thanks guys I'll will try with the latest version and I'll come back with updates.
On Sun, Feb 5, 2017 at 8:25 PM, tim Rowledge tim@rowledge.org wrote:
On 05-02-2017, at 5:08 AM, Clément Bera bera.clement@gmail.com
wrote:
I remember there was a discussion about that somewhere but I can't
find it. I cc vm-dev they may have a clue.
When copying a pointer object in 64 bits instead of 32 bits, you need
to copy twice many data, so it is going to be slower in any case.
Err, not really. Probably. Assuming you have a 64 bit cpu etc, of course. And dependent on details of the memory architecture outside the cpu too - after all many systems do not need the memory chip organisation to match the cpu word size, having multiple lanes, burst read cache loading, even heterogenous regions (I suspect mostly in embedded systems for that, but y’never know).
Yes, you’re moving twice as much stuff but it will still be a single read & write per word. After that you’re at the mercy of cache lines, write buffers, chip specs and not to mention the Hamsters.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim We can rescue a hostage or bankrupt a system. Now, what would you like us to do?
-- Dr. Ciprian TEODOROV Enseignant-chercheur ENSTA Bretagne
tél : 06 08 54 73 48 mail : ciprian.teodorov@gmail.com www.teodorov.ro
-- Dr. Ciprian TEODOROV Enseignant-chercheur ENSTA Bretagne
tél : 06 08 54 73 48 mail : ciprian.teodorov@gmail.com www.teodorov.ro
Thanks Ben,
the <primitive: 148> seems to fail something like 4-5 % with my bench (osx 10.11.6, the latest Pharo/Cog)
# of copy calls Failing primitive 148 Failing rate 1710 77 4,50% 3049 133 4,36% 51562 2947 5,72% and it does not seem to fail at all with something like:
1 to: 1000 do: [:i | (1 to: 100000) asArray copy. ]
cheers
On Tue, Feb 7, 2017 at 12:42 AM, Ben Coman btc@openinworld.com wrote:
On Tue, Feb 7, 2017 at 3:05 AM, Ciprian Teodorov < ciprian.teodorov@gmail.com> wrote:
It is strange, to me it seems like the <primitive: 148> fails back to the smalltalk implementation (http://bit.ly/2kjYdHv). However when trying to copy a small array like #(1 2 3 4) copy I cannot step-into the #shallowCopy nor when I try to copy a big array like (1 to: 100000) asArray copy
However, when I do cmd+. while running my bench the debugger stops in the shallowCopy
is this a debugger thing ?
To check, can you add a transcript output next line after the primitive pragma? cheers -ben
or the primitive really fails ? -- which can explain the > 2.6 slowdown
best regards, cip
On Mon, Feb 6, 2017 at 7:36 PM, Ciprian Teodorov < ciprian.teodorov@gmail.com> wrote:
Thanks guys I'll will try with the latest version and I'll come back with updates.
On Sun, Feb 5, 2017 at 8:25 PM, tim Rowledge tim@rowledge.org wrote:
On 05-02-2017, at 5:08 AM, Clément Bera bera.clement@gmail.com
wrote:
I remember there was a discussion about that somewhere but I can't
find it. I cc vm-dev they may have a clue.
When copying a pointer object in 64 bits instead of 32 bits, you need
to copy twice many data, so it is going to be slower in any case.
Err, not really. Probably. Assuming you have a 64 bit cpu etc, of course. And dependent on details of the memory architecture outside the cpu too - after all many systems do not need the memory chip organisation to match the cpu word size, having multiple lanes, burst read cache loading, even heterogenous regions (I suspect mostly in embedded systems for that, but y’never know).
Yes, you’re moving twice as much stuff but it will still be a single read & write per word. After that you’re at the mercy of cache lines, write buffers, chip specs and not to mention the Hamsters.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim We can rescue a hostage or bankrupt a system. Now, what would you like us to do?
-- Dr. Ciprian TEODOROV Enseignant-chercheur ENSTA Bretagne
tél : 06 08 54 73 48 mail : ciprian.teodorov@gmail.com www.teodorov.ro
-- Dr. Ciprian TEODOROV Enseignant-chercheur ENSTA Bretagne
tél : 06 08 54 73 48 mail : ciprian.teodorov@gmail.com www.teodorov.ro
Try the following experiment. Copy Object>>shallowCopy to Object>>monitorShallowCopy and after the pragma add... Smalltalk at: #Monitor put: #Failed.
Then in Playground... lastfail := 0. 1 to: 100000 do: [ :n | |src copy| src := Array new: n. Smalltalk at: #Monitor put: #Succeeded. copy := src monitorShallowCopy. (Smalltalk at: #Monitor) == #Failed ifTrue: [ Transcript crShow: n; tab; show: n - lastfail. lastfail := n. ]. ].
Produces the following interesting result....
RUN1...
65559 65559 67670 2111 67685 15 67700 15 67715 15 67730 15 ... 69860 15 69875 15 69890 15 69905 15 72334 2429 72348 14 72362 14 72376 14 72390 14 ... 74854 14 74868 14 74882 14 74896 14 77681 2785 77694 13 77707 13 77720 13 77733 13 ... 80619 13 80632 13 80645 13 80658 13 83894 3236 83906 12 83918 12 83930 12 83942 12 ... 87338 12 87350 12 87362 12 87374 12 91189 3815 91200 11 91211 11 91222 11 91233 11 ... 95292 11 95303 11 95314 11 95325 11 99867 4542 99877 10 99887 10 99897 10 99907 10 99917 10 99927 10 99937 10 99947 10 99957 10 99967 10 99977 10 99987 10 99997 10
RUN2...
67660 67660 67675 15 67690 15 67705 15 67720 15 .... 69865 15 69880 15 69895 15 69910 15 72324 2414 72338 14 72352 14 72366 14 72380 14 .... 74858 14 74872 14 74886 14 74900 14 77685 2785 77698 13 77711 13 77724 13 77737 13 .... 80623 13 80636 13 80649 13 80662 13 83898 3236 83910 12 83922 12 83934 12 83946 12 ... 87342 12 87354 12 87366 12 87378 12 91193 3815 91204 11 91215 11 91226 11 91237 11 ... 95285 11 95296 11 95307 11 95318 11 99871 4553 99881 10 99891 10 99901 10 99911 10 99921 10 99931 10 99941 10 99951 10 99961 10 99971 10 99981 10 99991 10
This is with * 60375-64.zip * cog_win64x64_squeak.stack.spur_201702021058.zip * Windows 7 Professional SP1
cheers -ben
On Tue, Feb 7, 2017 at 10:15 AM, Ciprian Teodorov < ciprian.teodorov@gmail.com> wrote:
Thanks Ben,
the <primitive: 148> seems to fail something like 4-5 % with my bench
(osx 10.11.6, the latest Pharo/Cog)
# of copy calls Failing primitive 148 Failing rate
1710 77 4,50% 3049 133 4,36% 51562 2947 5,72%
and it does not seem to fail at all with something like:
1 to: 1000 do: [:i | (1 to: 100000) asArray copy. ]
cheers
On Tue, Feb 7, 2017 at 12:42 AM, Ben Coman btc@openinworld.com wrote:
On Tue, Feb 7, 2017 at 3:05 AM, Ciprian Teodorov <
ciprian.teodorov@gmail.com> wrote:
It is strange, to me it seems like the <primitive: 148> fails back to
the smalltalk implementation (http://bit.ly/2kjYdHv).
However when trying to copy a small array like #(1 2 3 4) copy I cannot
step-into the #shallowCopy
nor when I try to copy a big array like (1 to: 100000) asArray copy
However, when I do cmd+. while running my bench the debugger stops in
the shallowCopy
is this a debugger thing ?
To check, can you add a transcript output next line after the primitive
pragma?
cheers -ben
or the primitive really fails ? -- which can explain the > 2.6 slowdown
best regards, cip
On Mon, Feb 6, 2017 at 7:36 PM, Ciprian Teodorov <
ciprian.teodorov@gmail.com> wrote:
Thanks guys I'll will try with the latest version and I'll come back
with updates.
On Sun, Feb 5, 2017 at 8:25 PM, tim Rowledge tim@rowledge.org wrote:
On 05-02-2017, at 5:08 AM, Clément Bera bera.clement@gmail.com
wrote:
I remember there was a discussion about that somewhere but I can't
find it. I cc vm-dev they may have a clue.
When copying a pointer object in 64 bits instead of 32 bits, you
need to copy twice many data, so it is going to be slower in any case.
Err, not really. Probably. Assuming you have a 64 bit cpu etc, of
course. And dependent on details of the memory architecture outside the cpu too - after all many systems do not need the memory chip organisation to match the cpu word size, having multiple lanes, burst read cache loading, even heterogenous regions (I suspect mostly in embedded systems for that, but y’never know).
Yes, you’re moving twice as much stuff but it will still be a single
read & write per word. After that you’re at the mercy of cache lines, write buffers, chip specs and not to mention the Hamsters.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim We can rescue a hostage or bankrupt a system. Now, what would you
like us to do?
-- Dr. Ciprian TEODOROV Enseignant-chercheur ENSTA Bretagne
tél : 06 08 54 73 48 mail : ciprian.teodorov@gmail.com www.teodorov.ro
-- Dr. Ciprian TEODOROV Enseignant-chercheur ENSTA Bretagne
tél : 06 08 54 73 48 mail : ciprian.teodorov@gmail.com www.teodorov.ro
-- Dr. Ciprian TEODOROV Enseignant-chercheur ENSTA Bretagne
tél : 06 08 54 73 48 mail : ciprian.teodorov@gmail.com www.teodorov.ro
interesting I have a similar behavior with your experiment with distance variations globally between 16 and 25
cheers, cip
On Tue, Feb 7, 2017 at 6:51 AM, Ben Coman btc@openinworld.com wrote:
Try the following experiment. Copy Object>>shallowCopy to Object>>monitorShallowCopy and after the pragma add... Smalltalk at: #Monitor put: #Failed.
Then in Playground... lastfail := 0. 1 to: 100000 do: [ :n | |src copy| src := Array new: n. Smalltalk at: #Monitor put: #Succeeded. copy := src monitorShallowCopy. (Smalltalk at: #Monitor) == #Failed ifTrue: [ Transcript crShow: n; tab; show: n - lastfail. lastfail := n. ]. ].
Produces the following interesting result....
RUN1...
65559 65559 67670 2111 67685 15 67700 15 67715 15 67730 15 ... 69860 15 69875 15 69890 15 69905 15 72334 2429 72348 14 72362 14 72376 14 72390 14 ... 74854 14 74868 14 74882 14 74896 14 77681 2785 77694 13 77707 13 77720 13 77733 13 ... 80619 13 80632 13 80645 13 80658 13 83894 3236 83906 12 83918 12 83930 12 83942 12 ... 87338 12 87350 12 87362 12 87374 12 91189 3815 91200 11 91211 11 91222 11 91233 11 ... 95292 11 95303 11 95314 11 95325 11 99867 4542 99877 10 99887 10 99897 10 99907 10 99917 10 99927 10 99937 10 99947 10 99957 10 99967 10 99977 10 99987 10 99997 10
RUN2...
67660 67660 67675 15 67690 15 67705 15 67720 15 .... 69865 15 69880 15 69895 15 69910 15 72324 2414 72338 14 72352 14 72366 14 72380 14 .... 74858 14 74872 14 74886 14 74900 14 77685 2785 77698 13 77711 13 77724 13 77737 13 .... 80623 13 80636 13 80649 13 80662 13 83898 3236 83910 12 83922 12 83934 12 83946 12 ... 87342 12 87354 12 87366 12 87378 12 91193 3815 91204 11 91215 11 91226 11 91237 11 ... 95285 11 95296 11 95307 11 95318 11 99871 4553 99881 10 99891 10 99901 10 99911 10 99921 10 99931 10 99941 10 99951 10 99961 10 99971 10 99981 10 99991 10
This is with
- 60375-64.zip
- cog_win64x64_squeak.stack.spur_201702021058.zip
- Windows 7 Professional SP1
cheers -ben
On Tue, Feb 7, 2017 at 10:15 AM, Ciprian Teodorov < ciprian.teodorov@gmail.com> wrote:
Thanks Ben,
the <primitive: 148> seems to fail something like 4-5 % with my bench
(osx 10.11.6, the latest Pharo/Cog)
# of copy calls Failing primitive 148 Failing rate
1710 77 4,50% 3049 133 4,36% 51562 2947 5,72%
and it does not seem to fail at all with something like:
1 to: 1000 do: [:i | (1 to: 100000) asArray copy. ]
cheers
On Tue, Feb 7, 2017 at 12:42 AM, Ben Coman btc@openinworld.com wrote:
On Tue, Feb 7, 2017 at 3:05 AM, Ciprian Teodorov <
ciprian.teodorov@gmail.com> wrote:
It is strange, to me it seems like the <primitive: 148> fails back to
the smalltalk implementation (http://bit.ly/2kjYdHv).
However when trying to copy a small array like #(1 2 3 4) copy I
cannot step-into the #shallowCopy
nor when I try to copy a big array like (1 to: 100000) asArray copy
However, when I do cmd+. while running my bench the debugger stops in
the shallowCopy
is this a debugger thing ?
To check, can you add a transcript output next line after the primitive
pragma?
cheers -ben
or the primitive really fails ? -- which can explain the > 2.6 slowdown
best regards, cip
On Mon, Feb 6, 2017 at 7:36 PM, Ciprian Teodorov <
ciprian.teodorov@gmail.com> wrote:
Thanks guys I'll will try with the latest version and I'll come back
with updates.
On Sun, Feb 5, 2017 at 8:25 PM, tim Rowledge tim@rowledge.org
wrote:
> On 05-02-2017, at 5:08 AM, Clément Bera bera.clement@gmail.com
wrote:
> > I remember there was a discussion about that somewhere but I can't
find it. I cc vm-dev they may have a clue.
> > When copying a pointer object in 64 bits instead of 32 bits, you
need to copy twice many data, so it is going to be slower in any case.
Err, not really. Probably. Assuming you have a 64 bit cpu etc, of
course. And dependent on details of the memory architecture outside the cpu too - after all many systems do not need the memory chip organisation to match the cpu word size, having multiple lanes, burst read cache loading, even heterogenous regions (I suspect mostly in embedded systems for that, but y’never know).
Yes, you’re moving twice as much stuff but it will still be a single
read & write per word. After that you’re at the mercy of cache lines, write buffers, chip specs and not to mention the Hamsters.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim We can rescue a hostage or bankrupt a system. Now, what would you
like us to do?
-- Dr. Ciprian TEODOROV Enseignant-chercheur ENSTA Bretagne
tél : 06 08 54 73 48 mail : ciprian.teodorov@gmail.com www.teodorov.ro
-- Dr. Ciprian TEODOROV Enseignant-chercheur ENSTA Bretagne
tél : 06 08 54 73 48 mail : ciprian.teodorov@gmail.com www.teodorov.ro
-- Dr. Ciprian TEODOROV Enseignant-chercheur ENSTA Bretagne
tél : 06 08 54 73 48 mail : ciprian.teodorov@gmail.com www.teodorov.ro
I would try logging the number of incremental and full GCs along with the failures. Just a hunch (the prim might fail for OOM).
- Bert -
On Tue, Feb 7, 2017 at 2:21 PM, Ciprian Teodorov <ciprian.teodorov@gmail.com
wrote:
interesting I have a similar behavior with your experiment with distance variations globally between 16 and 25
cheers, cip
On Tue, Feb 7, 2017 at 6:51 AM, Ben Coman btc@openinworld.com wrote:
Try the following experiment. Copy Object>>shallowCopy to Object>>monitorShallowCopy and after the pragma add... Smalltalk at: #Monitor put: #Failed.
Then in Playground... lastfail := 0. 1 to: 100000 do: [ :n | |src copy| src := Array new: n. Smalltalk at: #Monitor put: #Succeeded. copy := src monitorShallowCopy. (Smalltalk at: #Monitor) == #Failed ifTrue: [ Transcript crShow: n; tab; show: n - lastfail. lastfail := n. ]. ].
Produces the following interesting result....
RUN1...
65559 65559 67670 2111 67685 15 67700 15 67715 15 67730 15 ...
What's the error code when the primitive fails?
Levente
Hi,
I tried to analyse the problem and I think I found the cause and a potential solution.
I have just tried Ben's script in the 32 bits latest VM (Squeak.cog.spur) and at the bottom of the mail are the results in [1]. I modified the script to print the error codes. The ratio are a bit different from 64 bits, but the same pattern is present. The primitive fails once every 40-60 allocations in 32 bits instead of every 10-15 allocations in 64 bits, with every ~15 failures allocations working better for a short while. The primitive always fails for 'insufficient object memory'.
The allocation strategy is different for objects which size cannot be encoded in 16 bits (in our case, array larger than 65535 fields). Large objects are directly allocated in old space. The failures in shallowCopy happen in this case. I believe the case where many large objects are allocated in a row is not really optimised because it supposed to be uncommon. If it's common in someone's usecase, I am pretty sure we can do something about it.
Because the memory is in bytes and array fields are twice bigger in 64 bits, I would expect the failures to be twice more frequent in 64 bits than 32 bits. They seem to be 4 times more frequent, but different persons did the 64 bits measurements on different machines, so it could be that other side-effects require to be considered.
One solution I see is the following (Pharo version, in Squeak use directly vmParameterAt:) :
coef := 2. Smalltalk vm parameterAt: 25 put: (Smalltalk vm parameterAt: 25) * coef. Smalltalk vm parameterAt: 24 put: (Smalltalk vm parameterAt: 24) * coef.
Basically, I change the old space heuristics to allocate bigger segments and not to shrink too aggressively.
With a coef of 2, I see the primitive failing once every 58-87 times instead of once every 40-60 allocations. With a coef of 10, I see the primitive failing once every 350-700 allocations. The results for coef 10 are in [2] at the bottom of the mail.
Obviously with these settings the image is using a bit more RAM, but I guess in the use-case of Ciprian where images are 6.8Gb large it does not really matter to waste a dozen extra Mb.
Coef 2 may lead to a waste of ~15Mb Coef 10 may lead to a waste of ~150Mb
I don't think there is a generic magic solution for 64 bits. We could consider having twice bigger segments by default in 64 bits ? I don't know if it makes sense.
I have on my TODO list to build a GC object for Pharo (normally Squeak-compatible) to provide convenient APIs and documentation on how to adapt the GC policy in Spur for both growing and large heaps. Hopefully I will do that around June.
[1] 65631 65631 #'insufficient object memory' 65689 58 #'insufficient object memory' 65747 58 #'insufficient object memory' ... 65979 58 #'insufficient object memory' 66616 637 #'insufficient object memory' 66673 57 #'insufficient object memory' 66730 57 #'insufficient object memory' ... 67243 57 #'insufficient object memory' 67698 455 #'insufficient object memory' 67754 56 #'insufficient object memory' 67810 56 #'insufficient object memory' ... 68538 56 #'insufficient object memory' 68817 279 #'insufficient object memory' 68872 55 #'insufficient object memory' ... 99860 38 #'insufficient object memory'
[2] 66720 66720 #'insufficient object memory' 68303 1583 #'insufficient object memory' 69850 1547 #'insufficient object memory' 70231 381 #'insufficient object memory' 70610 379 #'insufficient object memory' 71363 753 #'insufficient object memory' 72107 744 #'insufficient object memory' 72844 737 #'insufficient object memory' 73574 730 #'insufficient object memory' 74296 722 #'insufficient object memory' 74654 358 #'insufficient object memory' 75011 357 #'insufficient object memory' 75719 708 #'insufficient object memory' 76071 352 #'insufficient object memory' ... 98404 816 #'insufficient object memory' 98945 541 #'insufficient object memory' 99214 269 #'insufficient object memory'
On Tue, Feb 7, 2017 at 7:58 PM, Levente Uzonyi leves@caesar.elte.hu wrote:
What's the error code when the primitive fails?
Levente
On Sun, Feb 5, 2017 at 9:08 PM, Clément Bera bera.clement@gmail.com wrote:
I remember there was a discussion about that somewhere but I can't find it. I cc vm-dev they may have a clue.
When copying a pointer object in 64 bits instead of 32 bits, you need to copy twice many data, so it is going to be slower in any case. 2.63 times slower seems to be too much however.
The latest Pharo 64 bits image can be found here: https://ci.inria.fr/pharo/job/Pharo-6.0-Update-Step-3.1-64bits/
That download was failing for me. That is, at about half way it completed without error, but the zip was corrupt. Worked from here... http://files.pharo.org/image/60/
cheers -ben
The latest Pharo 64 bits VM can be found here: https://bintray.com/opensmalltalk/vm/cog
Best,
On Sun, Feb 5, 2017 at 1:27 PM, Ciprian Teodorov ciprian.teodorov@gmail.com wrote:
Hi all,
I'm very happy to see that the 64 bit Pharo vm is progressing. I've even managed to get a ~6.85 GB heap allocated (see http://bit.ly/2lbp8n6). This is great!
There seems however to be a small problem with the #shallowCopy message, which is 2.63 times slower on the 64bit VM (image/vm details bellow).
The bench that I used is a simple random graph analysis tool that is intended to do a lot of random memory accesses on big heaps, which is accessible at http://www.smalltalkhub.com/#!/~CipT/PlugMC In this case I expect the execution time to be dominated by the Set implementation (which is the case with pharo 5 -- see http://bit.ly/2lbzJhd), and not by the array copy (see http://bit.ly/2kvbqvy).
Is this a 64bit limitation, or only a feature "not yet available" ? Where can I access the latests versions of 64 bit pharo image and vm ?
Image
/Users/ciprian/Downloads/Pharo64/60371-64/Pharo64-60371.image Pharo6.0 Latest update: #60371 Unnamed
Virtual Machine
/Users/ciprian/Downloads/Pharo64/Pharo 4.app/Contents/MacOS/Pharo CoInterpreter * VMMaker.oscog-eem.2111 uuid: 7c02b557-bdcc-4a91-92b1-7fc15f1e8605 Jan 27 2017 StackToRegisterMappingCogit * VMMaker.oscog-eem.2111 uuid: 7c02b557-bdcc-4a91-92b1-7fc15f1e8605 Jan 27 2017 VM: 201701271449 https://github.com/pharo-project/pharo-vm.git $ Date: Fri Jan 27 15:49:20 2017 +0100 $ Plugins: 201701271449 https://github.com/pharo-project/pharo-vm.git $
Mac OS X built on Jan 27 2017 15:28:14 UTC Compiler: 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31) VMMaker versionString VM: 201701271449 https://github.com/pharo-project/pharo-vm.git $ Date: Fri Jan 27 15:49:20 2017 +0100 $ Plugins: 201701271449 https://github.com/pharo-project/pharo-vm.git $ CoInterpreter * VMMaker.oscog-eem.2111 uuid: 7c02b557-bdcc-4a91-92b1-7fc15f1e8605 Jan 27 2017 StackToRegisterMappingCogit * VMMaker.oscog-eem.2111 uuid: 7c02b557-bdcc-4a91-92b1-7fc15f1e8605 Jan 27 2017
Cheers,
Dr. Ciprian TEODOROV Enseignant-chercheur ENSTA Bretagne
tél : 06 08 54 73 48 mail : ciprian.teodorov@gmail.com www.teodorov.ro
vm-dev@lists.squeakfoundation.org