Hi All,

Threads are more useful when one needs high performance and low latency in an application that runs in a single computer. High performance video games and (soft) realtime graphics are usually in this domain.

I know you're working with high performance video games. If you would introduce multi-threading in Squeak/Pharo, how would you do it ? Especially, do you have a design in mind that does not require to rewrite all the core libraries ?

For Pharo I am going to the GPGPU route, using either OpenCL or a low level graphics API (Vulkan, D3D 12 or Metal). This way, I do not have to change the VM or Pharo for using the many threads present in the GPGPU. I am modifying Pharo and the VM with other purposes, such as being able to submit lot of data to the GPU so that it can be kept busy.

For actual CPU side multithreading, I am abandoning Pharo and the VM by making an Ahead of Time Compiler(something similar to Bee Smalltalk), where I am using the OpalCompiler as a frontend, and a SSA based intermediate representation which is very similar to the one offered by LLVM, but written in Pharo. I had to make this SSA IR to be able to generate the shaders for Vulkan from Pharo, so for this AoT I am just reusing it by adding a machine code backend. With my framework I am able to generate an elf32 or an elf64 that can be linked with any C library directly, such as a minimalistic runtime( https://github.com/ronsaldo/slvm-native ) for providing Smalltalk facilities such as message sends, object allocation, GC, segmented stack, etc.

I have already gotten some things working like message sends, the segmented stack, block closure creation and activation. For the object model, I am using the Spur object model, but with some slight modifications. Object interiors are aligned to 16 bytes, for being able to use SSE instructions. There is a small preheader for implementing the LISP2 GC algorithm (I choose it by its simplicity), become and heap management. The preheader is not used by generated code, except for serializing objects in the object file. I changed the CompiledMethod object type for having generic mixed oop and native data objects. For GC and multithreading, I will be just stopping the whole world in safe points and doing GC in a single thread. By disabling the GC, the user could be scheduling the GC to happen in non user perceived times, such as just after sending a frame rendering command.

AoT compilation of Smalltalk is going to make modifications to method dictionaries a very rare operation, because you cannot AoT compile methods on runtime time, so you do not need the compiler in a shipping application. This places the burden of thread safetyness to a small number of places that can be protected explicitly by using some Mutexes.

My plan with this infrastructure is leaving Pharo and the standard VM as a game prototyping and development environment, but doing the actual deployment by using this very experimental Ahead of Time compiler, and the minimalistic Smalltalk runtime.

Best regards,

Ronie

2017-01-31 12:57 GMT-03:00 Levente Uzonyi <leves@caesar.elte.hu>:

On Tue, 31 Jan 2017, Stefan Marr wrote:

Hi Levente:

On 31 Jan 2017, at 15:22, Levente Uzonyi <leves@caesar.elte.hu> wrote:

Also the question is does it really need to be objects? Alternatives include things like tuple spaces (think Linda), low-level shared memory buffers (Python and others, and apparently ECMAScript 2017).

You'd actually share a segment with objects stored in it. Low-level buffers are very restricting. They force you to serialize objects if you want to keep using them. And that has some unwanted overhead.

What’s a segment?

It's a read-only chunk of memory holding objects.

Who controls the lifetime of it?

It's permanent.

Are you doing local GC plus global reference counting?

GC never touches that memory, because it can't change.

Somehow you’d still manage those objects, no?

No.

If you go with objects, the problem is that you need to support GC. And, I suppose Eliot will agree that GC for multithreaded systems isn’t exactly zero cost.

You don't need multi-threaded GC here, just many independent single-threaded GCs, which we have already.
Btw, this is the same thing Erlang does.

I am probably missing something, but I’d think you need some global GC mechanism. If you got shared objects, you need to coordinate the local GCs.

All shared objects are permanent and read-only.

In Erlang, most messages are copied, only large data chunks are shared by reference. So, that restricts the need for globally coordinated GC quite a bit, but you still need it as far as I can tell.

Here objects shared by reference would be permanent, therefore no GC would be required.

Levente

Best regards
Stefan

--
Stefan Marr
Johannes Kepler Universität Linz
http://stefan-marr.de/research/