The first real application I ever wrote in Smalltalk was an assembler for the Transputer. It has a nice bytecode-like machine language and the variable sized problem was even worse than in Smalltalk since you have to add prefix instructions for every 4 bits of your operands.
It so happened that I had gone back to the university and the lab was really happy to have me since I was the one who had gotten them into Unix many years before. Now I wanted them to forget Unix and go with Smalltalk instead, so I needed to impress them.
An assembler like this is trivial to write (even in C) so I made it an "instant assembler". You typed your code on the right half of what looks like an output listing. In the left half you saw the hex machine code and it was updated as you typed in the code! Take that, you nasty C. This meant that when you added an instruction, any number of jumps might suddenly become too short. So they would be extended with prefix instructions and that might make other jumps too short. Deleting a line might allow some jumps to become shorter.
We are talking about recalculating all offsets in a fraction of a second on a 4.77 MHz 8086 machine (Smalltalk V). And the Transputer code was far larger than any Smalltalk method should ever be. Sad to say, that lab is still a Unix shop.
On Thursday 01 August 2002 22:49, Ian Piumarta wrote:
On Thu, 1 Aug 2002, Swan, Dean wrote:
and potentially simpler instruction decoding.
Absolutely.
FWIW, all of the Jitters did a pre-pass over the bytecode to convert everthing into a "normal" form (a total of 20 or so "abstract insns", all of which were the same size and had the "opcode" in the same place) and to eliminate the convoluted "conditional jumps over unconditional jumps".
It is interesting how close your SAM instructions are to the current bytecodes in Self (which are different from the ones described in the various papers).
Like you say, this simplified compilation immensely. Whether or not (or on which architectures and under which conditions) it would increase interpreted performance would make for a fascinating experiment.
Some people might want to look at the instruction set I am using on a 16 bit machine (unfortunately too small for Squeak. I would handle something like Little Smalltalk just fine, on the other hand):
http://www.merlintec.com:8080/hardware/Oliver/
The send instructions are hard to understand unless you know that I use full Selector Table Indexing for message dispatch. This was never supposed to be used in practice since the table is only 5% filled, but I have 8MB of RAM (couldn't find a smaller chip to buy) on a 16 bit machine so I could afford it, and it did make the send instructions take only one clock. Returns take two - I could do better but ran out of space on the FPGA.
Markus has been talking about making parse trees be the "portable form" of compiled methods and then using a runtime compiler to convert them into bytecodes (for interpretation) or native code. Experimenting with different formats of bytecode (or "wordcode") would be really easy (and lots of fun) in such a context.
How about my 1984 design for a Smalltalk VM?
http://www.lsi.usp.br/~jecel/st84.txt
-- Jecel