This is the initial report for the VM Construction Project group.
Since this is the first such report, I'll start with my view of the goals of the project.
Goal: The purpose of this project is to create a vm building process and associated code that makes vm building as simple, reliable and maintainable as we can manage.
Participants: tim, John McIntosh, Rob Withers, David Lewis, and of course Ian Piuumarta and Andreas Raab. Other participants to represent the other vm ports are sought. Stefan Rudlof, Goran Hultgren and others have provided advise and help.
Background: From the initial release of Squeak, the image has included all the code needed to create a vm for a Mac. For other platforms there has been a variety of places and approaches used over time, mostly the provision of an archive of some sort with a file tree suited to the platform. Sometimes this has included 'common' files (files such as sq.h etc) that are derived from but not identical to those in the release image. This situation could continue into the future without being a major embarassment to the Squeak community, but some time ago I felt that it could be improved with some fairly simple changes.
One of the original driving forces for the work that has been released as the VMMaker and VMMakerTool was the need at exobox to be able to produce frequent vm builds automatically without extensive user intervention. Another original aim was to find a way to produce a file tree that reasonably suited all known platforms and that was compatible with automated vm production. The time-honoured vm production routine was not very suited to thise aims. For example, generating the core of the vm required two unrelated incantations and produced a dump of files in the current wrking directory; these files needed moving, sometimes editing, before being compiled. Producing a vm for a Mac required a different incantation and produced a different group of files. Changing which set of plugins were produced and whether they were internal or external, required editing a method.
None of this prevented a rudimentary script from being used succesfully, but it apearred untidy and suboptimal to some users.
A secondary concern was the scattered provenance of the code for non-Mac machines and the varied update levels of that code. Using some common repository was suggested and seemed a sensible experiment. The initial attempts to use SourceForge mainly served to demonstrate that a better file organisation would be very helpful.
Current status: The VMMaker has been the major focus of the work for this project thus far. It is currently able to produce vms for any of the four most active platform from a codebase stored on SourceForge. A UI exists that makes it easy to configure the plugins and to save and use said configurations. The basic tool is scriptable and has been documented on the swiki and in SqueakNews, as well as having been demonstrated at OOPSLA. Work is still in progress to bring all platforms to the same level of compliance (for want of a better word).
Future Plans: The VMMaker codebase is reasonably complete but there are enhancements that might be worth pursuing - makefile (or at least makefile fragment) production, integration with sqCVS, vm internal configuration (cache size etc) amongst them. The codebase currently on SF is in good order, barring the mostly makefile related updating needed by non-unix platforms, but tools to make it easier to fetch the code in release-matched sets, to tag release sets, deal with branches and bug fixes are all needed.
The most important goal is to reach agreement on adopting VMMaker or on an acceptable alternative. No other interesting alternative is currently proposed. Once a reasonable level of agreement is reached the artifact can be integrated into the main release, presumably by creating a module for it.
tim. chair, VM construction project. "We live for the VM, we debug for the VM. But we don't debug stupidly"
This is the initial report for the VI4 Project group.
Goal: This project is intended to gather together all the vm and vi related changes that have been proposed and which require a different image format.
Participants: So far I know of proposals from Anthony Hannan , Hans-Martin Mosner, Dan and myself. People affected very directly will be all port maintainers.
Background: There are several proposals for changes covered within this project.
i) new compiled method format The curent CompiledMethod class format is an ugly chimeric hybrid that mixes oops and bytes in a manner that complicates the gc and other parts of the vm, plus the lower levels of the image. A new format was proposed some time ago and was generally agredd to offer enough benefits to be desirable. the cost of changing the image format has heretofore prevented actual adoption of the change.
ii) BlockClosures For a long time Squeak has been lacking proper BlockClosures and at last there is a proposed implementation to work with. Bugs are already being swatted and it looks very promising.
iii) Two-bit oop tags Hans Martin has suggested moving to two tag bits on OOPS in order to allow more immediate classes. Currently the only really serious candidate class is Character, but it does have advantages relating to unicode and internationalisation. Other possibilities include short floats, immediate small points and err.... well.
iv) cleanup some header fields
v) clean out the primitive table The current primitive table can theoretically be 2048 entries long which is vastly more than is needed with named primitives. There is also a translation table to go from the 'old' numbers to the new names; this can be removed. The primitive can probably be reduced to 256 entries or less, allowing a few bits to be saved in method headers. There are also some primitives which are now redundant and should be put out to grass.
vi) plugin renaming As has been observed, there are some plugins with names that are rather unhelpful and the field would benefit from some tidying. Although this is not strictly an image format issue, it does cause a break in image compatability and therefore would be nice to get done at the same time.
Status: BlockClosures have been implemented and the changesets already incorporate most of the altered CompiledMethod structure and method header cleanup.
Plans: Any further proposals for image format affecting changes are needed as soon as possible; this is not something we want to do more than once. BlockClosure bugs are being addressed rapidly. Two-bit tags need to be considered soon. Header field and primitive table alterations need implementation.
tim Chairman, VI4 project "Image, my dear, is simply _everything_""
At 3:15 PM -0800 2/1/02, Tim Rowledge wrote:
This is the initial report for the VI4 Project group.
Goal: This project is intended to gather together all the vm and vi related changes that have been proposed and which require a different image format.
[...]
Plans: Any further proposals for image format affecting changes are needed as soon as possible; this is not something we want to do more than once.
I do have an image format enhancement I've been considering... and it seems now is the time.
Here's a very brief proposal. I can fill in details later, but I want let you know right away the basic idea.
I'd like to have a bit in the object header for the purpose of marking an object immutable.
Impact: I haven't seen the other proposals that affect the header, so I can't yet say what if anything I'm proposing to move out of the header to make room.
Semantics: New objects are created with the immutability bit cleared. There are primitives for setting and clearing the bit. If the bit is set and an attempt is made to alter any of the instance variables of the object, the VM takes some special action instead (most likely, an exception is raised). There are higher-level facilities that invoke the setting and clearing primitives, and for dealing with exceptions.
Usefulness: There are two basic classes of uses for this capability. 1) Preventing the modification of an object 2) Detecting the modification of an object
Some objects just shouldn't be modified -- Symbols, method literals, etc.
Others you can modify, but you want to know when it happens. This is really useful in implementing object synchronization when the same logical object has a physical representation in more than one space. It's good for distributed object systems, object persistence systems (my particular interest), object-relational persistence frameworks, and the like.
Alternatives: The alternatives for detecting object modifications are really awkward; this is a much easier way to do it.
Well, that's the barebones version. Let me know what you think... (not that this group is usually afraid to speak up :-)
-Martin
On Fri, 1 Feb 2002, Martin McClure wrote:
I'd like to have a bit in the object header for the purpose of marking an object immutable.
Usefulness: There are two basic classes of uses for this capability.
- Preventing the modification of an object
- Detecting the modification of an object
Hear, hear. I was just about to propose the same thing myself.
On Fri, 1 Feb 2002, Martin McClure wrote:
At 3:15 PM -0800 2/1/02, Tim Rowledge wrote:
This is the initial report for the VI4 Project group.
Goal: This project is intended to gather together all the vm and vi related changes that have been proposed and which require a different image format.
[...]
Plans: Any further proposals for image format affecting changes are needed as soon as possible; this is not something we want to do more than once.
I think we may need to do it more than once.. How about two iterations? VM4, where we improve the VM, and VM5 where we redo ObjectMemory?
I've been thinking of ideas to make GC a lot less weak, including almost free stack objects and temporaries. The problem is that it requires careful profiling first. But we won't have accurate profiling on the new VM until the new VM is out.
And, IMHO, we don't want to hold back a new VM just to maybe get a redo of the objectmemory in.
One really nice idea I've seen is to use a seperate scheme for temporary objects.. Something similar to:
Each object has a 'single reference' bit, and an extra header word pointing to the reference. (The single-reference bit can be imlicit. We can reserve a percentage of the memory range and assume that all bits in that range *are* single-reference.).
If an object is stored in more than one place, the object is cloned and a *copy* is put into youngspace. The origional is garbage. We can cheaply update the references to it because we have a pointer in the object that points to it..
Furthermore, we can scan that infant space quickly. Since each object in it contains a pointer to the single word in the any other object that is supposed to point to it, we can see if that pointer still points to it, and remove it if not.
No mark, no sweep, no 'following pointers'
Whether its worth it, or how much it will save... Profiling will have to tell.
--
There are other ideas I've kicked around.
Scott
Scott A Crosby wrote:
Plans: Any further proposals for image format affecting changes are needed as soon as possible; this is not something we want to do more than once.
I think we may need to do it more than once.. How about two iterations? VM4, where we improve the VM, and VM5 where we redo ObjectMemory?
Err, note the _image_ change bit here. Improving the VM, even changing quite extensively the GC system do not neccessarily require changing the object model. For VW we added mulit-segment oldSpace and permSpace without any change to the object model or image incompatability. For the Active Book I added rommable image capability and multi-space capability without any object model change.
One really nice idea I've seen is to use a seperate scheme for temporary objects.. Something similar to:
Each object has a 'single reference' bit, and an extra header word pointing to the reference. (The single-reference bit can be imlicit. We can reserve a percentage of the memory range and assume that all bits in that range *are* single-reference.).
If an object is stored in more than one place, the object is cloned and a *copy* is put into youngspace. The origional is garbage. We can cheaply update the references to it because we have a pointer in the object that points to it..
Furthermore, we can scan that infant space quickly. Since each object in it contains a pointer to the single word in the any other object that is supposed to point to it, we can see if that pointer still points to it, and remove it if not.
This is merely a restricted version of generation scavenging with extra costs. Temp variable are typically taken care of quite nicely by the disappearance of the context. It would cost a further range check on every store, plus the extra word on every 'infant' object.
tim
On Sat, 2 Feb 2002, Tim Rowledge wrote:
Scott A Crosby wrote:
Plans: Any further proposals for image format affecting changes are needed as soon as possible; this is not something we want to do more than once.
I think we may need to do it more than once.. How about two iterations? VM4, where we improve the VM, and VM5 where we redo ObjectMemory?
Err, note the _image_ change bit here. Improving the VM, even changing quite extensively the GC system do not neccessarily require changing the object model. For VW we added mulit-segment oldSpace and permSpace without any change to the object model or image incompatability. For the Active Book I added rommable image capability and multi-space capability without any object model change.
I was thinking of GC strategies that might require an extra header bit, for example, 'this object has a single reference', and a back-pointer to that reference. If thats frequently true in the image. (Have to test), then relocating and checking the liveness of objects is a lot cheaper, a linear scan through memory. (In the stock image, the impact of those back pointers is bounded at <6%)
One really nice idea I've seen is to use a seperate scheme for temporary objects.. Something similar to:
Each object has a 'single reference' bit, and an extra header word pointing to the reference. (The single-reference bit can be imlicit. We can reserve a percentage of the memory range and assume that all bits in that range *are* single-reference.).
If an object is stored in more than one place, the object is cloned and a *copy* is put into youngspace. The origional is garbage. We can cheaply update the references to it because we have a pointer in the object that points to it..
Furthermore, we can scan that infant space quickly. Since each object in it contains a pointer to the single word in the any other object that is supposed to point to it, we can see if that pointer still points to it, and remove it if not.
This is merely a restricted version of generation scavenging with extra costs. Temp variable are typically taken care of quite nicely by the disappearance of the context. It would cost a further range check on every store, plus the extra word on every 'infant' object.
I was thinking of ideas from the literature that are faster than a GC for stack and other quick temporaries. For example, in this proposal one would not have to scan youngspace at all, just scan the current stack.
A variant of this, would be to make the infant space copy-collected, and make a seperate youngspace thats copycollected. Oldspace is mark&sweep.
Having a seperate space for all objects that only have one reference is also sorta cute. It can be copy-collected cheaply in one pass.
I'm just brainstorming random ideas, because given my tentative benchmarks,[*] and guessing the impact of BC on them, the GC is going to moreso become a bottleneck.
Scott
[*] They show that other than primitives and interpreter overhead, MethodContext and GC are the largest identifiable issues in squeak. MethodContext's are supposed to be a lot cheaper with BC, so that leaves the GC.
Martin McClure wrote:
I do have an image format enhancement I've been considering... and it seems now is the time.
Here's a very brief proposal. I can fill in details later, but I want let you know right away the basic idea.
I'd like to have a bit in the object header for the purpose of marking an object immutable.
Yay! I've always wanted this as well, for exactly the same reasons. Persistency frameworks are usually brittle in the area of change detection (I've seen GemStone, Versant and TOPLink, and they all would have benefitted from this).
Cheers, Hans-Martin
-----Original Message----- From: squeak-dev-admin@lists.squeakfoundation.org [mailto:squeak-dev- admin@lists.squeakfoundation.org] On Behalf Of Martin McClure Sent: Friday, February 01, 2002 5:05 PM To: squeak-dev@lists.squeakfoundation.org; Tim Rowledge Subject: Image format proposals... Re: [SqF]Report of VI4 Project for
Feb
'02
At 3:15 PM -0800 2/1/02, Tim Rowledge wrote:
This is the initial report for the VI4 Project group.
Goal: This project is intended to gather together all the vm and vi related changes that have been proposed and which require a different image format.
[...]
Plans: Any further proposals for image format affecting changes are needed as soon as possible; this is not something we want to do more than once.
I do have an image format enhancement I've been considering... and it seems now is the time.
Here's a very brief proposal. I can fill in details later, but I want let you know right away the basic idea.
I'd like to have a bit in the object header for the purpose of marking an object immutable.
Impact: I haven't seen the other proposals that affect the header, so I can't yet say what if anything I'm proposing to move out of the header to make room.
Semantics: New objects are created with the immutability bit cleared. There are primitives for setting and clearing the bit. If the bit is set and an attempt is made to alter any of the instance variables of the object, the VM takes some special action instead (most likely, an exception is raised). There are higher-level facilities that invoke the setting and clearing primitives, and for dealing with exceptions.
Usefulness: There are two basic classes of uses for this capability.
- Preventing the modification of an object
- Detecting the modification of an object
I think this is a great idea. The AOS Platform VM's have all provided this in their object model and it has been invaluable over the years.
You might want to consider the following (getter/setter) forms:
a) basicIsImmutable b) basicIsResizeable c) basicIsLiteral
Internally in the AOS VM these are managed by three separate bit-flags.
a) m_isImmutable b) m_isResizeable c) m_hasVMFinalizer && m_isImmutable
There are MOP methods that ignore these flags. Standard methods honor these properties and generate an exception when a given constraint is violated.
When an object is constructed, the flags are set en-masse from the classes instance-prototype flags field (a subrange of the behavior's md_flags). This makes the constructor (instantiation) process very efficient.
Other flags have proven to be very useful in the AOS platform VM's, but they would be a separate discussion.
I do not know how much of this is applicable to squeak, the AOS Platform supports dynamically morphable headers for objects. The AOS VM architecture is written in c++ and provides a pluggable framework for modularly defining new object header formats. At runtime, this means that a given object may change its header format at anytime to support an extensible set of features efficiently. I.e., At any given time within a running system there are objects with different header forms.
Some objects just shouldn't be modified -- Symbols, method literals,
etc.
Others you can modify, but you want to know when it happens. This is really useful in implementing object synchronization when the same logical object has a physical representation in more than one space.
It's good for distributed object systems, object persistence systems (my particular interest), object-relational persistence frameworks, and the like.
Based on my experience with providing this in AOS Platform VM's, you will want some additional basic property flags for distributed behavior facilities. m_delegateSlotReads m_delegateSlotWrites
In addition, you will want to lay out the bit-flag fields carefully so that you can efficiently test them en-masse. Similarly, you want them layed out to enable efficient endian-neutralized operations for getter/setter methods to retrieve various groups of header flags. This is very important for packaging and module building.
You will probably also want to introduce a new kind of behavior called <ManagedBehavior> so that you can really efficiently detect and deal with arbitrary message delegation with no performance penalties by leveraging instance-specific behavior facilities using ManagedBehaviors. In effect, this uses the objects behavior as a flag in and of itself.
-- Dave S. [www.smallscript.org]
Alternatives: The alternatives for detecting object modifications are really awkward; this is a much easier way to do it.
Well, that's the barebones version. Let me know what you think... (not that this group is usually afraid to speak up :-)
-Martin
Tim,
iii) Two-bit oop tags Hans Martin has suggested moving to two tag bits on OOPS in order to allow more immediate classes. Currently the only really serious candidate class is Character, but it does have advantages relating to unicode and internationalisation. Other possibilities include short floats, immediate small points and err.... well.
I'm against this modification. Reasons:
* Decreased SmallInteger range. In many media-type applications (sound, graphics) SmallIntegers are in 32 bit range. In such algorithms we often carefully optimize for integer computations to stay in SmallInteger range in order to be efficient. Falling back to LargeInts decreases efficiency by roughly two orders of magnitude. Decreasing SmallInteger range means that these computations get even less efficient and harder to make efficient.
* Other classes are not "worthy" of being immediate. Unicode characters can be created by the equivalent means of the current Character implementation, e.g., by providing a character table (which could even be segmented in order to save space and augmented with addl. primitives for fast creation). Immediate floats are useless for all numerical operations anyways (since the accuracy is to low) and for large scale floating point manipulation the set of operations in float arrays can be used which (at that scale) will be even more efficent than any immediate float representation. The space reduction due to the use of short points appears to be too small to be of practical importance.
* Increased complexity in VM. In order to become efficient, any new immediate class would require wide-ranging modification in the areas of bytecodes, primitives, method lookups and garbage collection. An additional immediate will require new and possibly more complicated tests to be performed in various time-critical areas and may even decrease overall performance.
* Change of primitive interface. A modification of the tag bits will require to review and possibly rewrite all plugins many of which rely on the fact that various functions (like pushInteger: storeInteger:ofObject:withValue:) use 31 bit integers. All of them may break in unexpected ways when an additional tag bit is added. In addition, all plugins using the new immediate need to be rewritten in order to access it correctly.
In short, I vote against this proposal because I have yet to see a compelling reason for the advantage of whatever the new immediate class would be compared to the combination of added complexity and possible negative side effects, in particular those related to VM and plugin interface modifications.
Cheers, - Andreas
Andreas Raab wrote:
Tim,
iii) Two-bit oop tags Hans Martin has suggested moving to two tag bits on OOPS in order to allow more immediate classes. Currently the only really serious candidate class is Character, but it does have advantages relating to unicode and internationalisation. Other possibilities include short floats, immediate small points and err.... well.
I'm against this modification. Reasons:
- Decreased SmallInteger range. In many media-type applications (sound,
graphics) SmallIntegers are in 32 bit range. In such algorithms we often carefully optimize for integer computations to stay in SmallInteger range in order to be efficient. Falling back to LargeInts decreases efficiency by roughly two orders of magnitude. Decreasing SmallInteger range means that these computations get even less efficient and harder to make efficient.
It would be a decrease from 31-bit SmallInts to 30-bit SmallInts. I'd like to know whether the difference makes a difference in practice. If you've already coded carefully to avoid the 32-bit range, you've quite likely avoided the 31-bit range as well. But I did not analyze it much.
- Other classes are not "worthy" of being immediate. Unicode characters
can be created by the equivalent means of the current Character implementation, e.g., by providing a character table (which could even be segmented in order to save space and augmented with addl. primitives for fast creation).
You're probably right here, I did not measure effects.
Immediate floats are useless for all numerical operations anyways (since the accuracy is to low) and for large scale floating point manipulation the set of operations in float arrays can be used which (at that scale) will be even more efficent than any immediate float representation.
Immediate floats are nonsense, you're right. I just mentioned them when Tim pressed me for examples :-)
The space reduction due to the use of short points appears to be too small to be of practical importance.
Space-wise, you might be right. But I do think that 2D graphics operations as they happen a lot in morphic might benefit performance-wise.
An application of Immediate classes for performance might be to make nil, true and false immediate. This doesn't gain any space, but all operations now checking against the value of nilOop etc. could check against small constants.
- Increased complexity in VM. In order to become efficient, any new
immediate class would require wide-ranging modification in the areas of bytecodes, primitives, method lookups and garbage collection. An additional immediate will require new and possibly more complicated tests to be performed in various time-critical areas and may even decrease overall performance.
Nope. The changes to the VM are pretty moderate. I don't know why you think that GC and method lookups would even be affected. The arithmetic bytecodes would include support for immediate points, and String accessing methods would create immediate characters, but that's about it. The critical checks for immediateness and SmallIntegerness are not affected at all.
- Change of primitive interface. A modification of the tag bits will
require to review and possibly rewrite all plugins many of which rely on the fact that various functions (like pushInteger: storeInteger:ofObject:withValue:) use 31 bit integers. All of them may break in unexpected ways when an additional tag bit is added. In addition, all plugins using the new immediate need to be rewritten in order to access it correctly.
Can't comment on that. You're right that plugins will be incompatible and would have to be recompiled at least. I don't know whether there really is a need to rewrite.
Cheers, Hans-Martin
Hans-Martin,
- Decreased SmallInteger range.
It would be a decrease from 31-bit SmallInts to 30-bit SmallInts. I'd like to know whether the difference makes a difference in practice.
I've been running into various cases where it really does matter (such as doing operations on 32bit RGBA values) but considering Tim's proposal about only one additional immediate this problem appears to be solvable.
The space reduction due to the use of short points appears to be too small to be of practical importance.
Space-wise, you might be right. But I do think that 2D graphics operations as they happen a lot in morphic might benefit performance-wise.
That I would like to see first. According to my measures we're spending most of the time in a moderately complex morphic world with drawing primitives (in particular text).
BTW, here's one _for_ immediate short points: The place where I could see them most useful is for vector graphics. In other words, for stuff like flash graphics etc. they would be highly useful considering the amounts of points we're dealing with.
An application of Immediate classes for performance might be to make nil, true and false immediate. This doesn't gain any space, but all operations now checking against the value of nilOop etc. could check against small constants.
Again, I'd like to see some evidence first.
- Increased complexity in VM.
Nope. The changes to the VM are pretty moderate. I don't know why you think that GC and method lookups would even be affected. The arithmetic bytecodes would include support for immediate points, and String accessing methods would create immediate characters, but that's about it. The critical checks for immediateness and SmallIntegerness are not affected at all.
The complexity increases at least to the point that we'd need to have methods like #isImmediate (returning true if oop is an immediate), #isInteger, #isCharacter, and #isShortPoint. Which means that, for instance, #fetchClassOf: will be affected (and therefore indirectly message lookup) and most likely various other places as well. I'm not saying that its hard to do but to me, all these additional checks do sound like an increase in complexity.
- Change of primitive interface.
Can't comment on that. You're right that plugins will be incompatible and would have to be recompiled at least. I don't know whether there really is a need to rewrite.
This one I'm kinda worried about because it's so simple to overlook that some primitive might return a 31bit integer under some weird circumstances.
Cheers, - Andreas
Andreas Raab wrote:
iii) Two-bit oop tags
I'm against this modification. Reasons:
I'm not strongly opined one way or the other myself. On the one hand, immediate chars works well for VW, on the other, it does indeed mean quite a bit of work.
- Decreased SmallInteger range.
Not neccesarily. One could use *1 -> SmallInt 10 -> other immed, other bits provide details (unless only immed chars is used) 00 -> oop
No loss of precision caused.
In short, I vote against this proposal because I have yet to see a compelling reason for the advantage of whatever the new immediate class would be compared to the combination of added complexity and possible negative side effects, in particular those related to VM and plugin interface modifications.
I agree that there is not yet a compelling reason. That's why it is on the list to consider on what might be the last practical opportunity to give it due attention.
Other opinions? HMM, do you to present the case _for_ twobit tags?
tim
On Sat, 2 Feb 2002, Tim Rowledge wrote:
Andreas Raab wrote:
iii) Two-bit oop tags
I'm against this modification. Reasons:
I'm not strongly opined one way or the other myself. On the one hand, immediate chars works well for VW, on the other, it does indeed mean quite a bit of work.
I'm in favor of further, serious study. I take it that once you let them in, it's (relatively) easy to experiment with them? I.e., if it turns out immediate points sorta suck, we're not horrible bound to the first choices we made? If so, I lean more toward including them.
[snip]
I agree that there is not yet a compelling reason. That's why it is on the list to consider on what might be the last practical opportunity to give it due attention.
Other opinions? HMM, do you to present the case _for_ twobit tags?
I point to an earlier discussion:
http://www.create.ucsb.edu/squeak/9703.html#Letter11
and quote:
-------------
Another idea is to make nil, true, false immediate. This could possibly speed up some operations in the VM since they would deal with constants instead of variables. Probably the speedup isn't worth it, though.
Hmmm, aren't nil, false, true known to the VM anyhow as hard coded pointer values? What would be the benefit of immediate classes here?
They are not hard coded, but depend on the position of the image in memory. This means that whenever the VM has to use one of these values, it has to load it from a global variable in which they were stored at image startup. One place where it hurts is the conditional jump: the object on top of stack has to be compared with true or false first (depending on the kind of conditional jump), and if it's not equal, it has to be compared with the other one, too, to detect the case where a non-boolean is the receiver of ifTrue:ifFalse:. This makes 2 memory accesses. Having well-known immediate values both avoids the memory access and opens up the possibility for the compiler to use 'quick' instructions that deal with 8-bit or 16-bit immediate operands. ----------
I also point to the page about HHM's 2bit tag stuff:
http://www.heeg.de/~hmm/squeak/2tagbits/
I point to some VisualWorks stuff:
http://wiki.cs.uiuc.edu/VisualWorks/Immediate+Selectors
Hmm. How compelling *is* the multibyte Character case? The space savings isn't great, yes? It seems it would come down to issues about building character refs in #commonAt: and maybe Character comparisons?
Cheers, Bijan Parsia.
Self uses two bits for its tags: oops, small integers, floats and headers.
30 bit floating point numbers are probably not a good idea. They make people have to deal with rounding errors in nearly all applications while larger representations will let them get by in blissful ignorance most of the time.
The header tags speed up memory scanning operations. Instead of
for each object for each word in object is it what we are looking for?
you have
for each word in memory is it what we are looking for? back up to previous header
Personally, I would keep the current 1 bit scheme. But if you think faster scanning is worth it, then Tim's 1, 00 and 10 encoding would be nice. They could be small int, oop and header respectively.
-- Jecel
The AOS Platform (SmallScript, etc) uses two tag bits as follows:
OOP SmallInteger Character [other types like RGBColor could be here] WeakId/VarRef/ContextPtr/ArgList
-- Dave S. [SmallScript LLC]
SmallScript for the AOS & .NET Platforms David.Simmons@SmallScript.com | http://www.smallscript.org
-----Original Message----- From: squeak-dev-admin@lists.squeakfoundation.org [mailto:squeak-dev- admin@lists.squeakfoundation.org] On Behalf Of Jecel Assumpcao Jr Sent: Sunday, February 03, 2002 12:03 AM To: squeak-dev@lists.squeakfoundation.org Subject: Two-bit oop tags (was: [SqF]Report of VI4 Project for Feb
'02)
Self uses two bits for its tags: oops, small integers, floats and headers.
30 bit floating point numbers are probably not a good idea. They make people have to deal with rounding errors in nearly all applications while larger representations will let them get by in blissful
ignorance
most of the time.
The header tags speed up memory scanning operations. Instead of
for each object for each word in object is it what we are looking for?
you have
for each word in memory is it what we are looking for? back up to previous header
Personally, I would keep the current 1 bit scheme. But if you think faster scanning is worth it, then Tim's 1, 00 and 10 encoding would be nice. They could be small int, oop and header respectively.
-- Jecel
Tim Rowledge wrote:
Andreas Raab wrote:
- Decreased SmallInteger range.
Not neccesarily. One could use *1 -> SmallInt 10 -> other immed, other bits provide details (unless only immed chars is used) 00 -> oop
That does not work sorry, since the 10 combo is used as a sentry value by the GC to detect where the object header starts.
No loss of precision caused.
In short, I vote against this proposal because I have yet to see a compelling reason for the advantage of whatever the new immediate class would be compared to the combination of added complexity and possible negative side effects, in particular those related to VM and plugin interface modifications.
I agree that there is not yet a compelling reason. That's why it is on the list to consider on what might be the last practical opportunity to give it due attention.
Other opinions? HMM, do you to present the case _for_ twobit tags?
Of course :-) In another mail I wrote some things about it. However, I've been a happy Squeaker with 1 tag bit for a long time, and other VM changes such as block closures etc. have much higher priority for me, too.
Cheers, Hans-Martin
At 3:15 PM -0800 2/1/02, Tim Rowledge wrote:
This is the initial report for the VI4 Project group.
Goal: This project is intended to gather together all the vm and vi related changes that have been proposed and which require a different image format.
One more possible change: It was brought up a few months back that the number of object header bits devoted to identity hash (10 or 11 bits, IIRC) wasn't enough to allow efficient hashed identity collections that were very large. It would be nice to have more bits for this purpose. On the other hand, I don't really want to increase the footprint for small-memory systems, and I don't want to give up image portability either, so I'm not sure I really want to propose making the header larger.
But we should see if there's anything clever we can do in this area.
Also, as was discussed back then, it would improve things to make the identity hash reported by the VM have a larger range than 0-1023, even if it can only have 1024 different values. I think I proposed multiplying the current value by 65537, a prime number of roughly the right magnitude. This wouldn't require an image format change as such, but it would require rehashing all existing hashed identity collections, including method dictionaries. (I haven't rechecked my facts, this is all from memory, so please correct me if I've got any of the details wrong.)
There may have been more discussion of this within the last three months that I have missed; I've been away from the list until recently.
-Martin
On Fri, 1 Feb 2002, Martin McClure wrote:
At 3:15 PM -0800 2/1/02, Tim Rowledge wrote:
This is the initial report for the VI4 Project group.
Goal: This project is intended to gather together all the vm and vi related changes that have been proposed and which require a different image format.
One more possible change: It was brought up a few months back that the number of object header bits devoted to identity hash (10 or 11 bits, IIRC) wasn't enough to allow efficient hashed identity collections that were very large. It would be nice to have more bits
No, those bits are misued. In essence, large collections ignore them entirely. If they are used, the Identity** collections can scale to tens of thousands or more elements without a severe degradation. (Basically, using Identity* will be 4000x faster than a linear scan.)
Performance degradation will continue to occur, but I push the limit from 10,000 to 500,000-2,000,000.
Also, as was discussed back then, it would improve things to make the identity hash reported by the VM have a larger range than 0-1023, even if it can only have 1024 different values. I think I proposed multiplying the current value by 65537, a prime number of roughly the right magnitude. This wouldn't require an image format change as such, but it would require rehashing all existing hashed identity
Correct, my patch does that. There's a small potential robustness problem in a particular edge case, but it works. New primitives should be made though to do the multiplication in software.
collections, including method dictionaries. (I haven't rechecked my facts, this is all from memory, so please correct me if I've got any of the details wrong.)
No, I preserve the old (flawed) behavior for method dictionaries, rather than rebuilding a new image. True, this means that MethodDictionaries should not be scaled beyond, say, 2000 or so methods. I don't consider this a real problem.
Scott
At 9:42 PM -0500 2/1/02, Scott A Crosby wrote:
One more possible change: It was brought up a few months back that the number of object header bits devoted to identity hash (10 or 11 bits, IIRC) wasn't enough to allow efficient hashed identity collections that were very large. It would be nice to have more bits
No, those bits are misued. In essence, large collections ignore them entirely. If they are used, the Identity** collections can scale to tens of thousands or more elements without a severe degradation. (Basically, using Identity* will be 4000x faster than a linear scan.)
Performance degradation will continue to occur, but I push the limit from 10,000 to 500,000-2,000,000.
Maybe that's big enough for Squeak, I don't know. I guess I'm used to GemStone, where we didn't consider collections scalable unless they performed well into the hundreds of millions of objects. :-)
And more hash bits in the header would still improve performance. A collection of 2M objects in a system where there are only 1K unique hash values means that, on average, there are 2K objects with the same hash value as the one you're trying to access. This means a lot of linear probing to find the one you want. I do realize that your changes make this a few orders of magnitude better than it was, but it's still not great.
The cost of putting more hash bits in the header must also be considered, of course, which is why I'm not unconditionally advocating we make the header bigger to do this.
Also, as was discussed back then, it would improve things to make the identity hash reported by the VM have a larger range than 0-1023, even if it can only have 1024 different values. I think I proposed multiplying the current value by 65537, a prime number of roughly the right magnitude. This wouldn't require an image format change as such, but it would require rehashing all existing hashed identity
Correct, my patch does that. There's a small potential robustness problem in a particular edge case, but it works. New primitives should be made though to do the multiplication in software.
collections, including method dictionaries. (I haven't rechecked my facts, this is all from memory, so please correct me if I've got any of the details wrong.)
No, I preserve the old (flawed) behavior for method dictionaries, rather than rebuilding a new image. True, this means that MethodDictionaries should not be scaled beyond, say, 2000 or so methods. I don't consider this a real problem.
No, that's not a real problem, but it seemed to me that if the image format is changing that might give us an opportunity to use the cleaner solution of doing it the same way everywhere. This I *am* currently proposing we do.
-Martin
Scott A Crosby wrote:
collections, including method dictionaries. (I haven't rechecked my facts, this is all from memory, so please correct me if I've got any of the details wrong.)
No, I preserve the old (flawed) behavior for method dictionaries, rather than rebuilding a new image. True, this means that MethodDictionaries should not be scaled beyond, say, 2000 or so methods. I don't consider this a real problem.
Scott
For simplicity reasons they ought to be changed too, to keep everything the same. Do you have any benchmarks for such a change?
Also, changing the image format or rehashing all sets or any such should not be considered a problem with a new image format being instated anyway, so we ought to disregard such issues and just go for the best solution.
Henrik
PS. My thought was also that the two bits change should have a solid motivation. Even more so given the clear (yet not severe) disadvantages. "It might perhaps sometime become useful" is not enough IMHO.
On Sat, 2 Feb 2002, Henrik Gedenryd wrote:
Scott A Crosby wrote:
collections, including method dictionaries. (I haven't rechecked my facts, this is all from memory, so please correct me if I've got any of the details wrong.)
No, I preserve the old (flawed) behavior for method dictionaries, rather than rebuilding a new image. True, this means that MethodDictionaries should not be scaled beyond, say, 2000 or so methods. I don't consider this a real problem.
Scott
For simplicity reasons they ought to be changed too, to keep everything the same. Do you have any benchmarks for such a change?
No, doing that change requires both a VM change and a smalltalk-level change, Rebuilding a new incompatible image in the process. The gain is marginal. No MethodDictionary is 1/3 the size that would start to expose this problem.
Also, changing the image format or rehashing all sets or any such should not be considered a problem with a new image format being instated anyway, so we ought to disregard such issues and just go for the best solution.
My earlier change is small-talk only, and leads to no image incompatibility.
BTW<, using 65537 as the multiplier is a *BAD* idea.
What are: 1*65537 \ 4096 2*65537 \ 4096 3*65537 \ 4096 4*65537 \ 4096 5*65537 \ 4096 ....
And why is it bad? :)
PS. My thought was also that the two bits change should have a solid motivation. Even more so given the clear (yet not severe) disadvantages. "It might perhaps sometime become useful" is not enough IMHO.
I want to hear the rationale for it first. Does it save space? How much? Is the performance degradation minor?
I'm beginning to think the future should be a completely new ObjectMemory..
Scott
Scott A Crosby wrote:
No, doing that change requires both a VM change and a smalltalk-level change, Rebuilding a new incompatible image in the process. The gain is marginal. No MethodDictionary is 1/3 the size that would start to expose this problem.
Also, changing the image format or rehashing all sets or any such should not be considered a problem with a new image format being instated anyway, so we ought to disregard such issues and just go for the best solution.
My earlier change is small-talk only, and leads to no image incompatibility.
Completely irrelevant since the entire focus of this project is to produce a system that has an image format change!
At 4:36 PM -0500 2/2/02, Scott A Crosby wrote: [...]
BTW<, using 65537 as the multiplier is a *BAD* idea.
What are: 1*65537 \ 4096 2*65537 \ 4096 3*65537 \ 4096 4*65537 \ 4096 5*65537 \ 4096 ....
And why is it bad? :)
Well, this example has an ideal distribution for the hash bits range of 0..4095, every slot is filled exactly once.
However, if there really had only been 10 or 11 hash bits as I misremembered the other day (there are actually 12 hash bits) then this example would indeed point out a serious flaw in 65537 as a multiplier.
For a collection whose array was of size, say, 16384, all objects would hash to the first 4K slots. Yes, very bad. Thanks for pointing this out.
The reason is that using 65537 as a factor always yields hash values that have bits 12-15 zero.
Scott, last time I looked you were using a factor of 4097. Are you still using this factor?
If we stay with 12 bits in the header I don't have a better suggestion than a factor of 4097 right now, but it might be worth looking to see if we can find a better algorithm for spreading identity hash values.
Things that make 4097 less than ideal (though neither of these may be a big problem): * Produces hash values in the range (0 to: (2 raisedTo: 24) - 1); 24 bits. The ideal range is the entire SmallInteger range, 31 bits (though in either case we can only have 4096 values spread sparsely throughout the range, unless we add more header bits.) * All hash values are divisible by 17 and by 241, which makes for bad performance for any collection whose array size happens to be a multiple of either value. I don't have Scott's patch loaded up to run the tests, but I expect that benchmarks done on an 'IdentitySet new: 1445' and an 'IdentitySet new: 1446' would give quite different results.
-Martin
On Sat, 2 Feb 2002, Martin McClure wrote:
At 4:36 PM -0500 2/2/02, Scott A Crosby wrote: [...]
The reason is that using 65537 as a factor always yields hash values that have bits 12-15 zero.
... when computed modulo 2^k k>12
Scott, last time I looked you were using a factor of 4097. Are you still using this factor?
No. 200thousand-odd something.
Just pick a prime number p such that it* the hash value is large, doesn't overflow, and p-1 doesn't have many small or obviously bad factors.
Things that make 4097 less than ideal (though neither of these may be a big problem):
- Produces hash values in the range (0 to: (2 raisedTo: 24) - 1); 24
bits. The ideal range is the entire SmallInteger range, 31 bits (though in either case we can only have 4096 values spread sparsely throughout the range, unless we add more header bits.)
You want them evenly distributed over the size of the collection. If the largest collection you think you'll see is 1m, then a prime in the 300's would work OK. (Except for the problem where the collection array size and the prime have a common factor.) Ergo, a nice big beefy prime number thats unlikely to divide into the collection size by accident is a better choice.
Scott.
Is there any value in making the putative hash-multiplier be dependant upon the size of the collection into which the object is being put? The downside that immediately pops up is that this value would then change as you grow the collection which would presumably require moving all the previously placed objects; except of course when you grow a collection you have to copy all the entries to the new array anyway, at least in most cases.
This could be implemented by adding an instvar to all collections, though I fear that would be rather wasteful since so many collections are small enough that the multiplier would be 1. If a single list of values would be good (ie all varieties of collection can use the same values) then a trivial prim could take the array size, the object hash and do the entire lookup/algorithm - multiply - modulo operation.
tim
On Mon, 4 Feb 2002, Tim Rowledge wrote:
Is there any value in making the putative hash-multiplier be dependant upon the size of the collection into which the object is being put?
There is a *small* reason for this. Let the multiplier be $p$. If the size of the array is a multiple of $p$, that would be very very bad. A more subtle problem occurs if the size of the array has many factors in common with $p-1$/$p-1$, but that is no worse the performance we get as-is now.
So, let $p$ be a big beefy 6 digit prime that nobody will hit by mistake. (Hell, we'll make sure the hash growth code explicitly disallows any multiple of this number.)
If we dedicate 16-20 bits to hashBits, then a different solution then this will be wanted. (some non-linear function.)
I'd say to keep the multiplier fixed for the reasons below. My most recent version has some random 6 digit prime, but if you don't like it, I can give another prime.
This could be implemented by adding an instvar to all collections, though I fear that would be rather wasteful since so many collections are small enough that the multiplier would be 1. If a single list of values would be good (ie all varieties of collection can use the same values) then a trivial prim could take the array size, the object hash and do the entire lookup/algorithm - multiply - modulo operation.
Scott
At 4:18 PM -0800 2/4/02, Tim Rowledge wrote:
Is there any value in making the putative hash-multiplier be dependant upon the size of the collection into which the object is being put? The downside that immediately pops up is that this value would then change as you grow the collection which would presumably require moving all the previously placed objects; except of course when you grow a collection you have to copy all the entries to the new array anyway, at least in most cases.
This could be implemented by adding an instvar to all collections, though I fear that would be rather wasteful since so many collections are small enough that the multiplier would be 1. If a single list of values would be good (ie all varieties of collection can use the same values) then a trivial prim could take the array size, the object hash and do the entire lookup/algorithm - multiply - modulo operation.
Hmm. Possible, but I don't find it attractive. The main reasons, at least initially, are aesthetic.
Looking at the overall hashed collection design, it feels a lot more like what's broken is identityHash, and that the collections are fine. Thus, it feels more 'right' to fix identityHash.
It also looks like if we choose a multiplier (or other simple algorithm) for producing a hash value from the header hash bits, and we choose it *very* carefully, it will work for almost all collection array sizes. The 'almost' could lead to a slightly uglier design, having the collections avoid certain sizes (as Scott mentioned in his reply.)
Another, more technical and less aesthetic, argument against fixing it in the collections is that many collections are based on #hash, not on #identityHash, but since the default implementation of #hash is #identityHash the collection can't tell which objects it contains would need to have their hash values multiplied. You don't want to multiply other hash values, because if the implementers of that object's #hash have been good the hash value spans the entire SmallInteger range, and you'd end up pushing the computation into large integer math, which would be a slowdown that is unnecessary in this case.
-Martin
Martin McClure wrote:
One more possible change: It was brought up a few months back that the number of object header bits devoted to identity hash (10 or 11 bits, IIRC) wasn't enough to allow efficient hashed identity collections that were very large. It would be nice to have more bits for this purpose. On the other hand, I don't really want to increase the footprint for small-memory systems, and I don't want to give up image portability either, so I'm not sure I really want to propose making the header larger.
Improving the hash would be well worthwhile, especially if it can be done for little cost. I don't think that a change that requires adding more header size would be sensible.
Changing the hash prim to multiply the value from the header is certainly plausible, as would be making it look up a 'better' value via a table (4kbyers would cover all 1k curent hash indices; would a table of 1k prime numbers be better? Or 1k spread out not-quite simple multiples?)
As you pointed out, strictly speaking this does not require an image format change, but I guess it conveniently fits in the general area of work.
tim
On Fri, 1 Feb 2002, Tim Rowledge wrote:
iv) cleanup some header fields
What sorts of header field cleanups are under consideration?
My identityhash patches do a little refactor of the smalltalk-side of things as far as the hashbits access.
Plans: Any further proposals for image format affecting changes are needed as soon as possible; this is not something we want to do more than once.
Looking at an STSpace output...
About 310k objects in my image, 240k are stored compactly, saving about a megabyte (6%) of the image size, but costing 5 bits in each header, and an extra branch&table lookup on every findClassOf reference.
So, 6-7% space savings for 5 bits in the object header. Are they good/bad/ugly.
Just a brainstorm, cause everyone seems to want header bits.
Scott
Scott A Crosby wrote:
On Fri, 1 Feb 2002, Tim Rowledge wrote:
...
Plans: Any further proposals for image format affecting changes are needed as soon as possible; this is not something we want to do more than once.
Looking at an STSpace output...
About 310k objects in my image, 240k are stored compactly, saving about a megabyte (6%) of the image size, but costing 5 bits in each header, and an extra branch&table lookup on every findClassOf reference.
So, 6-7% space savings for 5 bits in the object header. Are they good/bad/ugly.
Just a brainstorm, cause everyone seems to want header bits.
Scott
We could probably do with just 3 bits for compact class index, if we include the object's format bits into account. If you look at the #instSpec value of the 15 compact classes, you'll find that there is no instSpec value which has more than 4 compact classes. Here are the pairs class->instSpec:
(only fixed fields) Association->1 Point->1 Rectangle->1
(only indexable pointer fields) Array->2 TranslatedMethod->2
(fixed and indexable pointer fields) BlockContext->3 MethodContext->3 PseudoContext->3 MethodDictionary->3
(indexable word fields) Bitmap->6 Float->6
(indexable byte fields) String->8 Symbol->8 LargePositiveInteger->8
(CompiledMethods, will move to instSpec 3 with the new CM format) CompiledMethod->12
We'd use (ccIndex << 4 bitOr: instSpec) as the index into a 128-element array of compact classes.
This would result in 2 free header bits at the cost of some complication. Probably dropping compact classes altogether is easier indeed. But perhaps the 6-7% figure gets higher for severely stripped-down images for PDA usage, which might be an argument against dropping the compact classes?
Cheers, Hans-Martin
Scott A Crosby crosby@qwes.math.cmu.edu is claimed by the authorities to have written:
On Fri, 1 Feb 2002, Tim Rowledge wrote:
iv) cleanup some header fields
What sorts of header field cleanups are under consideration?
Sorry - I was a bit unclear here; I was thinking _method_ header bits, where the prim number was split into two fields. My favourite cleanup here is to cut back to <256 numbered prims (plus fake special prims 256-519 for the pushing of instvars) and rely on named prims for everything else. The two method header bits thus freed would do nicely as the exception marking bits mentioned in Interpreter>isUnwindMarked:
Looking at an STSpace output...
About 310k objects in my image, 240k are stored compactly, saving about a megabyte (6%) of the image size, but costing 5 bits in each header, and an extra branch&table lookup on every findClassOf reference.
So, 6-7% space savings for 5 bits in the object header. Are they good/bad/ugly.
We've never really evaluated the worth of the compact classes. On one hand they do save some space, which is not to be sneezed at - despite the fact that you can indeed get 128Mb of ram for stupidly low prices, this is only applicable to some machines. Some of us want to be able to use 2Mb images on 4Mb machines to do useful things. On the other hand, they do cost bits in the header that might be useful for other things, and they do cost some time in decoding.
Anyone wanting to evaluate the difference in performance and space can do so by looking at the 'private' prococol of Behavior - you can make all the currently compact classes non-compact without any vm changes. check the image size profile and performance before and after.
tim
On Mon, 4 Feb 2002, Tim Rowledge wrote:
Scott A Crosby crosby@qwes.math.cmu.edu is claimed by the authorities to have written:
Sorry - I was a bit unclear here; I was thinking _method_ header bits, where the prim number was split into two fields. My favourite cleanup here is to cut back to <256 numbered prims (plus fake special prims 256-519 for the pushing of instvars) and rely on named prims for everything else. The two method header bits thus freed would do nicely as the exception marking bits mentioned in Interpreter>isUnwindMarked:
Ok.. I've been looking at interpreter technology, seeing if there is anything that can be borrowed from FORTH implementations and used in squeak. That, if ever done, might add quite a few bytecodes.
Looking at an STSpace output...
About 310k objects in my image, 240k are stored compactly, saving about a megabyte (6%) of the image size, but costing 5 bits in each header, and an extra branch&table lookup on every findClassOf reference.
So, 6-7% space savings for 5 bits in the object header. Are they good/bad/ugly.
We've never really evaluated the worth of the compact classes.
On one hand they do save some space, which is not to be sneezed at - despite the fact that you can indeed get 128Mb of ram for stupidly low prices, this is only applicable to some machines. Some of us want to be able to use 2Mb images on 4Mb machines to do useful things.
On a basically stock #4600 image, there are 336k objects, of which 270k are compact. This means that they save 1mb, or about 7% of the 15mb image.
On the other hand, they do cost bits in the header that might be useful for other things, and they do cost some time in decoding.
Based on profiling the entire VM. 1.9% is spent reading the 5 compact class bits [*] (in commonSend) .8% is spent loading it if it is. (in commonSend) .2% on the conditional (in commonSend)
[*] We're probably paying cache misses here.
This is ignoring any GC effects. Thus, the performance cost appears to be 2-4%.
They also cost 5 bits of header. With the current image, the expected cost in bits/object dedicated to identifying the class is 9.86. The minimum with a simple index is 11 (1.8k classes in the default image).
A few days ago I proposed a new header that has 18 bits of size, 16 bits of class index, 16 bits of hash, 6 bits for image maintance, and 8 unused bits.
Anyone wanting to evaluate the difference in performance and space can do so by looking at the 'private' prococol of Behavior - you can make all the currently compact classes non-compact without any vm changes. check the image size profile and performance before and after.
I tried that, as the first part in building a VM with no compact classes at all; it doesn't work.
CompiledMethod BlockContext MethodContext
override basicNew to error; there may be a way to uncompact them, but the normal technique does not work. Removing the failure messages and trying again leads to VM failure.
Scott
Second report - Fri, 5th April 2002
The initial report sparked some interesting duscussions on multiple tag-bits (to allow multiple immediate object classes) and extra hash-bits in the object headers. The tag-bit changes were not generally considered particularly interesting but the hash-bits seem to have some value. Also discussed was an immutable-bit that would make objects read-only or at least read-trap. No conclusions have been reached yet.
The Hannan BlockClosure work seems to have settled down after the inevitable initial flurry of bug reports and now needs a careful review to evaluate for inclusion. It looks very promising so far.
No other progress to report this time. More soon with luck.
tim ChairSophont, VI4 project. Loop: See Loop.
squeak-dev@lists.squeakfoundation.org