Hello Squeak VM Guys,
My name is Louis LaBrunda. I use Instantiations VA Smalltalk but dabble with Squeak from time to time.
I have an outside-the-box way of implementing an object database for Smalltalk that I would like to see if there is anyone here who is interested in implementing. I understand the theory behind Smalltalk VMs (at least I think I do) but would require a large learning curve to actually modify one. This idea doesn't require the inventing or improving of any technology but it does require changes to the VM.
For the purpose of describing this idea, I will deal with only one database and not go into binding to the database and other details like transaction processing and such. These things are of course important but I think they can be handled in very much standard ways that should not be changed by this means of implementing the object database.
The idea is that the VM would treat the database file much like a CPU chip would treat RAM and would treat its (the VM) memory like a CPU chip would treat its internal (on-chip) cache. There would be a similar means of linking the data in memory to the data in the database as there is between linking a CPU chip's cache and RAM.
A I said, I'm not very knowledgeable of the internal working of Smalltalk VMs, so much of what I am about to say is guess work but I think it is accurate. Objects represented in the memory of a Smalltalk VM probably take up about 12 bytes or so for 32 bit systems, more for 64 bit systems. Much of these bytes are bits that define the class. Some of the bytes might be the value of the object if it is say a small integer or a byte or character. If the data (value) of the object is larger than will fit in a few bytes, there is a pointer to the data. If the object has instance variables that are of course other objects, there are pointers to them.
A bit would be needed to indicate a persisted object and probably another bit to indicate the object is dirty (changed and therefore doesn't match the database file copy). Objects with the persisted bit off would otherwise look and be treated the same as they are now. Objects with the persisted bit on would have all their pointers replaced with offsets from the beginning of the database file (a single file containing all the persisted objects. All objects pointed to by a persisted object must also be persisted objects.
When the VM comes across a persisted object it would use the pointers (that are now offsets within the database file) as keys into a lookup table (hash table) to find the real pointer to the data in memory. If the item is found in the lookup table the value is used as it would have been if it was in the object and all is the same. If the item is not found in the lookup table the offset into the database file is used to read the object from the database. The lookup table would then be updated to include the new item.
As far as I can tell the copies of the object in memory and in the database file can be identical (no object dumper/loader serialization). There may need to be a little bit of a wrapper in the database file but I don't think much. This should make for a very quick loading and saving of objects.
Probably some objects, like blocks of code can't or shouldn't be saved to the database (I'm not sure if this is true for Squeak). But I don't think that is any different than systems that use object dumper/loader serialization.
I think a low priority fork could run through the lookup table for objects with the dirty bit set and save them to the database file. A #persist (or some other good name) method could be added to #Object to force the saving of an object to the database. This would probably be implemented with a primitive but maybe not.
There may be some changes needed for garbage collection to keep the lookup table up to date but I don't think that will be a big deal. Hopefully garbage collection for the database file could be handled mostly by Smalltalk code with the help of a few primitives.
Well, that's it for now. I hope this has been an interesting read and not a waste of your time. If you think the idea has merit, let me know and we can discuss it further.
Thank you very much for your time.
Lou ----------------------------------------------------------- Louis LaBrunda Keystone Software Corp. SkypeMe callto://PhotonDemon mailto:Lou@Keystone-Software.com http://www.Keystone-Software.com