The vm I'm working with has a stack, using 16 bits to encode an integer or pointer to an object.
I'm adding a cache. Addresses on the stack will now be 1/2 as small in bit size, and point to a location in cache. The cache address will then contain the integer or object pointers. The VM I'm working with now does not implement this, and I would appreciate any pointers to relevant articles.
Thanks,
tom
Hi Tom,
The vm I'm working with has a stack, using 16 bits to encode an integer or pointer to an object.
I'm really puzzled why you are working with such a VM; restricting to 16 bit OOPS is really very ancient technology. Is this some old version of LittleSmalltalk or something? Why not use a more modern system; ie Squeak ?
I'm adding a cache. Addresses on the stack will now be 1/2 as small in bit size, and point to a location in cache. The cache address will then contain the integer or object pointers. The VM I'm working with now does not implement this, and I would appreciate any pointers to relevant articles.
I can't find any suitable references for you on this. On the face of it it is very likely to cause a noticable slowdown on almost any CPU I can imagine. You are going to add yet another indirection for every object reference, something that is one of the most crucial activities in a VM. I'm curious as to the reason you think this will help?
I didn't quite get a chance to look at all 10,000 pages of the code for the interpreter...
The code I looked at suggested that the the items being pushed onto the stack are not addresses which points a location in a cache? Is that correct?
In every VM I've ever written or studied, the stack contains oops, the same as any other object. To do otherwise would be to add an extra layer of cmoplication that could only confuse!
tim
tjazul@uni.edu wrote...
The vm I'm working with has a stack, using 16 bits to encode an integer or pointer to an object.
Tim Rowledge tim@sumeru.stanford.edu replied...
I'm really puzzled why you are working with such a VM; restricting to 16 bit OOPS is really very ancient technology. Is this some old version of LittleSmalltalk or something? Why not use a more modern system; ie Squeak ?
Hey, Tim, go easy on the guy. It saves space. The first *decade* of Squeak's lineage was 16-bit pointers.
But why not get adventurous and use 8 bits to encode a pointer or an integer. You'll need to use extended precision more often, but you'll have learned something by the time you're done.
Maybe you could blow it into an FPGA.
Just kidding.
Not
- D
In some of my testing for long file names and because of the rewrite of the macintosh file path logic I've discovered that the async file logic has a wee bug. In all other file interfaces you supply a full path name for the open logic primitive. This is done either directly by you or by the open logic ensuring you have a full path name.
But the async file logic it will accept a partial path and because the C code was different it then would create the file in the working directory. However this is wrong. This change set fixes the 'test' method that breaks when we revert the async file open logic to the same logic used by the non-async file logic, reducing the number of places paths are mangled is a key consideration in this work effort.
Existing applications that use the async file logic should still work because of course the primitive *is* documented as needing a full path name.
VM change to support large files if hosting OS supports large files
Mike Rutenberg mdrs@akasta.com send me a link to
A squeak MD5 is at http://akasta.com/downloads/Md5.cs
Written by Duane Maxwell
I'm not sure I saw it in the base image, but shouldn't it be there? A type 3 UUID for example needs that code and perhaps using the hash for random numbers might be better for type 4 UUIDs.
I'm not sure I saw it in the base image, but shouldn't it be there? A type 3 UUID for example needs that code and perhaps using the hash for random numbers might be better for type 4 UUIDs.
Squeak has DSA in the image. I dunno why DSA was chosen over MD5 -- if the uses of DSA can transfer over, still, then it would be nice to switch.
Otherwise, MD5 is certainly widely used, so having it handy would be nice. Modules are gonna be here any time, now, anyway, right? So the bloat won't hurt as much.
-Lex
Squeak has DSA in the image. I dunno why DSA was chosen over MD5 -- if the uses of DSA can transfer over, still, then it would be nice to switch.
There was no choice made - DSA/SHA was in the image long before I wrote the MD5 stuff. The former were written by SqC people apparently to support signing of code for security reasons. I did the latter because it was a straightforward extension of the existing SHA code (which is why it is so similar), and because exobox potentially needed it for various miscellaneous Internet protocols. Plus it's a handy thing to have around.
One reason I believe it has not made it into the image is that a request was made by SqC a few months back to try and refactor SHA and MD5 functionality to take advantage of their similarities, but neither I nor anyone else found the time to do so. I agree that it should probably be part of the image.
I also think that we should choose one of the extant XML parsers as well and have a known base from which to develop XML-based stuff (SOAP/XML-RPC, RSS, whatever). I will not be suicidal if the choice is not the one I wrote, though I would be willing to expend the necessary effort to address any weaknesses and/or extend it to fit the needs of the Squeak community.
On Mon, 26 Nov 2001, Duane Maxwell wrote: [snip]
I also think that we should choose one of the extant XML parsers as well and have a known base from which to develop XML-based stuff (SOAP/XML-RPC, RSS, whatever). I will not be suicidal if the choice is not the one I wrote, though I would be willing to expend the necessary effort to address any weaknesses and/or extend it to fit the needs of the Squeak community.
You're probably right.
Actually, the *parser* isn't nearly as important as what it generates, since most of the time you'll be working with the parse tree (or event stream, or...).
Let me put in a word for the VisualWorks one...it's probably the most complete. It would let us share work with the VisualWorks and the Camp Smalltalk communities (and with Cincom itself). Cincom is definitely all the way behind it (they're even putting together some (hork) XML Schema stuff). I'm pretty sure they're open sourcing all bits, so we have an XSLT processer, XPath support, etc.
OTOH, we could be generous with our parse tree representations and parsimoneous with our parser. IIRC, the VisualWorks code has enough support to make generating varient Parse Trees pretty easy.
For a lot of apps, directly generating the application objects is the way to go (instead of parsing to some XML heavy representation and then working off that). Perhaps we should work in some modest thin level SAX like layer first, and then build/port other stuff on top?
On the downside of the VW one...the name choices for the various parse nodes is...ugly. Hideous even. Actually all the names get pretty ugly because of the namespace munging. But the Cincom guys seem pretty open to feed back, so we could evolve it in partnership.
The other possibility, of course, is to work with the Cincom guys to make their XML extras (XSLT, etc.) more independant of the parser...I don't know.
Masashi Umezawa has some fixes and smoothings to the current port which I'll be, well, moving to when I get a chance :)
Anyhoo. blather blather.
Cheers, Bijan Parsia.
Bijan Parsia wrote:
On Mon, 26 Nov 2001, Duane Maxwell wrote: [snip]
I also think that we should choose one of the extant XML parsers as well and
You're probably right.
Right. ;-)
Let me put in a word for the VisualWorks one...it's probably the most complete. It would let us share work with the VisualWorks and the Camp
The reason I started the YAX project was that the VW port at that time didn't work and, at least in my perception, was ugly (not the fault of the porters!!) and huge.
At some point Duane's and my parser were actually pretty similar, but I must confess I would need to take a closer look at the current versions to see how much effort it would be to merge the implementations.
Maybe we should agree on a Squeak native (lean mean DTD skipping, SAX) parser and point to the VW port for applications that need the more complete implementation the VW port offers.
Michael
Maybe we should agree on a Squeak native (lean mean DTD skipping, SAX) parser and point to the VW port for applications that need the more complete implementation the VW port offers.
From an outsider's perspective, this seems like a really strange
strategy. Code size isn't a terribly big deal -- the thing the Squeak community is most constrained by is programmer time. How often does someone post about a really neat idea they have.... but that they don't have time to program? I'd much rather see SuperSwiki get connected to Jabber, for example, than see a custom XML library. :)
Besides, it would seem reasonable to at least *try* to divide the VW parser into subsets, so that you can load whichever parts you need.
-Lex
I agree with Lex...I like Duane's parser, however, I think that the VW parser will have a much higher likelyhood to be used, extended, supported, etc. XML itself is pretty mundane and it would seem a waste to expend a lot of effort on it when there is so much more to be done.
- Stephen
-----Original Message----- From: squeak-dev-admin@lists.squeakfoundation.org [mailto:squeak-dev-admin@lists.squeakfoundation.org] On Behalf Of Lex Spoon Sent: Tuesday, November 27, 2001 10:33 AM To: squeak-dev@lists.squeakfoundation.org Subject: Re: XML Parser choice (was Re: [ENH] ??? MD5 in Squeak.)
Maybe we should agree on a Squeak native (lean mean DTD
skipping, SAX)
parser and point to the VW port for applications that need the more complete implementation the VW port offers.
From an outsider's perspective, this seems like a really strange strategy. Code size isn't a terribly big deal -- the thing the Squeak community is most constrained by is programmer time. How often does someone post about a really neat idea they have.... but that they don't have time to program? I'd much rather see SuperSwiki get connected to Jabber, for example, than see a custom XML library. :)
Besides, it would seem reasonable to at least *try* to divide the VW parser into subsets, so that you can load whichever parts you need.
-Lex
How is the VW parser licensed?
On Tuesday, November 27, 2001, at 10:52 AM, Stephen Pair wrote:
I agree with Lex...I like Duane's parser, however, I think that the VW parser will have a much higher likelyhood to be used, extended, supported, etc. XML itself is pretty mundane and it would seem a waste to expend a lot of effort on it when there is so much more to be done.
It is licensed with the "ParcPlace Public License" (or "Cincom Public License" which I believe was derived pretty trivially from the Mozilla license (1.0, I believe). You can find it, among other places, at http://www.cincomsmalltalk.com:8080/CincomSmalltalkWiki/CPL
At 02:03 PM 11/27/2001 -0500, Andrew C. Greenberg wrote:
How is the VW parser licensed?
On Tuesday, November 27, 2001, at 10:52 AM, Stephen Pair wrote:
I agree with Lex...I like Duane's parser, however, I think that the VW parser will have a much higher likelyhood to be used, extended, supported, etc. XML itself is pretty mundane and it would seem a waste to expend a lot of effort on it when there is so much more to be done.
-- Alan Knight [|], Cincom Smalltalk Development knight@acm.org aknight@cincom.com http://www.cincom.com/scripts/smalltalk.exe/downloads/index.asp
Andrew C. Greenberg asks:
How is the VW parser licensed?
From what I can figure out from following the various links on minnow to the
Camp Smalltalk page, it's apparently released under something called the "ParcPlace Public License", which claims to be derived from the Mozilla Public License V 1.1.
Text can be found at http://www.parcplace.com/support/opensource/PPL-1.0.html
-- Duane
From a quick review, thePPL's definitions of Covered Code raises the virus-swallows-the-entire-image problem, I think. Any chance we can get someone to agree to dual-license? Otherwise, it seems we should adopt instead the unambiguously Squeak-L code.
On Tuesday, November 27, 2001, at 03:22 PM, Duane Maxwell wrote:
Andrew C. Greenberg asks:
How is the VW parser licensed?
From what I can figure out from following the various links on minnow to the
Camp Smalltalk page, it's apparently released under something called the "ParcPlace Public License", which claims to be derived from the Mozilla Public License V 1.1.
Text can be found at http://www.parcplace.com/support/opensource/PPL-1.0.html
-- Duane
On Tue, 27 Nov 2001, Andrew C. Greenberg wrote:
From a quick review, thePPL's definitions of Covered Code raises the virus-swallows-the-entire-image problem, I think. Any chance we can get someone to agree to dual-license? Otherwise, it seems we should adopt instead the unambiguously Squeak-L code.
My guess is "Yes". Or at least "probably".
Cincom's lawyers seem to get all confused about licences. James Roberston (The VisualWorks project manager), however, is quite enthused by all this sort of sharing. I feel confident that something could be worked out.
Cheers, Bijan Parsia.
It's certainly not intended to, and I didn't think that was the intent of the Mozilla license, from which this was derived. What is the phrasing you find problematic? I would expect that management would be willing to get any such wording changed to clarify the intention.
At 05:19 PM 11/27/2001 -0500, Andrew C. Greenberg wrote:
From a quick review, thePPL's definitions of Covered Code raises the virus-swallows-the-entire-image problem, I think. Any chance we can get someone to agree to dual-license? Otherwise, it seems we should adopt instead the unambiguously Squeak-L code.
On Tuesday, November 27, 2001, at 03:22 PM, Duane Maxwell wrote:
Andrew C. Greenberg asks:
How is the VW parser licensed?
From what I can figure out from following the various links on minnow to the
Camp Smalltalk page, it's apparently released under something called the "ParcPlace Public License", which claims to be derived from the Mozilla Public License V 1.1.
Text can be found at http://www.parcplace.com/support/opensource/PPL-1.0.html
-- Duane
-- Alan Knight [|], Cincom Smalltalk Development knight@acm.org aknight@cincom.com http://www.cincom.com/scripts/smalltalk.exe/downloads/index.asp
Lex,
Maybe we should agree on a Squeak native (lean mean DTD
skipping, SAX)
parser and point to the VW port for applications that need the more complete implementation the VW port offers.
From an outsider's perspective, this seems like a really strange
strategy. Code size isn't a terribly big deal -- the thing the Squeak community is most constrained by is programmer time.
Code size isn't but complexity is. Usually these two go hand in hand and therefore it's no strange argument at all. In fact I'd argue that programmer time is (in this particular case) mostly dictated by the complexity involved in the parser itself - most people will want to do pretty simple stuff.
Cheers, - Andreas
On Tue, 27 Nov 2001, Andreas Raab wrote: [snip]
From an outsider's perspective, this seems like a really strange
strategy. Code size isn't a terribly big deal -- the thing the Squeak community is most constrained by is programmer time.
Code size isn't but complexity is. Usually these two go hand in hand and therefore it's no strange argument at all. In fact I'd argue that programmer time is (in this particular case) mostly dictated by the complexity involved in the parser itself - most people will want to do pretty simple stuff.
Well there's complexity and there's complexity. And there's various interfaces to manage that complexity. And dealing with missing functionality can be more complex than dealing with unneeded functionality.
It makes sense, in general, to have a SAX layer with a useful interface for generating application objects. One kind of application object is a DOM like (in the sense of representing most of the Infoset) tree. As long as you support all the infoset features, it shouldn't be that difficult to support whatever interface the application programmer wants to see. In other words, the parser isn't as interesting, generally speaking, as the output *except* that you might want the parser to take care of a bunch of standard tasks (validation against DTDs or Schemas is just one example) *or* you need certain programming or performance characterisitcs (e.g., the jabber needs mentioned earlier).
So, what do you put in the base image? What *are* we standardizing?
One reason to work with VWXML parse nodes, even given all their ugliness, is that you can easily port your application to VisualWorks or any Smalltalk that supports a parser that generates those nodes. And vice versa.
This seems like *some* sort of win to me :)
OTOH, what would *really* be nice is Unicode support. Why don't we get that first, and then argue about the other layers? ;)
Hmm. Looking at VW5i.4, most of the node names look reasonable. There are a few with underscores still, but I bet I can get Steve to change them....
OYTOH, I don't see any problem having multiple XML parsers/node sets, etc. Picking one to bless with bundling is a purely political matter at this point: What do we want to "force" folks to use (at least by default). Anything that gets pulled in will be *very* hard to avoid if you're doing XML stuff.
This goes even if it's modularized. The psycho-social impact remains the same.
The VisualWorks parser/node set supports, overall, a larger community and isn't technically horrible. (To be precise, it's rather featureful, though not complete. It seems to be reasonably nippy. It's flexible. It's under active development. The variety of interfaces seem sane if not wholely exciting or beautiful.)
Cheers, Bijan Parsia.
Bijan,
Well there's complexity and there's complexity. And there's various interfaces to manage that complexity.
Actually it's partly those interfaces that create the complexity. If you take the ExoBox parser then there's just an XMLNode. Period. Not a whole set of nodes with interfaces and relations to learn.
And dealing with missing functionality can be more complex than dealing with unneeded functionality.
Depends on your practical needs. I attempted to get all three versions (ExoBox, YaX, and VWXML) going with a simple XML file and while the simpler ones (ExoBox, YaX) worked out of the box, I got various errors with VWXML. I _think_ this is because the VWXML parser was trying to validate against a (non-existing) DTD - but that's exactly what I was talking about wrt complexity. Two of the three just worked. One did not. As a person who's just trying to read some XML file (which will be 99% of all common applications) it seems pretty complex to me that I have to figure out why one of the three just doesn't like this file. And now I _have_ to look at each of these interfaces and try to understand what's going on inside.
So, what do you put in the base image? What *are* we standardizing?
The simplest thing that could possibly work. Obviously ;-)
One reason to work with VWXML parse nodes, even given all their ugliness, is that you can easily port your application to VisualWorks or any Smalltalk that supports a parser that generates those nodes. And vice versa.
This seems like *some* sort of win to me :)
Again, it depends. I seriously doubt that the majority of users will primarily look at how well this stuff ports - and then, it doesn't seem like a big deal to me to port either the ExoBox or the YaX parser to VW. And that's exactly _because_ of their simplicity ;-)
Those people who are intrinsically concerned about how to port their stuff between VW and Squeak are completely free to use the VWXML parser. Those who don't are probably much better served with one of the simpler models.
Cheers, - Andreas
On Tue, 27 Nov 2001, Andreas Raab wrote:
From an outsider's perspective, this seems like a really strange
strategy. Code size isn't a terribly big deal -- the thing the Squeak community is most constrained by is programmer time.
Code size isn't but complexity is. Usually these two go hand in hand and therefore it's no strange argument at all. In fact I'd argue that programmer time is (in this particular case) mostly dictated by the complexity involved in the parser itself - most people will want to do pretty simple stuff.
I totally agree. When I evaluated the available XML parsers out there this summer (for a "real" project), I went with Duane's package because it was very simple and straight foward to use, but had the features I needed. I think that such a mindset would be a good idea for whomever chooses which XML parser goes into the image.
Regards, Aaron
Aaron Reichow :: UMD ACM Pres :: http://www.d.umn.edu/~reic0024/ "civilization is a limitless multiplication of unnecessary necessities." :: mark twain
Code size isn't but complexity is. Usually these two go hand in hand and therefore it's no strange argument at all. In fact I'd argue that programmer time is (in this particular case) mostly dictated by the complexity involved in the parser itself - most people will want to do pretty simple stuff.
This is true. But VWXML does seem to have a basic SAX interface. You subclass SAXDriver and implement methods like startElement:attributes: and endElement: (not the exact names).
-Lex
Let me put in a word for the VisualWorks one...it's probably the most complete. It would let us share work with the VisualWorks and the Camp Smalltalk communities (and with Cincom itself). Cincom is definitely all the way behind it (they're even putting together some (hork) XML Schema stuff). I'm pretty sure they're open sourcing all bits, so we have an XSLT processer, XPath support, etc.
OK, then I get to advocate the exobox one :)
The exobox parser is a complete well-formedness, non-validating parser minus Unicode support - every obscure little syntax weirdness is handled, even if the result is eventually dropped on the floor. It is set up much like a SAX-style parser, in that what actually happens to the result of the parse is handled through overridden methods. There is a subclass provided that constructs a tree built from OrderedCollections (for nodes) and Dictionaries (for attributes). Also, the exobox parser handles some peculiar cases, including the very tricky Jabber one, where a entire session is in fact one XML stream - in other words, the parser can spit out subtrees immediately as they close rather than the entire tree, without blocking on waiting for the next token.
Plus it's released unambiguously under the Squeak license and doesn't require convincing anyone to make Squeak-friendly changes. It's not, however, a speed demon, but that's fixable.
-- Duane
I'm really puzzled why you are working with such a VM; restricting to 16 bit OOPS is really very ancient technology. Is this some old version of LittleSmalltalk or something? Why not use a more modern system; ie Squeak ?
I while back I was toying with a VM that used 16-bit object pointers with a LOOM style translation to a much larger object space. The thought was perhaps the speed increase from having the active object working set always in L1/l2 processor cache would offset the overhead of periodically needing to shuffle objects between the small fast and large slower object spaces.
This thinking was right after doing a high speed networking project where performance REALLY was hurt by TLB misses, and the reality sunk in that that common systems can only do 5-10 million random memory accesses per second across a large memory space. This assumes a TLB miss, which requires 4 memory bus clocks of access latency, and 3 more memory bus clocks to finish a cache line burst, and then after loading the TLB entry (which was a L2 cache miss), the same 7 clocks to get the actual memory location, or a total of 14 bus clocks for the processor to read 4 bytes. At a typical memory bus clock speed of 100 Mhz, this ends up being a usable memory bandwidth of only about 28 Mbytes/sec. There is a critical ratio between how many TLB entries you have, total memory space, and the randomness of access. As I remember, having like 256k of L2 cache with 256 MBytes of total memory was beginning to show this problem, basically when ALL the TLB's don't fit in L2 cache. Now that memory costs $75/Gigabyte, and L2 caches are the same size as before, and processors are running at 2 GHz, this might be worth revisting.
This VM never got past the curious idea state, as I never got around to profiling a Smalltalk system to measure the randomness of memory accesses and the object working set size. I did have a name for the idea, which was a reduced address space architecture (RASA). It was just another variation of "make the common operations (access to objects in the object working set) go fast, in exchange for slowing down less common operations". It's possible I could now profile things using the Squeak VM simulator!
For a VM that has limited local resources (like a cheap microcontroller), and swaps objects across a network, I could imagine 16-bit object pointers with a LOOM virtual memory might be very attractive. Even a VM that has limited RAM but lots of slower flash (for most of the object space) might find this an attractive architecture.
In a perfect universe, all object pointers would be unique across the universe (128 bits is enough), and all that stuff we call files (local or remote) would just be referenced by some object pointer. Isn't this what Ted Nelson always wanted?
- Jan
On Saturday 24 November 2001 20:29, Dan Ingalls wrote:
Hey, Tim, go easy on the guy. It saves space. The first *decade* of Squeak's lineage was 16-bit pointers.
Indeed. 32K non integer objects are more than enough for many embedded applications. Even Squeak 1.16 had little more than this: 36K objects. Squeak 3.1 has ten times as much, but can hardly be considered a "light" environment.
I am doing a 16 bit VM and am quite happy with it. Since it is closer to Forth than to Smalltalk I won't go into details here.
But why not get adventurous and use 8 bits to encode a pointer or an integer. You'll need to use extended precision more often, but you'll have learned something by the time you're done.
The first Smalltalk VM I designed (1984) had 5 bit pointers/integers. Actually, it was 4 data bits and 1 "last nibble" bit for variable length pointers. I did learn a lot, in particular why the HP41 calculator (my inspiration for this) was so slooooow.
Maybe you could blow it into an FPGA.
Exactly what I am doing. A Xilinx Spartan II 15 only costs $7 and is enough for a tiny but relatively high performance VM. Tim is quite right about having an indirection on every memory access (object table) being a very bad idea. But you only have to put up with this if you use a normal CPU. When you roll your own, you can have a virtually addressed OO cache like the Mushroom and then the indirection only slows downs cache misses, not every access.
Just kidding.
Not
Me neither. We live in interesting times, so let's have some fun!
-- Jecel
squeak-dev@lists.squeakfoundation.org