"Andreas Raab" Andreas.Raab@gmx.de wrote:
Folks,
I noticed them coming into Squeak a while ago and always since I wanted to ask this question: What the heck are they good for?! ;-) I don't see them
Well, I was also implementing them for fun and there are some nice pdfs on the net somewhere (Google ought to find them for you) explaining them in detail - including a revised concurrent version (which the implementation in Squeak is not I assume). Note: there might be factual errors here - I am no expert:
Basically it is a list with overlayed pointers that you can use to "skip ahead" in the list. These pointers are created using randomness and thus you end up with searchproperties etc much like a balanced binary tree - but you don't have to worry about balancing etc. and insertions/deletions and so forth are much easier implemented. It is a damn clever and rather "obvious" solution, but who would even think of using randomness in a datastructure? :-)
As far as I can tell it is a very good alternative to balanced binary trees (much easier implementation being prime advantage I believe) - performance wise they are much the same. I read somewhere that database implementations still might prefer Btrees because of some property that make them more efficient when it comes to distribution on secondary memory - but I don't know the details on that.
So... seems like a good structure to have in Squeak - even if nobody uses it yet! :-)
regards, Göran
used anywhere and I've never heard of 'em before (yeah, well, my introductory courses in algorithms and data structures were plenty of years back ;-) but somebody found them important enough to write and include them. Since I like data structures (in particular efficient ones) I'd like to hear more about the places in which SkipLists could and should be used instead of other approaches. Pointers to comparisons would be nice, examples in Squeak even nicer ;-)
Cheers,
- Andreas
--- goran.hultgren@bluefish.se wrote:
Basically it is a list with overlayed pointers that you can use to "skip ahead" in the list. These pointers are created using randomness and thus you end up with searchproperties etc much like a balanced binary tree - but you don't have to worry about balancing etc. and insertions/deletions and so forth are much easier implemented. It is a damn clever and rather "obvious" solution, but who would even think of using randomness in a datastructure? :-)
Well... The original "skip list" solution did use fixed intervals between node instead of random ones. But then insertion was a nightmare and pretty inefficient. It was just like managing a completely balanced tree. So a guy named Pugh came up with the idea of having random intervals to facilite insertions and keep logarithmic search time...
Do I have any example of a "good reason to use a skip list" instead of (Insert favorite data structure here) ? No! But there conceptually so cool! But if you find any, let me know!!!
===== ---------------------- Benoit St-Jean bstjean@yahoo.com ICQ: 130611319 / Yahoo Mssngr: bstjean http://cactus.swiki.net "We're only immortal for a limited time", Neil Peart ----------------------
__________________________________________________ Do You Yahoo!? Get email alerts & NEW webcam video instant messaging with Yahoo! Messenger. http://im.yahoo.com
As far as I can tell it is a very good alternative to balanced binary trees (much easier implementation being prime advantage I believe) - performance wise they are much the same. I read somewhere that database implementations still might prefer Btrees because of some property that make them more efficient when it comes to distribution on secondary memory - but I don't know the details on that.
On disk, the exact number of dereferences makes a big difference. With a binary tree, you divide your search space by 2 with each dereference; with a btree, you divide it by a larger number. I'd guess it's similar for skip lists, though I've only heard about them, not studied them.
Hmm, with "main" memory being so much slower than cache, btrees might even start making sense for in-memory structures. But not necessarily for Squeak, where an "array of foo's" won't necessarily have all of the foo's next to each other, anyway, and thus where accessing items within a node still causes fetches to distant parts of memory.
Anyway, I'd rather work on something higher-level. This hyper-optimization stuff is only useful on small portions of most of the programs I ever work on.
-Lex
On Sun, 30 Sep 2001, Lex Spoon wrote:
On disk, the exact number of dereferences makes a big difference. With
Its the number and the predictability of the dereference. If you can preload it, it doesn't matter. In this case, the cost of loading 16 bytes versus 16k is so little, that its faster to load the 16k, do 10 binary tree comparisons, than do 10 disk seeks.
a binary tree, you divide your search space by 2 with each dereference; with a btree, you divide it by a larger number. I'd guess it's similar for skip lists, though I've only heard about them, not studied them.
Hmm, with "main" memory being so much slower than cache, btrees might even start making sense for in-memory structures. But not necessarily
Nope. Most cachelines are 16 bytes.. You pay a lot more with the extra comparisons. Nor do you save anything with teh ability to prefetch; you cannot predict which branches you'll need. You might get semi-close with splay trees though, because they can use locality of access, so the effective number of accesses is much better.
I'll probably eventually implement them.
for Squeak, where an "array of foo's" won't necessarily have all of the foo's next to each other, anyway, and thus where accessing items within a node still causes fetches to distant parts of memory.
Distant parts don't hurt; cache lines are 16 bytes. The closest they could get to hurting is if they hit the TLB too heavily. (TLB for x86 has 64 entries for 4kb pages. Or, 256k. access outside that costs about 3 memory lookups to read in the new page table entry for that 4kb. PTE lookups can be cached.)
Anyway, I'd rather work on something higher-level. This hyper-optimization stuff is only useful on small portions of most of the programs I ever work on.
Exactly.. The O(log n) will beat your O(n) every time you start getting a lot of data. If it does in profiling, then implement the O(log n) algorithm, and contribute it.
Don't discount profiling.. Profiling has found out places in my network stream library where invoking #origionalContents instead of #contents sped it up by 3x,
And this little beauty: fidding with one line of code has lead to speedups of 30% in the queueing code.
I found out that: ``readPosition > (writeLimit/2)'' is about 30 times slower than ``readPosition * 2 > writeLImit'' (if writeLimit is randomly distributed. If writeLimit is always odd, its 60x.)
Which was the absolute last thing I expected to be consuming 20% of my CPU time. Profiling is useful, use it. :)
Scott
-- No DVD movie will ever enter the public domain, nor will any CD. The last CD and the last DVD will have moldered away decades before they leave copyright. This is not encouraging the creation of knowledge in the public domain.
Scott A Crosby crosby@qwes.math.cmu.edu is widely believed to have written:
Nope. Most cachelines are 16 bytes.
I doubt it. ARM uses a variety of cache line sizes ranging from 0 to 256 bytes. I think PPC uses 32 bytes. Sparc uses a variety depending on step level (and just to add fun, the line range spec in the the flush instruction varies; some levels use a closed interval, some a half-open interval. Imagine the amusement derived from debugging a code generator when SUN forgets to tell you this ).
tim
squeak-dev@lists.squeakfoundation.org