When I was looking at GST vs. Ruby benchmarks today, http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=gst&am... I came across a link at the bottom to the original "Design Principles Behind Smalltalk" paper by Dan Ingalls, see: http://users.ipa.net/~dwighth/smalltalk/byte_augc81/design_principles_behind...
This essay attempts to look at Dan's 1981 essay and move beyond it, especially by considering supporting creativity by a group instead of creativity by an isolated individual, and also by calling into question "objects" as a sole major metaphor for a system supporting creativity. Some of this thinking about "objects" is informed by the late William Kent's work, especiallyKent's book "Data & Reality": http://www.bkent.net/ http://www.bkent.net/Doc/darxrp.htm Presumably the original paper reflects not just Dan's work and thinking, but that of Alan Kay and the larger Learning Research Group at Xerox Parc at the time, but I will refer to it as Dan's writing, because his is the only name on it.
Mainly I will consider the first half of the paper. This essay is perhaps a little in the spirit of 'The Rise of "Worse is Better"', http://www.ai.mit.edu/docs/articles/good-news/subsection3.2.1.html and is intended to help in understanding why, say, Python has been so successful in capturing the hearts and minds of the last decade of development, running core systems from Google to NASA, whereas Squeak Smalltalk has remained a niche project during that time. And this is true even though we all know that Squeak is better than Python in oh so many ways (with a more expandable and more self-documenting syntax using key:words: instead of functions, better transparency from top to bottom of the system, better core graphics engine, better community in terms of very bright people capable of handling a high level of abstraction, better core tools, more consistent language model, better streams and number classes, more portable VM, better dynamic development where you can code in the debugger and restart a method instead of an application, and so on). Still, for all those Squeak advantages, I think, the same applies for Squeak Smalltalk as Richard Gabriel of 'The Rise of "Worse is Better"' says of Lisp, "... one can conclude only that the Lisp community needs to seriously rethink its position on Lisp design."
That "Worse is Better" paper probably has had little effect on changing Lisp the language, and I doubt this note will have much effect on Squeak the system. :-) Ultimately languages (and the mailing lists that support them) are somewhat self-selecting -- if you have major problems with the language or paradigm, you probably are not using Squeak or on the Squeak development list. Still, I found it of value to me to write up these issues, in terms of thinking of the next generation of tools and users, and I hope some Squeakers out there find it of value to read.
First off, I agree with Dan's stated goal of a quarter century back: "The purpose of the Smalltalk project is to provide computer support for the creative spirit in everyone." So, overall, there is no difference in goal when broadly construed.
This essay will outline some points of disagreement with how to best support that goal. Some of this disagreement will be from a coming at the notion of how to support creativity different perspective, especially given how the computing landscape has changed due to the very success of the object-oriented and networked GUI paradigms which Smalltalk (and the Alto it was developed on) pioneered. As Steve Jobs said: http://americanhistory.si.edu/collections/comphist/sj1.html "SJ: ... I saw their early computer called the Alto which was a phenomenal computer and they actually showed me three things there that they had working in 1976. I saw them in 1979. Things that took really until a few years ago for us to fully recreate, for the industry to fully recreate in this case with NeXTStep. However, I didn't see all three of those things. I only saw the first one which was so incredible to me that it saturated me. It blinded me to see the other two. It took me years to recreate them and rediscover them and incorporate them back into the model but they were very far ahead in their thinking. They didn't have it totally right, but they had the germ of the idea of all three things. And the three things were graphical user interfaces, object oriented computing and networking."
=== creativity by an individual versus creativity by a group ===
Dan wrote in the paper: "If a system is to serve the creative spirit, it must be entirely comprehensible to a single individual."
I strongly disagree with this, as much as I still agree with his later statement of: "Any barrier that exists between the user and some part of the system will eventually be a barrier to creative expression."
The disagreement comes from considering the idea of creativity of a community involving building on the work of others (where such work you use may be beyond your ability to either fully comprehend or modify). Or, is what is best for the individual in conflict with what is best for the group? And if so, which should win? While ideally, any system should have no barriers to the individual user, in practice, it may, and yet may still support a large amount of creativity by the group, sometimes directly because of those barriers. Another way to think of a barrier is an "interface" or a "firewall", and those have some positive connotations. Even Smalltalk's very success is due in part to creating a strong barrier at run-time between the user image of objects and the VM which supports it; this is a barrier which people have complained about, with the phrase "if you can't crash it, you're not doing the driving. " :-)
Consider, for example, the success of Python, which is a mostly object-oriented language core written in C, where lots of other libraries have been bolted on to it by the community. It is this widespread availability of useful libraries which drives Python adoption more than any other thing. (Same with Perl adoption.) Python has other ideas which proves popular, like its use of significant white space like Occam used, having dictionaries built in and easy to use, and looking a lot like C, but it is the libraries and the related modularity which is one of the biggest wins for Python. Well, and that you can write a program in a few lines in a text editor that use those great libraries, making it easy to build small things with little learning. I think this sentiment of focusing on empowering the individual primarily, indirectly at the expense of empowering the community, was why Squeak Smalltalk has suffered from poor modularity so often and why, for example, it took so long to get namespaces and such into the mainstream, and also why it has struggled to have as many libraries as Python offers.
So while I think Dan's original goal is a nice ideal, in practice it is not needed in the extreme, since the creativity of a group sharing, say, a community mailing list, will still move beyond the creativity of any individual in the group. So, it is *more* important in the internet age to have techniques for supporting group creativity, including modularity, than it is for the system to not have any barriers. That is probably one reason a system like Python is actually used in practice by more people to do creative things than Smalltalk. Python makes it easy for many individuals to do small things. While Python as a language and as an environment in practice does not scale as well as Smalltalk, the aggregate amount of all those individuals doing small things overwhelms what any one Smalltalker can do (or even a small group of them stepping on each others' toes and watching their contributions suffer from "bit rot").
Another issue here is that Dan was writing 25 years ago in the context of a *proprietary* system. So, availability of all the code within that system to the user was essential for creativity, even if the code was controlled by someone else, since otherwise the individual had no access to the code ever. But, when working with open systems based on free software, the code and tools to work with it are accessible to anyone, even if they are in other languages or supported by other communities, say, GCC community, than the community one is currently working in (say, GNU Smalltalk).
Again, to contrast with Python, Squeak wants to run the show, but Python plays nice with all the other free tools of the GNU/Linux ecosystem. When you use Python, your environment is not just Python; it is really more like GNU/Linux. So, free (as in "freedom") software -- with accompanying free licenses like the GPL that work as de-facto constitutions for collaborative communities -- has shifted the landscape, and the development ideals may need to shift with it. This is another reason why Python, which has always been free and had a core community with those values, has been able to succeed so quickly over the past decade, whereas Smalltalk, which was originally proprietary, has struggled, even though a free-ish Smalltalk like Squeak is much more accessible to easier modification in many ways than Python.
=== technical versus conceptual barriers ====
Another issue here is that Dan is talking about "technical" barriers, which are not the only, or even the biggest, barriers to creativity. There often exist "conceptual" barriers. It is in the "failure of the imagination" that we face our biggest hurdles. Imagination is indeed "the ultimate resource".
For example, the code to generate the VM in Squeak needs a certain mindset to understand; one has to think about the domain of bit manipulations. Even though the syntax looks like Smalltalk, the domain is very different from your run-of-the-mill GUI application or eToy. While it is indeed an innovation to use Smalltalk syntax in an uncommon domain, and indeed Smalltalk's syntax is a marvel which can make unnecessary many "domain specific languages", one can not get around the conceptual barrier of a programmer understanding a new domain, as much as a familiar syntax might help with the task.
Thus, barriers will always exist in any programming system, since all interesting programs probably address new domains (or old domains in new ways). So, again, while the goal Dan defined twenty five years ago is "ideal", it is an ideal that can never be reached because of conceptual barriers encountered when working in multiple domains, even in a pure Smalltalk system which minimizes arbitrary technical barriers like differing syntax. Forcing everyone to work in Smalltalk using Smalltalk tools, as good as they are, means that other innovations developed in other languages with other tools, for example, Java, are lost to the Squeak community. Yes, in theory, anything is possible, especially with Squeaks interface to loadable modules; I am speaking more of tone and emphasis and culture here.
And, if it is so often the conceptual barrier that is the ultimate hurdle, then are technical barriers of different syntax and different tools really so big? Many humans become fluent in multiple human languages and their accompanying cultures, which is typically a harder thing than learning new computer languages. If one needs to switch mental gears conceptually to work on the VM, then is it *really* so bad if the VM is written directly in C like GNU Smalltalk does? Now, I know a lot of positive arguments can be made for the utility and convenience of the Squeak VM written in Smalltalk (especially given C's quirks as a not-quite-cross-platform language), but in the end, my point is, keeping everything in one syntax may not really save that much time for the community, all things considered. Even when the syntax is the same, the underlying domain semantics may be very different, and those semantics, or meaning of all the objects, are what take the time to learn. To build a new VM, one still needs to spend a long time understanding what a VM is and how it could work, and no choice of familiar tools or use of one single syntax will make that extremely easier (a little easier, yes). A better choice of abstraction perhaps might make maintain a VM easier for those who get the abstraction, but not a choice of language by itself all other things being equal. Were the Squeak VM coded in some other portable language (like Java or Free Pascal or OCaml) then it might not take very much more trouble to maintain -- and such a VM might even be easier to develop, as one could use the complete tool set of that other system to debug the VM in the language it was maintained in, rather than facing a technical barrier :-) of seeing C code for the VM in the debugger instead of the original Smalltalk source which was translated from. Granted, if the Squeak VM was coded in, say, OCaml, one would have a barrier to an VM maintainer of learning that language and its paradigms, but I would argue that the barrier would remain more conceptual than technical, and the syntax problem would be the lesser issue.
Right now, I think Squeak on the JVM, like Talks2 is a step towards, could be a really big win for the Squeak community, and translating the VM from an abstract representation (in Smalltalk) to a specific language is a big win there. Still, the VM could have been in any translatable abstraction (XML, Lisp, ANTLR parse tree from a VM-specific language, Parrot, etc.) and generating Java would still be easily doable (though of course Smalltalk encoding is preferable for Smalltalkers). Also it is not clear even if this is a big win, if it is a big enough win to justify other aspects making VMs harder to debug and maintain by having an intermediate translation step, compared to just working in a language that is more cross-platform by design than C, where somebody else does the hard work of maintaining that other platform. Again, here is the issue of community support versus individual support, and related assumptions.
=== Language versus "group computation" =====
I really like the "Figure 1" diagram in the original paper. And it remains a useful illumination of the problem area. Still, the bold statement "Purpose of Language: To provide a framework for communication" may not be the entire picture. What if one drops the big idea of "Language" entirely and focuses instead on "computation"? Consider if two or more people are not so much engaging in "language" as they talk, but instead are engaging in a "group computation", where utterances between group members plays a facilitating role. So, from this point of view, the major goal has to be allowing the group to compute effectively, whatever that takes. Language is one aspect of this. But the, so are licenses. So are communications channels. So are formal and informal community processes. And so on to all of sociology and politics. In a way, by elevating "language" as a paradigm, many of these other aspects could be missed. Squeak the community has certainly suffered on some of these issues at various times (though recent efforts, especially on relicensing even it just had PR value, are hopeful sign).
Also, consider, the latest thinking on cognitive psychology and AI includes that the human brain simultaneously thinks about problems in multiple representations, and chooses second-by-second the representation that pays off the most in making progress towards goals. So for example, we may look out the window at a rainy day with a goal of going to the store by car, and we may simultaneously imagine becoming wet as a sort of 3D world simulation of rain falling on our heads as we venture to the car in our imagination, engage in formal logic based on linguistic experience ("if rain, then take umbrella"), use neural net pattern matching to get the most common behavior in the gestalt of the situation (it just feels right to reach for the umbrella based on the gestalt of the situation), plus we be mentally making a two-D map of a route an areas with obstacles (rain between self and car) and considering ways to make progress through the 2D representation of a rain obstacle. So here we might have four different representations we might be using simultaneously, which each have parsed the world differently, perhaps into "objects" of various sorts or perhaps not. One or more of them may prove most useful and drive our behavior for that moment.
Language is only playing a direct role in one of those representations in this case, the formal symbolic-logical process. Language may play a role in the others representations as well as we internally reflect with language (generating internal questions like "why do I feel like reaching for the umbrella?" Or, "How can I overcome the rain barrier?" etc.). However the other representational schemes may also be applied to the formal linguistic-symbolic representation or to each other.
In short, we now know that viewing the mind as solely about "language" is an overly simplistic way of thinking about it. And if the paradigm has grown, then so too should our computer support systems, in order to honor the insight in the original paper of: "The mechanisms of human thought and communication have been engineered for millions of years, and we should respect them as being of sound design. Moreover, since we must work with this design for the next million years, it will save time if we make our computer models compatible with the mind, rather that the other way around."
One of those forces shaping the mechanisms of human thought has been how it is the *group* which survives in the wilderness; the lone *individual* is rapidly picked off by accidents (say, a broken leg) or runs into trouble (say, a pack of coyotes) beyond his or her individual ability to cope. (That's one reason it's foolish to think you can survive an apocalyptic disaster long term by running away to the wilderness on your own.) When a village defends itself against a large pack of coyotes, even with verbal shouts and grunts to the coyotes or between villagers, what is going on is in some ways is primarily a coyote defense "computation" involving all the villagers and all their thinking (which may be operating lots of simultaneous decision making models), not just a "discussion" among villagers about coyotes, as useful as language may be in helping that larger group computation to come to a successful conclusion. So in a tool to enhance group creativity, we must consider all the ways to enhance these creative group computations, and those go beyond just supporting a common language.
== objects are an illusions, but useful ones ===
In my undergraduate work in psychology I wrote a senior paper in 1985 entitled: "Why intelligence: Object, Evolution, Stability, and Model" where I argued the impression of a world of well-defined objects is an illusion, but a useful one. Considered in the context of the section above, we can also see that how you parse the world into objects may depend on the particular goal you have (reaching your car without being wet) or the particular approach you are taking to reaching the goal (either the strategy, walking outside, or any helping tool used, like a neural net or 2D map). Yet, the world is the same, even as what we consider to be an "object" may vary from time to time; in one situation "rain" might be an object, in another a "rain drop" might be an object, in another the weather might be of little interest. So objects are a *convenience* to reaching goals (in terms of internal states), not reality (which our best physics says is more continuous than anything else in terms of quantum probabilities, or at best, more conventionally a particle-wave duality). So objects, as tools of thought, then have no meaning apart from the context in which we create them -- and the contexts include our viewpoints, our goals, our tools, or history, or relations to the community, and so on.
Consider Dan's statement of "A computer language should support the concept of "object" and provide a uniform means for referring to the objects in its universe." That appears to me to have made a classical mistake of thinking the universe has only one parsing into one object hierarchy and that the objects exist in some sort of Platonic ideal. See Plato's "Allegory of the Cave" for the best example of the mistaken notion of only one true parsing, even though as the social commentary it may still be accurate: :-). http://faculty.washington.edu/smcohen/320/cave.htm http://www.ship.edu/~cgboeree/platoscave.html As discussed above, the world does not have just one unique parsing into objects. Or, to bend Plato's allegory, that we sometimes find apparently discrete "shadows" useful to perceive and think about as "objects" does not mean there really are discrete ideal things out there casting those shadows, with any sort of one-to-one correspondence. Again, this is not to say objects are not useful, just that they are a tool.
To use an example from the paper, when Dan wrote: "Every time you want to talk about "that chair over there", you must repeat the entire processes of distinguishing that chair. This is where the act of reference comes in: we can associate a unique identifier with an object, and, from that time on, only the mention of that identifier is necessary to refer to the original object." That sounds really nice on the surface. But consider, what if the "chair" is glued to the floor? You think it is an "object" but there is no clear real boundary between it and the floor. And when you attempt to move it, what if the floor boards come up with it? You now have an entity you are manipulating which is not quite a "chair" and not quite a "floor" -- what is it? There is no neat "class" to put it in. Clearly you, the reader, can think about this entity so the human brain supports this fluidity in changing our definitions of objects and not requiring a one-to-one mapping to ideal classes to think about them, but a computer language like Smalltalk would have many problems representing this. William Kent, in the book _Data & Reality_ discusses these sorts of problems at length.
Sure, you could make a new object for the combined entity, but what if then you decided to take apart the chair into cushions, legs (with floor boards still glued on), a back, and lots of bolts? Now you have lots of new objects? Sure, but then how do you reference about all the objects and all their relations in all possible permutations consistently? Your mind can do it easily; a Smalltalk class hierarchy and related application would struggle to do it, at best.
Sure there are design patterns for some of these things (like "Facade") but they are not completely reflected in a system which has an overly static notion of "object". Smalltalk has some ways to deal with these things, like "becomes:", but that is not dealing with this problem in all its generality. You can simultaneously think about an original chair, a chair with floor boards stuck on it, and a chair taken apart -- so your mind is capable of much more imaginative representational power than a simple notion of "objects".
Again, discrete objects are a useful tool to think with, but they are not the only tool, and they are not as stable a tool as one might think at first glance. Objects are useful within contexts. Yet, Smalltalk lacks a formal notion of an object having a context (or imaginative world) which defines its meaning. When people (including Alan Kay) talk about Smalltalk they often say to the effect that objects are self-contained. But clearly they are not. Their meaning emerges out of their interactions with a world of other objects. Yet modern Smalltalk have not formalized the notion of a world of objects beyond a very coarse-grained kitchen-sink "image". It would seem one needs finer-grained contexts, be they "worlds", "modules", or some other thing.
==== talking to an object vs. manipulating it ====
Consider this statement from the paper: "Computing should be viewed as an intrinsic capability of objects that can be uniformly invoked by sending messages." It sounds uniform (from an implementation point of view), yet it violates the human notion of how most of our time is spent actually interacting with "objects". When we use language, we are generally talking to ourself or other people; most items in the world don't respond directly to language. We use our hands or feet or whatever to manipulate them -- to move them or change their internal configuration. Not every object in the real world has a name or knows what the best inspector is for it; in fact, very few of them do, except perhaps people. When we pick up a rock, we try different tools on it if we want to observe it. We classify it ourselves; we don't ask it its class. We may stick a label on it and put it in a museum, but that is an active effort of categorization. And we may later change our minds about how to label it. Or we may break it into two parts (plus some rock fragments), and grind one of the parts up into rock dust and put some of the dust through various chemical processes over a period of years (say, if it was originally a "moon rock"). A moon rock does not know how to perform chemical analysis on itself or even split itself in two, yet Smalltalk philosophy encourages making models of reality as if a moon rock did.
I think the issue shows why languages like Lisp or Python (a Lisp derivative in some ways) or even C++ have hung on so well, both as philosophies and communities. In those languages you often have data structures which are operated on externally by large subroutines. And people who like these languages claim they like to sometimes do OO when they want it (OO meaning behavior emerges from a lot of interacting objects), or at other times to do these sorts of external manipulations on sets of inert objects using complex routines. Manipulating otherwise inert-seeming objects according to our fancy of the moment is something that people are comfortable with, and likely a big part of our mind is structured to do that well. Yes, we do talk to people or certain animals (or now certain devices). But we also do a lot of manipulation by hand (or foot etc.) and classification by eye (or ear or touch).
So, this suggests perhaps it is a mistake to have an object hierarchy where at the top everything knows its name or how to put data into its own slots. What's wrong with, say, asking the VM to put data into an objects slots? Or asking the VM for the ID of an object? Why should objects be expected to be so smart when we programmers are surrounded in the real world with objects which are usually quite dumb? This a violation of the good principles that Dan starts out the paper with -- to make a system which maps well onto how humans think. Humans both talk and manipulate, so it would seem a system should support both styles of interaction. Granted, it is almost trivial in Smalltalk to reach into other objects and manipulate them, but my point is that Smalltalk is not presented that way, and such interactions are generally discouraged as bad practice. Somehow I feel this issue needs to be revisited. Among newer GUI interfaces, like Morphic, there is an emphasis on direct manipulation. Yet, somehow, this notion is discouraged in programming. There is a paradigm conflict here which needs to be addressed.
== summing up==
I don't have time or energy right now to go into the rest of this excellent paper in detail; much is either on the value of modularity, which I agree with (even as Squeak Smalltalk may not have enough of it in practice), or implementation strategy or GUI (which is a whole other can of worms).
But let me say these criticisms are made with (perhaps) 20/20 hindsight a quarter century later. Dan himself may have come to these insights or better ones by now; no doubt like many people when contemplating their earlier work of decades gone by, they can be both proud of it and embarrassed by it at the same time (I know I am). As Dan said insightfully in the conclusion: "There are clearly other aspects to human thought that have not been addressed in this paper. These must be identified as metaphors that can complement the existing models of the language." This essay is intended along those very lines Dan mapped out so long ago.
For its time, the original paper is a remarkable achievement, as is Smalltalk-80. It is only because of such great work that we can think about moving forward onto even greater projects.
But what I find most illuminating stumbling across this paper again (I probably read it in Byte way back when) is that now, in retrospect, it seems to explain both the ways Smalltalk would *succeed* spectacularly in the goal of making (*some*) individuals more creative (though, contrast with Howard Gardener's theory of *multiple intelligences* including non-language ones, see the list here:) http://www.infed.org/thinkers/gardner.htm and also the ways Smalltalk would *fail* (somewhat) in making groups more creative (which it was not designed to do), compared to, say, Python (which is not as scalable for individual user's creative projects, but has other advantages in a group context).
Subconsciously it may be these sorts of issues that motivate would-be Squeakers to have an interest in, say, Python. And these sorts of issues may be implicitly behind some of the specific issues Squeak has wrestled with both as an implementation and as a community. Obviously, the Squeak community, especially with, say, Monticello or Croquet, is trying to bridge this gap to support group creativity. OpenAugment, based on Squeak, is indirectly another such project. http://www.openaugment.org/ So it is not like there are no attempts to recognize some of these issues and move forward. But perhaps the original "Design Principles Behind Smalltalk" paper (as it unconsciously resonates about in the Smalltalk community, and even in the minds of core Squeakers) now holds Smalltalk (and Squeak) back, as much as that paper propelled Smalltalk forward for a quarter century.
--Paul Fernhout (I hearby place this essay under the GPL, version 2 or later; and also the GFDL with no invariant sections).
From: "Paul D. Fernhout" pdfernhout@kurtz-fernhout.com Reply-To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org Subject: Design Principles Behind Smalltalk, Revisited Date: Mon, 25 Dec 2006 17:10:47 -0500
http://www.ai.mit.edu/docs/articles/good-news/subsection3.2.1.html and is intended to help in understanding why, say, Python has been so successful in capturing the hearts and minds of the last decade of development
One doesn't have to look far for this. C became popular for no other reason then it was "close enough" and we could use it "right now" to build systems. The more advanced languages used resources when there were no resources to use.
"The rise of worse is better" largely misses the point here. The point is: getting 80% today is infinitely better then 100% "some day". And all the rest is just the incredible weight of "backward compatibility".
And the concept of backward compatibility isn't just software and hardware. It extends to workers as well. The average programmer is just not very good (and I don't speak about the worth of the people as human beings. There are just so many in the field with no interest in it other then money, which is totally ok. It just doesn't make for good programmers). The cost of moving people who are barely keeping up from C++ to Java isn't so bad. It actually makes things simpler: just the same syntax again with much of what they didn't understand taken out. But moving these folks away from a C based syntax is out of the question. And getting rid of them in favor of more talented programmers would be just as out of the question.
Again, to contrast with Python, Squeak wants to run the show, but Python plays nice with all the other free tools of the GNU/Linux ecosystem.
I keep on seeing this, but it appears largely overstated. Java has it's own VM, threads etc. as well. And it is easier to connect to the outside world in Squeak then Java, because in java you are in "your on your own!" land. In Squeak you always were so there is no need to be afraid of this step if you need it. In at least Squeak and Dolphin smalltalk you can call "extern C" style functions directly from smalltalk (thought in squeak you need to load FFI first). That is at least as good as any of the other languages.
And if you mean more to address the tools, well yes you *can* edit Java code in vi if you really want to. But no one really wants to. And if your interface to the language is through some program anyway, then the "barrier" of the code not being on the file system disappears.
Forcing everyone to work in Smalltalk using Smalltalk tools, as good as they are, means that other innovations developed in other languages with other tools, for example, Java, are lost to the Squeak community.
Um... What innovations in Java?
Many humans become fluent in multiple human languages and their accompanying cultures, which is typically a harder thing than learning new computer languages. If one needs to switch mental gears conceptually to work on the VM, then is it *really* so bad if the VM is written directly in C like GNU Smalltalk does?
Typically? It is harder in every case, no matter how badly designed the programming language.
And what do you want to gain here? If the squeak community came out today and said "Ok! You can write the squeak VM in anything you want, we don't care", they wouldn't suddenly get volunteers knocking the doors down to work on squeak. They would only lose people who can work on the VM today (not because these people *can't* do it, but because they wouldn't want to anymore).
While I agree that squeak is not required to be written in a subset of smalltalk, it *is* and changing it wont gain anything. Getting squeak to run on strong talk might, but I haven't seen anyone forbidding that.
Right now, I think Squeak on the JVM, like Talks2 is a step towards, could be a really big win for the Squeak community, and translating the VM from an abstract representation (in Smalltalk) to a specific language is a big win there. Still, the VM could have been in any translatable abstraction (XML, Lisp, ANTLR parse tree from a VM-specific language, Parrot, etc.) and generating Java would still be easily doable (though of course Smalltalk encoding is preferable for Smalltalkers).
Java isn't the end-all/be-all here. Microsoft is moving to a more dynamic VM already, and because of this Java will be forced to as well. Java has always been behind pre-existing technologies and this area is no different. If you want to move into the future it is best not to follow a group that is always behind.
== objects are an illusions, but useful ones ===
In my undergraduate work in psychology I wrote a senior paper in 1985 entitled: "Why intelligence: Object, Evolution, Stability, and Model" where I argued the impression of a world of well-defined objects is an illusion, but a useful one. Considered in the context of the section above, we can also see that how you parse the world into objects may depend on the particular goal you have (reaching your car without being wet) or the particular approach you are taking to reaching the goal (either the strategy, walking outside, or any helping tool used, like a neural net or 2D map). Yet, the world is the same, even as what we consider to be an "object" may vary from time to time; in one situation "rain" might be an object, in another a "rain drop" might be an object, in another the weather might be of little interest. So objects are a *convenience* to reaching goals (in terms of internal states), not reality (which our best physics says is more continuous than anything else in terms of quantum probabilities, or at best, more conventionally a particle-wave duality). So objects, as tools of thought, then have no meaning apart from the context in which we create them -- and the contexts include our viewpoints, our goals, our tools, or history, or relations to the community, and so on.
To me this was the most insightful point in the whole essay. Though, honestly I thought this was pretty well understood. Object Orientation is simply a way of organizing code in a way that makes sense from the perspective of the problem domain it is related to. But since programming is a task of managing complexity, correct organization is a critical piece of the puzzle.
But this observation is the reason OO databases haven't really taken off: An OO database will tend to model things how *your* application wants to see them. A traditional relational DBA will model things in the most generic way he can so that *all* the applications can build the view they need easily. Relational DBA's tend to be of the view point: The data will exist for the life of the company, while the applications that access it come and go like the tide. And one only needs to look at the huge Java rewrites going on to know they are right.
Consider Dan's statement of "A computer language should support the concept of "object" and provide a uniform means for referring to the objects in its universe." That appears to me to have made a classical mistake of thinking the universe has only one parsing into one object hierarchy and that the objects exist in some sort of Platonic ideal.
Actually I think this applies more to C++ derived OO languages (e.g. Java). It is those languages that have huge hierarchies of things that are not that related due to the brain-dead typing systems. In smalltalk the only hierarchy that has to be is inheriting from Object. And you don't even have to do that.
But I think this works just fine: We are choosing to code something, so we have to model it in the point of view appropriate to how we are going to solve the problem. And this implies some organization technique. And among organization techniques, (correct) OO has had the most success in my opinion.
<chair part snipped> There is no neat "class" to put it in.
I wouldn't expect it to be in a class. I would expect classes to know how to stick to each other. :)
Clearly you, the reader, can think about this entity so the human brain supports this fluidity in changing our definitions of objects and not requiring a one-to-one mapping to ideal classes to think about them, but a computer language like Smalltalk would have many problems representing this. William Kent, in the book _Data & Reality_ discusses these sorts of problems at length.
I disagree with where the focus is placed here. An entity typically does have just one name and would make sense to be called one thing in the system. What you are describing sounds more like interface protocols. This might be an area that could use more research, but honestly I would want to know what is bought by formalizing this existing practice more (e.g. making protocols first class objects themselves or something).
For an example of what I mean, in case it isn't that clear, we could think about Lists. They have a collection protocol: a series of messages that conform to what other collections can do. But they could also have a "stack" protocol: a series of messages for treating the list as though it were a stack.
This could be seen as what we do in real life. Due to necessity I may find myself driving a nail into a piece of wood with a screw driver. But I would never call what is in my hand a hammer. I would simply be using it's "blunt object" interface momentarily.
Thanks, JJ
_________________________________________________________________ Type your favorite song. Get a customized station. Try MSN Radio powered by Pandora. http://radio.msn.com/?icid=T002MSN03A07001
J J wrote:
One doesn't have to look far for this. C became popular for no other reason then it was "close enough" and we could use it "right now" to build systems. The more advanced languages used resources when there were no resources to use.
Paul Graham has an essay on why languages become popular: http://www.paulgraham.com/popular.html In the case of both C and C++, one should not discount the wight of AT&T, one of the largest and most widespread and visible companies of the time (as it ran a telephone monopoly). Similarly, without the backing of both Sun and IBM, Java might well never have taken off. Clearly Smalltalk was much better than Java in many ways when it was released: http://www.oreillynet.com/ruby/blog/2006/01/bambi_meets_godzilla.html And, before Java,. people were actively converting too it as a "COBOL for the 1990s"; and either C++ or Smalltalk were both so different from COBOL, that there was no huge difference in ease of understanding either syntax for COBOL programmers; in fact, Smalltalk was closer to COBOL's use of complete words without arbitrary abbreviations if anything.
"The rise of worse is better" largely misses the point here. The point is: getting 80% today is infinitely better then 100% "some day". And all the rest is just the incredible weight of "backward compatibility".
And the concept of backward compatibility isn't just software and hardware. It extends to workers as well. The average programmer is just not very good (and I don't speak about the worth of the people as human beings. There are just so many in the field with no interest in it other then money, which is totally ok. It just doesn't make for good programmers). The cost of moving people who are barely keeping up from C++ to Java isn't so bad. It actually makes things simpler: just the same syntax again with much of what they didn't understand taken out. But moving these folks away from a C based syntax is out of the question. And getting rid of them in favor of more talented programmers would be just as out of the question.
Well, it is also true one big issue is that an Algol-like syntax with operator precedence (times over plus) is taught in K-12 school. That is a big advantage for a computer language to build on that, even as that precedence is arbitrary and Smalltalk is more consistent. And you are right on how Java seemed an easy move for C++ programmers. Of course, now Ruby seems an easy move for Java programmers (and much of Ruby is based on Smalltalk ideas), so in a matter of time, we may see Ruby developers making the leap to a more self-documenting and flexible syntax. :-)
Still, Smalltalk syntax was supposedly designed to be easy for kids to learn. It is not that hard to learn the syntax. I've helped people in business learn it. It takes at most week to become proficient in it (and often just a day). What is hard is to learn all the libraries. But, with more and more programmers learning things like Java or Python or Ruby, all systems with rich libraries, Ruby's being almost exactly Smalltalk's in many ways, making the leap to a new syntax would be a minor investment (and one worth taking because Smalltalk syntax is more extensible and self-documenting than any of those other languages').
People are changing languages all the time. People have moved to Python; people are moving to Ruby; people have even moved to languages like Perl, which have much more tortured syntaxes or PHP which have much more limited libraries. People learned HTML out of the blue because they wanted to do web sites, and HTML is a much harder syntax to work in than Smalltalk's in many ways (though you can edit in vi and then see immediate results in your local web browser). So, why not people moving to Smalltalk (Squeak especially)? People in Python or Perl or PHP or Ruby camps are not bemoaning "backward compatibility" as the reason for limited success and adoption. While everything you say it true, it is not true enough IMHO to be the main reason. What are the others and how can they be addressed to produce a popular free Smalltalk?
Again, to contrast with Python, Squeak wants to run the show, but Python plays nice with all the other free tools of the GNU/Linux ecosystem.
I keep on seeing this, but it appears largely overstated. Java has it's own VM, threads etc. as well. And it is easier to connect to the outside world in Squeak then Java, because in java you are in "your on your own!" land. In Squeak you always were so there is no need to be afraid of this step if you need it. In at least Squeak and Dolphin smalltalk you can call "extern C" style functions directly from smalltalk (thought in squeak you need to load FFI first). That is at least as good as any of the other languages.
True. Though there can still be a difference in "culture" of the communities surrounding a language. Clearly Smalltalk's (or Squeak's) culture is very different than Python's. I wrote something on that here, in terms of how the cultures of the communities relate to their histories: http://mail.python.org/pipermail/edu-sig/2006-December/007476.html
And if you mean more to address the tools, well yes you *can* edit Java code in vi if you really want to. But no one really wants to. And if your interface to the language is through some program anyway, then the "barrier" of the code not being on the file system disappears.
Well, there is a bigger difference here between Python (which I mentioned) and Java (which you mentioned). Python plays nicer with UNIX-y systems than Java in many ways, mostly because Python is smaller, historically had a faster startup time, and earlier had more comprehensive libraries for interfacing with UNIX-y libraries. My point was more for Python, which is being billed as a "glue" languages -- something to glue together your C libraries.
Java is different, as you point out. However, Java is so different, and received so much attention, and incorporated so many Smalltalk-pioneered ideas in the JVM design and class libraries (Swing) that ten years after it has been introduced, it finally mostly works right as a self-contained environment. Not quite VisualWorks, but darn close in many ways by now, and it is free as in beer and is becoming free as in freedom (GPL). :-)
But for both Java and Python, being able to be easily edited in vi (or emacs) or being able to use a conventional text oriented version control system were indeed big wins, as they reduced the learning curve and initial commitment to new ideas. Being able to use the familiar file manager to look at code was also of value. And going beyond vi, the fact that Java IDEs started to look like C++ IDEs was another big win on familiarity. And seeing each class in a separate file in the good old reliable file system was also comforting -- at least you knew where your source is, and could use grep or other tools to search and manipulate it and back it up in a familiar fashion. Talks2 shows this is possible -- having a directory of Smalltalk class files. It is possible to generate text files from an image -- any Smalltalk can typically export such classes. And it isn't that hard to export instances as text either (I made something in Python that does it for instances in that language; any Smalltalk could do much the same) which gives you an image defined by textual program code to rebuild a world of objects.
Forcing everyone to work in Smalltalk using Smalltalk tools, as good as they are, means that other innovations developed in other languages with other tools, for example, Java, are lost to the Squeak community.
Um... What innovations in Java?
Extensive tested and debugged libraries on a variety of topics.
Many humans become fluent in multiple human languages and their accompanying cultures, which is typically a harder thing than learning new computer languages. If one needs to switch mental gears conceptually to work on the VM, then is it *really* so bad if the VM is written directly in C like GNU Smalltalk does?
Typically? It is harder in every case, no matter how badly designed the programming language.
Well, Spanish to Portuguese might be easier than COBOL to OCaml? But COBOL to OCaml is hard for different reasons than syntax. :-)
And what do you want to gain here? If the squeak community came out today and said "Ok! You can write the squeak VM in anything you want, we don't care", they wouldn't suddenly get volunteers knocking the doors down to work on squeak. They would only lose people who can work on the VM today (not because these people *can't* do it, but because they wouldn't want to anymore).
While I agree that squeak is not required to be written in a subset of smalltalk, it *is* and changing it wont gain anything. Getting squeak to run on strong talk might, but I haven't seen anyone forbidding that.
My point here wasn't that Squeak should change; it was just an example of how being different and staying entirely in Smalltalk might not have been a big win, compared to just having a VM written in, say, C. There remains the "conceptual" barrier of the VM domain, even as the "technical" one of syntax is removed.
I am not against translating a VM from an abstract representation, in, say Smalltalk. I think it is a clever idea, especially since it already has been done. And with some more work, it might even gain the elegance of say ANTLR's plugin for Eclipse, or ANTLRWorks, where you can step through the abstraction in an IDE without seeing the underlying code (Java in ANTLR's case). (Maybe Squeak can already do this by now?)
Still, having said that, a Smalltalk VM is so simple, consider this 47K Public Domain one that does most of the work (from the Java version of A Little Smalltalk, now called SmallWorld): http://budd.eecs.oregonstate.edu/~budd/SmallWorld/Source/SmallObject.java so how hard is that to maintain a Smalltalk VM the original Java?
Translating primitives into C or Java, like for sound manipulation, seems like a bigger win. But even then, you have to be writing that code (or rewriting that code) in such a non-Smalltalk way semantically that it is still not clear to me if there is a lot of value in it. Especially when the alternative might be to just call an existing sound synthesis library written in Java or C. We now have Java for a good cross-platform language with equivalent to C++ performance, so it would have been a harder choice ten years previously as to what cross-platform language to use if not C with all its quirks (Free Pascal?).
Right now, I think Squeak on the JVM, like Talks2 is a step towards, could be a really big win for the Squeak community, and translating the VM from an abstract representation (in Smalltalk) to a specific language is a big win there. Still, the VM could have been in any translatable abstraction (XML, Lisp, ANTLR parse tree from a VM-specific language, Parrot, etc.) and generating Java would still be easily doable (though of course Smalltalk encoding is preferable for Smalltalkers).
Java isn't the end-all/be-all here. Microsoft is moving to a more dynamic VM already, and because of this Java will be forced to as well. Java has always been behind pre-existing technologies and this area is no different. If you want to move into the future it is best not to follow a group that is always behind.
The value of Squeak on Java is a separate issue. The value is mostly to be able to reduce deployment overhead, especially for systems that mix Smalltalk and faster native-y code written in Java or another JVM language; Talks2 already did a lot of this work.
But here again is an issue of culture. Who cares if Sun is "behind"; or if Squeak runs 30% slower without some extra dynamic dispatch opcode in the JVM? Speed is not Squeak's main problem. Being able to leverage Sun's JVM and the fact that you can call AWT classes in the same way for any platform Java runs on is a big win for Squeak IMHO, as it would reduce the maintenance burden of it in terms of complexity of the common code base, and would also make it easy to install one common package for any platform Java runs on. Ten years ago, or even five, I myself would have laughed at the value of this idea (as Java was so buggy and unstable and slow). But most of the bugs have been fixed, the 1.5 JVM shares memory across JVMs and does dynamic translation for speed, so Java finally, now that it is going free under the GPL, has the potential to be a great cross-platform tool where you get both a common base GUI window system as well as the ability to deliver fast primitives written in Java, as well as access to a lot of libraries someone else has already written and debugged for you.
The Squeak community could admit that it would be a big win to leverage that "pink plane" success, even if it is "behind" and decide to move forward on top of it, but in other "blue plane" directions. Or it can continue to spend a lot of time dealing with time consuming basic issues relating to packaging and testing C code for lots of platforms (which essentially just duplicates the work the Java community is doing, but not as well because of more limited people power).
dot net is a non-starter because it is proprietary (and may be covered by patents). And I would not make this suggestion without basing it on Sun's move to the GPL for Java. There are several JVM Smalltalk already of course. http://www.robert-tolksdorf.de/vmlanguages.html But none have the power of Squeak. And, building on Squeak's strengths, it could be an opportune time to also shake off licensing problems, say by carefully comparing with and using GNU Smalltalk code when possible, or by using an approach like Bistro to leverage Java libraries temporarily until replacement versions in Smalltalk could be written in a true "clean room" fashion.
But the bigger point, along the lines of this main "revisited" thread, is that building on others work in a comprehensive way, like having a Squeak on top of Java, even though it has been done somewhat with the excellent Talks2, is something that goes against the grain of the community (and quite possibly to its disadvantage).
Python, by contrast, runs on the JVM, using Jython, and has great integration with Java. It has issues, and lags the main release, but overall it is production quality (at least in earlier releases); and since Java is such a difficult language to develop in because it is so verbose with braces and passing through exceptions and types and such, Jython may well be the big thing that makes Java continue to succeed. :-) From: http://www.jython.org/Project/index.html " Jython, lest you do not know of it, is the most compelling weapon the Java platform has for its survival into the 21st century:-) —Sean McGrath, CTO, Propylon"
Why not have Squeak in that role too? But the deeper question is, why is it not there already, and why has, say, Talks2 not gotten more effort behind it? And I think that issue has to do with community issues and also licensing issues than technology issues. (I myself would build on Talks2, right now except it is stuck in the same licensing ambiguity Squeak is; I'm hoping when Squeak gets that cleared up for itself, that Talks2 might follow).
== objects are an illusions, but useful ones ===
[snip]
To me this was the most insightful point in the whole essay. Though, honestly I thought this was pretty well understood. Object Orientation is simply a way of organizing code in a way that makes sense from the perspective of the problem domain it is related to. But since programming is a task of managing complexity, correct organization is a critical piece of the puzzle.
When one thinks deeply about this, perhaps your point about organization is the big missing piece of the puzzle. Yes, you are right, people build models of systems with objects, and should admit those models are imperfect. But there is no formal support for this process in the environment, or between people, other than using basic Smalltalk tools (Browser, Debugger, maybe Refactoring tools). Well, I guess you could use one of the formal OO modelling approaches, like CRC cards, but even that is oriented to getting one model -- not to managing a variety of possible representations to be used simultaneously as appropriate. Perhaps a next generation of OO systems needs to explicitly support this process somehow. How, I do not know. I just have the question here, not the solution. I do think having objects point to a context or world is perhaps a start, and I did that in a couple frameworks I have made in either Python or Smalltalk.
But this observation is the reason OO databases haven't really taken off: An OO database will tend to model things how *your* application wants to see them. A traditional relational DBA will model things in the most generic way he can so that *all* the applications can build the view they need easily. Relational DBA's tend to be of the view point: The data will exist for the life of the company, while the applications that access it come and go like the tide. And one only needs to look at the huge Java rewrites going on to know they are right.
Good point.
Consider Dan's statement of "A computer language should support the concept of "object" and provide a uniform means for referring to the objects in its universe." That appears to me to have made a classical mistake of thinking the universe has only one parsing into one object hierarchy and that the objects exist in some sort of Platonic ideal.
Actually I think this applies more to C++ derived OO languages (e.g. Java). It is those languages that have huge hierarchies of things that are not that related due to the brain-dead typing systems. In smalltalk the only hierarchy that has to be is inheriting from Object. And you don't even have to do that.
But I think this works just fine: We are choosing to code something, so we have to model it in the point of view appropriate to how we are going to solve the problem. And this implies some organization technique. And among organization techniques, (correct) OO has had the most success in my opinion.
All true, but if you look at how people teach OO, and how people talk about it, especially n Smalltalk circles, I think the community and its culture is somehow at odds with a greater flexibility. It's hard for me to detail this precisely; it more has to do with tone. Certainly I like the Smalltalk approach; it is just not enough.
<chair part snipped> There is no neat "class" to put it in.
I wouldn't expect it to be in a class. I would expect classes to know how to stick to each other. :)
Clearly you, the reader, can think about this entity so the human brain supports this fluidity in changing our definitions of objects and not requiring a one-to-one mapping to ideal classes to think about them, but a computer language like Smalltalk would have many problems representing this. William Kent, in the book _Data & Reality_ discusses these sorts of problems at length.
I disagree with where the focus is placed here. An entity typically does have just one name and would make sense to be called one thing in the system. What you are describing sounds more like interface protocols. This might be an area that could use more research, but honestly I would want to know what is bought by formalizing this existing practice more (e.g. making protocols first class objects themselves or something).
For an example of what I mean, in case it isn't that clear, we could think about Lists. They have a collection protocol: a series of messages that conform to what other collections can do. But they could also have a "stack" protocol: a series of messages for treating the list as though it were a stack.
This could be seen as what we do in real life. Due to necessity I may find myself driving a nail into a piece of wood with a screw driver. But I would never call what is in my hand a hammer. I would simply be using it's "blunt object" interface momentarily.
That is one of the reasons I have been attracted to Prototypes, which attempt to address issues like: http://en.wikipedia.org/wiki/Self_programming_language "Experience with early OO languages like Smalltalk showed that this sort of issue came up again and again. Systems would tend to grow to a point and then become very rigid, as the basic classes deep below the programmer's code grew to be simply "wrong". Without some way to easily change the original class, serious problems could arise"
But prototypes have other problems, the biggest being difficulty being self-documenting the way classes are. As someone put it to me, "if you want to share something, you probably have to name it".
Formalizing protocols is one possible idea. Dan mentions it in his paper.
I thin the solutions to this issue lie in deeper directions. Classes or instances or prototypes could be building blocks, perhaps, but we could use other abstractions and better tools somehow. What these are, I do not know for sure. Still, like Bill Kent, I think these may lie in the direction of being able to model "relations" somehow.
--Paul Fernhout
On Tue, 26 Dec 2006 05:52:26 -0800, Paul D. Fernhout pdfernhout@kurtz-fernhout.com wrote:
In the case of both C and C++, one should not discount the wight of AT&T,
http://en.wikipedia.org/wiki/Wight
Indeed. :-)
Well, it is also true one big issue is that an Algol-like syntax with operator precedence (times over plus) is taught in K-12 school. That is a big advantage for a computer language to build on that, even as that precedence is arbitrary and Smalltalk is more consistent.
I learned operator precedence in programming, not in math. I'm sitting among college graduates--in the IT department--right now who give me a blank stare when I say "operator precedence". One guy knows it has to do with parentheses. My favorite (tongue-in-cheek) response was "That means the user comes first." (And as a professional programmer, my rule has always been: Don't count on your ability to remember operator precedence. C++ has, what, 17 levels of precedence?)
I guess my point is, I don't consider "operator precedence" to be a significant advantage. Smalltalk works the way I think; I have to actively (admittedly easily at this point) allow for operator precedence. And I don't bury my Smalltalk code in parentheses, yeah!
And you are right on how Java seemed an easy move for C++ programmers.
The prevalence of "C-like" syntax has convinced me over the years that C programmers are wusses. They apparently won't try anything that doesn't look like something they already know.
===Blake===
From: "Paul D. Fernhout" pdfernhout@kurtz-fernhout.com Reply-To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org Subject: Re: Design Principles Behind Smalltalk, Revisited Date: Tue, 26 Dec 2006 08:52:26 -0500
Paul Graham has an essay on why languages become popular: http://www.paulgraham.com/popular.html In the case of both C and C++, one should not discount the wight of AT&T, one of the largest and most widespread and visible companies of the time (as it ran a telephone monopoly).
Well I realize this had something to do with it as well. But I still think the "have something now" played the biggest role. And even with the conversion from COBOL, C++ was the most widely used language for a while. If you go smalltalk then you train your existing folks but if you go with C++ or Java you can leverage that huge base of programmers.
Well, it is also true one big issue is that an Algol-like syntax with operator precedence (times over plus) is taught in K-12 school. That is a big advantage for a computer language to build on that, even as that precedence is arbitrary and Smalltalk is more consistent.
I actually find the operator preference irrelevant. Personally I always ignored it and wrote the expression to read left-to-right how I wanted it evaluated. And I did this in school as well, when I was learning it. I was accustomed to the left-right orientation, why learn a new one that applies in just one area.
And you are right on how Java seemed an easy move for C++ programmers. Of course, now Ruby seems an easy move for Java programmers (and much of Ruby is based on Smalltalk ideas), so in a matter of time, we may see Ruby developers making the leap to a more self-documenting and flexible syntax. :-)
Lets hope!
Still, Smalltalk syntax was supposedly designed to be easy for kids to learn. It is not that hard to learn the syntax. I've helped people in business learn it. It takes at most week to become proficient in it (and often just a day). What is hard is to learn all the libraries. But, with more and more programmers learning things like Java or Python or Ruby, all systems with rich libraries, Ruby's being almost exactly Smalltalk's in many ways, making the leap to a new syntax would be a minor investment (and one worth taking because Smalltalk syntax is more extensible and self-documenting than any of those other languages').
You are right, the libraries are a challenge to learn. But this is so in any language that has any code for it at all. For example, nearly any Haskell tutorial you look at mentions the countless times the author has rewritten something that was in the standard (never mind external) libraries.
So, why not people moving to Smalltalk (Squeak especially)? People in Python or Perl or PHP or Ruby camps are not bemoaning "backward compatibility" as the reason for limited success and adoption.
But keep in mind, python is in the same boat as Java. I.e. if you know C++, it's not much of a jump to learn. And perl is basically just bash/ksh++. It is a language by and for sys admins (of which there were/are a great many). So if you had already been fighting with *sh all this time, perl wasn't such a departure.
While everything you say it true, it is not true enough IMHO to be the main reason. What are the others and how can they be addressed to produce a popular free Smalltalk?
Honestly, I think we are just in the age of "the killer app". You have to have something everyone needs and no other language provides, to draw people to you. After that, the other things start to matter.
The things I see that squeak needs to address are: database access (I hear this is being worked on by Alan Knight), advanced JIT-ish technology (I'm expecting huge things from Exupery in this area), better thread handling and the ability to take advantage of native threads (don't know the status of this), modularity (I think Ralph Johnson will bring us a long way with this one).
Java is different, as you point out. However, Java is so different, and received so much attention, and incorporated so many Smalltalk-pioneered ideas in the JVM design and class libraries (Swing) that ten years after it has been introduced, it finally mostly works right as a self-contained environment. Not quite VisualWorks, but darn close in many ways by now, and it is free as in beer and is becoming free as in freedom (GPL). :-)
I think going with the GPL was actually a very bad move. As far as I know, the only people who are going to care about that will be people on the fringe (who are, of course, irrelevant). They could have went with BSD or MIT instead.
But for both Java and Python, being able to be easily edited in vi (or emacs) or being able to use a conventional text oriented version control system were indeed big wins, as they reduced the learning curve and initial commitment to new ideas. Being able to use the familiar file manager to look at code was also of value. And going beyond vi, the fact that Java IDEs started to look like C++ IDEs was another big win on familiarity. And seeing each class in a separate file in the good old reliable file system was also comforting -- at least you knew where your source is, and could use grep or other tools to search and manipulate it and back it up in a familiar fashion.
Well these are good points, but I can tell you, after being in smalltalk for a while and then doing some python at work I had a feeling of panic. I realized I was going to have to *do something* if I wanted revision control. In squeak it just happens and I never think about it unless I need to revert.
The grep and all the things you mentioned are a matter of training, but yes, a project to narrow the gap until people get off the crutches probably wouldn't hurt.
Extensive tested and debugged libraries on a variety of topics.
Well, first I would call this "work", not innovation. :) If smalltalk had the number of bodies working on it that Java has had we would have solved cancer by now.
And this isn't only a smalltalk compatibility issue. *No one* that isn't on the JVM can use these.
Typically? It is harder in every case, no matter how badly designed the programming language.
Well, Spanish to Portuguese might be easier than COBOL to OCaml? But COBOL to OCaml is hard for different reasons than syntax. :-)
Actually there is a story somewhere of a student who learned fluent Spanish and then took a trip to Brazil. He figured it was close enough, and it was. He adjusted very quick. But he has a terrible time talking spanish now since the languages *are* so close. Human language is just incredibly complicated. You really do, as you mentioned, have to absorb some of the culture as well to get fluent.
My point here wasn't that Squeak should change; it was just an example of how being different and staying entirely in Smalltalk might not have been a big win, compared to just having a VM written in, say, C. There remains the "conceptual" barrier of the VM domain, even as the "technical" one of syntax is removed.
I know what you mean. I just think in this case it doesn't really apply. If the VM had been written in C then the people who worked on it probably wouldn't have. And that would have just resulted in no squeak at all.
Translating primitives into C or Java, like for sound manipulation, seems like a bigger win. But even then, you have to be writing that code (or rewriting that code) in such a non-Smalltalk way semantically that it is still not clear to me if there is a lot of value in it. Especially when the alternative might be to just call an existing sound synthesis library written in Java or C. We now have Java for a good cross-platform language with equivalent to C++ performance, so it would have been a harder choice ten years previously as to what cross-platform language to use if not C with all its quirks (Free Pascal?).
Well I think projects like Exupery will be important here. If it works out, then smalltalk code will be able to compete with C or Java in many cases. And Java is no better for a cross platform language then smalltalk is. They are both running on VMs.
Now if we ignore Java the language and consider the VM itself as the computer that our languages compile to then ok. But I don't think Java has the best VM. The problem is, the Java VM is made for a static language and dynamic languages have to be bolted on. Microsoft had the same problem since they just wanted to basically fork Java. But as I understand it, both are moving toward the idea of having the VM be for dynamic languages and build static languages on top of that. If that is the case then something like Strongtalk is already ahead of the game.
But here again is an issue of culture. Who cares if Sun is "behind"; or if Squeak runs 30% slower without some extra dynamic dispatch opcode in the JVM? Speed is not Squeak's main problem. Being able to leverage Sun's JVM and the fact that you can call AWT classes in the same way for any platform Java runs on is a big win for Squeak IMHO, as it would reduce the maintenance burden of it in terms of complexity of the common code base, and would also make it easy to install one common package for any platform Java runs on. Ten years ago, or even five, I myself would have laughed at the value of this idea (as Java was so buggy and unstable and slow). But most of the bugs have been fixed, the 1.5 JVM shares memory across JVMs and does dynamic translation for speed, so Java finally, now that it is going free under the GPL, has the potential to be a great cross-platform tool where you get both a common base GUI window system as well as the ability to deliver fast primitives written in Java, as well as access to a lot of libraries someone else has already written and debugged for you.
The Squeak community could admit that it would be a big win to leverage that "pink plane" success, even if it is "behind" and decide to move forward on top of it, but in other "blue plane" directions. Or it can continue to spend a lot of time dealing with time consuming basic issues relating to packaging and testing C code for lots of platforms (which essentially just duplicates the work the Java community is doing, but not as well because of more limited people power).
Well those are all good points.
dot net is a non-starter because it is proprietary (and may be covered by patents). And I would not make this suggestion without basing it on Sun's move to the GPL for Java. There are several JVM Smalltalk already of course.
Don't make the mistake of assuming the world is how we really *want* it to be. dot.net is getting more popular all the time and may end up beating Java in the end. Linux has been GPL from the start but windows is still the king of the desktop and growing in the server realm.
And I would be hesitant *because* of Sun using the GPL. The license has the reputation for being viral (even if it isn't anymore) and therefor many companies avoid it. For example I work for a very large company who will only allow GPL code to be used in isolation (i.e. as a stand alone program), never something to build on top of for fear of having to give away trade secrets.
A license that says you *must* make source code available isn't any more free then what Microsoft provides. It is just restricted in a different way.
http://www.robert-tolksdorf.de/vmlanguages.html But none have the power of Squeak. And, building on Squeak's strengths, it could be an opportune time to also shake off licensing problems, say by carefully comparing with and using GNU Smalltalk code when possible, or by using an approach like Bistro to leverage Java libraries temporarily until replacement versions in Smalltalk could be written in a true "clean room" fashion.
Well gcc can optionally output XML instead of assembly code. I wonder about using something like this to convert C projects to smalltalk directly. And this may work for all the gcc compilers (e.g. Java).
Why not have Squeak in that role too? But the deeper question is, why is it not there already, and why has, say, Talks2 not gotten more effort behind it?
It is a matter of people time. Everyone always asks "why hasn't <the project I'm interested in> gotten more effort behind it?". There is no free effort left to get behind it. And these questions wont inspire like a Braveheart speech before a battle. All I can suggest is; if you believe in it get behind it. Hopefully others will follow but don't expect it. I am personally looking at ways to pay to get work done in squeak I want to see done but don't have the time to do. Maybe with rentacoder.com or something.
And I think that issue has to do with community issues and also licensing issues than technology issues. (I myself would build on Talks2, right now except it is stuck in the same licensing ambiguity Squeak is; I'm hoping when Squeak gets that cleared up for itself, that Talks2 might follow).
I did a quick look at the talks2 page and it said you are granted all the rights to anything you write on it, to sell or not sell. What is ambiguous about that? It sounds as free as it gets to me.
Thanks, J
_________________________________________________________________
From photos to predictions, The MSN Entertainment Guide to Golden Globes has
it all. http://tv.msn.com/tv/globes2007/
On Dec 26, 2006, at 3:18 , J J wrote:
Again, to contrast with Python, Squeak wants to run the show, but Python plays nice with all the other free tools of the GNU/Linux ecosystem.
I keep on seeing this, but it appears largely overstated. Java has it's own VM, threads etc. as well.
Yes, Java. I think Python is very different from Java in this context, as Java also wants to run the show, and I think this is where it is quite similar to Smalltalk. Python on the other hand is quite happy to play along with others, just like Ruby, Perl and, of course, C.
[more java comparison]
And if you mean more to address the tools, well yes you *can* edit Java code in vi if you really want to. But no one really wants to. And if your interface to the language is through some program anyway, then the "barrier" of the code not being on the file system disappears.
Once again, I think that Java is not a valid substitute for Python in this context. In my experience, hacking Python in or Java or Ruby in vi is not just doable but quite useful. I can't say the same for Java.
Marcel
The reason I chose Java to use as a reference was because by every chart I have seen it is by far the most used language and therefor an example of "success" (well and it fits my arguments better :). And what I mean by "success" is, regardless of what you or I think of the language, the number of people using it have validated some of the ideas that were questioned in the original email.
From: Marcel Weiher marcel@metaobject.com Reply-To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org Subject: Re: Design Principles Behind Smalltalk, Revisited Date: Tue, 2 Jan 2007 22:26:31 -0800
On Dec 26, 2006, at 3:18 , J J wrote:
Again, to contrast with Python, Squeak wants to run the show, but Python plays nice with all the other free tools of the GNU/Linux ecosystem.
I keep on seeing this, but it appears largely overstated. Java has it's own VM, threads etc. as well.
Yes, Java. I think Python is very different from Java in this context, as Java also wants to run the show, and I think this is where it is quite similar to Smalltalk. Python on the other hand is quite happy to play along with others, just like Ruby, Perl and, of course, C.
[more java comparison]
And if you mean more to address the tools, well yes you *can* edit Java code in vi if you really want to. But no one really wants to. And if your interface to the language is through some program anyway, then the "barrier" of the code not being on the file system disappears.
Once again, I think that Java is not a valid substitute for Python in this context. In my experience, hacking Python in or Java or Ruby in vi is not just doable but quite useful. I can't say the same for Java.
Marcel
_________________________________________________________________ Dave vs. Carl: The Insignificant Championship Series. Who will win? http://clk.atdmt.com/MSN/go/msnnkwsp0070000001msn/direct/01/?href=http://dav...
Paul,
if you haven't seen it already, you might find the description of the Us variation of Self (which makes it a kind of Smalltalk in my view) interesting:
http://citeseer.ist.psu.edu/5049.html
You might also want to see Dan's Squeak-on-JVM (I don't have Java here, so I can't test whether this URL still works):
http://Weather-Dimensions.com/Dan/ForwardToThePast.jnlp
Note that some people were interested in getting Squeak to run on top of Strongtalk's VM and that would match your request for a non Smalltalk syntax (it is written in C++).
While I can see how Python might seem like a success to be emulated from a Squeaker's viewpoint, I am sure the notion would seem very funny to a VisualBasic programmer.
I think there is still an important role for systems which can be understood by a single person (an entirely separate issue from making it easy or not to collaborate). There are huge advatanges of being able to just pick up black boxes created by other people but there are also costs - such systems tend to grow exponentially in size for linear gains in functionality.
-- Jecel
Hello Paul,
Interesting article. I am still looking at the linked material.
Have you looked at Lua?
A very interesting language that I like a lot. It is very small has some Smalltalk like characteristics, but is a Prototype language. Actually it seems to handle various paradigms quite well. It has objects, prototypes, modules, closures, coroutines, tail-recursive, etc. and a very clean and small implementation in C. Very, very portable, embeddable, and very fast. Compare it to Ruby, Python and GST on Alioth. It integrates easily and well with libraries written in C, etc. It is the scripting language built into SciTE.
Its biggest weakness for me and many, is that it does not currently have a rich set of libraries. But I believe that is very doable. I would love to see Lua with a rich set of libraries like Python's. But it is not without libraries. One would just need to see if it has what one needs.
I much prefer it as a language to both Python and Ruby. Its syntax is clean and nice. I find it easier to think in Lua than either Python or Ruby. And as one who is not a computer professional, the fact that I can read the book over and over is a big plus. I finally understand closures. :) (at least as presented in PIL2)
I am hoping that someday soon that libraries could be written for Lua and it have an equally rich system as Python or Ruby. A rich Lua system arriving around or before Python 3, Ruby 2, etc. would provide for an interesting alternative. And I don't believe it would have to equal Python's or Ruby's to be an excellent alternative to Python or Ruby.
It isn't perfect. But I think its problems are definitely fixable.
Lua is MIT licensed. So it is compatible with anything you want to do.
Just wanted to toss that out there. Didn't quite mean to get so evangelistic on the Squeak-list. But oh well. I love Squeak. But I've learned to love Lua also. I just tolerate Python, no love. :)
Again thanks for the essay.
Jimmie Houchin Homeschooling father of 9 ;) Yes I read your writings on edu-sig.
I've looked at lua a little, but I really like Smalltalk syntax. :-) It seems there are several prototype based systems (including IO the language) but they all seem to start out thinking Smalltalk (or Self) keyword syntax is a problem, whereas for me I see it as a solution.
All the best. You might find our free software, especially the PlantStudio program, of interest for homeschooling for your kids. :-) That's the biggest thing I want to port to a dynamic language like Squeak or Python (from Delphi).
--Paul Fernhout
Jimmie Houchin wrote:
Hello Paul,
Interesting article. I am still looking at the linked material.
Have you looked at Lua?
A very interesting language that I like a lot. It is very small has some Smalltalk like characteristics, but is a Prototype language. Actually it seems to handle various paradigms quite well. It has objects, prototypes, modules, closures, coroutines, tail-recursive, etc. and a very clean and small implementation in C. Very, very portable, embeddable, and very fast. Compare it to Ruby, Python and GST on Alioth. It integrates easily and well with libraries written in C, etc. It is the scripting language built into SciTE.
Its biggest weakness for me and many, is that it does not currently have a rich set of libraries. But I believe that is very doable. I would love to see Lua with a rich set of libraries like Python's. But it is not without libraries. One would just need to see if it has what one needs.
I much prefer it as a language to both Python and Ruby. Its syntax is clean and nice. I find it easier to think in Lua than either Python or Ruby. And as one who is not a computer professional, the fact that I can read the book over and over is a big plus. I finally understand closures. :) (at least as presented in PIL2)
I am hoping that someday soon that libraries could be written for Lua and it have an equally rich system as Python or Ruby. A rich Lua system arriving around or before Python 3, Ruby 2, etc. would provide for an interesting alternative. And I don't believe it would have to equal Python's or Ruby's to be an excellent alternative to Python or Ruby.
It isn't perfect. But I think its problems are definitely fixable.
Lua is MIT licensed. So it is compatible with anything you want to do.
Just wanted to toss that out there. Didn't quite mean to get so evangelistic on the Squeak-list. But oh well. I love Squeak. But I've learned to love Lua also. I just tolerate Python, no love. :)
Again thanks for the essay.
Jimmie Houchin Homeschooling father of 9 ;) Yes I read your writings on edu-sig.
Paul D. Fernhout wrote:
I've looked at lua a little, but I really like Smalltalk syntax. :-) It seems there are several prototype based systems (including IO the language) but they all seem to start out thinking Smalltalk (or Self) keyword syntax is a problem, whereas for me I see it as a solution.
All the best. You might find our free software, especially the PlantStudio program, of interest for homeschooling for your kids. :-) That's the biggest thing I want to port to a dynamic language like Squeak or Python (from Delphi).
I too love the Smalltalk syntax. It enabled me to learn Smalltalk rapidly. The environment kept me comfortable as I learned the libraries and provided me tremendous availability to learn. But the syntax is attached to Smalltalk. Now that isn't a bad thing necessarily. But if you are looking at options which are not Smalltalk including Python and Ruby which were mentioned in your essay. And I know you have worked extensively with Python even tho' seemingly with angst. :)
In the options category I offered Lua. Lua is as much like Smalltalk and more than some. 1 based indexing. Yeah! Blocks. do ... end. No it isn't Smalltalk, but if non-Smalltalks are an option then consider Lua. :) I think it is much closer to the philosophy of Smalltalk, Self and also the syntax than either Python or Ruby. Now, I know you have a significant investment in Python. But if alternatives are an option consider improving Lua. :)
If not, then please tell us what lacketh an Apache licensed Squeak requires for you to use? What compels you to look beyond Squeak? And if you look beyond Squeak then the Smalltalk syntax isn't an option. As an option Lua is nice. But it does desire libraries.
Unfortunately, can't try anything that's not Mac OS X or Linux. I don't do Windows. So until your software is ported, can't try it.
Portability is one of the beautiful things about Squeak. Lua also is incredibly portable. But graphics depend on the specific library. But of course your accustomed to that. ;)
Squeak and/or Lua are very nice options.
Options are good.
Jimmie
Jimmie-
I'll agree options are good; on the other hand, there is only so much time for exploring them (especially when you have kids, though older ones can do some of the exploring for you, I guess. :-)
You're not the first person to recommend Lua to me, so I've installed a later version, and will look some more at it. In the past I have evaluated Lua mainly from the documentation from the point of view of being a VM to put other Squeak-like work on back when I was also looking at Parrot (and the JVM comes out higher for me as a value proposition, now that Sun has announced its move to GPL for Java code).
Just out of musing, lets us consider this issue of picking what system to build on abstractly out of all the known options.
There is a such a thing as a language or programming system being "entrenched" in a company (or even in a programmer's toolbox in some sense). (Entrenched literally meaning "dug in" or "in a trench"). http://wordnet.princeton.edu/perl/webwn?s=entrenched
New things rarely start out "entrenched" -- unless, in a few special cases they have, say, lots of sales and marketing money behind them (Britney Spears? :-) or Java marketed by Sun and IBM). Or they somehow spin off from another already entrenched group (say, Microsoft taking the torch from IBM when IBM carelessly picked an OS for its internally unimportant IBM PC, or Java gaining ground being linked with Netscape browsers, or Java syntax looking a lot like C++). So, as an unusual case, Java started out "entrenched" in several ways, and only got more so, as it was actually used over the past decade (very painfully at first). That's one reason Smalltalk did not have much of a chance against Java in 1996 -- Java was one of those rare systems that was entrenched multiple ways from the start, and Smalltalk had missed tis chance to stop Java (and the major Smalltalk player, ParcPlace etc., actually embraced Java for a time and tried to reposition itself as a Java tools company!)
Now, most systems do not go from nonexistent to entrenched overnight. Even Java internally had various steps internally from VisualWorks/ST-80 (runtime fees too high! :-) to Green to Oak Java. Yes, I say Smalltalk, because from what rumors I hear, Sun did want to use Smalltalk for the set top box first, but was rebuffed out of greed. You see there that a somewhat-entrenched Smalltalk had a chance to nip Java in the bud but a failure of management vision coupled with short-term commercial greed got in the way IMHO. And all Smalltalkers have paid the price -- both for that failure of vision but also for our own personal decisions to couple our fortunes (money wise and aesthetics wise) to a commercial vendor. That is the main reason why the personal copy I purchased of VisualWorks + ENVY Client + ENVY Server (at almost $10000 all together) sits gathering dust on a shelf and instead I use Python and explore Squeak or other free Smalltalks; I just don't want to be on that path anymore -- entrusting my fate mainly to someone else's greed; it has brought me and many other Smalltalkers years of pain of having to work in other languages and related environments like Java (or, even, to a lesser extent, Python, when you know what an programming environment could be like). Maintaining large complex business apps in Smalltalk was often fun; maintaining them in Java is mostly work. (Python is somewhere in the middle.)
The usual progression to becoming "entrenched" is probably somewhat like this: * ignored. * experimental (you downloaded it and are playing with it) * first useful task (you actually did something with it, probably where you don't need to maintain the code, like reformatting a text file). * multiple minor useful tasks. * first major task (you did something very important or profitable with it). * multiple major tasks that need to be supported. By the last phase, the tool has become "entrenched".
I think an important "design principle" should be that a system should easily move from one stage here to the next. :-) Because, all other things being equal, ultimately, entrenched systems are easier to use. :-)
Things can also become "un-entrenched" when the applications they support get replaced or diminish in importance. VisualWorks was entrenched in many large companies in the early 1990s; it is less so now, even as it remains a niche.
While people often talk about "the industry", in reality the programming industry is a lot like a big city, with lots of niches and variety. Sure there is a big Java convenience store on every corner, but that does not mean there is not a nice Smalltalk boutique on 5th Avenue doing a brisk business in dynamic objects. So, being entrenched is often relative to the person or company under consideration (if you live on "5th Avenue" in this hypothetical programming city, Smalltalk is then a lot more convenient than Java).
For Lua or Squeak to make it all the way to entrenched for me personally, it needs to make it through all those phases. Python has already done that for me, and so is "entrenched" for me. However, in the past, VisualWorks was entrenched, just like Delphi, C, and some other programming systems, which have all fallen out of daily use with me for one reason or another. Squeak itself was even almost entrenched for me about six years ago (but then lost out to Python as I found I could convince people in industry to try Python, whereas most people would not even look at Squeak, both for licensing reasons and also for syntax issues).
To progress from ignored to experimental, through useful, and then to entrenched, is not an easy thing. Tools either need to be so easy to install and use (Python, the programming language?) in terms of low hurdles at each stage that you can make it over one hurdle to the next easily, or they need to be so powerful or fun to use (Squeak, the idea processor to boost creativity?) that people are motivated to jump higher hurdles. Ideally, things are both powerful and easy to install, use, deploy, and support; unfortunately Squeak doesn't fall into this category for the masses (being powerful but quirky and unstable and incomplete and still problematically licensed).
For me, while I'll still consider learning other systems, Python (mostly Jython these days) is the entrenched tool with an Algol syntax. Lua just has no hope I can see of displacing Python for me anytime soon (barring external forces or some new need I come up with). Sure, I just downloaded the latest version and will experiment with it (so, it has made it to the first phase beyond "ignored"), but it has a long road to go. And it is not clear to me that it offers a big enough value proposition to move forward for me. Granted, if say "COBOL" or "C++" was what was entrenched for me, then Lua might have a level playing field with Python and Ruby and Smalltalk (ignoring user community size).
This is not a slur on the value of Lua. This is just to say that to compete head-to-head against an entrenched alternative (say like Python/Jython vs. Java in many businesses today) a programming language has to offer not just 10% more value (1.1X), but more like 300% more value (3X). Python (as Jython) does offer that 300% over Java for most Java developers who will use it. Lua does not offer 300% more value to me than Python (at least as far as I can see right now; time will tell). No language is perfect; but some meet current needs better than others; as needs change, so does the landscape of sensible possibilities.
Squeak's big value proposition is that it is four things: * an effective self-documenting programming language, * a set of cross-platform libraries and runtimes, * a complete IDE, including a fantastic debugger and source code control, * an idea processor for enhancing creativity (its ultimate purpose).
And that is why I find Python or for that matter, probably Lua, unsatisfying in contrast. Python has the first being a language that has been called "executable pseudo code", for sure, but the other three ares are where it starts to falter and then fall short. For example, it's difficult to alter a running Python program, and almost impossible to restart a function, which is easy to do in Squeak and a big part of Smalltalk productivity -- especially in large programs. I had no problem maintaining huge things in VisualWorlks+ENVY or participating in a development process involving lots of people (many of them relatively inexperienced), but even a mid-sized Python program developed by one or two experienced people begins to get a big of a bear to maintain and refactor and incrementally improve (not impossible, just harder than I know a similar application in Smalltalk would be).
However, issues like familiarity, stability, completeness, modularity, and licensing have trumped those other Squeak advantages (for me) which is why I usually turn to Python. Most projects I do are not that huge anyway, or can be refactored into smaller parts. So, to be a better value proposition, Squeak (or Lua :-) either has to become an even more compelling "idea processor" (e.g. even beyond OpenAugment), or gain those other advantages Python has, or really *now* needs to do both, since for me Python is already "entrenched" and Squeak needs to now be 300% better to compete against it. Ten years ago, Squeak could have been as good as Python; now it needs to be vastly superior. "Just as good" is not "good enough" when the alternative is already "entrenched". Granted, however, some specific issues about Squeak (licensing, GUI feel, internal complexity) have kept me from giving it a lot of chance to grown on me as an idea processor.
Now, I'm only bothering to write this not to point out Squeak-ish competitive disadvantages by themselves, but because I'm willing to put a little work into those directions -- since I still believe in a lot of the Smalltalk ideals (and remember its accomplishments when I used it) and still prefer keyword syntax (though I still have to trade that off against getting other stuff done now with Python). Perhaps a Squeak 1.1 under the Apache license on the JVM the same way Dan did a Squeak 2.2. http://weather-dimensions.com/Dan/ForwardToThePast.jnlp might be a start. Then I could leverage Java's ability to provide some of those other things (stability -- after ten years of Sun working on it, ease of installability -- one click web start, and so on). Essentially, an "idea processor" for the JVM? Unfortunately, Squeak 1.1. reaches so far back (and I remember the early problems with 1.13) that there is a lot of work to bring 1.1 back up to something really usable. And even then that system might be in the same position Lua is in now, great, but not "entrenched". :-)
And, as for your question on license, what matters to me in that regard is being able to put GPL'd applications on top of the platform. (I don't care that much about the licensing of the platform otherwise, as long as it is "free"). If that is possible with an Apache version of Squeak, that's fine. If it is not, then that is a big difficulty. I like the GPL as a constitution for defining cooperation on an application and have used it before with success to that end. About half the free software out there is under the GPL, another big chunk is under a GPL compatible license (X/MIT or BSD), and then a smaller part is GPL-incompatible (Apache, etc.). http://www.fsf.org/licensing/licenses/ And it's a sad situation, as the FSF themselves writes about the Apache 2.0 license: "This is a free software license but it is incompatible with the GPL. The Apache Software License is incompatible with the GPL because it has a specific requirement that is not in the GPL: it has certain patent termination cases that the GPL does not require. (We don't think those patent termination cases are inherently a bad idea, but nonetheless they are incompatible with the GNU GPL.)" Here is Apache's take on this: http://www.apache.org/licenses/GPL-compatibility.html Anyway, it's still not clear to me what the licensing issues are for an Apache-licensed Squeak in terms of derived works built on top of it, and then how that interacts with added GPL code. But, after having been burned before on this, I prefer to know exactly what the most likely licensing implications are up front before committing a major effort to something (so I don't find out afterwards I can't release it).
--Paul Fernhout (By the way, our garden simulator and other free software runs under Wine under GNU/Linux, last I tried it).
Jimmie Houchin wrote:
Paul D. Fernhout wrote:
I've looked at lua a little, but I really like Smalltalk syntax. :-) It seems there are several prototype based systems (including IO the language) but they all seem to start out thinking Smalltalk (or Self) keyword syntax is a problem, whereas for me I see it as a solution.
All the best. You might find our free software, especially the PlantStudio program, of interest for homeschooling for your kids. :-) That's the biggest thing I want to port to a dynamic language like Squeak or Python (from Delphi).
I too love the Smalltalk syntax. It enabled me to learn Smalltalk rapidly. The environment kept me comfortable as I learned the libraries and provided me tremendous availability to learn. But the syntax is attached to Smalltalk. Now that isn't a bad thing necessarily. But if you are looking at options which are not Smalltalk including Python and Ruby which were mentioned in your essay. And I know you have worked extensively with Python even tho' seemingly with angst. :)
In the options category I offered Lua. Lua is as much like Smalltalk and more than some. 1 based indexing. Yeah! Blocks. do ... end. No it isn't Smalltalk, but if non-Smalltalks are an option then consider Lua. :) I think it is much closer to the philosophy of Smalltalk, Self and also the syntax than either Python or Ruby. Now, I know you have a significant investment in Python. But if alternatives are an option consider improving Lua. :)
If not, then please tell us what lacketh an Apache licensed Squeak requires for you to use? What compels you to look beyond Squeak? And if you look beyond Squeak then the Smalltalk syntax isn't an option. As an option Lua is nice. But it does desire libraries.
Unfortunately, can't try anything that's not Mac OS X or Linux. I don't do Windows. So until your software is ported, can't try it.
Portability is one of the beautiful things about Squeak. Lua also is incredibly portable. But graphics depend on the specific library. But of course your accustomed to that. ;)
Squeak and/or Lua are very nice options.
Options are good.
Jimmie
Hello Paul,
Paul D. Fernhout wrote: [snip]
Just out of musing, lets us consider this issue of picking what system to build on abstractly out of all the known options.
There is a such a thing as a language or programming system being "entrenched" in a company (or even in a programmer's toolbox in some sense). (Entrenched literally meaning "dug in" or "in a trench"). http://wordnet.princeton.edu/perl/webwn?s=entrenched
Ok. Didn't see that as a requirement or variable in the previous discussion. As one who currently does no professional programming and gets to pick what language he wants to do a project in, it is a non-issue.
[snip]
Now, most systems do not go from nonexistent to entrenched overnight. Even Java internally had various steps internally from VisualWorks/ST-80 (runtime fees too high! :-) to Green to Oak Java. Yes, I say Smalltalk, because from what rumors I hear, Sun did want to use Smalltalk for the set top box first, but was rebuffed out of greed. You see there that a somewhat-entrenched Smalltalk had a chance to nip Java in the bud but a failure of management vision coupled with short-term commercial greed got in the way IMHO. And all Smalltalkers have paid the price -- both for that failure of vision but also for our own personal decisions to couple our fortunes (money wise and aesthetics wise) to a commercial vendor. That is the main reason why the personal copy I purchased of VisualWorks + ENVY Client + ENVY Server (at almost $10000 all together) sits gathering dust on a shelf and instead I use Python and explore Squeak or other free Smalltalks; I just don't want to be on that path anymore -- entrusting my fate mainly to someone else's greed; it has brought me and many other Smalltalkers years of pain of having to work in other languages and related environments like Java (or, even, to a lesser extent, Python, when you know what an programming environment could be like). Maintaining large complex business apps in Smalltalk was often fun; maintaining them in Java is mostly work. (Python is somewhere in the middle.)
Sad story. And I understand completely. My first programming experience was with Prograph CPX on the Mac. $1500 (not quite your investment) was very significant to me, and the company did not honor its obligations and went out of business. My next one was Optima++. It too (as a product) went into oblivion. Ugh. Two of my compelling reasons for my desire of Open Source software.
I think an important "design principle" should be that a system should easily move from one stage here to the next. :-) Because, all other things being equal, ultimately, entrenched systems are easier to use. :-)
True, very true. But for things done on a personal level, I am a big believer in "build it and they will come". That is provided there is a compelling product being offered. So started Python. Being entrenched wasn't its goal. But entrenched it is. Having something compelling and worth being entrenched is the important part unless you can provide dollars or politics in which to cause entrenching. ie: Java
[snip]
For me, while I'll still consider learning other systems, Python (mostly Jython these days) is the entrenched tool with an Algol syntax. Lua just has no hope I can see of displacing Python for me anytime soon (barring external forces or some new need I come up with).
That's fine. As the one picking the language I use or play with I place a higher value on pleasure or beauty of the language than on its being entrenched anywhere than with me. Squeak is the only thing entrenched with me. I can use Python but don't really enjoy so.
I enjoy functional programming and Lua offers that nicely. I love that Lua uses 1 based indexing, has blocks, closures, coroutines. I love that it is small so it can be learned to a reasonably high level quickly.
I've tended to use Python more functionally than OO. So Lua fits me better in that regard.
For some strange reason, when in Squeak I jump right in creating classes and methods. ...
In Python I've never created a class. I just create methods and data structures. Strange. And I have never like Pythons OO. It just doesn't feel right to do length('string') instead of 'string'.length .
For that reason I've often considered Ruby. But I've stumbled and never acquired the taste for @$@@LineNoise.
Oh well, so much for the vagaries of personal whims. :)
[snip]
This is not a slur on the value of Lua.
Never thought so.
This is just to say that to compete head-to-head against an entrenched alternative (say like Python/Jython vs. Java in many businesses today) a programming language has to offer not just 10% more value (1.1X), but more like 300% more value (3X). Python (as Jython) does offer that 300% over Java for most Java developers who will use it. Lua does not offer 300% more value to me than Python (at least as far as I can see right now; time will tell). No language is perfect; but some meet current needs better than others; as needs change, so does the landscape of sensible possibilities.
Will not argue or dispute that.
Squeak's big value proposition is that it is four things:
- an effective self-documenting programming language,
- a set of cross-platform libraries and runtimes,
- a complete IDE, including a fantastic debugger and source code control,
- an idea processor for enhancing creativity (its ultimate purpose).
Agreed.
And that is why I find Python or for that matter, probably Lua, unsatisfying in contrast. Python has the first being a language that has been called "executable pseudo code", for sure, but the other three ares are where it starts to falter and then fall short. For example, it's difficult to alter a running Python program, and almost impossible to restart a function, which is easy to do in Squeak and a big part of Smalltalk productivity -- especially in large programs. I had no problem maintaining huge things in VisualWorlks+ENVY or participating in a development process involving lots of people (many of them relatively inexperienced), but even a mid-sized Python program developed by one or two experienced people begins to get a big of a bear to maintain and refactor and incrementally improve (not impossible, just harder than I know a similar application in Smalltalk would be).
However, issues like familiarity, stability, completeness, modularity, and licensing have trumped those other Squeak advantages (for me) which is why I usually turn to Python. Most projects I do are not that huge anyway, or can be refactored into smaller parts. So, to be a better value proposition, Squeak (or Lua :-) either has to become an even more compelling "idea processor" (e.g. even beyond OpenAugment), or gain those other advantages Python has, or really *now* needs to do both, since for me Python is already "entrenched" and Squeak needs to now be 300% better to compete against it. Ten years ago, Squeak could have been as good as Python; now it needs to be vastly superior. "Just as good" is not "good enough" when the alternative is already "entrenched". Granted, however, some specific issues about Squeak (licensing, GUI feel, internal complexity) have kept me from giving it a lot of chance to grown on me as an idea processor.
Understood.
Now, I'm only bothering to write this not to point out Squeak-ish competitive disadvantages by themselves, but because I'm willing to put a little work into those directions -- since I still believe in a lot of the Smalltalk ideals (and remember its accomplishments when I used it) and still prefer keyword syntax (though I still have to trade that off against getting other stuff done now with Python). Perhaps a Squeak 1.1 under the Apache license on the JVM the same way Dan did a Squeak 2.2. http://weather-dimensions.com/Dan/ForwardToThePast.jnlp might be a start. Then I could leverage Java's ability to provide some of those other things (stability -- after ten years of Sun working on it, ease of installability -- one click web start, and so on). Essentially, an "idea processor" for the JVM? Unfortunately, Squeak 1.1. reaches so far back (and I remember the early problems with 1.13) that there is a lot of work to bring 1.1 back up to something really usable. And even then that system might be in the same position Lua is in now, great, but not "entrenched". :-)
I don't believe you need to start with Squeak 1.1. If I am not mistaken Viewpoints is attempting to relicense the whole of Squeak having gotten Apple to relicense its part. It is trying to move the whole of core Squeak into an Apache(Apple)+MIT(the rest) license situation.
The nice thing is that Squeak is compelling enough that I think most issues will be addressed. Despite that Python has owned you for awhile it doesn't seem to have owned your heart. You keep looking back to Squeak hoping and longing.
I would just say that if given the opportunity on a project if compelling (Squeak) vs. entrenched (Python) that compelling is anywhere in the running give it a chance.
Personally, due to my quirk above. I've even considered prototyping in Squeak and porting to Python on some things. Squeak is just such a comfortable place to work, even if for entrenched or other reasons you can not deploy. It is a great place to work out the ideas and thoughts until you have a fruitful conclusion. Then if necessary, deploy to entrenched. If not enjoy the compelling and its good enough. Might reduce the head banging. At least while in the thinking it through process. (not sure if I'm preaching to you or me. ;)
And, as for your question on license, what matters to me in that regard is being able to put GPL'd applications on top of the platform. (I don't care that much about the licensing of the platform otherwise, as long as it is "free"). If that is possible with an Apache version of Squeak, that's fine. If it is not, then that is a big difficulty. I like the GPL as a constitution for defining cooperation on an application and have used it before with success to that end. About half the free software out there is under the GPL, another big chunk is under a GPL compatible license (X/MIT or BSD), and then a smaller part is GPL-incompatible (Apache, etc.).
I would think GPL on top of Squeak is doable if your GPL code is separate from Squeak and the GPL only applies to your code. But I'm not expert and prefer MIT to GPL.
--Paul Fernhout (By the way, our garden simulator and other free software runs under Wine under GNU/Linux, last I tried it).
Alas, I am the only Linux user. My wife and children use Mac with OS X on PowerPC machines.
Jimmie
From: Jimmie Houchin j.squeak@cyberhaus.us Reply-To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org Subject: Re: Design Principles Behind Smalltalk, Revisited Date: Thu, 28 Dec 2006 11:36:02 -0600
I've tended to use Python more functionally than OO. So Lua fits me better in that regard.
Have you looked at Haskell? It is purely functional and amazingly expressive. Behind smalltalk, it is probably my second favorite at this point.
_________________________________________________________________ Your Hotmail address already works to sign into Windows Live Messenger! Get it now http://clk.atdmt.com/MSN/go/msnnkwme0020000001msn/direct/01/?href=http://get...
J J wrote:
I've tended to use Python more functionally than OO. So Lua fits me better in that regard.
Have you looked at Haskell? It is purely functional and amazingly expressive. Behind smalltalk, it is probably my second favorite at this point.
Yes I have, but not in a while. I do need to revisit it. It looked interesting. I asked a few questions on the mailing list. And for the project I am currently working on it didn't seem to be the most practical tool at that time.
I am doing lots of text processing. A few million objects and several gigabytes of text. Constant daily text retrieval and processing.
But I will tell you this much. In this thread you flipped my world upside down. :)
I've been spending time thinking about how I wanted to manage all my data. Now, I'm not a professional programmer and have no explicit training.
I've avoided RDBMS because I read a lot about the Object Relational mismatch in Squeak, Ruby, Python, etc. mailing lists. So how do I store my millions of objects, search and access them. I could easily store them in files and search via Swish-e. But managing millions of files in the file system is kludge. Ugh. So I've been thinking that I'm working harder on a kludge than it would be to learn SQL and use PostgreSQL.
And then you write: """But this observation is the reason OO databases haven't really taken off: An OO database will tend to model things how *your* application wants to see them. A traditional relational DBA will model things in the most generic way he can so that *all* the applications can build the view they need easily. Relational DBA's tend to be of the view point: The data will exist for the life of the company, while the applications that access it come and go like the tide. And one only needs to look at the huge Java rewrites going on to know they are right."""
This stood out for me: """Relational DBA's tend to be of the view point: The data will exist for the life of the company, while the applications that access it come and go like the tide."""
I've been chewing on that. And it just rang true to me. Wow!!!
And I thought about my entire computing experience. I have all kinds of data and documents that I've changed the application accessing them many, many times. But the data format is paramount. And as I thought about my projects. Still true.
So with that nudge from you, I sit at my desk right now reading one of my several SQL books. Thanks. :)
I know for smaller datasets options increase. But I'm feeling good about an RDB for this one. Now that I've had a little tweak to my thinking. :)
Thanks again.
Jimmie
On Dec 30, 2006, at 9:09 AM, Jimmie Houchin wrote:
I've avoided RDBMS because I read a lot about the Object Relational mismatch in Squeak, Ruby, Python, etc. mailing lists. So how do I store my millions of objects, search and access them. I could easily store them in files and search via Swish-e. But managing millions of files in the file system is kludge. Ugh. So I've been thinking that I'm working harder on a kludge than it would be to learn SQL and use PostgreSQL.
So with that nudge from you, I sit at my desk right now reading one of my several SQL books. Thanks. :)
Put down the book.
You want to load up GLORP. It rocks. It is as easy to work with as an OODB, but much more flexible and is backed by PostgreSQL. Grab a copy of squeak, load the postgres client, then load up glorp. You're golden.
Except you need a meta model. You can write one in glorp, you can build one with a GUI like Apple's EOModeler - free with WebObjects. The EOGlorp package will let GLORP work off of your EOModel files. Once you have your meta model, glorp is just like working with objects. You write queries in Smalltalk like
aDatabase readOneOf: User where: [:user | user login = 'jhouchin'].
-Todd Blanchard
From: Todd Blanchard tblanchard@mac.com Reply-To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org Subject: Re: Design Principles Behind Smalltalk, Revisited Date: Sun, 31 Dec 2006 21:26:22 -0800
On Dec 30, 2006, at 9:09 AM, Jimmie Houchin wrote:
I've avoided RDBMS because I read a lot about the Object Relational mismatch in Squeak, Ruby, Python, etc. mailing lists. So how do I store my millions of objects, search and access them. I could easily store them in files and search via Swish-e. But managing millions of files in the file system is kludge. Ugh. So I've been thinking that I'm working harder on a kludge than it would be to learn SQL and use PostgreSQL.
So with that nudge from you, I sit at my desk right now reading one of my several SQL books. Thanks. :)
Put down the book.
You want to load up GLORP. It rocks. It is as easy to work with as an OODB, but much more flexible and is backed by PostgreSQL. Grab a copy of squeak, load the postgres client, then load up glorp. You're golden.
Except you need a meta model. You can write one in glorp, you can build one with a GUI like Apple's EOModeler - free with WebObjects. The EOGlorp package will let GLORP work off of your EOModel files. Once you have your meta model, glorp is just like working with objects. You write queries in Smalltalk like
aDatabase readOneOf: User where: [:user | user login = 'jhouchin'].
-Todd Blanchard
Tools like GLORP are very nice: they save you writing SQL directly. But look at your line of code: it is SQL in message form.
I wasn't talking about using embedded SQL in code. I was talking about the back end data store. IMO the data is often best modeled relationally. Then you can set up any views you want and then use something like GLORP to access it.
_________________________________________________________________ Find sales, coupons, and free shipping, all in one place! MSN Shopping Sales & Deals http://shopping.msn.com/content/shp/?ctid=198,ptnrid=176,ptnrdata=200639
On Jan 1, 2007, at 1:23 AM, J J wrote:
From: Todd Blanchard tblanchard@mac.com
aDatabase readOneOf: User where: [:user | user login = 'jhouchin'].
-Todd Blanchard
Tools like GLORP are very nice: they save you writing SQL directly. But look at your line of code: it is SQL in message form.
You know, that is a good point. I think it would be easy to emulate the squeak collections operations though. ie
database users detect: [:ea | ea login='todd'] -> readOneOf: User where:... database users select: [] -> readManyOf:... reject:....
Basically treating each entity as a collection. That might be worth doing and should be pretty easy. Good idea.
-Todd Blanchard
Hi!
Todd Blanchard tblanchard@mac.com wrote:
On Jan 1, 2007, at 1:23 AM, J J wrote:
From: Todd Blanchard tblanchard@mac.com
aDatabase readOneOf: User where: [:user | user login = 'jhouchin'].
-Todd Blanchard
Tools like GLORP are very nice: they save you writing SQL directly. But look at your line of code: it is SQL in message form.
You know, that is a good point. I think it would be easy to emulate the squeak collections operations though. ie
database users detect: [:ea | ea login='todd'] -> readOneOf: User where:... database users select: [] -> readManyOf:... reject:....
Basically treating each entity as a collection. That might be worth doing and should be pretty easy. Good idea.
Magma uses a similar trick in its query capabilities (in order to make seemingly iterative block-code actually generate a query with optimization and index-support). See here: http://wiki.squeak.org/squeak/5859
(search for #where: down that page)
I don't have time right now posting in this thread, let me just mention that I disagree with JJ :) regarding the arguments for using an RDB instead of an ODB. There are of course arguments in both directions - depending on context - but IMHO the lifecycle-argument is not as clear cut as described.
regards, Göran
From: goran@krampe.se Reply-To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org Subject: Re: Design Principles Behind Smalltalk, Revisited Date: Tue, 2 Jan 2007 10:56:23 +0200
I don't have time right now posting in this thread, let me just mention that I disagree with JJ :) regarding the arguments for using an RDB instead of an ODB. There are of course arguments in both directions - depending on context - but IMHO the lifecycle-argument is not as clear cut as described.
Well, let me clarify my position a little. I don't feel that ODB's are useless or anything. Things you see in the Rails demo's should probably have been in an ODB (or even just objects, as Ramon's "blog in 15 minutes" showed). I simply believe in the right tool for the right job, and you can't beat an RDB in it's domain.
It depends on what you are doing. Sometimes in a powerful language like smalltalk you just keep your data in objects and let image persistence handle it. Sometimes you want a little more so you write the data out to files. Sometimes you want to go even further, and this is when an ODB can be a great solution.
But at the enterprise level (i.e. lots of different programs over a large organization) I still see RDBMS as the winner. And the reason I see it this way is simply: SQL/RDB can be seen as a DSL system for dealing with set data. There is a tremendous amount of power built into it for this particular domain that would be difficult to make more concise in another way. I suppose it is just a question of how comfortable one is with SQL.
_________________________________________________________________ Get FREE Web site and company branded e-mail from Microsoft Office Live http://clk.atdmt.com/MRT/go/mcrssaub0050001411mrt/direct/01/
J J-6 wrote:
But at the enterprise level (i.e. lots of different programs over a large organization) I still see RDBMS as the winner. And the reason I see it this way is simply: SQL/RDB can be seen as a DSL system for dealing with set data. There is a tremendous amount of power built into it for this particular domain that would be difficult to make more concise in another way. I suppose it is just a question of how comfortable one is with SQL.
I never had the opportunity to work on an Object Oriented Database (such as Gemstone) used for integration between multiple applications, as would be a RDB. I suppose it could offer very efficient, simple and powerfull solution for integration.
Anyone has some feedback?
J J wrote:
... I simply believe in the right tool for the right job,
and you can't beat an RDB in it's domain. ...
That's something I've never really understood: what is the domain in which Relational Databases excel?
- Data too large to fit in memory? Well, most uses today may have been too large to fit in memory 20 years ago, but aren't today. And even for really big data sets today, networks are much faster than disk drives, so a distributed database (e.g., a DHT) will be faster. Sanity check: Do you think Google uses an RDB for storing indexes and a cache of the WWW?
- Transactional processing with rollback, three-phase commit, etc? Again, these don't appear to actually be used by the application servers that get connected to the databases today. And if they were, would this be a property of relational databases per se? Finally, in world with great distributed computing power, is centralized transaction processing really a superior model?
- Set processing? I'm not sure what you mean by set data, JJ. I've seen set theory taught in a procedural style, a functional style, and in an object oriented style, but outside of ERP system training classes, I've never seen it taught in a relational style. I'm not even sure what that means. (Tables with other than one key, ...) That's not a proof that relational is worse, but it does suggest to me that the premise is worth questioning.
- Working with other applications that are designed to use RDB's? Maybe, but that's a tautology, no?
I'm under the impression (could be wrong) that RDBMS were created to solve a particular problem that may or may not have been true at the time, but which is no longer the situation today. And what are called RDBMS no longer actually conform to the original problem/solution space anyway.
Regards, -Howard
Funny, I just blogged about this.
http://www.blackbagops.net/?p=93
On Jan 2, 2007, at 6:18 AM, Howard Stearns wrote:
J J wrote:
... I simply believe in the right tool for the right job,
and you can't beat an RDB in it's domain. ...
That's something I've never really understood: what is the domain in which Relational Databases excel?
- Data too large to fit in memory? Well, most uses today may have
been too large to fit in memory 20 years ago, but aren't today. And even for really big data sets today, networks are much faster than disk drives, so a distributed database (e.g., a DHT) will be faster. Sanity check: Do you think Google uses an RDB for storing indexes and a cache of the WWW?
- Transactional processing with rollback, three-phase commit, etc?
Again, these don't appear to actually be used by the application servers that get connected to the databases today. And if they were, would this be a property of relational databases per se? Finally, in world with great distributed computing power, is centralized transaction processing really a superior model?
- Set processing? I'm not sure what you mean by set data, JJ. I've
seen set theory taught in a procedural style, a functional style, and in an object oriented style, but outside of ERP system training classes, I've never seen it taught in a relational style. I'm not even sure what that means. (Tables with other than one key, ...) That's not a proof that relational is worse, but it does suggest to me that the premise is worth questioning.
- Working with other applications that are designed to use RDB's?
Maybe, but that's a tautology, no?
I'm under the impression (could be wrong) that RDBMS were created to solve a particular problem that may or may not have been true at the time, but which is no longer the situation today. And what are called RDBMS no longer actually conform to the original problem/ solution space anyway.
Regards, -Howard
-----Original Message----- From: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev- bounces@lists.squeakfoundation.org] On Behalf Of Todd Blanchard Sent: Tuesday, January 02, 2007 7:17 AM To: The general-purpose Squeak developers list Subject: Re: relational for what? [was: Design Principles Behind Smalltalk,Revisited]
Funny, I just blogged about this.
Todd,
As a long-time user of Object Databases (and now a vendor's employee) I found your comments interesting. A few responses with regard to GemStone/Smalltalk: - Object accesses do not need to be done within a transaction. - Locking is at the object (not page) level (and can be optimistic or pessimistic). - Classes are available that allow things like a queue to have multiple producers and a single consumer without conflicts. - A number of classes have automatic retry built in. - GemBuilder provides client-side in-memory user-level caching. - Schema migration is very flexible. You can have multiple versions of a class, each of which may have live instances. The database does not have to be off-line to update the schema. - Most image-level bug fixes can be applied with the system in production. - Users can be assigned to groups and objects can be assigned security based on owner/group/world. - Garbage collection is quite sophisticated and is not adversely impacted by schema changes.
James
I agree that RDBMSs tend to be knee-jerk reactions that produce as many problems as they solve. My favorite alternative is not a real OODBMS, but instead a pattern that is best exemplified by Prevayler, a Java framework. The main idea is to represent your data as objects, and to ensure that every change to the data is represented by a Command. Executing a Command will cause it to write itself out on a log. You get persistence by periodically (once a day, perhaps) writing all your objects out to disk and recovering from crashes by restarting from the last checkpoint and then replaying the log of Commands. You get multiuser access by implementing the transactions inside the system, making them fast (no disk access) and just having a single lock for the whole system.
There are lots of things this doesn't give you. You don't get a query language. This is a big deal in Java, not so big a deal in Smalltalk, because Smalltalk makes a pretty good ad-hoc query language (for Smalltalk programmers). You don't get multilanguage access. The data must all fit in memory, or suddenly your assumptions of instantanious transactions break down. You have to be a decent programmer, though it really isn't very hard, and if you let your just barely decent programmers build a toy system to learn the pattern then they should do fine. Lots of people learn the pattern by working on a production system, but that is probably a bad idea for all patterns, not just this one.
I did this in Smalltalk long before Prevayler was invented. In fact, Smalltalk-80 has always used this pattern. Smalltalk programs are stored this way. Smalltalk programs are classes and methods, not the ASCII stored on disk. The ASCII stored on disk is several things, including a printable representation with things like comments that programmer need but the computer doesn't. But the changes file, in particular, is a log and when your image crashes, you often will replay the log to get back to the version of the image at the time your system crashed. The real data is in the image, the log is just to make sure your changes are persistent.
But this message stream is about what RDBMSs are good for, and I'd like to address that. First, even though SQL is rightly criticised, it is a standard query language that enables people who are not intimately familiar with the data to access it to make reports, browse the data, or write simple applications. Most groups I've seen have only programmers using SQL and so don't take advantage of this, but I've seen shops where secretaries used SQL or query-by-exmple tools to make reports for their bosses, so it can be done. I suppose an OO database or a Prevayler-like system could provide a query-by-example tool, too, but I have never seen one.
Second, even though the use of an RDBMS as the glue for a system is rightly criticised, this is common practice. It tends to produce a big ball of mud, but for many organizations, this seems to be the best they can do. See http://www.laputan.org/mud/ One advantage of using the RDBMS as the glue is that it is supported by nearly every language and programming environment. I think that the growing use of SOA will make this less important, because people will use XML and web services as the glue rather than a database.
Third, data in an RDBMS is a lot like plain text. It is more or less human readable. It stands up to abuse pretty well, tolerating null fields, non-normalized data, and use of special characters to store several values in one field. For the past few years, I have had undergraduate teams migrating databases for a city government. The students are always amazed at how bad the data is. I laugh at them. All databases contain bad data, and it is important for the system to tolerate it.
An RDBMS works best with relatively simple data models. One of its weaknesses is trees, since you have to make a separate query for each descent. It also has problems with versioned data, i.e. data with a date or date range as part of the key. But it can deal pretty well with the usual set of objects that represent the state of the business, and another set of objects that represent important events. For example, a bank has deposit accounts and loans to customers, and it records deposits, cash withdrawals, computation of interest, payments, and checks written to other organizations. Huge amounts of data are OK for a RDBMS, but complex data models tend to cause troubles.
It is wrong to think that persistence = RDBMS. Architects should also consider XML, a Prevayler-like system, binary files, OODBMS. Each has advantages and disadvantages. An architect needs to have experience with all these technologies to make a good decision. Of course, which one is best often depends on what is going to happen ten years in the future, which is impossible to predict. It is good to encapsulate this decision so that it can be changed. This is another advantage of a SOA; your clients don't care how you store the data.
In the end, technology decisions on large projects depend as much on politics as on technical reasons. RDBMSs are the standard, the safe course. "Nobody ever got fired for buying Oracle". They are usually not chosen for technical reasons. There are times when they really are the best technical choice, but they are used a lot more often that that.
-Ralph
Hi all!
Todd Blanchard tblanchard@mac.com wrote:
Funny, I just blogged about this.
And a "response" from me:
http://goran.krampe.se/blog/Bits/ODBvsRDB.rdoc
But... let me ramble a bit about the RDB life cycle stuff.
JJ IIRC talks about making a "proper" relational model and then letting multiple apps written in various languages over time operate on it - or parts of it. The idea is that the data "lives forever" and the apps come and go.
Is this idea really based on real world observations? I dunno, I have only been exposed to a few "enterprises" so my experience is of course not valid for proofs but I have a feeling that it goes more like this:
1. Someone builds an app. Or hey, the company buys one. A big business system, or whatever. It has tons of interesting data in an RDB. It is not object oriented and it has few or very bad interfaces to the outside world.
2. Another app, bought or homemade wants to use that data - or even manipulate it! Noone at the company has ever thought of the concept of encapsulation - so what the heck, let's go straight to the source - use SQL right into the RDB - these table and column names don't look so hard to grok... For readonly queries we will hopefully get it right, for manipulations we damn sure *hope* to get it right.
3. And yet another apps pops up putting its fingers in the cookie jar too and so it goes. Eventually we have tons of apps written in a bunch of languages using/abusing the RDB, adding tables of their own, breaking a few rules here and there perhaps, not using the proper SQL and so on. It might be tempting to say this is BY DESIGN and that this is GOOD, but I think that is often a reconstruction of the truth.
I also don't think that first app ever really dies without the DB going down with it. I also don't think you *first* design the DB, then build apps to use it. Nope, it is that first app that comes with the DB and the DB can't just stand on its own without it. Sure, you might *rewrite* that app using the same DB - but have you ever seen that being actually done? Some of the apps that came afterwards may go, but the original system typically is only *replaced including the DB* with something else when it gets unbearable.
Using the RDB as a sharing ground for applications is IMHO really, really bad. Sure, it *works* kinda, but very fast you end up with replicated SQL statements all over the place. Then someone says "stored procedures" and hey... why not consider OBJECTS? There is probably a reason why people are so worked up about Services these days. :)
Just my 3 cents of course.
regards, Göran
PS. On a given day and context lots of factors come into play. I just don't buy simple answers about RDBs being superior for enterprises based on these particular arguments. There are large mission critical systems built using ODBs running at "Enterprises". If it fits they rock.
On Jan 2, 2007, at 1:36 PM, goran@krampe.se wrote:
Using the RDB as a sharing ground for applications is IMHO really, really bad. Sure, it *works* kinda, but very fast you end up with replicated SQL statements all over the place. Then someone says "stored procedures" and hey... why not consider OBJECTS? There is probably a reason why people are so worked up about Services these days. :)
So if it is objects instead of tables - how is this different? Uh, and the alternative would be what? Take a typical company that makes and sells stuff.
They have customers (hopefully).
The marketing guys want the customer's demographics and contacts to generate targeted messages. The accounting people what to know their credit status, payments, and order totals. The inventory/production planning guys don't really care who is buying what, but they want to see how much of each thing is going out the door. The product development people are looking for trends to spot new kinds of demand trends. The sales guys want recent order history, contact event logs, etc.
There are many cross cutting concerns.
If you take the naive object model you probably have Customers->>Accounts->>Orders->>Items-----(CatalogItem)->InventoryStatus
Works for most traversals, you put the customers in a dictionary at the root by some identifier. But for the people who process orders or do shipping, this model is a drag. They just want orders, and items and navigating to all the orders by searching starting at customers is nuts. So maybe you add a second root for orders for them. Then there's the inventory stuff....
Everybody wants a different view with different entry points. I'm talking enterprise architecture here - bigger than systems which is bigger than applications.
Relational databases don't care about your relationships or roots - anything can be a root. Anything can be correlated. Any number of object models can map to a well normalized.
RDBMS systems have a couple nice properties - you can produce lots of different views tailored to a viewpoint/area of responsibility. They guarantee data consistency in integrity. Something I find lacking from OO solutions.
Here's a fun game. Build an OO db. Get a lot of data. Overtime, deprecate one class and start using another in its place. Give it to another developer who doesn't know the entire history. One day he deletes a class because it appears not to be referenced in the image anymore. 6 months later try to traverse the entire db to build a report and find errors 'no such class for record'. What will you do?
This has happened BTW to me. If I have long lived data, with different classes of users and different areas of responsibilities, I want the RDBMS because it is maximally flexible while providing the highest guarantees of data integrity and consistency. The problems I've heard described are the result of poor design and unwillingness to refactor as the business changes and grows.
FWIW I have worked at everything from 5 person Mac software companies (anyone remember Tempo II?) to telcos, aerospace, government agencies, and the world's largest internet retailer (a scenario where the relational database turns out to be not the best fit overall). My solution selection hierarchy as the amount of data grows runs:
1) In the image 2) Image segments or PLists in directories 3) RDBMS/ORM like glorp 4) RDBMS with optimized SQL 5) SOA
I'm pretty sour on OODBMS's based on my long running experiences with them.
-Todd Blanchard
Todd Blanchard tblanchard@mac.com wrote:
On Jan 2, 2007, at 1:36 PM, goran@krampe.se wrote:
Using the RDB as a sharing ground for applications is IMHO really, really bad. Sure, it *works* kinda, but very fast you end up with replicated SQL statements all over the place. Then someone says "stored procedures" and hey... why not consider OBJECTS? There is probably a reason why people are so worked up about Services these days. :)
So if it is objects instead of tables - how is this different?
Objects offer encapsulation and sharable behavior. Tables offer just shared data.
Uh, and the alternative would be what? Take a typical company that makes and sells stuff.
[SNIP of quick description]
There are many cross cutting concerns.
[SNIP]
RDBMS systems have a couple nice properties - you can produce lots of different views tailored to a viewpoint/area of responsibility. They guarantee data consistency in integrity. Something I find lacking from OO solutions.
I am not saying that I have a Grand Solution. I agree that an ODB is focused on having a Real object model at the core instead of a bunch of loosely interrelated tables that can be viewed in 1000 different ways. I still believe the object model offers real value in the form of proper behavior, better more natural model, reuse of business rules and so on.
But I agree that ODBs do not offer that "twist and turn"-ability, I just often see that as an advantage instead of a disadvantage. :)
One project I was involved in was interesting - we built a proper, good object model in GemStone for quite a complicated domain. Then we added an "export tool" on top of it which could produce tabular data from it - you picked what aspects you wanted, calculated or not - and got it out as a batch job. Then you could analyze that to your heart's content in an OLAP tool on the side.
It would be interesting to know if there are any ODBs or ORDBs that offer that ability - but online instead of offline. Most of the use cases you mentioned was about getting information (and not writing).
regards, Göran
FWIW,
The usual answer to this is to encapsulate behavior behind an api written as stored procedures and forbid direct table access.
On Jan 3, 2007, at 12:26 AM, goran@krampe.se wrote:
So if it is objects instead of tables - how is this different?
Objects offer encapsulation and sharable behavior. Tables offer just shared data.
HiI!
Todd Blanchard tblanchard@mac.com wrote:
FWIW,
The usual answer to this is to encapsulate behavior behind an api written as stored procedures and forbid direct table access.
Yes, I kinda wrote that.... :) I wrote:
Then someone says "stored procedures" and hey... why not consider OBJECTS? There is probably a reason why people are so worked up about Services these days. :)
regards, Göran
A big +1 to most of this message (just not the anti-OODB stuff. They have not had any poison for me yet)
From: Todd Blanchard tblanchard@mac.com Reply-To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org Subject: Re: relational for what? [was: Design Principles Behind Smalltalk,Revisited] Date: Tue, 2 Jan 2007 17:28:25 -0800
On Jan 2, 2007, at 1:36 PM, goran@krampe.se wrote:
Using the RDB as a sharing ground for applications is IMHO really, really bad. Sure, it *works* kinda, but very fast you end up with replicated SQL statements all over the place. Then someone says "stored procedures" and hey... why not consider OBJECTS? There is probably a reason why people are so worked up about Services these days. :)
So if it is objects instead of tables - how is this different? Uh, and the alternative would be what? Take a typical company that makes and sells stuff.
They have customers (hopefully).
The marketing guys want the customer's demographics and contacts to generate targeted messages. The accounting people what to know their credit status, payments, and order totals. The inventory/production planning guys don't really care who is buying what, but they want to see how much of each thing is going out the door. The product development people are looking for trends to spot new kinds of demand trends. The sales guys want recent order history, contact event logs, etc.
There are many cross cutting concerns.
If you take the naive object model you probably have Customers->>Accounts->>Orders->>Items-----(CatalogItem)->InventoryStatus
Works for most traversals, you put the customers in a dictionary at the root by some identifier. But for the people who process orders or do shipping, this model is a drag. They just want orders, and items and navigating to all the orders by searching starting at customers is nuts. So maybe you add a second root for orders for them. Then there's the inventory stuff....
Everybody wants a different view with different entry points. I'm talking enterprise architecture here - bigger than systems which is bigger than applications.
Relational databases don't care about your relationships or roots - anything can be a root. Anything can be correlated. Any number of object models can map to a well normalized.
RDBMS systems have a couple nice properties - you can produce lots of different views tailored to a viewpoint/area of responsibility. They guarantee data consistency in integrity. Something I find lacking from OO solutions.
Here's a fun game. Build an OO db. Get a lot of data. Overtime, deprecate one class and start using another in its place. Give it to another developer who doesn't know the entire history. One day he deletes a class because it appears not to be referenced in the image anymore. 6 months later try to traverse the entire db to build a report and find errors 'no such class for record'. What will you do?
This has happened BTW to me. If I have long lived data, with different classes of users and different areas of responsibilities, I want the RDBMS because it is maximally flexible while providing the highest guarantees of data integrity and consistency. The problems I've heard described are the result of poor design and unwillingness to refactor as the business changes and grows.
FWIW I have worked at everything from 5 person Mac software companies (anyone remember Tempo II?) to telcos, aerospace, government agencies, and the world's largest internet retailer (a scenario where the relational database turns out to be not the best fit overall). My solution selection hierarchy as the amount of data grows runs:
- In the image
- Image segments or PLists in directories
- RDBMS/ORM like glorp
- RDBMS with optimized SQL
- SOA
I'm pretty sour on OODBMS's based on my long running experiences with them.
-Todd Blanchard
_________________________________________________________________ Get live scores and news about your team: Add the Live.com Football Page www.live.com/?addtemplate=football&icid=T001MSN30A0701
Based on the thoughtful responses I've gottten, I'll be taking a new look at some oodb technologies - but I'm pretty sure I've made the right move for my current project. Honestly, working with glorp is no more or less complex than working with an oodb - they feel about the same to me - especially since I derive the schema from the meta model anyhow.
On Jan 3, 2007, at 12:47 PM, J J wrote:
A big +1 to most of this message (just not the anti-OODB stuff. They have not had any poison for me yet)
From: Todd Blanchard tblanchard@mac.com Reply-To: The general-purpose Squeak developers list<squeak- dev@lists.squeakfoundation.org> To: The general-purpose Squeak developers list<squeak- dev@lists.squeakfoundation.org> Subject: Re: relational for what? [was: Design Principles Behind Smalltalk,Revisited] Date: Tue, 2 Jan 2007 17:28:25 -0800
On Jan 2, 2007, at 1:36 PM, goran@krampe.se wrote:
Using the RDB as a sharing ground for applications is IMHO really, really bad. Sure, it *works* kinda, but very fast you end up with replicated SQL statements all over the place. Then someone says "stored procedures" and hey... why not consider OBJECTS? There is probably a reason why people are so worked up about Services these days. :)
So if it is objects instead of tables - how is this different? Uh, and the alternative would be what? Take a typical company that makes and sells stuff.
They have customers (hopefully).
The marketing guys want the customer's demographics and contacts to generate targeted messages. The accounting people what to know their credit status, payments, and order totals. The inventory/production planning guys don't really care who is buying what, but they want to see how much of each thing is going out the door. The product development people are looking for trends to spot new kinds of demand trends. The sales guys want recent order history, contact event logs, etc.
There are many cross cutting concerns.
If you take the naive object model you probably have Customers->>Accounts->>Orders->>Items-----(CatalogItem)-
InventoryStatus
Works for most traversals, you put the customers in a dictionary at the root by some identifier. But for the people who process orders or do shipping, this model is a drag. They just want orders, and items and navigating to all the orders by searching starting at customers is nuts. So maybe you add a second root for orders for them. Then there's the inventory stuff....
Everybody wants a different view with different entry points. I'm talking enterprise architecture here - bigger than systems which is bigger than applications.
Relational databases don't care about your relationships or roots
- anything can be a root. Anything can be correlated. Any
number of object models can map to a well normalized.
RDBMS systems have a couple nice properties - you can produce lots of different views tailored to a viewpoint/area of responsibility. They guarantee data consistency in integrity. Something I find lacking from OO solutions.
Here's a fun game. Build an OO db. Get a lot of data. Overtime, deprecate one class and start using another in its place. Give it to another developer who doesn't know the entire history. One day he deletes a class because it appears not to be referenced in the image anymore. 6 months later try to traverse the entire db to build a report and find errors 'no such class for record'. What will you do?
This has happened BTW to me. If I have long lived data, with different classes of users and different areas of responsibilities, I want the RDBMS because it is maximally flexible while providing the highest guarantees of data integrity and consistency. The problems I've heard described are the result of poor design and unwillingness to refactor as the business changes and grows.
FWIW I have worked at everything from 5 person Mac software companies (anyone remember Tempo II?) to telcos, aerospace, government agencies, and the world's largest internet retailer (a scenario where the relational database turns out to be not the best fit overall). My solution selection hierarchy as the amount of data grows runs:
- In the image
- Image segments or PLists in directories
- RDBMS/ORM like glorp
- RDBMS with optimized SQL
- SOA
I'm pretty sour on OODBMS's based on my long running experiences with them.
-Todd Blanchard
Get live scores and news about your team: Add the Live.com Football Page www.live.com/?addtemplate=football&icid=T001MSN30A0701
From: Howard Stearns hstearns@wisc.edu Reply-To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org Subject: relational for what? [was: Design Principles Behind Smalltalk, Revisited] Date: Tue, 02 Jan 2007 08:18:24 -0600
J J wrote:
... I simply believe in the right tool for the right job,
and you can't beat an RDB in it's domain. ...
That's something I've never really understood: what is the domain in which Relational Databases excel?
Handling large amounts of enterprise data. If you have never worked in a large company, you probably wont appreciate this. But in a large company you have a *lot* of data, and different applications want to see different parts of it. In an RDBMS this is no problem, you normalize the data and take one of a few strategies to supply it to the different consumers (e.g. views, stored procedures, etc.).
- Data too large to fit in memory? Well, most uses today may have been too
large to fit in memory 20 years ago, but aren't today. And even for really big data sets today, networks are much faster than disk drives, so a distributed database (e.g., a DHT) will be faster. Sanity check: Do you think Google uses an RDB for storing indexes and a cache of the WWW?
Are you serious with this (data too large to fit into memory)? And if you use a good RDBMS then you don't have to worry about disk speed or distribution. The DBA's can watch how the database is being used and tune this (i.e. partition the data and move it to another CPU, etc., etc.).
Oh, but you found one example where someone with a lot of data didn't use a RDB. I guess we can throw the whole technology sector in the trash. Sanity check: google is trying to keep a current snapshot of all websites and run it on commodity hardware. You could do exactly the same thing with a lot less CPU's using a highly tuned, distributed RDBMS. They chose to hand tune code instead of an RDBMS.
- Transactional processing with rollback, three-phase commit, etc? Again,
these don't appear to actually be used by the application servers that get connected to the databases today. And if they were, would this be a property of relational databases per se?
What data point are you using? Sure little blogs and things like that probably don't use it, and that probably is the majority of database users. But how much wealth (i.e. money and jobs) are being generated by those compared to larger companies.
All the applications I write at work absolutely require such functionality and I have no intention of writing it myself.
Finally, in world with great distributed computing power, is centralized transaction processing really a superior model?
Some people seem to think so: http://lambda-the-ultimate.org/node/463
And there is more then that. I believe in that paper (dont have time to verify) they mention that hardware manufacturers are also starting to take this approach as well because fine grain locking is so bad.
- Set processing? I'm not sure what you mean by set data, JJ. I've seen set
theory taught in a procedural style, a functional style, and in an object oriented style, but outside of ERP system training classes, I've never seen it taught in a relational style. I'm not even sure what that means. (Tables with other than one key, ...) That's not a proof that relational is worse, but it does suggest to me that the premise is worth questioning.
I thought this was the common way of expression the data operations one does in an RDBMS. To give an example of the power; not too long ago I had to write a report about the state of various systems on the network in relation to the applications that run on them. My first approach was simply read the data into objects and extract the data via coding. But after the requirements for the reports changed a couple of times I got sick of hand writing joins, unions, etc. etc. and just downloaded a database. It took about 5 minutes to set up the scheme and import all the data. After that I could quickly generate any report the requesters could dream up. Since SQL is effectively a DSL over relational data, my code changed from many statements to 1 per report.
- Working with other applications that are designed to use RDB's? Maybe,
but that's a tautology, no?
Again, one has to work in a large company to appreciate the nature of enterprise application development.
I'm under the impression (could be wrong) that RDBMS were created to solve a particular problem that may or may not have been true at the time, but which is no longer the situation today. And what are called RDBMS no longer actually conform to the original problem/solution space anyway.
I don't know what the first RDBMS was created for, but what they are today and have been for the span of my career is certainly not a solution to a problem no one has.
The fact is, there are two basic kinds of databases: Relational and Hierarchical (LDAP, OODB). Each is good at dealing with certain kinds of data and bad at others.
_________________________________________________________________ Fixing up the home? Live Search can help http://imagine-windowslive.com/search/kits/default.aspx?kit=improve&loca...
From: J J
From: Howard Stearns hstearns@wisc.edu That's something I've never really understood: what is the domain in which Relational Databases excel?
Handling large amounts of enterprise data.
Handling and dynamically querying large amounts of data where the data format is not necessarily completely stable and ad-hoc query performance is important. "Large" here is "much larger than main memory of the machine(s) concerned". I routinely handle data sets of tens of gigs on current commodity hardware - storing the data in RAM would be somewhat faster, but too expensive for the available capital.
The strength of relational over other forms is in being able to form arbitrary joins *relatively* efficiently, and hence in being able to query across data many times larger than main memory without excessive disk traffic.
Google isn't a good counter-example, as the ad-hoc querying is missing. The types of queries done on the Google database are very limited and are well known in advance.
- Peter
Peter Crowther wrote:
From: J J
From: Howard Stearns hstearns@wisc.edu That's something I've never really understood: what is the domain in which Relational Databases excel?
Handling large amounts of enterprise data.
Handling and dynamically querying large amounts of data where the data format is not necessarily completely stable and ad-hoc query performance is important. "Large" here is "much larger than main memory of the machine(s) concerned". I routinely handle data sets of tens of gigs on current commodity hardware - storing the data in RAM would be somewhat faster, but too expensive for the available capital.
The strength of relational over other forms is in being able to form arbitrary joins *relatively* efficiently, and hence in being able to query across data many times larger than main memory without excessive disk traffic.
Google isn't a good counter-example, as the ad-hoc querying is missing. The types of queries done on the Google database are very limited and are well known in advance.
My apologies for an ignorant and naive reply. So forgive if I am way off base.
But it seems to me that being able to perform arbitrary joins relatively efficiently is a requisite of an RDBMS because an RDBMS requires you to arbitrarily partition your data in such a way as to require such joins.
Any time I've spent reading a book on SQL and speaking of "normalizing" my data, I've never liked what I read.
1st Normal Form contains: Atomicity: Each attribute must contain a single value, not a set of values.
Since a list is a natural and common way of grouping things. It is by nature (IMO) an unnatural thing to decompose the list so that I have the express ability to recompose the list.
Things like that I don't believe are common to other methods of persistence. I may be wrong.
So I don't believe that comparing a requisite of an RDBMS efficient joins to other persistent methods OODBS, filesystem, etc. which don't require such joins to be a valid comparison or at least one in which the RDBMS wins.
Of course this is a simple argument and could be debated and go off into the reasons of RDB theory. But there's not enough room for that.
I also don't understand what queries could be performed with an RDBMS that can't with Google. Or that couldn't if Google partitioned its data for such queries. After all the data set would have to be partitioned correctly for an RDB to perform said queries also.
Personally I've almost always been pleased with the performance of my Google queries. I've also been to many, many, many sites backed by an RDB in which the queries were horribly slow.
So I personally would be reticent to say Google made a wrong decision. (and yes, I know you didn't say so either. :)
Jimmie
Some people seem to think so: http://lambda-the-ultimate.org/node/463
Damn it, that's the wrong paper. I don't know what this one is about, but it might be similar. The paper I meant was on LTU somewhere within the last couple of weeks. I don't know how some of you people are so fast with links. My mind records the summary, not the title, so I can never find the same link in any reasonable amount of time. :(
_________________________________________________________________
From photos to predictions, The MSN Entertainment Guide to Golden Globes has
it all. http://tv.msn.com/tv/globes2007/
Yes, I'm quite serious. I'm asking what kinds of problems RDBMS are uniquely best at solving (or at least no worse). I'm not asking whether they CAN be used for this problem or that. I'm asking this from an engineering/mathematics perspective, not a business ("we've always done things this way" or "we like this vendor") perspective.
I'm new to the Enterprise Software world, having been mostly in either industrial or "hard problem" software. But the 3-tier application architecture we use for financial processing at our 26 state campuses (University of Wisconsin) appears to me to be typical: large numbers of individual browser (not communicating with each other) interact through a Web server farm to the Application Servers. The overall application is too large as implemented to allow the load to be accommodated, so it is divided by functional area into a farm of individual applications that do not talk directly to each other. This partitioning isn't very successful, because the users tend to do the same functional activities at the same times of day, so most of the applications sit idle while a few are at their limit. I assumed that a single database was used so that the RDBMS could ensure data consistency between all these different applications. But it turns out that the Oracle database can't handle that, so instead, each functional area gets its own database. Most of the work done by the system (and most of the work of programmers like me) is to COPY data from one table to another at night when the system is otherwise quiet. [There is this Byzantine dance in which data is copied from one ledger to the next, with various checks against yet another set of ledgers. The whole thing is kept in sync by offsetting entries ("double entry") that are reconciled once a month or once a year when the system is shut down. Amazing.] The whole thing is kludged so that nothing ends up handling more than a few gigs of records at a time. [Naively, it seems like the obvious solution for this (mathematically) is a hashing operation to keep the data evenly distributed over in-memory systems on a LAN, plus an in-memory cache of recently used chunks. But let's assume I'm missing something. The task here is to figure out what I'm not seeing.]
Maybe this isn't typical, but it is the architecture that Oracle and its PeopleSoft division pushes on us in their extensive training classes. And it appears to be the architecture discussed in the higher education IT conferences and Web sites in the U.S.
My experience with non-Enterprise Web/application software is also limited, but installations I've encountered since -- when did Phil and Alex's Excellent Web site come out? -- appear to also use partitioning to keep the working sets down to a few gigs.
My friends at Ab Initio won't tell me what they do or how they do it, but no one's claiming they use a RDBMS as Codd described it.
Anyway, either the data AS USED fits into memory or doesn't. If it does, then what benefit is the relational math providing? If it doesn't, then we have to ask whether the math techniques that were developed to provide efficient random access over disks 20 years ago are still valid. Is this still the fastest way? (Answer is no.) Is there some circumstance in which it is the fastest? Or the safest? Or allow us to do something that we could not do otherwise?
Having tools to allow a cult of specialists to break your own computing model (the relational calculus) is not feature, but a signal that something is wrong.
I tried briefly to combine JJ's answer with Peter's to find an appropriate niche. (Again, I'm trying to look at the math, not fit and finish, availability of experienced programmers, color of brochure...) For exampe, there could be a class of problems for which the data set is a few 10's of gigs and needs to be operated on as a whole. And that queries are fairly arbitrary and exploratory, not production-oriented. Etc. But I haven't been able to come up with one that doesn't have better characteristics as a distributed system. Maybe if we define the problem as "and you only have one commodity box to do it on." That's fair. Maybe that's it? (Then we need to find an "enterprise" with only one box...)
J J wrote:
From: Howard Stearns hstearns@wisc.edu Reply-To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org Subject: relational for what? [was: Design Principles Behind Smalltalk, Revisited] Date: Tue, 02 Jan 2007 08:18:24 -0600
J J wrote:
... I simply believe in the right tool for the right job,
and you can't beat an RDB in it's domain. ...
That's something I've never really understood: what is the domain in which Relational Databases excel?
Handling large amounts of enterprise data. If you have never worked in a large company, you probably wont appreciate this. But in a large company you have a *lot* of data, and different applications want to see different parts of it. In an RDBMS this is no problem, you normalize the data and take one of a few strategies to supply it to the different consumers (e.g. views, stored procedures, etc.).
- Data too large to fit in memory? Well, most uses today may have been
too large to fit in memory 20 years ago, but aren't today. And even for really big data sets today, networks are much faster than disk drives, so a distributed database (e.g., a DHT) will be faster. Sanity check: Do you think Google uses an RDB for storing indexes and a cache of the WWW?
Are you serious with this (data too large to fit into memory)? And if you use a good RDBMS then you don't have to worry about disk speed or distribution. The DBA's can watch how the database is being used and tune this (i.e. partition the data and move it to another CPU, etc., etc.).
Oh, but you found one example where someone with a lot of data didn't use a RDB. I guess we can throw the whole technology sector in the trash. Sanity check: google is trying to keep a current snapshot of all websites and run it on commodity hardware. You could do exactly the same thing with a lot less CPU's using a highly tuned, distributed RDBMS. They chose to hand tune code instead of an RDBMS.
- Transactional processing with rollback, three-phase commit, etc?
Again, these don't appear to actually be used by the application servers that get connected to the databases today. And if they were, would this be a property of relational databases per se?
What data point are you using? Sure little blogs and things like that probably don't use it, and that probably is the majority of database users. But how much wealth (i.e. money and jobs) are being generated by those compared to larger companies.
All the applications I write at work absolutely require such functionality and I have no intention of writing it myself.
Finally, in world with great distributed computing power, is centralized transaction processing really a superior model?
Some people seem to think so: http://lambda-the-ultimate.org/node/463
And there is more then that. I believe in that paper (dont have time to verify) they mention that hardware manufacturers are also starting to take this approach as well because fine grain locking is so bad.
- Set processing? I'm not sure what you mean by set data, JJ. I've
seen set theory taught in a procedural style, a functional style, and in an object oriented style, but outside of ERP system training classes, I've never seen it taught in a relational style. I'm not even sure what that means. (Tables with other than one key, ...) That's not a proof that relational is worse, but it does suggest to me that the premise is worth questioning.
I thought this was the common way of expression the data operations one does in an RDBMS. To give an example of the power; not too long ago I had to write a report about the state of various systems on the network in relation to the applications that run on them. My first approach was simply read the data into objects and extract the data via coding. But after the requirements for the reports changed a couple of times I got sick of hand writing joins, unions, etc. etc. and just downloaded a database. It took about 5 minutes to set up the scheme and import all the data. After that I could quickly generate any report the requesters could dream up. Since SQL is effectively a DSL over relational data, my code changed from many statements to 1 per report.
- Working with other applications that are designed to use RDB's?
Maybe, but that's a tautology, no?
Again, one has to work in a large company to appreciate the nature of enterprise application development.
I'm under the impression (could be wrong) that RDBMS were created to solve a particular problem that may or may not have been true at the time, but which is no longer the situation today. And what are called RDBMS no longer actually conform to the original problem/solution space anyway.
I don't know what the first RDBMS was created for, but what they are today and have been for the span of my career is certainly not a solution to a problem no one has.
The fact is, there are two basic kinds of databases: Relational and Hierarchical (LDAP, OODB). Each is good at dealing with certain kinds of data and bad at others.
Fixing up the home? Live Search can help http://imagine-windowslive.com/search/kits/default.aspx?kit=improve&loca...
Howard Stearns wrote:
Yes, I'm quite serious. I'm asking what kinds of problems RDBMS are uniquely best at solving (or at least no worse). I'm not asking whether they CAN be used for this problem or that. I'm asking this from an engineering/mathematics perspective, not a business ("we've always done things this way" or "we like this vendor") perspective.
The main benefit: They work. There is no question how to use them, apply them to problems, map them into different domains etc. This has all been worked out, there is nothing new to find out, just a book or two to read. From an engineering perspective that is vastly advantageous since it represents a solution with a proven track-record and no surprises.
Cheers, - Andreas
Of course. No question. Except, of course, where they don't. The 3-tier enterprise software scenario is -- to me -- an example of it NOT working.
I used to write expert system software. A fellow once asked, "But couldn't I do that with Fortran?" The answer was, "Yes, and you could do it with pencil and paper, too, but you wouldn't want to."
There's a whole bunch of problems for which pencil and paper are good enough, but maybe not ideal. Same for RDBMS. And there's all sorts of practical considerations in this range. Worse is Better, End to End, and whatever you like. No one is (I hope!) going to walk away from a solution in-hand that is good enough.
There are also problems for which pencil and paper really aren't suited for. Same for RDBMS. They can be made to work with the great expenditure of resources, chewing gum, bailing wire, duct tape, vise grips, etc. And half of all enterprise IT projects fail. And yet even with this knowledge, there's still a 50% chance that you can make an RDBMS work on the wrong kind of problem if you throw enough money at it.
What I'm trying to do -- and of course, this isn't a Squeak question at all, but I hope it is a Squeak community question -- is try to learn what domain a perfectly running RDBMS is a good fit for by design, compared with a perfectly running alternative (even a hypothetical one).
Andreas Raab wrote:
Howard Stearns wrote:
Yes, I'm quite serious. I'm asking what kinds of problems RDBMS are uniquely best at solving (or at least no worse). I'm not asking whether they CAN be used for this problem or that. I'm asking this from an engineering/mathematics perspective, not a business ("we've always done things this way" or "we like this vendor") perspective.
The main benefit: They work. There is no question how to use them, apply them to problems, map them into different domains etc. This has all been worked out, there is nothing new to find out, just a book or two to read. From an engineering perspective that is vastly advantageous since it represents a solution with a proven track-record and no surprises.
Cheers,
- Andreas
On 1/2/07, Howard Stearns hstearns@wisc.edu wrote:
There are also problems for which pencil and paper really aren't suited for. Same for RDBMS. They can be made to work with the great expenditure of resources, chewing gum, bailing wire, duct tape, vise grips, etc....
What I'm trying to do -- and of course, this isn't a Squeak question at all, but I hope it is a Squeak community question -- is try to learn what domain a perfectly running RDBMS is a good fit for by design, compared with a perfectly running alternative (even a hypothetical one).
I am not clear what you mean by "good fit by design"
When you asked in an earlier message "whether the math techniques that were developed to provide efficient random access over disks 20 years ago are still valid" were you referring to the math techniques relational model?
If so, my hunch is that you are framing the question upon an incorrect perception of the purpose of the relational calculus. My understanding is that the calculus, or specifically SQL, is a _problem statement language_ , a way for engineers to specify what needs to be done, leaving the computer to figure out how to do it.
I wasn't doing this 20 years ago, but my reading of history is that engineers knew perfectly well how to make efficient use of disks, and when their employer bought the leading RDBMS they got a slow layer of murky proprietary code, with a shiny standardised data model and API.
In other words, RDBs make data access slower, _but_ make engineering easier for some problem domains.
David
Howard Stearns writes:
What I'm trying to do -- and of course, this isn't a Squeak question at all, but I hope it is a Squeak community question -- is try to learn what domain a perfectly running RDBMS is a good fit for by design, compared with a perfectly running alternative (even a hypothetical one).
I'd say if you're placing the database schema at the center of your large system or you're using the query facilities. Relational algebra is often just powerful enough to model commercially interesting systems. It's lack of expressive power makes it a very powerful system to manipulate either during design or by a query optimizer.
The great strength of RDBMSes is they are a mathematically decidable and complete system. If you can translate a problem into relational algebra you can always find a solution however such a system is not powerful enough to model arithmetic on natural numbers.
Bryce
From: Howard Stearns hstearns@wisc.edu Reply-To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org Subject: Re: relational for what? [was: Design Principles Behind Smalltalk, Revisited] Date: Tue, 02 Jan 2007 15:16:22 -0600
Of course. No question. Except, of course, where they don't. The 3-tier enterprise software scenario is -- to me -- an example of it NOT working.
I think this is due to your bad experiences with bad implementations.
I used to write expert system software. A fellow once asked, "But couldn't I do that with Fortran?" The answer was, "Yes, and you could do it with pencil and paper, too, but you wouldn't want to."
There's a whole bunch of problems for which pencil and paper are good enough, but maybe not ideal. Same for RDBMS. And there's all sorts of practical considerations in this range. Worse is Better, End to End, and whatever you like. No one is (I hope!) going to walk away from a solution in-hand that is good enough.
There are also problems for which pencil and paper really aren't suited for. Same for RDBMS. They can be made to work with the great expenditure of resources, chewing gum, bailing wire, duct tape, vise grips, etc. And half of all enterprise IT projects fail. And yet even with this knowledge, there's still a 50% chance that you can make an RDBMS work on the wrong kind of problem if you throw enough money at it.
Completely agree.
What I'm trying to do -- and of course, this isn't a Squeak question at all, but I hope it is a Squeak community question -- is try to learn what domain a perfectly running RDBMS is a good fit for by design, compared with a perfectly running alternative (even a hypothetical one).
Programmer time. How long will it take to make the RDBMS run perfectly (for some definition of perfectly) vs. writing this alternative.
It is the same argument of using an existing DSL vs. just writing it by hand in your favorite language.
_________________________________________________________________
From photos to predictions, The MSN Entertainment Guide to Golden Globes has
it all. http://tv.msn.com/tv/globes2007/
On Jan 3, 2007, at 13:39 , J J wrote:
From: Howard Stearns hstearns@wisc.edu
Of course. No question. Except, of course, where they don't. The 3- tier enterprise software scenario is -- to me -- an example of it NOT working.
I think this is due to your bad experiences with bad implementations.
Obviously. And yours seems to be good experiences with good implementations. What does that show us? Apart from that both good and bad examples exist?
What I'm trying to do -- and of course, this isn't a Squeak question at all, but I hope it is a Squeak community question -- is try to learn what domain a perfectly running RDBMS is a good fit for by design, compared with a perfectly running alternative (even a hypothetical one).
Programmer time. How long will it take to make the RDBMS run perfectly (for some definition of perfectly) vs. writing this alternative.
It is the same argument of using an existing DSL vs. just writing it by hand in your favorite language.
Precisely. If the problem domain is a good fit for the RDBMS/DSL, data that naturally wants to be in 'tables', then it *may* be a win, even after factoring in the inevitable overhead of overcoming packaging mismatch. If the original problem is not naturally "table- oriented", and many are not, then it's just not going to be a win.
Cheers,
Marcel
From: Marcel Weiher marcel@metaobject.com Reply-To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org Subject: Re: relational for what? [was: Design Principles Behind Smalltalk,Revisited] Date: Wed, 3 Jan 2007 16:36:42 -0800
Obviously. And yours seems to be good experiences with good implementations. What does that show us? Apart from that both good and bad examples exist?
Well, it didn't tell us anything, it just reminded us that there are many more bad IT people then good ones (or at least it sure seems so).
Precisely. If the problem domain is a good fit for the RDBMS/DSL, data that naturally wants to be in 'tables', then it *may* be a win, even after factoring in the inevitable overhead of overcoming packaging mismatch. If the original problem is not naturally "table- oriented", and many are not, then it's just not going to be a win.
I agree. Trying to fit non-relational data into a DB because "it's what we know" is bad.
_________________________________________________________________ Type your favorite song. Get a customized station. Try MSN Radio powered by Pandora. http://radio.msn.com/?icid=T002MSN03A07001
On Jan 2, 2007, at 12:57 , Andreas Raab wrote:
Howard Stearns wrote:
Yes, I'm quite serious. I'm asking what kinds of problems RDBMS are uniquely best at solving (or at least no worse). I'm not asking whether they CAN be used for this problem or that. I'm asking this from an engineering/mathematics perspective, not a business ("we've always done things this way" or "we like this vendor") perspective.
The main benefit: They work.
For some definition of 'work', yes. My experience so far is more that they are 'perceived' to work by enterprisey management-types. "If it's got a database (preferably Oracle), then it must be a real business system. Otherwise it's some weird toy". This perception is not to be ignored lightly, but it isn't the same as actual technical merit and not necessarily backed by facts.
There is no question how to use them, apply them to problems, map them into different domains etc.
Really? A pretty smart acquaintance of mine who does "enterprise" consulting with a company that's actually pretty damn good (and has a pretty good reputation and track record AFAICT) once asked the not entirely rhetorical question why, in this day and age, every CRUD application turns into a PhD thesis.
The Sports system I was supposed to improve had a very complicated schema, but all the interesting data had to be stored in serialized dictionaries anyway because it simply wasn't regular enough. During the development of the replacement (which doesn't use an RDB at all), we were presented with a 'standardized' relational schema for the domain. We fell over laughing. I don't think we could have printed it on an A0 sheet. And that isn't the only time I have seen this sort of thing.
I did come up with a schema that would have worked, and apparently our DBAs were quite impressed with it, but I wasn't, as it was really just a meta-model for defining arbitrary key-value pairs and relations between them.
This has all been worked out, there is nothing new to find out, just a book or two to read. From an engineering perspective that is vastly advantageous since it represents a solution with a proven track-record and no surprises.
At least not until you put the system into production and wonder why it doesn't actually work at all, or performs worse than 2 people doing the same job manually.
Marcel
On Tue, Jan 02, 2007 at 09:57:40PM +0100, Andreas Raab wrote:
Howard Stearns wrote:
Yes, I'm quite serious. I'm asking what kinds of problems RDBMS are uniquely best at solving (or at least no worse). I'm not asking whether they CAN be used for this problem or that. I'm asking this from an engineering/mathematics perspective, not a business ("we've always done things this way" or "we like this vendor") perspective.
The main benefit: They work. There is no question how to use them, apply them to problems, map them into different domains etc. This has all been worked out, there is nothing new to find out, just a book or two to read. From an engineering perspective that is vastly advantageous since it represents a solution with a proven track-record and no surprises.
Quite right from an engineering perspective. But "proven track-record and no surprises" is wrong, at least in the context of the larger organizations for which RDBMS are considered appropriate. This has very little to do with technology, mathematics, or engineering, and lots to do with organizational behavior. An RDBMS scales extremely well, but the human organizations associated with them do not.
One lesson that I take from Squeak is that the way people interact with technology is important. It does not matter whether or not Squeak is "fast" if it helps people to work with ideas and solve problems quickly. More broadly, it does not matter if a technology (RDBMS or whatever) scales well if it leads people and organizations to behave as disfunctional groups of "architects," "data analysts," and so forth.
Dave
p.s. Ralph Johnson's earlier reply on this thread is an excellent assessment, and would serve well as the last word on the topic. My sincere apologies for indulging in a further reply ;)
From: "David T. Lewis" lewis@mail.msen.com Reply-To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org Subject: Re: relational for what? [was: Design Principles Behind Smalltalk,Revisited] Date: Thu, 4 Jan 2007 00:26:40 -0500
Quite right from an engineering perspective. But "proven track-record and no surprises" is wrong, at least in the context of the larger organizations for which RDBMS are considered appropriate. This has very little to do with technology, mathematics, or engineering, and lots to do with organizational behavior. An RDBMS scales extremely well, but the human organizations associated with them do not.
Well that's true. The worst things I have seen in my career in this context were
1) DA's who apply silly standards to every table no matter what. We had some data that happen to have strings in it (the host names of computers), but since it was a string the DA's wanted us to break the string out to another table(s) so that we could internationalize our application. No matter how we explained it they just replied with the "Data standards" document.
2) Developers (and by this I mean: The kind of person who probably uses Java and only knows the OO that Java has) who inflict their will on the tables. This is probably where most of the horror stories come from. Either the table was designed by them from the start, or they gradually made modifications to it that fit their world view. I have seen some pretty awful results from this one.
One lesson that I take from Squeak is that the way people interact with technology is important. It does not matter whether or not Squeak is "fast" if it helps people to work with ideas and solve problems quickly. More broadly, it does not matter if a technology (RDBMS or whatever) scales well if it leads people and organizations to behave as disfunctional groups of "architects," "data analysts," and so forth.
Agreed.
p.s. Ralph Johnson's earlier reply on this thread is an excellent assessment, and would serve well as the last word on the topic. My sincere apologies for indulging in a further reply ;)
Agreed.
_________________________________________________________________ Get live scores and news about your team: Add the Live.com Football Page www.live.com/?addtemplate=football&icid=T001MSN30A0701
From: Howard Stearns I'm asking what kinds of problems RDBMS are uniquely best at solving (or at least no worse).
If you could go from a clean slate for each unique problem, probably none. Same for almost any other widely-deployed technology - almost by definition, if it has been deployed outside its niche then it has been deployed in sub-optimal ways.
I'm not asking whether they CAN be used for this problem or that. I'm asking this from an engineering/mathematics perspective, not a business ("we've always
done
things this way" or "we like this vendor") perspective.
Ah. Theory :-). In theory, I agree with you. In reality, I agree with Andreas - RDBMSs are stable and widely understood, and they aren't *that* bad for quite a wide class of problems.
[Naively, it seems like the obvious solution for this (mathematically) is a hashing operation to keep the data evenly distributed over in-memory systems on a LAN, plus an in-memory cache of recently used chunks. But let's assume I'm missing something. The task here is to figure out what I'm not seeing.]
Stability and incremental development. How long would it take to develop your system and get the showstopper defect rate down low enough for the system to be in line-of-business use? How would you extend your system when the next application area came along? How would you convince your funder (who wants some part of this system live *now*) to wait long enough to get the defects out?
Maybe this isn't typical
Alarmingly, it's not atypical. My day job involves a *lot* of plumbing - connecting up previously-incompatible data sources. This is because most organisations grow organically, and their IT systems grow organically with them. The systems are patch upon patch, and it's never possible to rip them out and start again.
Anyway, either the data AS USED fits into memory or doesn't.
I think that's naive. Could I instead propose "the data AS USED fits into memory plus what can reasonably be transferred via the mass storage subsystem"? For many of the apps I use, 98+% of the data accessed comes from RAM - but it's nice for the remaining 2% to be able to be 10x or 100x the size of RAM without major ill effects. However, are you looking at the correct boundary? Consider tape vs disk, L2 cache versus main memory, registers and L1 cache versus L2, etc. I would presume you could get even faster performance reading all this data into a mass of Athlon or Core L2 caches and using the HyperTransports to hook 'em together - why should we use this slow RAM stuff when we have this much faster on-chip capability? In other words, what's your rationale for picking RAM and disk as the boundary?
Is this still the fastest way? (Answer is no.)
No. Neither's your proposed approach of using main memory, I suspect. It may, however be the fastest per dollar of expenditure on the end system.
Is there some circumstance in which it is the fastest? Or the safest? Or allow us to do something that we could not do otherwise?
The latter, yes: develop a sufficiently robust and functional application in a sufficiently short time with a sufficiently cheap set of developers.
Having tools to allow a cult of specialists to break your own
computing
model (the relational calculus) is not feature, but a signal that something is wrong.
Agree entirely :-).
Maybe if we define the problem as "and you only have one commodity box to do it on." That's fair. Maybe that's it? (Then we need to find an "enterprise" with
only
one box...)
Or /n/ commodity boxes, where n is the capital the organisation can reasonably deploy in that area. I suspect you're coming from a background of solving "hard" problems, where throwing tin at the job is acceptable, to a world where return on investment determines whether a project can be justified or not. If it's not justifiable, it shouldn't get done - and there are plenty of quotes we've put in where we've been the cheapest, but the company's decided not to proceed because, actually, the cost of the system is more than they would ever save from using it. That's a pretty sharp razor for business applications, but ultimately it's the appropriate one to use - it avoids wasting capital and human effort to produce a shining solution when, ultimately, it would have been cheaper to use lots of monkeys with typewriters.
- Peter
Peter Crowther wrote:
...
lots of good comments. (Thanks.)
I suspect you're coming from a background of solving "hard" problems, where throwing tin at the job is acceptable, to a world where return on investment determines whether a project can be justified or not. ...
Heh. Bingo.
But the other side of the coin is that so many projects in this "easy problem" world are failures. A higher failure rate than the clean-slate hard problem world!
I've had a $300K hard-problem budget zero'ed because, in part, the last easy-problem folks spent $26M (some say $52M) to implement a very standard three-tier system that didn't work. OK. I get that. This is the way it is, and I've got plenty to say about that, too, over a beer.
But I'm an engineer. I want to understand why the three-tier projects fail. And how to avoid that. I know that there are people who can make them succeed despite the math, using leadership and operations research and charm and ruthlessness and lots of money, or whatever. But that's not my domain. I may not have the choice to always pick the right tool for the job, but I do want to try to understand what makes something the right (or wrong) tool.
From: Howard Stearns Peter Crowther wrote: lots of good comments. (Thanks.)
Thanks for the response - didn't know how they'd be taken.
But the other side of the coin is that so many projects in this "easy problem" world are failures. A higher failure rate than the
clean-slate
hard problem world!
Yup. Because the (technically) "easy problems" are organisational nightmares, with vendor/customer politics, sales/tech politics, many stakeholders at the client who are using the system for political infighting and empire-building, and (in general) at least one key stakeholder who will do his/her best to sabotage the project as it's to their personal advantage to have it fail. And that's in a *small* project.
By contrast, the clean-slate projects typically have a few key stakeholders, clear and non-conflicting requirements, and less in the way of internal politics.
Consider the following (typical) example of an "easy" problem: - The client invites tenders for a specified system; - The salesperson wants their bonus, so deliberately tenders for less than they know the system will cost to develop; - The developing company wins the bid; - The salesperson gets their bonus, and moves on to the next sale; - The project *cannot* be a win for both remaining sides, as it is not possible to bring it in with all features and within budget - and that's ignoring requirements creep; - Somebody loses. Probably both sides lose: there's no profit in the job, and the end system doesn't do what the client wants.
Unless you consider the systems angle, you don't see the full system. The *full* system includes all the humans who interact to produce it, and therein lies a large chunk of the problem.
I want to understand why the three-tier projects fail. And how to avoid that. I know that there are people who can make them succeed despite the math, using leadership and operations
research
and charm and ruthlessness and lots of money, or whatever. But that's not my domain. I may not have the choice to always pick the right tool for the job, but I do want to try to understand what makes something the right (or wrong) tool.
From observation (perhaps with charcoal-tinted spectacles), I think
you're looking at the end of the problem that can make a few percent of difference. If, instead, you look at all the messy leadership, charm, ruthlessness and money side, I think you're looking at the side where most of the difference in the success of a project is *actually* made. It's possible to do so, but you need to apply the principles of systems analysis... and competition, of course. Even if your company is sensible and bids at a level where they can do the job, they'll be undercut by a lying salestoad from a company who (eventually) can't.
Cynical? Moi? :-)
- Peter
On Jan 2, 2007, at 12:36 , Howard Stearns wrote:
I'm new to the Enterprise Software world, having been mostly in either industrial or "hard problem" software. But the 3-tier application architecture we use for financial processing at our 26 state campuses (University of Wisconsin) appears to me to be typical: large numbers of individual browser (not communicating with each other) interact through a Web server farm to the Application Servers. The overall application is too large as implemented to allow the load to be accommodated, so it is divided by functional area into a farm of individual applications that do not talk directly to each other. This partitioning isn't very successful, because the users tend to do the same functional activities at the same times of day, so most of the applications sit idle while a few are at their limit. I assumed that a single database was used so that the RDBMS could ensure data consistency between all these different applications.
This sounds so incredibly familiar, even if the domain is quite different. And I thought that financial processing would be the one area where RDBMSes would be able to shine...
But it turns out that the Oracle database can't handle that, so instead, each functional area gets its own database. Most of the work done by the system (and most of the work of programmers like me) is to COPY data from one table to another at night when the system is otherwise quiet.
And of course use various bits of
Maybe this isn't typical, but it is the architecture that Oracle and its PeopleSoft division pushes on us in their extensive training classes. And it appears to be the architecture discussed in the higher education IT conferences and Web sites in the U.S.
I am starting to fear that it *is* typical. Good thing I am now pretty much completely out of the enterprisey world. :-)
Cheers,
Marcel
From: Howard Stearns hstearns@wisc.edu Reply-To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org Subject: Re: relational for what? [was: Design Principles Behind Smalltalk, Revisited] Date: Tue, 02 Jan 2007 14:36:22 -0600
Yes, I'm quite serious. I'm asking what kinds of problems RDBMS are uniquely best at solving (or at least no worse). I'm not asking whether they CAN be used for this problem or that. I'm asking this from an engineering/mathematics perspective, not a business ("we've always done things this way" or "we like this vendor") perspective.
<horror story ommited>
Honestly, it just seems to me like someone architected an awful system. I know, for example, some databases (oracle I thought) can span a given DB across boxes etc. with different methods of partitioning (e.g. some tables here, some tables there, foreign keys between them, etc.).
You certainly shouldn't have to be copying data between tables. If nothing better, you could install MySQL everywhere and turn on replication.
Maybe this isn't typical, but it is the architecture that Oracle and its PeopleSoft division pushes on us in their extensive training classes. And it appears to be the architecture discussed in the higher education IT conferences and Web sites in the U.S.
Well, the big companies tend to push the most expensive option, not the best for the data model. In my experience so far, I can think of no case where we accepted what the vendors proposed before some serious threats etc..
Anyway, either the data AS USED fits into memory or doesn't. If it does, then what benefit is the relational math providing? If it doesn't, then we have to ask whether the math techniques that were developed to provide efficient random access over disks 20 years ago are still valid. Is this still the fastest way? (Answer is no.) Is there some circumstance in which it is the fastest? Or the safest? Or allow us to do something that we could not do otherwise?
I still don't think the question has anything to do with "in memory" vs. "not in memory" or "quickest way to access the disk". You can tune your RDBMS to try to cache as much as possible in memory, and then it becomes a contest of: is it faster for me to write all the code to do the joins, etc. or take what they already have for possibly a run-time speed hit.
Or maybe a speed gain since the RDBMS can break up the table into different "spaces" and run the query simultaniously in different threads. Of course you can do that by hand, but then you are getting further behind what they already have.
I tried briefly to combine JJ's answer with Peter's to find an appropriate niche. (Again, I'm trying to look at the math, not fit and finish, availability of experienced programmers, color of brochure...) For exampe, there could be a class of problems for which the data set is a few 10's of gigs and needs to be operated on as a whole. And that queries are fairly arbitrary and exploratory, not production-oriented. Etc. But I haven't been able to come up with one that doesn't have better characteristics as a distributed system. Maybe if we define the problem as "and you only have one commodity box to do it on." That's fair. Maybe that's it? (Then we need to find an "enterprise" with only one box...)
Well, it's not going to be that ("you only have one commodity box"). When I said I think you could do what Google is doing with an RDBMS if you really wanted to I wasn't thinking of a few commidity boxes. I was thinking of 4-10 really enormous boxes (but my understanding was that google uses *lots* of computers to do their work, no?).
In other words the RDBMS solution will be much more expensive computer/software wise compared to what Google did.
_________________________________________________________________ Your Hotmail address already works to sign into Windows Live Messenger! Get it now http://clk.atdmt.com/MSN/go/msnnkwme0020000001msn/direct/01/?href=http://get...
(I see that the conversation has moved along since the time that I started to draft this, but here goes...)
On Jan 2, 2007, at 11:10 AM, J J wrote:
Sanity check: google is trying to keep a current snapshot of all websites and run it on commodity hardware. You could do exactly the same thing with a lot less CPU's using a highly tuned, distributed RDBMS. They chose to hand tune code instead of an RDBMS.
What, really? There are many possible reasons that Google don't use an RDBMS to index the web: stupidity, arrogance, excessive cost of an RDBMS, sound engineering decisions, or a combination of these.
According to the computer systems research community, Google has sound engineering reasons for its architecture; they have published papers at top conferences such as OSDI and SOSP. See http:// labs.google.com/papers ("The Google File System" and "BigTable..." might be the most relevant to this conversation).
That's not rule out the possibility of stupidity, arrogance, excessive cost, etc.. But it does cast doubt on the unsubstantiated claim that Google could "do exactly the same thing with a lot less CPUs".
Finally, in world with great distributed computing power, is centralized transaction processing really a superior model?
Some people seem to think so: http://lambda-the-ultimate.org/node/463
And there is more then that. I believe in that paper (dont have time to verify) they mention that hardware manufacturers are also starting to take this approach as well because fine grain locking is so bad.
As you mentioned in a follow-up email, this wasn't the paper you meant. Although it has nothing whatsoever to do with RDBMSes, I would recommend anyone who has enough free time to learn enough Haskell to read that paper.
Did you happen to find the intended link?
- Working with other applications that are designed to use RDB's?
Maybe, but that's a tautology, no?
Again, one has to work in a large company to appreciate the nature of enterprise application development.
I have no doubt that you're right, but it doesn't answer the question: what is it that RDBs *fundamentally* get correct? It's quite like the easy but unsatisfying answer to "why is Smalltalk so great?"... "well, you can't appreciate it unless you've grokked Smalltalk".
Certainly RDBs are essential to the operations of the modern enterprise, but how much of this is because RDBs are really the best imaginable approach to this sort of thing, and how much is due to a complicated process of co-evolution that has resulted in the current enterprise software ecosystem?
Josh
From: Joshua Gargus what is it that RDBs *fundamentally* get correct?
People find it easy to understand tabular data and to cross-reference between tables. Relational databases contain tabular data. So people find relational databases easy to understand compared to the alternatives. Other than the uniformity ("everything is either a tuple or an atomic value"), there's little else to commend them - but that ease of understanding has been enough, I think.
The rest of the system is optimisation to try to get relatively efficient use of the machine despite expressing the problem in ways that are easy for humans to understand. Oh, and trying to make up for the *dreadful* query language that IBM inflicted on the world with SQL. I learned the principles of relational systems with db++, a very odd relational system that used a very clean relational algebra where select, project and join were first-class operations. It's certainly helped me cut through the fog of SQL, where the principles are far less clear!
- Peter
From: Joshua Gargus schwa@fastmail.us Reply-To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org Subject: Re: relational for what? [was: Design Principles Behind Smalltalk,Revisited] Date: Tue, 2 Jan 2007 15:21:48 -0800
What, really? There are many possible reasons that Google don't use an RDBMS to index the web: stupidity, arrogance, excessive cost of an RDBMS, sound engineering decisions, or a combination of these.
I'm not saying Google are idiots. Clearly not. I was basically just questioning using them as some sort of counter point against RDBM systems. I think you could do the same thing they did with a RDBMS, but not on a bunch of low end computers. You would have to spend some cash.
According to the computer systems research community, Google has sound engineering reasons for its architecture; they have published papers at top conferences such as OSDI and SOSP. See http:// labs.google.com/papers ("The Google File System" and "BigTable..." might be the most relevant to this conversation).
Yes of course. And look at what they are doing: Fault tolerant systems on a large number of commodity boxes. Almost the opposite of an RDBMS. :)
That's not rule out the possibility of stupidity, arrogance, excessive cost, etc.. But it does cast doubt on the unsubstantiated claim that Google could "do exactly the same thing with a lot less CPUs".
Well, it would be time consuming (and probably expensive) to prove, but I still think the statement is ok. But it will be big boxes and big CPUs with lots of through-put.
As you mentioned in a follow-up email, this wasn't the paper you meant. Although it has nothing whatsoever to do with RDBMSes, I would recommend anyone who has enough free time to learn enough Haskell to read that paper.
Did you happen to find the intended link?
Yes, http://lambda-the-ultimate.org/node/1896
Certainly RDBs are essential to the operations of the modern enterprise, but how much of this is because RDBs are really the best imaginable approach to this sort of thing, and how much is due to a complicated process of co-evolution that has resulted in the current enterprise software ecosystem?
Here I think you envision more religious fervor behind my words than exist. It is nothing more then a "toolbox" issue for me. A problem comes up, what is the fastest way to solve it weighed against the suspected length of the project and how scalable the solution? For me there are times I reach for the RDBMS. There are other times I would reach for an OODB (I plan to use magma to persist my website). Or maybe a combination (I am *very* impressed of what I have seen from GLORP so far), or maybe just stick it in memory.
Which is going to be the best? Well it is our jobs as engineers to weigh all the factors and answer that question, but for every isolated case.
_________________________________________________________________ Type your favorite song. Get a customized station. Try MSN Radio powered by Pandora. http://radio.msn.com/?icid=T002MSN03A07001
On Jan 4, 2007, at 11:40 AM, J J wrote:
That's not rule out the possibility of stupidity, arrogance, excessive cost, etc.. But it does cast doubt on the unsubstantiated claim that Google could "do exactly the same thing with a lot less CPUs".
Well, it would be time consuming (and probably expensive) to prove, but I still think the statement is ok. But it will be big boxes and big CPUs with lots of through-put.
Well, if you say so. I'm no expert.
As you mentioned in a follow-up email, this wasn't the paper you meant. Although it has nothing whatsoever to do with RDBMSes, I would recommend anyone who has enough free time to learn enough Haskell to read that paper.
Did you happen to find the intended link?
Thanks, that looks interesting. It actually is related to the original link.
Certainly RDBs are essential to the operations of the modern enterprise, but how much of this is because RDBs are really the best imaginable approach to this sort of thing, and how much is due to a complicated process of co-evolution that has resulted in the current enterprise software ecosystem?
Here I think you envision more religious fervor behind my words than exist.
My apologies, I can see how you might read it that way. I'm not saying that you are arguing that RDBs are the best imaginable approach; I was trying to re-state Howard's initial question. As I understood it, the question was not about whether an RDBMS is the appropriate choice in a given situation (given time and cost constraints, etc.), but whether we know enough now to make fundamentally better choices if we magically found ourselves with the resources to "burn the disk packs" and start over.
Josh
On Jan 2, 2007, at 11:10 , J J wrote:
J J wrote:
... I simply believe in the right tool for the right job,
and you can't beat an RDB in it's domain. ...
That's something I've never really understood: what is the domain in which Relational Databases excel?
Handling large amounts of enterprise data. If you have never worked in a large company, you probably wont appreciate this.
Well, I have worked in a large-ish enterprise and my experience was that moving *away* from the RDB was central to improving performance around a hundred- to a thousandfold, with the bigger improvement for the project that completely eliminated the RDB.
But in a large company you have a *lot* of data, and different applications want to see different parts of it. In an RDBMS this is no problem, you normalize the data and take one of a few strategies to supply it to the different consumers (e.g. views, stored procedures, etc.).
Har har. Sorry, but I have seen very few actually reusable data models.
- Data too large to fit in memory? Well, most uses today may have
been too large to fit in memory 20 years ago, but aren't today. And even for really big data sets today, networks are much faster than disk drives, so a distributed database (e.g., a DHT) will be faster. Sanity check: Do you think Google uses an RDB for storing indexes and a cache of the WWW?
Are you serious with this (data too large to fit into memory)? And if you use a good RDBMS then you don't have to worry about disk speed or distribution.
You are kidding, right?
The DBA's can watch how the database is being used and tune this (i.e. partition the data and move it to another CPU, etc., etc.).
To some limited extent, yes. But they can't work miracles, and neither can the DB. In fact, if you know your access patterns, you can (almost) always do better than the DB, simply because there are fewer layers between you and the code.
Oh, but you found one example where someone with a lot of data didn't use a RDB. I guess we can throw the whole technology sector in the trash. Sanity check: google is trying to keep a current snapshot of all websites and run it on commodity hardware. You could do exactly the same thing with a lot less CPU's using a highly tuned, distributed RDBMS.
That's a big claim, mister. Care to back it up?
Marcel
From: Marcel Weiher marcel@metaobject.com Reply-To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org Subject: Re: relational for what? [was: Design Principles Behind Smalltalk,Revisited] Date: Tue, 2 Jan 2007 23:08:29 -0800
Well, I have worked in a large-ish enterprise and my experience was that moving *away* from the RDB was central to improving performance around a hundred- to a thousandfold, with the bigger improvement for the project that completely eliminated the RDB.
Har har. Sorry, but I have seen very few actually reusable data models.
You are kidding, right?
Who are you people getting for DBA's? :)
Oh, but you found one example where someone with a lot of data didn't use a RDB. I guess we can throw the whole technology sector in the trash. Sanity check: google is trying to keep a current snapshot of all websites and run it on commodity hardware. You could do exactly the same thing with a lot less CPU's using a highly tuned, distributed RDBMS.
That's a big claim, mister. Care to back it up?
And how do you propose I do that? I worked at a very very large retailer for most of my career and they kept basically every transaction ever for trending purposes. Now given the size of that company I would say it has to be at least as large as Google's data (probably quite a bit bigger). Now they didn't turn over queries in fractions of a second, but keep in mind the general kind of queries they were dealing with. If they were limited to a subset of possible queries like Google is, I believe they could produce comparable times.
_________________________________________________________________ Your Hotmail address already works to sign into Windows Live Messenger! Get it now http://clk.atdmt.com/MSN/go/msnnkwme0020000001msn/direct/01/?href=http://get...
On Jan 3, 2007, at 13:01 , J J wrote: [I wrote]
You are kidding, right?
Who are you people getting for DBA's? :)
Let's pull back some context here:
Are you serious with this (data too large to fit into memory)? And if you use a good RDBMS then you don't have to worry about disk speed or distribution.
Do you really, truly believe you don't have to worry about physical parameters such as disk speed just because there is an intermediate layer between you and your disk(s) called a RDBMS?
Then our vendor must have been really ignorant when they recommended that we get faster machines with faster disks in order to fix problems we were having where the database could not keep up with the (write) data rate. And reality must also have not known about this magical property of RDBMS to make you immune to physical limtations, because getting faster disks *did* actually solve the problem.
Or are you saying that what happens is that you pay someone else to worry about those parameters?
Of course, using a database in that scenario was actually not necessary, and the benefits that the vendor touted for their database- based system were quite irrelevant in our application context. We could have built a far (a) faster (b) simpler (c) more reliable and (d) cheaper system had we not bought into the "must use RDBMS because it makes everything better" voodoo and just kept the relevant data in application memory. This might have been a small amount of extra programming initially, but would it ever have paid off in maintenance.
Especially since we didn't actually own the data, we were just getting a feed from somewhere else.
Had the data been ours, that would have been a different story, because data-integrity is one RDBMS myth that I still believe in, as it hasn't been beaten out of me yet by encounters with reality...
Marcel
From: Marcel Weiher marcel@metaobject.com Reply-To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org Subject: Re: relational for what? [was: Design Principles Behind Smalltalk,Revisited] Date: Wed, 3 Jan 2007 20:41:58 -0800
Are you serious with this (data too large to fit into memory)? And if you use a good RDBMS then you don't have to worry about disk speed or distribution.
Do you really, truly believe you don't have to worry about physical parameters such as disk speed just because there is an intermediate layer between you and your disk(s) called a RDBMS?
Well no, someone has to worry about this. I guess when I said RDBM*S* I meant RDBMS *team*.
Or are you saying that what happens is that you pay someone else to worry about those parameters?
Sort of.
I just don't see the data point of "does the data fit into memory or not" as being relavant to the discussion. If you have relational type data and you want to run various reports that look at the data in various different ways for reports, what does "have it in memory" have to do with anything? Whether it fits or not, you still have to hand write code that does relational joins and other things to deal with it.
My last (relevant) project would have easily fit in memory, but downloading MySQL, building 3 tables and loading up the data was *vastly* faster then hand writing all that stuff for about 10 reports that had to be run one time.
Of course, using a database in that scenario was actually not necessary, and the benefits that the vendor touted for their database- based system were quite irrelevant in our application context.
_________________________________________________________________ Your Hotmail address already works to sign into Windows Live Messenger! Get it now http://clk.atdmt.com/MSN/go/msnnkwme0020000001msn/direct/01/?href=http://get...
From: Marcel Weiher
The DBA's can watch how the database is being used and tune this (i.e. partition the data and move it to another CPU, etc., etc.).
To some limited extent, yes. But they can't work miracles, and neither can the DB. In fact, if you know your access patterns, you can (almost) always do better than the DB, simply because there are fewer layers between you and the code.
[Note: I appear to be getting a slow and incomplete feed from squeak-dev - someone else may have said what's below or the discussion may have moved on. My apologies if so.]
*If you know your access patterns*, I agree. I suggest that in many/most practical business applications you do *not* know your access patterns, because the data is at the core of the business and will be used in unexpected ways over time. The old schema will change slowly; new pieces of schema will be bolted on at the edges as new applications are accreted; many, many ad-hoc reports will be written that need to run "fast enough" when they are required; and different functional areas of the business will wish to view the data in their own way. Optimising for any one set of access patterns counts as premature optimisation, and will come back and bite you later. I claim that one of the advantages of a normalised TP database is that it is agnostic about its access patterns, and that indexes (and indexed views) can be added later and changed dynamically to improve response times as the access patterns change over time.
RDBs are "good enough" for quite a wide class of problems, and most sensibly-designed relational databases have gentle fitness curves - as the requirements change over time (which they do), there are comparatively few changes that completely break the database. By contrast, a highly-optimised storage structure is fragile - a requirements change may completely screw the optimisation, or be screwed by it. I don't want "right", I want "good enough for the job".
- Peter
+1
From: "Peter Crowther" Peter@ozzard.org Reply-To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org To: "The general-purpose Squeak developers list"squeak-dev@lists.squeakfoundation.org Subject: RE: relational for what? [was: Design Principles Behind Smalltalk,Revisited] Date: Thu, 4 Jan 2007 08:28:49 -0000
From: Marcel Weiher
The DBA's can watch how the database is being used and tune this (i.e. partition the data and move it to another CPU, etc., etc.).
To some limited extent, yes. But they can't work miracles, and neither can the DB. In fact, if you know your access patterns, you can (almost) always do better than the DB, simply because there are fewer layers between you and the code.
[Note: I appear to be getting a slow and incomplete feed from squeak-dev
- someone else may have said what's below or the discussion may have
moved on. My apologies if so.]
*If you know your access patterns*, I agree. I suggest that in many/most practical business applications you do *not* know your access patterns, because the data is at the core of the business and will be used in unexpected ways over time. The old schema will change slowly; new pieces of schema will be bolted on at the edges as new applications are accreted; many, many ad-hoc reports will be written that need to run "fast enough" when they are required; and different functional areas of the business will wish to view the data in their own way. Optimising for any one set of access patterns counts as premature optimisation, and will come back and bite you later. I claim that one of the advantages of a normalised TP database is that it is agnostic about its access patterns, and that indexes (and indexed views) can be added later and changed dynamically to improve response times as the access patterns change over time.
RDBs are "good enough" for quite a wide class of problems, and most sensibly-designed relational databases have gentle fitness curves - as the requirements change over time (which they do), there are comparatively few changes that completely break the database. By contrast, a highly-optimised storage structure is fragile - a requirements change may completely screw the optimisation, or be screwed by it. I don't want "right", I want "good enough for the job".
- Peter
_________________________________________________________________
From photos to predictions, The MSN Entertainment Guide to Golden Globes has
goran@krampe.se wrote:
Hi!
Todd Blanchard tblanchard@mac.com wrote:
Basically treating each entity as a collection. That might be worth doing and should be pretty easy. Good idea.
Magma uses a similar trick in its query capabilities (in order to make seemingly iterative block-code actually generate a query with optimization and index-support). See here: http://wiki.squeak.org/squeak/5859
And oh, I forgot the first time - Avi did ROE which is also very interesting: http://map.squeak.org/packagebyname/roe
regards, Göran
Paul,
Thanks for sharing this essay. I think it brings up many important topics which I'd like to comment on one at a time(or perhaps on my blog) ...
On 12/25/06, Paul D. Fernhout pdfernhout@kurtz-fernhout.com wrote:
When I was looking at GST vs. Ruby benchmarks today,
http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=gst&am... I came across a link at the bottom to the original "Design Principles Behind Smalltalk" paper by Dan Ingalls, see:
http://users.ipa.net/~dwighth/smalltalk/byte_augc81/design_principles_behind...
This essay attempts to look at Dan's 1981 essay and move beyond it, especially by considering supporting creativity by a group instead of creativity by an isolated individual, and also by calling into question "objects" as a sole major metaphor for a system supporting creativity. Some of this thinking about "objects" is informed by the late William Kent's work, especiallyKent's book "Data & Reality": http://www.bkent.net/ http://www.bkent.net/Doc/darxrp.htm
<snip>
== objects are an illusions, but useful ones ===
In my undergraduate work in psychology I wrote a senior paper in 1985 entitled: "Why intelligence: Object, Evolution, Stability, and Model" where I argued the impression of a world of well-defined objects is an illusion, but a useful one. Considered in the context of the section above, we can also see that how you parse the world into objects may depend on the particular goal you have (reaching your car without being wet) or the particular approach you are taking to reaching the goal (either the strategy, walking outside, or any helping tool used, like a neural net or 2D map). Yet, the world is the same, even as what we consider to be an "object" may vary from time to time; in one situation "rain" might be an object, in another a "rain drop" might be an object, in another the weather might be of little interest. So objects are a *convenience* to reaching goals (in terms of internal states), not reality (which our best physics says is more continuous than anything else in terms of quantum probabilities, or at best, more conventionally a particle-wave duality). So objects, as tools of thought, then have no meaning apart from the context in which we create them -- and the contexts include our viewpoints, our goals, our tools, or history, or relations to the community, and so on.
While there are certainly valuable insights in "Data & Reality" and I would agree that some data objects are merely "tools of thought", *many* objects have meaning and exist independent of our view/model. Quantum physics does tell us that the boundries of "things" are hard to define precisely but "things" themselves as aggregates are held together by forces of nature not by external views. A keyboard can be remapped in software and different people using it can have different views of the individual key "objects". Even the keyboard itself could be viewed differently - a word processor, game controller, or a cash register. However, any observer, human, machine or otherwise observer of measurable physical characteristics of the keyboard will not see any changes. The wave-functions underlying all of the sub-atomic particles making up that keyboard have a unique history going back at least to just after the big bang.
Today, more and more so-called information systems are being used not just for description but to augment/effect the external world. In this evolving hyperlinked meshverse of simulation and "reality"http://www.meshverse.com/2006/11/20/hyperlinking-reality/, data often enters into a symbiotic relationship with "reality" where changing views can change "reality". The "real" Mars Climate Orbiterhttp://en.wikipedia.org/wiki/Mars_Climate_Orbiterobject was destroyed because it was dependent on the data a model object had. If one accepts that a paradigm shift is underway which Croquet offers something of value in, then there are important ramificationshttp://croquet.funkencode.com/2006/04/24/the-64-billion-dollar-question/for database and language choices.
Laurence
Laurence Rozier wrote:
While there are certainly valuable insights in "Data & Reality" and I would agree that some data objects are merely "tools of thought", *many* objects have meaning and exist independent of our view/model. Quantum physics does tell us that the boundries of "things" are hard to define precisely but "things" themselves as aggregates are held together by forces of nature not by external views. A keyboard can be remapped in software and different people using it can have different views of the individual key "objects". Even the keyboard itself could be viewed differently - a word processor, game controller, or a cash register. However, any observer, human, machine or otherwise observer of measurable physical characteristics of the keyboard will not see any changes. The wave-functions underlying all of the sub-atomic particles making up that keyboard have a unique history going back at least to just after the big bang.
Thanks for the other comments.
On the "keyboard" analogy:
Consider if you move to a Dvorack layout on your keyboard instead of Qwerty. Then you might need to pry off all the keycaps and move them around. Suddenly you do not have "a keyboard". What you have is a collection of keycaps (perhaps some broken in the process of removing them) plus a base (perhaps with a keyboard switch or two damaged by prying). Your mind has followed this situation, where something you thought was an object has now been decomposed into multiple items, some of which even have subitems or subareas which are not obviously removable (broken switches soldered on the keyboard base) yet behave differently. To model this requires a lot of subtly with boundaries not being obvious -- with the boundaries fluidly moving around depending on the questions we have or the intent we have or the actions we take.
Or, what if, say, a rabid ocelot has just wondered into your office? http://en.wikipedia.org/wiki/Ocelot Suddenly your mental model of your entire office might shift to -- what item can I throw at the foaming-mouthed ocelot to keep it away from me and give me enough time to escape past it through the door? The closest thing at hand is the keyboard. Suddenly your mental model of the keyboard needs to switch from "data entry device" to "ocelot management system". You have to think about issues like will the cable be long enough if I throw it as-is or will the plug disconnect from the computer if hurled with enough force? Or will the computer itself move with it if I toss it? And all in an instant. So, suddenly your whole mapping of the possibilities and uses of your keyboard needs to change, and in less time than it takes that rabid ocelot to move from your door to your desk. A typical ST80 simulation of a computer could not be used in that way, but your mind can do it easily and quickly.
So there is a gap here between the flexibility of the way your brain models physical objects and processes and intent and the way we build limited computer models using ST80. Your brain makes the switch in microseconds; it might take weeks or months to change a simulation of a keyboard as data input to keyboard as thrown object (let alone model a rabid ocelot :-). And there remain subtle problems -- is the keyboard an independent "object" if you have to think about the cable and how it is attached to another independent "object, the computer? Perhaps this set of problems is just solvable with a good class library; if so, I haven't seen it yet. :-) Perhaps the latest version of Inform? http://www.inform-fiction.org/I7/Inform%207.html But even there, it seems like a lot of hand crafting of rules specific to the needs of the story. Essentially, our minds' model of reality is much more subtle and fluid than that of "object" even if we appear to be seeing them all the time. And it works so well we don't even notice these abrupt shifts in representation -- except perhaps when we laugh as a perspective shifting joke. :-) Consider: http://www.funsulting.com/september_2004_newsletter.html From there: "Illegal aliens have always been a problem in the United States. Ask any [American] Indian."
Our mind has a much deeper and greater and more flexible command of the notion of "objects" and "classes" in relation to "need" or "intent" than the Smalltalk environment has, at the very least. And these perspective shifts are often the basis of creativity. And it is exactly enhancing creativity which is Smalltalk's stated design goal. So maybe we need a software modeling environment for modeling jokes about objects and classes. :-) Again from the above link: """Research has linked the creative and humor portions of our brains. Several studies showed that humor leads to creativity. One of the most creative uses of humor is seen in the comedic style of Stephen Wright. His one liner’s take normal everyday concepts and show us a creative, and playful, way of seeing them. Here are some examples: “I spilled Spot Remover on my dog... Now he's gone.” “I went to a general store. They wouldn't let me buy anything specifically.” Many of us hear his jokes and immediately see the humor in the different perspective. Interestingly, by exposing ourselves to this kind of humor, we are also more likely to be creative. Since the creative process involves seeing new things or new points of view, humor is a logical jump starter to creativity.""" Maybe, ultimately, the problem with Smalltalk and its very rigid class based view of the world is it is too serious a programming environment? Maybe it needs to lighten up a little? Learn to laugh at itself? :-) How would one even begin to tell a joke (and get laughter in response) in Smalltalk-80?
As someone else in the thread put it, it is a general principle of mathematical model building what we are just making a simplification of reality for our purposes. I'll agree, but I will still not let Smalltalk off the hook -- since our mind is able to build and rebuild these models seemingly in an instant -- even in the punch line of a joke -- whereas Smalltalk coding takes a long time. And I only hold ST80 to such high standards as it aspires to them (forget about C++; no hope of a sense of humor there. :-) '
There is some sort of mismatch going on here between the mind and Smalltalk's object model. What it is in its entirety I am not sure. But clearly the tools at hand in Smalltalk-80 can't match the minds flexibility in object-oriented (and other) modeling. Yet it is very much a stated design goal in Dan's original paper to have the Smalltalk software environment be a good match for how the mind actually works. So, here, as exemplified by humor, we have a mismatch. Essentially, Smalltalk code isn't funny. :-)
Granted eToys may be "fun", but that is not the same as being "funny". How could you tell a joke to eToys and have it laugh? Or how could eToys invent new jokes and tell them to you for your approval? Perhaps this starts to border on AI?
Anyway, writing this inspired me to Google on programs that invent jokes, and I got this: http://news.bbc.co.uk/1/hi/technology/5275544.stm """Computer scientists in Scotland developed the program for children who need to use computerised speech aids. The team said enabling non-speaking children to use puns and other jokes would help them to develop their language and communication skills. The researchers admitted some of the computer-generated puns were terrible, but said the children who had tried the technology loved them. ...Children using the software can choose a word or compound word, which will form some or all of the punch line, from the system's dictionary. The program then writes the joke's opener. It works by comparing the selected word with other words in its dictionary for phonetic similarity or concepts that link the words together, and then fits them into a pun template. ... Dr Waller said: "The kids have been superb, they have taken to the software like fish to water. They have been regaling everybody with their jokes." She said it seemed to have boosted their confidence as well as their language skills. "It gives these kids the ability to control conversations, perhaps for the first time, it gives them the ability to entertain other people. And their self-image improves too." """
Related web sites: http://www.computing.dundee.ac.uk/staff/awaller/research.asp http://groups.inf.ed.ac.uk/standup/ From the last: "We are exploring how humour may be used to help non-speaking children learn to use language more effectively. There is evidence to suggest that language play, including using puns and other jokes, has a beneficial effect on a child's developing language and communication skills. Children with communication impairments are often reliant on augmented communication aids in order to carry on conversations, but these aids give little scope for generating novel language. This inhibits experimentation with language and limits the trying out of humorous ideas, which can in turn have a stultifying effect on language development. We propose to address this deficiency in the language environment of the non-speaking child by providing a software tool which promotes humorous language play. Starting from our previous research on the automated generation of punning riddles, we will design and implement a program which allows the user to experiment with the construction of simple jokes. The user interface of this system will be specially designed to be accessible to children with communication and physical disabilities. We will then test the efficacy of the system by observing and evaluating the use of the software by the children."
Perhaps there in Dr. Waller's lab is the future of Smalltalk? :-)
Today, more and more so-called information systems are being used not just for description but to augment/effect the external world. In this evolving hyperlinked meshverse of simulation and "reality"http://www.meshverse.com/2006/11/20/hyperlinking-reality/, data often enters into a symbiotic relationship with "reality" where changing views can change "reality". The "real" Mars Climate Orbiterhttp://en.wikipedia.org/wiki/Mars_Climate_Orbiterobject was destroyed because it was dependent on the data a model object had. If one accepts that a paradigm shift is underway which Croquet offers something of value in, then there are important ramificationshttp://croquet.funkencode.com/2006/04/24/the-64-billion-dollar-question/for database and language choices.
Thanks for the links. I'll agree that as the "noosphere" or "nooverse" continues to develops, http://en.wikipedia.org/wiki/Pierre_Teilhard_de_Chardin http://en.wikipedia.org/wiki/Noosphere we'll see more bridging of mental models (incarnated in computers or not) and the physical world, where such abstract constructs have unexpected effects on the physical world. I heop this project has poticve effects in that direction (intended to be a GPL'd matter replicator, which can reproduce itself): http://reprap.org/ Still, we have been seeing this link of model (data) and reality for some time, and not just on an individual level -- I'm sure we've all had to deal with government bureaucracies or corporate hierarchies or classroom settings where our problem or need did not match the pigeonholes or procedures the organization had for dealing with individuals (especially creative ones. :-) How does a bureaucracy deal with humor? It often can't. Consider: "The Soviet Joke Book" http://www.st-andrews.ac.uk/~pv/courses/sovrus/jokes.html An anecdote told during the Brezhnev era: Stalin, Khrushchev and Brezhnev were all travelling together in a railway carriage, when unexpectedly the train stopped. Stalin put his head out of the window and shouted, "Shoot the driver!" But the train didn't start moving. Khrushchev then shouted, "Rehabilitate the driver!" But it still didn't move. Brezhnev then said, "Comrades, Comrades, let's draw the curtains, turn on the gramophone and let's pretend we're moving!" After Gorbachev came to power another line was added, in which he suggests: "Comrades, let's get out and push."
The history of Smalltalk-80 is itself an example of that -- ST80 didn't fit Steve Job's model when he saw it, so he ignored most of it, and gave us only the GUI window part in the Macintosh. Or considering my comments above, essentially, Steve did not get most of the joke. :-) The idea of making source and development tools available to end users did not match the notion of run-time fees, so we ended up with an absurd focus on "packaging" and "image stripping" and "shrinking" even to this day, so again considering the above, ParcPlace did not see the humor in a free Smalltalk. :-) But now we do.
--Paul Fernhout
On Wed, 03 Jan 2007 09:54:35 -0500, "Paul D. Fernhout" pdfernhout@kurtz-fernhout.com wrote:
There is some sort of mismatch going on here between the mind and Smalltalk's object model. What it is in its entirety I am not sure. But clearly the tools at hand in Smalltalk-80 can't match the minds flexibility in object-oriented (and other) modeling. Yet it is very much a stated design goal in Dan's original paper to have the Smalltalk software environment be a good match for how the mind actually works. So, here, as exemplified by humor, we have a mismatch. Essentially, Smalltalk code isn't funny. :-)
I'm working on some serious AI research right now, using Squeak (of course). My idea of the brain (in terms of how we model it) is a virtual machine, with very little Smalltalk code, and huge amounts of data that gets stored and indexed. You can't model things like humor and emotions and such in code - it gets modeled in data.
http://www.bioloid.info/tiki/tiki-index.php?page=MicroRaptor if anyone is interested...
Later, Jon
-------------------------------------------------------------- Jon Hylands Jon@huv.com http://www.huv.com/jon
Project: Micro Raptor (Small Biped Velociraptor Robot) http://www.huv.com/blog
Jon Hylands wrote:
On Wed, 03 Jan 2007 09:54:35 -0500, "Paul D. Fernhout" pdfernhout@kurtz-fernhout.com wrote:
Essentially, Smalltalk code isn't funny. :-)
I'm working on some serious AI research right now, using Squeak (of course). My idea of the brain (in terms of how we model it) is a virtual machine, with very little Smalltalk code, and huge amounts of data that gets stored and indexed. You can't model things like humor and emotions and such in code - it gets modeled in data.
Of course, as LISP often shows, or Squeak's VM generation system, the line between code and data can often get blurry. :-)
http://www.bioloid.info/tiki/tiki-index.php?page=MicroRaptor
Interesting project. I'll be curious over time how you see Squeak needing to change or expand to better support your AI and robotics related goals.
--Paul Fernhout
squeak-dev@lists.squeakfoundation.org