About the state of the Celeste index - A thread a couple of month ago reviewed some of the same issues. JohnMaloney was involved with the original Babar and mentioned reading the index on demand and storing the index as binary as two ways to make it faster.
I've recently changed the index to use binary storage. Unfortunately, this required quite a lot of refactoring and Celeste was being edited at the time and I lost the image, etc, so, the code is somewhere between lost and hopelessly out of date and incompatible. However, a few conclusions I did reach - 1. Binary is indeed faster than ascii. To be make this precise - not having to convert the Integers (msgId, message time, length and offset) is faster. This is as John remembered. The parts of the index that are strings (subject, from, to, etc) remain the same speed (at best, see next point). 2. Checking for CrLfs can be very expensive. Particularly when for strings in a binary file, CrLfStream tries to figure out the right mode anew for each string. 3. Binary done right is still no panacea. Significant improvement, but no big breakthrough.
A few ideas that came to mind as result - 1. Organize the index according to categories, so we can read only the current category, as per Lex. 2. Write the index in such a way that we can read only as much of a category as is needed (for example, the latest 200 messages). The conceptually simple way is to write the index backwards (latest first), probably the practical way is more along the lines of holding for each category the offsets of it's index entries.
On second thought, the index serves two purposes. It keeps the msgID -> message file offsets mapping, which is pretty expensive to recalculate (scanning the message file). It also caches some useful header info for all messages, which could be calculated on demand instead. So another solution would be to eliminate the msgID, and keep the offsets directly in the categories file (so we need to move the logging mechanism to it, for update performance and safety). This means that nothing more than the message and category files would be needed. The header cache function, should we find it necessary, could be done by a separate file keeping TOC entries for the last 1000~ entries we've had to calculate. My guess is for most activations of Celeste, those just 1000 entries would cover everything the user wants to see. Anyway if not then, the data is reparsed from the message text.
Whew. I hadn't planned on all that when I started writing.
Poke holes, anybody?
Daniel
"Lex Spoon" lex@cc.gatech.edu wrote:
Address book handling would be really excellent. There should at least be the *option* of keeping the address book locally, so it shouldn't *require* LDAP. But LDAP is cool, too, if someone wants to put it together. One issue is what to do if there are multiple address approaches floating around....
My favorite feature, by the way, is to pick up addresses and names automatically as email is viewed, like the "Big Brother Database (BBDB)" does.
Along with address book handling, come a lot of, err, opportunities to improve the composition window. It's kinda lame sitting in a GUI system and being forced to edit messages as raw text files.
FWIW, though, even nicer than an address book would be a faster mail database. Loading the entire index into memory is pretty slow right now -- it should only load index entries for the category I'm looking at. If .unclassified. ceases being a pseudo-category, then so be it. Nowadays, compacting is very fast, and .unclassified. could simply be built when a compaction is run.
-Lex
"Stephan B. Wessels" stephan.wessels@sdrc.com wrote:
I wrote some address book code for Celeste, including a Netscape addrebook importer, but got distracted by paying work and had to suspend it. If anyone wants to help we could probably get at least that part working in a weekend.
- Steve
Mike Rutenberg wrote:
As far as I know, everyone is doing this manually right now. I certainly am. I want to change this though.
LDAP is something I know nothing about, but might be a good option especially if you have an existing (corporate?) address book.
Interfaces to an external address book is also a possibility.
There are some very interesting options to use the message index information as a fast automatically collected database of "important" email addresses. This is done previously by JWZ (?) Big Brother Database for emacs mail reading. I do this manually by using the "Participants Filter" to find the email address of my intended recipient. I tried some experiments with this last week but have not finished it.
Mike
squeak-dev@lists.squeakfoundation.org