Sunday, September 27, 2009

OBM Barcamp

We spent friday afternoon & saturday on a barcamp dedicated to OBM with some co-workers from Paris.

At the end of the first afternoon, I was pretty dissapointed :
  • Lots of talking
  • No actions
My first impression was that I lost an afternoon of minig bugfixing. The second day changed my opinion.

Mehdi & Guillaume merged most of the patches we maintained for our main production setup.

Sylvain worked with Erwan to figure out the best way to make LemonLDAP + OBM + MiniG work out of the box.

David & Vincent did some improvements on 2.3 calendar.

Tony worked on obm-automation improvements to allow running update.pl at login time for auto-provisionning of mailboxes & ldap entries.

I worked with Michel & Guillaume to figure out how to handle really big mailboxes in MiniG. Michel mailbox is a 20GB one. To give you an idea, our cyrus setup in Toulouse is only 35GB for 44 mailboxes & lots of mailshares.

Michel's mailbox was a very interesting test-case. It triggered a "too many open files" pretty rapidly. We identified a very old bug : the class that read minig configuration files left a file descriptor open.

This jumbo mailbox has other properties : it is big enough to trigger bugs that only occured on Toulouse production setup after more than 24h. As we are still unsure if the deadlock timeout that occurs on some jdbc transactions is derby's fault or minig's fault, I just committed the switch from a derby internal database to an external PostgreSQL instance. This change will permit one thing: identify if the bug was in derby.

Some easier to fix bugs were also identified :
  • Advanced search misshandled searches on subject with two keywords
  • Tony helped me fix the filter that was used for autocomplete when ldapContacts.pl is active
Some decision were taken:
  • We confirmed that OBM feature & ui freeze will be effective on Oct 15th.
  • Contact search in OBM & MiniG will move from SQL queries to a SOLR core (we're not sure yet if it will meet the 2.3 freeze date). We will also try to do the same for events.
  • We are going to integrate a real LemonLDAP sso provider in minig. One that will not require OBM sso subsystem

If I consider the complete barcamp session, It was needed & productive.

Wednesday, September 23, 2009

Spinner of death (tm) Improvement

Sometime things can go wrong in MiniG, even with legitimate reasons :

  • You suspended your laptop and your minig session is expired when your browser comes back to life

  • Your beloved administrator restarted a tomcat server without notifying anybody

  • Your network was down more than 3 minutes (MiniG frontend session duration)



This case will be handled in a next releases.



Yes, the texts are temporary and TartifletteMode is only how an expired session is called in the code.

Thursday, September 17, 2009

(Temporary) Epic Fail

Changing MiniG mail grouping is more tricky than expected.

The previous algorithm worked like this :
- read the subject
- compute its root, "Fwd: Re: Hello sent the 2009-08-07" becomes "hello sent the xxxx-xx-xx"
- simply use an equality comparison on the subject roots.

First version with real threads used an hacked version of subject grouping. It was just a hack on the "equality comparison". Instead of comparing suject roots, it used lists of Message-ID headers and was doing comparisons on the differences between those 2. It worked. For mailboxes < 1000 messages. Sylvain's 100k mails Junk folder took more than 1hour to process. We needed a linear algorithm to do the thread grouping.

The new algorithm works on other headers : Message-ID and In-Reply-To.

It works this way. We maintain a ThreadRoot list which holds known conversations with the message-id's in them

When we detect that messages are added/changed or deleted to a folder, we first process removals. When a ThreadRoot has no more messages, we mark it as dead and flag it for removal.

Then we process updates, this is the tricky part.

We have 2 lists :
- List<ThreadRoot>, all the known thread roots not flagged as dead.
- a List<RawMessage> called leafCandidates where we have all the not yet processed messages.

We then run :


unmerged = 0;
merged = -1;
while (merged < unmerged) {
unmerged = leafCandidates.size();
doMerge();
merged = leafCandidates.size();
}


Then doMerge() does all the job :


for (RawMessage r : leafCandidates) {
if rootsIds.contains(r.messageId) {
flagUpdate(threadRoot, r);
} else if (r.inReplyTo == null) {
createThreadRoot(r);
} else {
ThreadRoot tr = rootIds.get(r.inReplyTo);
if (tr) {
tr.merge(r);
} else {
// corner cases, most problems are not here
// exemple : a mail with an In-Reply-To and the father mail is deleted
}
}
}


This algorithm is mostly linear... Its real complexity depends X, Y, Z where X is how much new mail you receive, Y how much you receive replies to existing emails and Z, how much the user changes flags on existing emails. It can easily process Sylvain 100k spams in 20sec, so let's assume the complixity is OK.

Most MiniG problems on our trunk version are in the "flagUpdate(threadRoot, r)" process. This creates all kind of strange bugs : conversations that still look read, mark as read/unread that only work "sometime".

Flags are still broken, but the new algorith is pretty promising. The load on our test & production mail server is lighter, really lighter. Some tuning of the last corner cases will make it a very good change. In fact the new code while still buggy is so fast that it exposed race conditions in the indexing code ;-)

Sunday, September 13, 2009

Long time no blog

Massive changes are coming to minig. Real thread sorting, by message actions, etc.

We added a second minig installation to our test setup. We now have a MiniG setup running on our production mail server, and another one running on a lenny 64bit kvm. Both are using our production mail server.

The one running in kvm allows a first round of testing on live mailboxes. Performance testing is easier on this one, as the IMAP load and the java load are cleanly splitted on two servers.

With this new setup I already identified an easy optimisation : the select imap command forces Cyrus to write a log line. Storing the selected imap folder in the org.minig.imap lib might provide sensible speed improvements to initial mailbox indexing.

Tuesday, September 08, 2009

Big minig bug hunt

I'm focused on chasing minig bugs.

We clearly improved our testing flow. Releases are tested everyday by ff3 users, ff3.5 users and chromium users (Sylvain & me dropped firefox now that 64bit builds are available). IE8 is often tested on our TSE server. I still need to find an easy way to do IE6 testing.


What's coming into minig :

- real thread grouping. User testing shown that "subject+ignore numbers" is only manageable to technical people.

- composer improvements (I'm fighting with GWT RichTextArea, and learning things I didn't want to know about browsers differences).

- by email actions. Delete one, Star one, Print one. Add sender to contacts, etc.

- next/previous buttons. I don't read my email like that, but a lots of people seems to think that when you read a conversation, next has a meaning.

- folder management improvents. Rename & move.

And lots of needed bugfixes......