Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r23936
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-10-26 17:39:24


On Oct 26, 2010, at 1:27 PM, Ralph Castain wrote:

> I think we can do the old libevent for now as the trunk doesn't exploit the new 2.0 features yet (though I have some implemented in a branch that is now on hold). However, if we can fix shared memory quickly (and Sam appears to have something that works, though isn't fully verified yet), and can resolve the performance question quickly, I would MUCH rather not waste my time on retrofitting 1.4!

Sorry I had to drop off the call today (tornados in my area!).

After digging around in the new libevent a bit, I found the problem -- it's exactly what I said in my first mail: libevent called poll() with an infinite timeout. I talked with Brian and we're pretty sure we have the right solution. I committed it in r23957.

Ralph committed a performance fix in r23956 (i.e., disable libevent's threading support -- we need to evaluate what this means for MPI_THREAD_MULTIPLE). Testing shows that this puts us back in the right performance ballpark; attached are 2 graphs of NetPIPE that I ran on 2 wolfdale-class machines at Cisco. I ran with the trunk HEAD (after the libevent fix commits from today) and with a commit from before all the libevent upgrades.

*** Confirmation of this data from another site would be greatly appreciated.

In short, the graphs show:

- TCP BTL performance over gigE and IPoIB is the same (between the 2 machines)
- SM BTL performance is a skosh lower in the new libevent (on 1 machine)

Note that these were DEBUG builds -- optimized builds would be a little better (particularly in SM latency). Ralph and I discussed a performance tweak that he's going to implement tonight. We think/hope will put the SM latency/bandwidth right back where it was before the upgrade -- i.e., we think it'll erase the small performance difference.

-----

As such, given that everything *seems* to be working properly, and *seems* to be back at the old performance level, I personally don't think it's worth it to do a libevent component of the old version. I had thought it would be an easy component to do, but apparently it's not (i.e., it would be a 2-3 days' worth of work -- which doesn't seem worth it to me). I think our time would be better suited to tuning up the new libevent properly.

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/