Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] matching code rewrite in OB1
From: Gleb Natapov (glebn_at_[hidden])
Date: 2007-12-12 15:20:49


On Wed, Dec 12, 2007 at 11:57:11AM -0500, Jeff Squyres wrote:
> Gleb --
>
> How about making a tarball with this patch in it that can be thrown at
> everyone's MTT? (we can put the tarball on www.open-mpi.org somewhere)
I don't have access to www.open-mpi.org, but I can send you the patch.
I can send you a tarball too, but I prefer to not abuse email.

>
>
> On Dec 11, 2007, at 4:14 PM, Richard Graham wrote:
>
> > I will re-iterate my concern. The code that is there now is mostly
> > nine
> > years old (with some mods made when it was brought over to Open
> > MPI). It
> > took about 2 months of testing on systems with 5-13 way network
> > parallelism
> > to track down all KNOWN race conditions. This code is at the center
> > of MPI
> > correctness, so I am VERY concerned about changing it w/o some very
> > strong
> > reasons. Not apposed, just very cautious.
> >
> > Rich
> >
> >
> > On 12/11/07 11:47 AM, "Gleb Natapov" <glebn_at_[hidden]> wrote:
> >
> >> On Tue, Dec 11, 2007 at 08:36:42AM -0800, Andrew Friedley wrote:
> >>> Possibly, though I have results from a benchmark I've written
> >>> indicating
> >>> the reordering happens at the sender. I believe I found it was
> >>> due to
> >>> the QP striping trick I use to get more bandwidth -- if you back
> >>> down to
> >>> one QP (there's a define in the code you can change), the reordering
> >>> rate drops.
> >> Ah, OK. My assumption was just from looking into code, so I may be
> >> wrong.
> >>
> >>>
> >>> Also I do not make any recursive calls to progress -- at least not
> >>> directly in the BTL; I can't speak for the upper layers. The
> >>> reason I
> >>> do many completions at once is that it is a big help in turning
> >>> around
> >>> receive buffers, making it harder to run out of buffers and drop
> >>> frags.
> >>> I want to say there was some performance benefit as well but I
> >>> can't
> >>> say for sure.
> >> Currently upper layers of Open MPI may call BTL progress function
> >> recursively. I hope this will change some day.
> >>
> >>>
> >>> Andrew
> >>>
> >>> Gleb Natapov wrote:
> >>>> On Tue, Dec 11, 2007 at 08:03:52AM -0800, Andrew Friedley wrote:
> >>>>> Try UD, frags are reordered at a very high rate so should be a
> >>>>> good test.
> >>>> Good Idea I'll try this. BTW I thing the reason for such a high
> >>>> rate of
> >>>> reordering in UD is that it polls for MCA_BTL_UD_NUM_WC completions
> >>>> (500) and process them one by one and if progress function is
> >>>> called
> >>>> recursively next 500 completion will be reordered versus previous
> >>>> completions (reordering happens on a receiver, not sender).
> >>>>
> >>>>> Andrew
> >>>>>
> >>>>> Richard Graham wrote:
> >>>>>> Gleb,
> >>>>>> I would suggest that before this is checked in this be tested
> >>>>>> on a
> >>>>>> system
> >>>>>> that has N-way network parallelism, where N is as large as you
> >>>>>> can find.
> >>>>>> This is a key bit of code for MPI correctness, and out-of-order
> >>>>>> operations
> >>>>>> will break it, so you want to maximize the chance for such
> >>>>>> operations.
> >>>>>>
> >>>>>> Rich
> >>>>>>
> >>>>>>
> >>>>>> On 12/11/07 10:54 AM, "Gleb Natapov" <glebn_at_[hidden]> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I did a rewrite of matching code in OB1. I made it much
> >>>>>>> simpler and 2
> >>>>>>> times smaller (which is good, less code - less bugs). I also
> >>>>>>> got rid
> >>>>>>> of huge macros - very helpful if you need to debug something.
> >>>>>>> There
> >>>>>>> is no performance degradation, actually I even see very small
> >>>>>>> performance
> >>>>>>> improvement. I ran MTT with this patch and the result is the
> >>>>>>> same as on
> >>>>>>> trunk. I would like to commit this to the trunk. The patch is
> >>>>>>> attached
> >>>>>>> for everybody to try.
> >>>>>>>
> >>>>>>> --
> >>>>>>> Gleb.
> >>>>>>> _______________________________________________
> >>>>>>> devel mailing list
> >>>>>>> devel_at_[hidden]
> >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>>> _______________________________________________
> >>>>>> devel mailing list
> >>>>>> devel_at_[hidden]
> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>> _______________________________________________
> >>>>> devel mailing list
> >>>>> devel_at_[hidden]
> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>
> >>>> --
> >>>> Gleb.
> >>>> _______________________________________________
> >>>> devel mailing list
> >>>> devel_at_[hidden]
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>> _______________________________________________
> >>> devel mailing list
> >>> devel_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >> --
> >> Gleb.
> >> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> Cisco Systems
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
			Gleb.