Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Persistent Communication using MPI_SEND_INIT, MPI_RECV_INIT etc.
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-03-26 21:08:42


On Mar 25, 2013, at 10:21 PM, Timothy Stitt <Timothy.Stitt.9_at_[hidden]> wrote:

> I've inherited a MPI code that was written ~8-10 years ago

Always a fun situation to be in. :-)

> and it predominately uses MPI persistent communication routines for data transfers e.g. MPI_SEND_INIT, MPI_RECV_INIT, MPI_START etc. I was just wondering if using persistent communication calls is still the most efficient/scalable way to perform communication when the communication pattern is known and fixed amongst neighborhood processes? We regularly run the code across an IB network so would there be a benefit to rewrite the code using another approach (e.g. MPI one-sided communication)?

The answer is: it depends. :-)

Persistent is not a bad choice. It separates one-time setup from the actual communication, and OMPI does actually optimize that reasonably well. Hence, when you MPI_START a request, it's a pretty short hop until you get down to the actual network stack and start sending messages (not that normal MPI_SEND is expensive, mind you...).

That being said, there are *many* factors that contribute to overall MPI performance. Persistent vs. non-persistent communication is one, buffer re-use (for large messages) is another (especially over OpenFabrics networks, which you have/use, IIRC), pre-posting receives is another, ...etc.

It depends on your communication pattern, how much registered memory you have available (*** be sure to see http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem -- even if you're not [yet] seeing those warnings ***), the network distance of each process, etc.

I'm not a huge fan of MPI one-sided, but there are definitely applications which fit its model quite well. And I'm told by people that I trust to understand that stuff much more deeply than me that the new MPI-3 one-sided stuff is *good* -- albeit complex. Before trying to use that stuff, be sure to read the MPI-3 one-sided chapter (not MPI-2.2 -- one-sided got revamped in MPI-3) and really understand it. Open MPI's MPI-3 one-sided implementation is not yet complete, but Brian is actively working on it.

There's also the neighborhood collectives that were introduced in MPI-3, which may or may not help you. We don't have these ready yet, either. I believe MPICH may have implementations of the neighborhood collectives; I don't know if they've had time to optimize them yet or not (you should ask them) -- i.e., I don't know if they're more optimized than a bunch of Send_init's at the beginning of time and calling Startall() periodically. YMMV -- you'll probably need to do some benchmarking to figure out what's best for your application.

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/