Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Progress of the asynchronous messages
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-11-06 13:57:01


Have a look at the FAQ; we discuss quite a few of these kinds of issues:

- http://www.open-mpi.org/faq/?category=tuning
- http://www.open-mpi.org/faq/?category=openfabrics

More specifically, what Eugene is saying is correct -- OMPI has made
tradeoffs for various, complicated reasons. One of the things that we
sacrificed in the common case was communication/computation overlap on
OpenFabrics networks (in the common case).

If you want good overlap, set the MCA parameter mpi_leave_pinned to 1
(on OpenFabrics networks). This will effectively move the bulk of the
message passing progress (but not all of it) down to the hardware.
Hence, when you sleep/do real computations while looping over
MPI_TEST, the message is probably actually being progressed in the
background. You won't see this kind of overlap with other transports
such as shared memory or TCP because we don't have hardware assist in
these cases.

On Nov 6, 2008, at 12:52 PM, Eugene Loh wrote:

> vladimir marjanovic wrote:
>>
>> In order to overlap communication and computation I don't want to
>> use MPI_Wait.
> Right. One thing to keep in mind is that there are two ways of
> overlapping communication and computation. One is you start a send
> (MPI_Isend), you do a bunch of computation while the message is
> being sent, and then after the message has been sent you call
> MPI_Wait just to clean up. This assumes that the MPI implementation
> can send a message while control of the program has been returned to
> you. The experts can give you the fine print, but my simple
> assertion is, "This doesn't usually happen."
>
> Rather, the MPI implementation typically will send data only when
> your code is in some MPI call. That's why you have to call MPI_Test
> periodically... or some other MPI function.
>> For sure the message is being decomposed into chucks and the size
>> of chuck is probably defined by environment variable.
>> Maybe do you know how can I control size of chuck?
> I don't. Try running "ompi_info -a" and looking through the
> parameters. For the shared-memory BTL, it's
> mca_btl_sm_max_frag_size. I also see something like
> coll_sm_fragment_size. Maybe look at the parameters that have
> "btl_openib_max" in their names.
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems