Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Progress of the asynchronous messages
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-11-06 14:00:07

George is right -- you *can* do this, but it is *not advised* (you'll
likely run out of memory or other resources pretty quickly -- if you
can run at all!). :-)

Try mpi_leave_pinned, and check out those FAQ sections that I sent,
particularly the OpenFabrics section, for how to specifically tune
various behaviors of the openib BTL.

On Nov 6, 2008, at 1:52 PM, George Bosilca wrote:

> In order to get good performance out of your test application, the
> whole message has to be send in just one fragment. The reason is
> that as long as there is no progress thread for the MPI library
> (internal to the library), there is no way to make progress.
> Now, I can explain how to do this, but trust me this is an ugly
> hack, that make your application MPI implementation specific, i.e.
> not portable in terms of performance. But, I guess this decision is
> up to you. The really bad thing that might happens, is that in the
> case the receiver is slower that the sender, you will buffer all
> this eager message or messages in the receiver memory (what a
> waste), you will use a lot more memory copies and you give up the
> possibility to use the RMA features available on your network. So
> yes, your specific code will maybe/eventually runs faster, but the
> price to pay is way to expensive [from my perspective].
> Here is how you can do this: Based on the network you use (open ib
> in this case), the parameter selecting the first fragment size is
> called *_eager_limit. Do a "ompi_info --param btl openib", grep for
> eager_limit to figure out the name of the argument, and set it using
> "--mca <name> value" to the value that you want. As an example, I
> think this will work for openib: "--mca btl_openib_eager_limit
> 8388648" (8388608 + 40 for internal headers).
> george.
> On Nov 6, 2008, at 12:52 PM, Eugene Loh wrote:
>> vladimir marjanovic wrote:
>>> In order to overlap communication and computation I don't want to
>>> use MPI_Wait.
>> Right. One thing to keep in mind is that there are two ways of
>> overlapping communication and computation. One is you start a send
>> (MPI_Isend), you do a bunch of computation while the message is
>> being sent, and then after the message has been sent you call
>> MPI_Wait just to clean up. This assumes that the MPI
>> implementation can send a message while control of the program has
>> been returned to you. The experts can give you the fine print, but
>> my simple assertion is, "This doesn't usually happen."
>> Rather, the MPI implementation typically will send data only when
>> your code is in some MPI call. That's why you have to call
>> MPI_Test periodically... or some other MPI function.
>>> For sure the message is being decomposed into chucks and the size
>>> of chuck is probably defined by environment variable.
>>> Maybe do you know how can I control size of chuck?
>> I don't. Try running "ompi_info -a" and looking through the
>> parameters. For the shared-memory BTL, it's
>> mca_btl_sm_max_frag_size. I also see something like
>> coll_sm_fragment_size. Maybe look at the parameters that have
>> "btl_openib_max" in their names.
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
> _______________________________________________
> users mailing list
> users_at_[hidden]

Jeff Squyres
Cisco Systems