Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] hcoll destruction via MPI attribute
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-01-10 09:55:44


On Jan 10, 2014, at 9:49 AM, George Bosilca <bosilca_at_[hidden]> wrote:

> As I said, this is the case today. There are ongoing discussion in the MPI Forum to relax the wording of the MPI_Comm_free as most of the MPI implementations do not rely on the strict “collective” behavior of the MPI_Comm_free (in the sense that it has to be called by all processes but not necessarily in same time).

That will be an interesting discussion. I look forward to your proposal. :-)

>> I still agree with this point, though — even though COMM_FREE is collective, you could still get into ordering / deadlock issues if you're (effectively) doing communication inside it.
>
> As long as the call is collective and the same attributes exists on all communicators I don’t see how the deadlock is possible. My wording was more a precaution for the future than a restriction for today.

Here's an example:

-----
MPI Comm comm;
// comm is setup as an hcoll-enabled communicator
if (rank == x) {
    MPI_Send(..., y, tag, MPI_COMM_WORLD);
    MPI_Comm_free(comm);
} else if (rank == y) {
    MPI_Comm_free(comm);
    MPI_Recv(..., x, tag, MPI_COMM_WORLD);
}
------

If the hcoll teardown in the COMM_FREE blocks waiting for all of its peer COMM_FREEs in other processes in the communicator (e.g., due to blocking communication), rank x may block in MPI_SEND waiting for rank y's MPI_RECV, and therefore never invoke its COMM_FREE.

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/