Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] hcoll destruction via MPI attribute
From: George Bosilca (bosilca_at_[hidden])
Date: 2014-01-10 09:19:06


On Jan 10, 2014, at 14:50 , Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:

> On Jan 9, 2014, at 12:05 PM, Joshua Ladd <joshual_at_[hidden]> wrote:
>
>> [Josh] We have a recursive doubling algorithm in progress implemented with PML send/recvs, more accurately , with "RTE_isend/RTE_irecv" functions, which, in the case of OMPI are PML calls.
>
> Does that mean that you’ll be blocking (effectively) in the communicator destruction function?

I’m not sure I understand what you call the “communicator destruction function”. I can see two options here: user perspective (MPI_Comm_free) or ompi perspective (the communicator destructor). As I explained in my previous email if they post requests on the communicator then the communicator destructor will never be called before they cancel their pending requests. Thus, it is critical that they cleanup their internal stuff as early as possible in the MPI_Comm_free tear down sequence, and here the attribute is a perfect approach.

> I *think* that's ok, but I'm not 100% sure... Brian / George / Nathan: can you confirm?
>
> I ask because the standard does not specify what is allowed in attribute callback functions — which, by omission, means that *everything* is allowed, but I don't know how well tested code paths are that invoke arbitrary MPI (PML) functionality inside communicator teardown.

From the perspective of the MPI 3.0 standard and the current code of Open MPI, this approach is perfectly legal and should work.

However, one should keep in mind that MPI_Comm_free does not have to be a collective function, thus making any type of collective assumption/communications inside the attribute destructor might lead to deadlocks in future versions. In other words if the only thing you do in the attribute descriptor is tearing down locally posted requests, then you are safe. If you send data using the communicator then you’re definitively playing dangerously with the safety line.

  George.

>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel