If this change the behavior of MPI_Abort to only abort processes on the specified communicator how this doesn't affects the default user experience (when today it aborts everything)?
If we accept the fact that MPI_Abort will only abort the processes in the current communicator what happens with the other processes in the same MPI_COMM_WORLD (but not on the communicator that has been used by MPI_Abort)? What about all the other connected processes (based on the connectivity as defined in the MPI standard in Section 10.5.4) ? Do they see this as a fault?
On Jun 9, 2011, at 16:32 , Josh Hursey wrote:
> WHAT: Fix missing code in MPI_Abort
> WHY: MPI_Abort is missing logic to ask for termination of the process
> group defined by the communicator
> WHERE: Mostly orte/mca/errmgr
> WHEN: Open MPI trunk
> TIMEOUT: Tuesday, June 14, 2011 (after teleconf)
> A bitbucket branch is available here (last sync to r24757 of trunk)
> In the MPI Standard (v2.2) Section 8.7 after the introduction of
> MPI_Abort, it states:
> "This routine makes a best attempt to abort all tasks in the group of comm."
> Open MPI currently only calls orte_errmgr.abort() to abort the calling
> process itself. The code to ask for the abort of the other processes
> in the group defined by the communicator is commented out. Since one
> process calling abort currently causes all processes in the job to
> abort, it has not been a big deal. However as the group starts
> exploring better resilience in the OMPI layer (with further support
> from the ORTE layer) this aspect of MPI_Abort will become more
> necessary to get right.
> This branch adds back the logic necessary for a single process calling
> MPI_Abort to request, from ORTE errmgr, that a defined subgroup of
> processes be aborted. Once the request is sent to the HNP, the local
> process then calls abort on itself. The HNP requests that the defined
> subgroup of processes be terminated using the existing plm mechanisms
> for doing so.
> This change has no effect on the current default user experienced
> behavior of MPI_Abort.
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> devel mailing list