Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-05-27 20:46:38


Since you ignored my response, I'll reiterate and clarify it here. The problem in the case of loop_spawn is that the parent process remains "connected" to children after the child has finalized and died. Hence, when the parent attempts to finalize, it tries to "disconnect" itself from processes that no longer exist - and that is what generates the error message.

So the issue in that case appears to be that "finalize" is not marking the child process as "disconnected", thus leaving the parent thinking that it needs to disconnect when it finally ends.

On May 27, 2014, at 5:33 PM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:

> Note that MPI says that COMM_DISCONNECT simply disconnects that individual communicator. It does *not* guarantee that the processes involved will be fully disconnected.
>
> So I think that the freeing of communicators is good app behavior, but it is not required by the MPI spec.
>
> If OMPI is requiring this for correct termination, then something is wrong. MPI_FINALIZE is supposed to be collective across all connected MPI procs -- and if the parent and spawned procs in this test are still connected (because they have not disconnected all communicators between them), the FINALIZE is supposed to be collective across all of them.
>
> This means that FINALIZE is allowed to block if it needs to, such that OMPI sending control messages to procs that are still "connected" (in the MPI sense) should never cause a race condition.
>
> As such, this sounds like an OMPI bug.
>
>
>
>
> On May 27, 2014, at 2:27 AM, Gilles Gouaillardet <gilles.gouaillardet_at_[hidden]> wrote:
>
>> Folks,
>>
>> currently, the dynamic/intercomm_create test from the ibm test suite output the following messages :
>>
>> dpm_base_disconnect_init: error -12 in isend to process 1
>>
>> the root cause it task 0 tries to send messages to already exited tasks.
>>
>> one way of seeing things is that this is an application issue :
>> task 0 should have MPI_Comm_free'd all its communicator before calling MPI_Comm_disconnect.
>> This can be achieved via the attached patch
>>
>> an other way of seeing things is that this is a bug in OpenMPI.
>> In this case, what would be the the right approach ?
>> - automatically free communicators (if needed) when MPI_Comm_disconnect is invoked ?
>> - simply remove communicators (if needed) from ompi_mpi_communicators when MPI_Comm_disconnect is invoked ?
>> /* this causes a memory leak, but the application can be seen as responsible of it */
>> - other ?
>>
>> Thanks in advance for your feedback,
>>
>> Gilles
>> <intercomm_create.patch>_______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14847.php
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14875.php