Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: about dynamic/intercomm_create test from ibm test suite
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-05-27 20:33:45


Note that MPI says that COMM_DISCONNECT simply disconnects that individual communicator. It does *not* guarantee that the processes involved will be fully disconnected.

So I think that the freeing of communicators is good app behavior, but it is not required by the MPI spec.

If OMPI is requiring this for correct termination, then something is wrong. MPI_FINALIZE is supposed to be collective across all connected MPI procs -- and if the parent and spawned procs in this test are still connected (because they have not disconnected all communicators between them), the FINALIZE is supposed to be collective across all of them.

This means that FINALIZE is allowed to block if it needs to, such that OMPI sending control messages to procs that are still "connected" (in the MPI sense) should never cause a race condition.

As such, this sounds like an OMPI bug.

On May 27, 2014, at 2:27 AM, Gilles Gouaillardet <gilles.gouaillardet_at_[hidden]> wrote:

> Folks,
>
> currently, the dynamic/intercomm_create test from the ibm test suite output the following messages :
>
> dpm_base_disconnect_init: error -12 in isend to process 1
>
> the root cause it task 0 tries to send messages to already exited tasks.
>
> one way of seeing things is that this is an application issue :
> task 0 should have MPI_Comm_free'd all its communicator before calling MPI_Comm_disconnect.
> This can be achieved via the attached patch
>
> an other way of seeing things is that this is a bug in OpenMPI.
> In this case, what would be the the right approach ?
> - automatically free communicators (if needed) when MPI_Comm_disconnect is invoked ?
> - simply remove communicators (if needed) from ompi_mpi_communicators when MPI_Comm_disconnect is invoked ?
> /* this causes a memory leak, but the application can be seen as responsible of it */
> - other ?
>
> Thanks in advance for your feedback,
>
> Gilles
> <intercomm_create.patch>_______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14847.php

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/