Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] barrier before calling del_procs
From: Yossi Etigin (yosefe_at_[hidden])
Date: 2014-07-21 13:10:56


I see. But in branch v1.8, in 31869, Ralph reverted the commit which moved del_procs after the barrier:
  "Revert r31851 until we can resolve how to close these leaks without causing the usnic BTL to fail during disconnect of intercommunicators
   Refs #4643"
Also, we need an rte barrier after del_procs - because otherwise rankA could call pml_finalize() before rankB finishes disconnecting from rankA.

I think the order in finalize should be like this:
        1. mpi_barrier(world)
        2. del_procs()
        3. rte_barrier()
        4. pml_finalize()

-----Original Message-----
From: Nathan Hjelm [mailto:hjelmn_at_[hidden]]
Sent: Monday, July 21, 2014 8:01 PM
To: Open MPI Developers
Cc: Yossi Etigin
Subject: Re: [OMPI devel] barrier before calling del_procs

I should add that it is an rte barrier and not an MPI barrier for technical reasons.

-Nathan

On Mon, Jul 21, 2014 at 09:42:53AM -0700, Ralph Castain wrote:
> We already have an rte barrier before del procs
>
> Sent from my iPhone
> On Jul 21, 2014, at 8:21 AM, Yossi Etigin <yosefe_at_[hidden]> wrote:
>
> Hi,
>
>
>
> We get occasional hangs with MTL/MXM during finalize, because a global
> synchronization is needed before calling del_procs.
>
> e.g rank A may call del_procs() and disconnect from rank B, while rank B
> is still working.
>
> What do you think about adding an MPI barrier on COMM_WORLD before
> calling del_procs()?
>
>

> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15204.php