Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with MPI_Barrier (Inter-communicator)
From: Edgar Gabriel (gabriel_at_[hidden])
Date: 2012-04-05 16:10:53


so just to confirm, I ran our test suite for inter-communicator
collective operations and communicator duplication, and everything still
works. Specifically comm_dup on an intercommunicator is not
fundamentally broken, but worked for my tests.

Having your code to see what your code precisely does would help me to
hunt the problem down, since I am otherwise not able to reproduce the
problem.

Also, which version of Open MPI did you use?

Thanks
Edgar

On 4/4/2012 3:09 PM, Thatyene Louise Alves de Souza Ramos wrote:
> Hi Edgar, thank you for the response.
>
> Unfortunately, I've tried with and without this option. In both the
> result was the same... =(
>
> On Wed, Apr 4, 2012 at 5:04 PM, Edgar Gabriel <gabriel_at_[hidden]
> <mailto:gabriel_at_[hidden]>> wrote:
>
> did you try to start the program with the --mca coll ^inter switch that
> I mentioned? Collective dup for intercommunicators should work, its
> probably again the bcast over a communicator of size 1 that is causing
> the hang, and you could avoid it with the flag that I mentioned above.
>
> Also, if you could attach your test code, that would help in hunting
> things down.
>
> Thanks
> Edgar
>
> On 4/4/2012 2:18 PM, Thatyene Louise Alves de Souza Ramos wrote:
> > Hi there.
> >
> > I've made some tests related to the problem reported by Rodrigo. And I
> > think, I'd rather be wrong, that /collective calls like Create and Dup
> > do not work with Inter communicators. I've try this in the client
> group:/
> >
> > *MPI::Intercomm tmp_inter_comm;*
> > *
> > *
> > *tmp_inter_comm = server_comm.Create (server_comm.Get_group().Excl(1,
> > &rank));*
> > *
> > *
> > *if(server_comm.Get_rank() != rank)*
> > *server_comm = tmp_inter_comm.Dup();*
> > *else*
> > *server_comm = MPI::COMM_NULL;*
> > *
> > *
> > The server_comm is the original inter communicator with the server
> group.
> >
> > I've noticed that the program hangs in the Dup call. It seems that the
> > tmp_inter_comm created without one process still has this process,
> > because the other processes are waiting for it call the Dup too.
> >
> > What do you think?
> >
> > On Wed, Mar 28, 2012 at 6:03 PM, Edgar Gabriel <gabriel_at_[hidden]
> <mailto:gabriel_at_[hidden]>
> > <mailto:gabriel_at_[hidden] <mailto:gabriel_at_[hidden]>>> wrote:
> >
> > it just uses a different algorithm which avoids the bcast on a
> > communicator of 1 (which is causing the problem here).
> >
> > Thanks
> > Edgar
> >
> > On 3/28/2012 12:08 PM, Rodrigo Oliveira wrote:
> > > Hi Edgar,
> > >
> > > I tested the execution of my code using the option -mca coll
> ^inter as
> > > you suggested and the program worked fine, even when I use 1
> server
> > > instance.
> > >
> > > What is the modification caused by this parameter? I did not
> find an
> > > explanation about the utilization of the module coll inter.
> > >
> > > Thanks a lot for your attention and for the solution.
> > >
> > > Best regards,
> > >
> > > Rodrigo Oliveira
> > >
> > > On Tue, Mar 27, 2012 at 1:10 PM, Rodrigo Oliveira
> > > <rsilva.oliveira_at_[hidden]
> <mailto:rsilva.oliveira_at_[hidden]> <mailto:rsilva.oliveira_at_[hidden]
> <mailto:rsilva.oliveira_at_[hidden]>>
> > <mailto:rsilva.oliveira_at_[hidden]
> <mailto:rsilva.oliveira_at_[hidden]>
> > <mailto:rsilva.oliveira_at_[hidden]
> <mailto:rsilva.oliveira_at_[hidden]>>>> wrote:
> > >
> > >
> > > Hi Edgar.
> > >
> > > Thanks for the response. I just did not understand why
> the Barrier
> > > works before I remove one of the client processes.
> > >
> > > I tryed it with 1 server and 3 clients and it worked
> properly.
> > After
> > > I removed 1 of the clients, it stops working. So, the
> removal is
> > > affecting the functionality of Barrier, I guess.
> > >
> > > Anyone has an idea?
> > >
> > >
> > > On Mon, Mar 26, 2012 at 12:34 PM, Edgar Gabriel
> > <gabriel_at_[hidden] <mailto:gabriel_at_[hidden]>
> <mailto:gabriel_at_[hidden] <mailto:gabriel_at_[hidden]>>
> > > <mailto:gabriel_at_[hidden] <mailto:gabriel_at_[hidden]>
> <mailto:gabriel_at_[hidden] <mailto:gabriel_at_[hidden]>>>> wrote:
> > >
> > > I do not recall on what the agreement was on how to
> treat
> > the size=1
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden] <mailto:users_at_[hidden]>
> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden] <mailto:users_at_[hidden]>
> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden] <mailto:users_at_[hidden]>
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> --
> Edgar Gabriel
> Associate Professor
> Parallel Software Technologies Lab http://pstl.cs.uh.edu
> Department of Computer Science University of Houston
> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
> Tel: +1 (713) 743-3857 <tel:%2B1%20%28713%29%20743-3857>
> Fax: +1 (713) 743-3335 <tel:%2B1%20%28713%29%20743-3335>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335