Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with MPI_Barrier (Inter-communicator)
From: Thatyene Louise Alves de Souza Ramos (thatyene_at_[hidden])
Date: 2012-04-09 19:00:27


Hi Edgar, sorry about the late response. I've been travelling without
Internet access.

Well, I took the code Rodrigo provided and modified the client to make the
dup after the creation of the new inter communicator, without 1 process.
That is, I just replaced the lines 54-55 in the *removeRank* method with my
if-else block.

I tried this because call a new create after the first create did not work
and I thought it would might be the communicator . So, I tried to duplicate
the inter communicator to see if worked.

Thanks.

Thatyene Ramos.

On Thu, Apr 5, 2012 at 5:10 PM, Edgar Gabriel <gabriel_at_[hidden]> wrote:

> so just to confirm, I ran our test suite for inter-communicator
> collective operations and communicator duplication, and everything still
> works. Specifically comm_dup on an intercommunicator is not
> fundamentally broken, but worked for my tests.
>
> Having your code to see what your code precisely does would help me to
> hunt the problem down, since I am otherwise not able to reproduce the
> problem.
>
> Also, which version of Open MPI did you use?
>
> Thanks
> Edgar
>
> On 4/4/2012 3:09 PM, Thatyene Louise Alves de Souza Ramos wrote:
> > Hi Edgar, thank you for the response.
> >
> > Unfortunately, I've tried with and without this option. In both the
> > result was the same... =(
> >
> > On Wed, Apr 4, 2012 at 5:04 PM, Edgar Gabriel <gabriel_at_[hidden]
> > <mailto:gabriel_at_[hidden]>> wrote:
> >
> > did you try to start the program with the --mca coll ^inter switch
> that
> > I mentioned? Collective dup for intercommunicators should work, its
> > probably again the bcast over a communicator of size 1 that is
> causing
> > the hang, and you could avoid it with the flag that I mentioned
> above.
> >
> > Also, if you could attach your test code, that would help in hunting
> > things down.
> >
> > Thanks
> > Edgar
> >
> > On 4/4/2012 2:18 PM, Thatyene Louise Alves de Souza Ramos wrote:
> > > Hi there.
> > >
> > > I've made some tests related to the problem reported by Rodrigo.
> And I
> > > think, I'd rather be wrong, that /collective calls like Create and
> Dup
> > > do not work with Inter communicators. I've try this in the client
> > group:/
> > >
> > > *MPI::Intercomm tmp_inter_comm;*
> > > *
> > > *
> > > *tmp_inter_comm = server_comm.Create
> (server_comm.Get_group().Excl(1,
> > > &rank));*
> > > *
> > > *
> > > *if(server_comm.Get_rank() != rank)*
> > > *server_comm = tmp_inter_comm.Dup();*
> > > *else*
> > > *server_comm = MPI::COMM_NULL;*
> > > *
> > > *
> > > The server_comm is the original inter communicator with the server
> > group.
> > >
> > > I've noticed that the program hangs in the Dup call. It seems that
> the
> > > tmp_inter_comm created without one process still has this process,
> > > because the other processes are waiting for it call the Dup too.
> > >
> > > What do you think?
> > >
> > > On Wed, Mar 28, 2012 at 6:03 PM, Edgar Gabriel <gabriel_at_[hidden]
> > <mailto:gabriel_at_[hidden]>
> > > <mailto:gabriel_at_[hidden] <mailto:gabriel_at_[hidden]>>> wrote:
> > >
> > > it just uses a different algorithm which avoids the bcast on a
> > > communicator of 1 (which is causing the problem here).
> > >
> > > Thanks
> > > Edgar
> > >
> > > On 3/28/2012 12:08 PM, Rodrigo Oliveira wrote:
> > > > Hi Edgar,
> > > >
> > > > I tested the execution of my code using the option -mca coll
> > ^inter as
> > > > you suggested and the program worked fine, even when I use 1
> > server
> > > > instance.
> > > >
> > > > What is the modification caused by this parameter? I did not
> > find an
> > > > explanation about the utilization of the module coll inter.
> > > >
> > > > Thanks a lot for your attention and for the solution.
> > > >
> > > > Best regards,
> > > >
> > > > Rodrigo Oliveira
> > > >
> > > > On Tue, Mar 27, 2012 at 1:10 PM, Rodrigo Oliveira
> > > > <rsilva.oliveira_at_[hidden]
> > <mailto:rsilva.oliveira_at_[hidden]> <mailto:rsilva.oliveira_at_[hidden]
> > <mailto:rsilva.oliveira_at_[hidden]>>
> > > <mailto:rsilva.oliveira_at_[hidden]
> > <mailto:rsilva.oliveira_at_[hidden]>
> > > <mailto:rsilva.oliveira_at_[hidden]
> > <mailto:rsilva.oliveira_at_[hidden]>>>> wrote:
> > > >
> > > >
> > > > Hi Edgar.
> > > >
> > > > Thanks for the response. I just did not understand why
> > the Barrier
> > > > works before I remove one of the client processes.
> > > >
> > > > I tryed it with 1 server and 3 clients and it worked
> > properly.
> > > After
> > > > I removed 1 of the clients, it stops working. So, the
> > removal is
> > > > affecting the functionality of Barrier, I guess.
> > > >
> > > > Anyone has an idea?
> > > >
> > > >
> > > > On Mon, Mar 26, 2012 at 12:34 PM, Edgar Gabriel
> > > <gabriel_at_[hidden] <mailto:gabriel_at_[hidden]>
> > <mailto:gabriel_at_[hidden] <mailto:gabriel_at_[hidden]>>
> > > > <mailto:gabriel_at_[hidden] <mailto:gabriel_at_[hidden]>
> > <mailto:gabriel_at_[hidden] <mailto:gabriel_at_[hidden]>>>> wrote:
> > > >
> > > > I do not recall on what the agreement was on how to
> > treat
> > > the size=1
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > users_at_[hidden] <mailto:users_at_[hidden]>
> > <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>
> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden] <mailto:users_at_[hidden]>
> > <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden] <mailto:users_at_[hidden]>
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > --
> > Edgar Gabriel
> > Associate Professor
> > Parallel Software Technologies Lab http://pstl.cs.uh.edu
> > Department of Computer Science University of Houston
> > Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
> > Tel: +1 (713) 743-3857 <tel:%2B1%20%28713%29%20743-3857>
> > Fax: +1 (713) 743-3335 <tel:%2B1%20%28713%29%20743-3335>
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden] <mailto:users_at_[hidden]>
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> --
> Edgar Gabriel
> Associate Professor
> Parallel Software Technologies Lab http://pstl.cs.uh.edu
> Department of Computer Science University of Houston
> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>