Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem-Bug with MPI_Intercomm_create()
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-11-04 08:17:40


After some discussion on the devel list, I opened https://svn.open-mpi.org/trac/ompi/ticket/2904 to track the issue.

On Oct 25, 2011, at 12:08 PM, Ralph Castain wrote:

> FWIW: I have tracked this problem down. The fix is a little more complicated then I'd like, so I'm going to have to ping some other folks to ensure we concur on the approach before doing something.
>
> On Oct 25, 2011, at 8:20 AM, Ralph Castain wrote:
>
>> I still see it failing the test George provided on the trunk. I'm unaware of anyone looking further into it, though, as the prior discussion seemed to just end.
>>
>> On Oct 25, 2011, at 7:01 AM, orel wrote:
>>
>>> Dears,
>>>
>>> I try from several days to use advanced MPI2 features in the following scenario :
>>>
>>> 1) a master code A (of size NPA) spawns (MPI_Comm_spawn()) two slave
>>> codes B (of size NPB) and C (of size NPC), providing intercomms A-B and A-C ;
>>> 2) i create intracomm AB and AC by merging intercomms ;
>>> 3) then i create intercomm AB-C by calling MPI_Intercomm_create() by using AC as bridge...
>>>
>>> MPI_Comm intercommABC; A: MPI_Intercomm_create(intracommAB, 0, intracommAC, NPA, TAG,&intercommABC);
>>> B: MPI_Intercomm_create(intracommAB, 0, MPI_COMM_NULL, 0,TAG,&intercommABC);
>>> C: MPI_Intercomm_create(intracommC, 0, intracommAC, 0, TAG,&intercommABC);
>>>
>>> In these calls, A0 and C0 play the role of local leader for AB and C respectively.
>>> C0 and A0 play the roles of remote leader in bridge intracomm AC.
>>>
>>> 3) MPI_Barrier(intercommABC);
>>> 4) i merge intercomm AB-C into intracomm ABC$
>>> 5) MPI_Barrier(intracommABC);
>>>
>>> My BUG: These calls success, but when i try to use intracommABC for a collective communication like MPI_Barrier(),
>>> i got the following error :
>>>
>>> *** An error occurred in MPI_Barrier
>>> *** on communicator
>>> *** MPI_ERR_INTERN: internal error
>>> *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
>>>
>>>
>>> I try with OpenMPI trunk, 1.5.3, 1.5.4 and Mpich2-1.4.1p1
>>>
>>> My code works perfectly if intracomm A, B and C are obtained by MPI_Comm_split() instead of MPI_Comm_spawn() !!!!
>>>
>>>
>>> I found same problem in a previous thread of the OMPI Users mailing list :
>>>
>>> => http://www.open-mpi.org/community/lists/users/2011/06/16711.php
>>>
>>> Is that bug/problem is currently under investigation ? :-)
>>>
>>> i can give detailed code, but the one provided by George Bosilca in this previous thread provides same error...
>>>
>>> Thank you to help me...
>>>
>>> --
>>> Aurélien Esnard
>>> University Bordeaux 1 / LaBRI / INRIA (France)
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/