Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem-Bug with MPI_Intercomm_create()
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2012-05-18 08:06:12


I'm afraid there's on news so far. :-\

On May 14, 2012, at 4:21 PM, Aurélien Esnard wrote:

>
> Hi,
>
> No news, good news... ?
>
> Aurélien Esnard :-)
>
> On 11/04/2011 01:17 PM, Jeff Squyres wrote:
>> After some discussion on the devel list, I opened https://svn.open-mpi.org/trac/ompi/ticket/2904 to track the issue.
>>
>>
>> On Oct 25, 2011, at 12:08 PM, Ralph Castain wrote:
>>
>>> FWIW: I have tracked this problem down. The fix is a little more complicated then I'd like, so I'm going to have to ping some other folks to ensure we concur on the approach before doing something.
>>>
>>> On Oct 25, 2011, at 8:20 AM, Ralph Castain wrote:
>>>
>>>> I still see it failing the test George provided on the trunk. I'm unaware of anyone looking further into it, though, as the prior discussion seemed to just end.
>>>>
>>>> On Oct 25, 2011, at 7:01 AM, orel wrote:
>>>>
>>>>> Dears,
>>>>>
>>>>> I try from several days to use advanced MPI2 features in the following scenario :
>>>>>
>>>>> 1) a master code A (of size NPA) spawns (MPI_Comm_spawn()) two slave
>>>>> codes B (of size NPB) and C (of size NPC), providing intercomms A-B and A-C ;
>>>>> 2) i create intracomm AB and AC by merging intercomms ;
>>>>> 3) then i create intercomm AB-C by calling MPI_Intercomm_create() by using AC as bridge...
>>>>>
>>>>> MPI_Comm intercommABC; A: MPI_Intercomm_create(intracommAB, 0, intracommAC, NPA, TAG,&intercommABC);
>>>>> B: MPI_Intercomm_create(intracommAB, 0, MPI_COMM_NULL, 0,TAG,&intercommABC);
>>>>> C: MPI_Intercomm_create(intracommC, 0, intracommAC, 0, TAG,&intercommABC);
>>>>>
>>>>> In these calls, A0 and C0 play the role of local leader for AB and C respectively.
>>>>> C0 and A0 play the roles of remote leader in bridge intracomm AC.
>>>>>
>>>>> 3) MPI_Barrier(intercommABC);
>>>>> 4) i merge intercomm AB-C into intracomm ABC$
>>>>> 5) MPI_Barrier(intracommABC);
>>>>>
>>>>> My BUG: These calls success, but when i try to use intracommABC for a collective communication like MPI_Barrier(),
>>>>> i got the following error :
>>>>>
>>>>> *** An error occurred in MPI_Barrier
>>>>> *** on communicator
>>>>> *** MPI_ERR_INTERN: internal error
>>>>> *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
>>>>>
>>>>>
>>>>> I try with OpenMPI trunk, 1.5.3, 1.5.4 and Mpich2-1.4.1p1
>>>>>
>>>>> My code works perfectly if intracomm A, B and C are obtained by MPI_Comm_split() instead of MPI_Comm_spawn() !!!!
>>>>>
>>>>>
>>>>> I found same problem in a previous thread of the OMPI Users mailing list :
>>>>>
>>>>> => http://www.open-mpi.org/community/lists/users/2011/06/16711.php
>>>>>
>>>>> Is that bug/problem is currently under investigation ? :-)
>>>>>
>>>>> i can give detailed code, but the one provided by George Bosilca in this previous thread provides same error...
>>>>>
>>>>> Thank you to help me...
>>>>>
>>>>> --
>>>>> Aurélien Esnard
>>>>> University Bordeaux 1 / LaBRI / INRIA (France)
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/