Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with MPI_Intercomm_create
From: Edgar Gabriel (gabriel_at_[hidden])
Date: 2011-06-07 11:31:51


On 6/7/2011 10:23 AM, George Bosilca wrote:
>
> On Jun 7, 2011, at 11:00 , Edgar Gabriel wrote:
>
>> George,
>>
>> I did not look over all the details of your test, but it looks to
>> me like you are violating one of the requirements of
>> intercomm_create namely the request that the two groups have to be
>> disjoint. In your case the parent process(es) are part of both
>> local intra-communicators, isn't it?
>
> The two groups of the two local communicators are disjoints. One
> contains A,B while the other only C. The bridge communicator contains
> A,C.
>
> I'm confident my example is supposed to work. At least for Open MPI
> the error is under the hood, as the resulting inter-communicator is
> valid but contains NULL endpoints for the remote process.

I'll come back to that later, I am not yet convinced that your code is
correct :-) Your local groups might be disjoint, but I am worried about
the ranks of the remote leader in your example. THey can not be 0 from
both groups perspective.

>
> Regarding the fact that the two leader should be separate processes,
> you will not find any wording about this in the current version of
> the standard. In the 1.1 there were two opposite sentences about this
> one stating that the two groups can be disjoint, while the other
> claiming that the two leaders can be the same process. After
> discussion, the agreement was that the two groups have to be
> disjoint, and the standard has been amended to match the agreement.

I realized that this is a non-issue. If the two local groups are
disjoint, there is no way that the two local leaders are the same process.

Thanks
Edgar

>
> george.
>
>
>>
>> I just have MPI-1.1. at hand right now, but here is what it says:
>> ----
>>
>> Overlap of local and remote groups that are bound into an
>> inter-communicator is prohibited. If there is overlap, then the
>> program is erroneous and is likely to deadlock.
>>
>> ---- so bottom line is that the two local intra-communicators that
>> are being used have to be disjoint, and the bridgecomm needs to be
>> a communicator where at least one process of each of the two
>> disjoint groups need to be able to talk to each other.
>> Interestingly I did not find a sentence whether it is allowed to be
>> the same process, or whether the two local leaders need to be
>> separate processes...
>>
>>
>> Thanks Edgar
>>
>>
>> On 6/7/2011 12:57 AM, George Bosilca wrote:
>>> Frederic,
>>>
>>> Attached you will find an example that is supposed to work. The
>>> main difference with your code is on T3, T4 where you have
>>> inversed the local and remote comm. As depicted on the picture
>>> attached below, during the 3th step you will create the intercomm
>>> between ab and c (no overlap) using ac as a bridge communicator
>>> (here the two roots, a and c, can exchange messages).
>>>
>>> Based on the MPI 2.2 standard, especially on the paragraph in
>>> PS:, the attached code should have been working. Unfortunately, I
>>> couldn't run it successfully neither with Open MPI trunk nor
>>> MPICH2 1.4rc1.
>>>
>>> george.
>>>
>>> PS: Here is what the MPI standard states about the
>>> MPI_Intercomm_create:
>>>> The function MPI_INTERCOMM_CREATE can be used to create an
>>>> inter-communicator from two existing intra-communicators, in
>>>> the following situation: At least one selected member from each
>>>> group (the “group leader”) has the ability to communicate with
>>>> the selected member from the other group; that is, a “peer”
>>>> communicator exists to which both leaders belong, and each
>>>> leader knows the rank of the other leader in this peer
>>>> communicator. Furthermore, members of each group know the rank
>>>> of their leader.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Jun 1, 2011, at 05:00 , Frédéric Feyel wrote:
>>>
>>>> Hello,
>>>>
>>>> I have a problem using MPI_Intercomm_create.
>>>>
>>>> I 5 tasks, let's say T0, T1, T2, T3, T4 resulting from two
>>>> spawn operations by T0.
>>>>
>>>> So I have two intra-communicator :
>>>>
>>>> intra0 contains : T0, T1, T2 intra1 contains : T0, T3, T4
>>>>
>>>> my goal is to make a collective loop to build a single
>>>> intra-communicator containing T0, T1, T2, T3, T4
>>>>
>>>> I tried to do it using MPI_Intercomm_create and
>>>> MPI_Intercom_merge calls, but without success (I always get MPI
>>>> internal errors).
>>>>
>>>> What I am doing :
>>>>
>>>> on T0 : *******
>>>>
>>>> MPI_Intercom_create(intra0,0,intra1,0,1,&new_com)
>>>>
>>>> on T1 and T2 : **************
>>>>
>>>> MPI_Intercom_create(intra0,0,MPI_COMM_WORLD,0,1,&new_com)
>>>>
>>>> on T3 and T4 : **************
>>>>
>>>> MPI_Intercom_create(intra1,0,MPI_COMM_WORLD,0,1,&new_com)
>>>>
>>>>
>>>> I'm certainly missing something. Could anybody help me to solve
>>>> this problem ?
>>>>
>>>> Best regards,
>>>>
>>>> Frédéric.
>>>>
>>>> PS : of course I did an extensive web search without finding
>>>> anything usefull on my problem.
>>>>
>>>> _______________________________________________ users mailing
>>>> list users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>> _______________________________________________ users mailing
>>> list users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> -- Edgar Gabriel Assistant Professor Parallel Software Technologies
>> Lab http://pstl.cs.uh.edu Department of Computer Science
>> University of Houston Philip G. Hoffman Hall, Room 524
>> Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax:
>> +1 (713) 743-3335
>>
>> _______________________________________________ users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________ users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335