Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Intercomm Merge
From: George Bosilca (bosilca_at_[hidden])
Date: 2013-09-17 17:01:57


Ralph,

On Sep 17, 2013, at 20:13 , Ralph Castain <rhc_at_[hidden]> wrote:

> I guess we could argue this for awhile, but I personally don't care how it gets fixed. The issue here is that (a) you promised to provide a "better" fix nearly a year ago, (b) it never happened, and © a user who has patiently waited all this time has asked if we could please fix it.

There seem to be some misunderstanding here. Believe me, I have neither time nor interest in vain arguments but in this case I was not arguing, I was just trying to be polite and explain the problem so that people can understand the real issue and the fact that there was no room for argument. Dot. Your patch is not correct, as it addresses a non existent issue in MPI. But it appears somehow I failed not make myself clear here, and you took my message as some kind of joke.

> It now works, but if you want to provide a better solution, please do - I have no issue with it. However, until you do, I propose to use what we have.

It does not work. It fixes a minimalistic corner case without addressing the real problem. You can leave it in the trunk if so you wish, but it definitively should not make it in the 1.7.

Let me try another approach. A complete test case for this is to add an MPI_Barrier on the intercom before the call to MPI_Intercomm_merge, the one merging the communicator created by MPI_Intercomm_create (this MPI_Barrier should be added on both the parent and the children code). If this test passes with the current patch, then I misunderstood your patch and I'm entirely in the wrong.

> As for the commit message, I really have no interest in spending time debating the proper way to say something. :-)

I do, words have meaning for a good reason. Please read the Section 6.6.2 in the MPI standard, and you will understand why your commit message was slightly distorting the reality of the MPI standard.

George.

>
>
> On Sep 17, 2013, at 10:40 AM, George Bosilca <bosilca_at_[hidden]> wrote:
>
>> Ralph,
>>
>> I don't think your patch is addressing the right issue. In fact your commit treat the wrong symptom instead of addressing the core issue that generate the problem. Let me explain this in terms of MPI.
>>
>> The MPI_Intercomm_merge function transform an inter-comm into an intra-comm, basically a two groups world into a single group world. Under the MPI standard the two groups handled by this function should be able to talk to each other in this inter-comm. So, your patch fixes a non existent problem, as the processes were already supposed to be able to communicate together before the MPI_Intercomm_merge. The real issue (which was highlighted in the original email exchange) is that during the MPI_Intercom_create the bridge communicator is not used to correctly exchange the modex of the two groups of processes.
>>
>> In addition I have two smaller issues related to this patch.
>>
>> 1. The commit message is misleading, at least from the MPI standpoint.
>>
>> 2. This function is one of the few MPI-2 dynamic processing functions that can be solved purely at the OMPI layer, without a need for extra functionality from the RTE. The infrastructure of the correct solution is already in the trunk, what is missing is the correct exchange of the complete modex information of the two groups instead of exchanging their OMPI_ARCH.
>>
>> Based on the fact that the band-aid is not really solving the right problem I propose the removal of this patch from the trunk, and the blocking of the pending CMR until a better solution is found.
>>
>> Thanks,
>> George.
>>
>>
>> On Sep 15, 2013, at 17:01 , Ralph Castain <rhc_at_[hidden]> wrote:
>>
>>> I fixed it and have filed a cmr to move it to 1.7.3
>>>
>>> Thanks for your patience, and for reminding me
>>> Ralph
>>>
>>> On Sep 13, 2013, at 12:05 PM, Suraj Prabhakaran <suraj.prabhakaran_at_[hidden]> wrote:
>>>
>>>> Dear Ralph, that would be great if you could give it a try. We have been hoping for it for a year now and it could greatly benefit us if this is fixed!! :-)
>>>>
>>>> Thanks!
>>>> Suraj
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Sep 13, 2013 at 5:39 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>> It has been a low priority issue, and hence not resolved yet. I doubt it will make 1.7.3, though if you need it, I'll give it a try.
>>>>
>>>> On Sep 13, 2013, at 7:21 AM, Suraj Prabhakaran <suraj.prabhakaran_at_[hidden]> wrote:
>>>>
>>>> > Hello,
>>>> >
>>>> > Is there a plan to fix the problem with MPI_Intercomm_merge with 1.7.3 as stated in this ticket? We are really in need of this at the moment. Any hints?
>>>> >
>>>> > We face the following problem.
>>>> >
>>>> > Parents (x and y) spawn child (z). (all of them execute on separate nodes)
>>>> > x is the root.
>>>> > x,y and z do an MPI_Intercomm_merge.
>>>> > x and z are able to communicate properly.
>>>> > But y and z are not able to communicate after the merge.
>>>> >
>>>> > Is this bug in high priority for the next release?
>>>> >
>>>> > https://svn.open-mpi.org/trac/ompi/ticket/2904
>>>> >
>>>> > Best,
>>>> > Suraj
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > devel mailing list
>>>> > devel_at_[hidden]
>>>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Suraj Prabhakaran
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel