Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Intercomm Merge
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-09-17 14:13:02


I guess we could argue this for awhile, but I personally don't care how it gets fixed. The issue here is that (a) you promised to provide a "better" fix nearly a year ago, (b) it never happened, and (c) a user who has patiently waited all this time has asked if we could please fix it.

It now works, but if you want to provide a better solution, please do - I have no issue with it. However, until you do, I propose to use what we have.

As for the commit message, I really have no interest in spending time debating the proper way to say something. :-)

On Sep 17, 2013, at 10:40 AM, George Bosilca <bosilca_at_[hidden]> wrote:

> Ralph,
>
> I don't think your patch is addressing the right issue. In fact your commit treat the wrong symptom instead of addressing the core issue that generate the problem. Let me explain this in terms of MPI.
>
> The MPI_Intercomm_merge function transform an inter-comm into an intra-comm, basically a two groups world into a single group world. Under the MPI standard the two groups handled by this function should be able to talk to each other in this inter-comm. So, your patch fixes a non existent problem, as the processes were already supposed to be able to communicate together before the MPI_Intercomm_merge. The real issue (which was highlighted in the original email exchange) is that during the MPI_Intercom_create the bridge communicator is not used to correctly exchange the modex of the two groups of processes.
>
> In addition I have two smaller issues related to this patch.
>
> 1. The commit message is misleading, at least from the MPI standpoint.
>
> 2. This function is one of the few MPI-2 dynamic processing functions that can be solved purely at the OMPI layer, without a need for extra functionality from the RTE. The infrastructure of the correct solution is already in the trunk, what is missing is the correct exchange of the complete modex information of the two groups instead of exchanging their OMPI_ARCH.
>
> Based on the fact that the band-aid is not really solving the right problem I propose the removal of this patch from the trunk, and the blocking of the pending CMR until a better solution is found.
>
> Thanks,
> George.
>
>
> On Sep 15, 2013, at 17:01 , Ralph Castain <rhc_at_[hidden]> wrote:
>
>> I fixed it and have filed a cmr to move it to 1.7.3
>>
>> Thanks for your patience, and for reminding me
>> Ralph
>>
>> On Sep 13, 2013, at 12:05 PM, Suraj Prabhakaran <suraj.prabhakaran_at_[hidden]> wrote:
>>
>>> Dear Ralph, that would be great if you could give it a try. We have been hoping for it for a year now and it could greatly benefit us if this is fixed!! :-)
>>>
>>> Thanks!
>>> Suraj
>>>
>>>
>>>
>>>
>>> On Fri, Sep 13, 2013 at 5:39 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>> It has been a low priority issue, and hence not resolved yet. I doubt it will make 1.7.3, though if you need it, I'll give it a try.
>>>
>>> On Sep 13, 2013, at 7:21 AM, Suraj Prabhakaran <suraj.prabhakaran_at_[hidden]> wrote:
>>>
>>> > Hello,
>>> >
>>> > Is there a plan to fix the problem with MPI_Intercomm_merge with 1.7.3 as stated in this ticket? We are really in need of this at the moment. Any hints?
>>> >
>>> > We face the following problem.
>>> >
>>> > Parents (x and y) spawn child (z). (all of them execute on separate nodes)
>>> > x is the root.
>>> > x,y and z do an MPI_Intercomm_merge.
>>> > x and z are able to communicate properly.
>>> > But y and z are not able to communicate after the merge.
>>> >
>>> > Is this bug in high priority for the next release?
>>> >
>>> > https://svn.open-mpi.org/trac/ompi/ticket/2904
>>> >
>>> > Best,
>>> > Suraj
>>> >
>>> >
>>> > _______________________________________________
>>> > devel mailing list
>>> > devel_at_[hidden]
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Suraj Prabhakaran
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel