Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Intercomm Merge
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-09-18 13:18:14


Actually, we wouldn't have to modify the interface - just have to define a DB_RTE flag and OR it to the DB_INTERNAL/DB_EXTERNAL one. We'd need to modify the "fetch" routines to pass the flag into them so we fetched the right things, but that's a simple change.

On Sep 18, 2013, at 10:12 AM, Ralph Castain <rhc_at_[hidden]> wrote:

> I struggled with that myself when doing my earlier patch - part of the reason why I added the dpm API.
>
> I don't know how to update the locality without referencing RTE-specific keys, so maybe the best thing would be to provide some kind of hook into the db that says we want all the non-RTE keys? Would be simple to add that capability, though we'd have to modify the interface so we specify "RTE key" when doing the initial store.
>
> The "internal" flag is used to avoid re-sending data to the system under PMI. We "store" our data as "external" in the PMI components so the data gets pushed out, then fetch using PMI and store "internal" to put it in our internal hash. So "internal" doesn't mean "non-RTE".
>
>
> On Sep 18, 2013, at 10:02 AM, George Bosilca <bosilca_at_[hidden]> wrote:
>
>> I hit send too early.
>>
>> Now that we move the entire "local" modex is there any way to trim it down or to replace the entries that are not correct anymore? Like the locality?
>>
>> George.
>>
>> On Sep 18, 2013, at 18:53 , George Bosilca <bosilca_at_[hidden]> wrote:
>>
>>> Regarding your comment on the bug trac, I noticed there is a DB_INTERNAL flag. While I see how to set I could not figure out any way to get it back.
>>>
>>> With the required modification of the DB API can't we take advantage of it?
>>>
>>> George.
>>>
>>>
>>> On Sep 18, 2013, at 18:52 , Ralph Castain <rhc_at_[hidden]> wrote:
>>>
>>>> Thanks George - much appreciated
>>>>
>>>> On Sep 18, 2013, at 9:49 AM, George Bosilca <bosilca_at_[hidden]> wrote:
>>>>
>>>>> The test case was broken. I just pushed a fix.
>>>>>
>>>>> George.
>>>>>
>>>>> On Sep 18, 2013, at 16:49 , Ralph Castain <rhc_at_[hidden]> wrote:
>>>>>
>>>>>> Hangs with any np > 1
>>>>>>
>>>>>> However, I'm not sure if that's an issue with the test vs the underlying implementation
>>>>>>
>>>>>> On Sep 18, 2013, at 7:40 AM, "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]> wrote:
>>>>>>
>>>>>>> Does it hang when you run with -np 4?
>>>>>>>
>>>>>>> Sent from my phone. No type good.
>>>>>>>
>>>>>>> On Sep 18, 2013, at 4:10 PM, "Ralph Castain" <rhc_at_[hidden]> wrote:
>>>>>>>
>>>>>>>> Strange - it works fine for me on my Mac. However, I see one difference - I only run it with np=1
>>>>>>>>
>>>>>>>> On Sep 18, 2013, at 2:22 AM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:
>>>>>>>>
>>>>>>>>> On Sep 18, 2013, at 9:33 AM, George Bosilca <bosilca_at_[hidden]> wrote:
>>>>>>>>>
>>>>>>>>>> 1. sm doesn't work between spawned processes. So you must have another network enabled.
>>>>>>>>>
>>>>>>>>> I know :-). I have tcp available as well (OMPI will abort if you only run with sm,self because the comm_spawn will fail with unreachable errors -- I just tested/proved this to myself).
>>>>>>>>>
>>>>>>>>>> 2. Don't use the test case attached to my email, I left an xterm based spawn and the debugging. It can't work without xterm support. Instead try using the test case from the trunk, the one committed by Ralph.
>>>>>>>>>
>>>>>>>>> I didn't see any "xterm" strings in there, but ok. :-) I ran with orte/test/mpi/intercomm_create.c, and that hangs for me as well:
>>>>>>>>>
>>>>>>>>> -----
>>>>>>>>> ❯❯❯ mpicc intercomm_create.c -o intercomm_create
>>>>>>>>> ❯❯❯ mpirun -np 4 intercomm_create
>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, &inter) [rank 4]
>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, &inter) [rank 5]
>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, &inter) [rank 6]
>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, &inter) [rank 7]
>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) [rank 4]
>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) [rank 5]
>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) [rank 6]
>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) [rank 7]
>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0)
>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0)
>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0)
>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0)
>>>>>>>>> [hang]
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>> Similarly, on my Mac, it hangs with no output:
>>>>>>>>>
>>>>>>>>> -----
>>>>>>>>> ❯❯❯ mpicc intercomm_create.c -o intercomm_create
>>>>>>>>> ❯❯❯ mpirun -np 4 intercomm_create
>>>>>>>>> [hang]
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>>> George.
>>>>>>>>>>
>>>>>>>>>> On Sep 18, 2013, at 07:53 , "Jeff Squyres (jsquyres)" <jsquyres_at_[hidden]> wrote:
>>>>>>>>>>
>>>>>>>>>>> George --
>>>>>>>>>>>
>>>>>>>>>>> When I build the SVN trunk (r29201) on 64 bit linux, your attached test case hangs:
>>>>>>>>>>>
>>>>>>>>>>> -----
>>>>>>>>>>> ❯❯❯ mpicc intercomm_create.c -o intercomm_create
>>>>>>>>>>> ❯❯❯ mpirun -np 4 intercomm_create
>>>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, &inter) [rank 4]
>>>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, &inter) [rank 5]
>>>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, &inter) [rank 6]
>>>>>>>>>>> b: MPI_Intercomm_create( intra, 0, intra, MPI_COMM_NULL, 201, &inter) [rank 7]
>>>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0)
>>>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0)
>>>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0)
>>>>>>>>>>> a: MPI_Intercomm_create( ab_intra, 0, ac_intra, 0, 201, &inter) (0)
>>>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) [rank 4]
>>>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) [rank 5]
>>>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) [rank 6]
>>>>>>>>>>> c: MPI_Intercomm_create( MPI_COMM_WORLD, 0, intra, 0, 201, &inter) [rank 7]
>>>>>>>>>>> [hang]
>>>>>>>>>>> -----
>>>>>>>>>>>
>>>>>>>>>>> On my Mac, it hangs without printing anything:
>>>>>>>>>>>
>>>>>>>>>>> -----
>>>>>>>>>>> ❯❯❯ mpicc intercomm_create.c -o intercomm_create
>>>>>>>>>>> ❯❯❯ mpirun -np 4 intercomm_create
>>>>>>>>>>> [hang]
>>>>>>>>>>> -----
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sep 18, 2013, at 1:48 AM, George Bosilca <bosilca_at_[hidden]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Here is a quick (and definitively not the cleanest) patch that addresses the MPI_Intercomm issue at the MPI level. It should be applied after removal of 29166.
>>>>>>>>>>>>
>>>>>>>>>>>> I also added the corrected test case stressing the corner cases by doing barriers at every inter-comm creation and doing a clean disconnect.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Jeff Squyres
>>>>>>>>>>> jsquyres_at_[hidden]
>>>>>>>>>>> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> devel mailing list
>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Jeff Squyres
>>>>>>>>> jsquyres_at_[hidden]
>>>>>>>>> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> devel_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>