Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Donald Kerr (Don.Kerr_at_[hidden])
Date: 2007-05-09 10:13:02


Looking at that section it appears that we store the port value locally
in udapl_addr and use the local copy, so changing the udapl attribute
may not be doing anything for the BTL. I will run some tests as well.

-DON

Andrew Friedley wrote:

>OK, strange but good. Yeah I wouldn't be surprised if something has
>been changed, though I wouldn't know what, and I don't have time right
>now to go digging :( Maybe Don Kerr knows something?
>
>Andrew
>
>
>Boris Bierbaum wrote:
>
>
>>I've run the whole IMB Benchmark Suite on 2, 3, and 4 nodes with 2
>>processes per node and --mca btl udapl,self. I didn't encouter any problems.
>>
>>The comment above line 197 says that dat_ep_query() returns wrong port
>>numbers (which it does indeed), but I can't find any call to
>>dat_ep_query() in the uDAPL BTL code. Maybe the comment is out of date?
>>
>>Boris
>>
>>
>>Andrew Friedley wrote:
>>
>>
>>>You say that fixes the problem, does it work even when running more than
>>>one MPI process per node? (that is the case the hack fixes) Simply
>>>doing an mpirun with a -np paremeter higher than the number of nodes you
>>>have set up should trigger this case, and making sure to use '-mca btl
>>>udapl,self' (ie not SM or anything else).
>>>
>>>Andrew
>>>
>>>Boris Bierbaum wrote:
>>>
>>>
>>>>It has been explained in a different thread on [ofa-general] that the
>>>>problem lies in a combination of the OpenIB-cma provider not setting the
>>>>local and remote port numbers on endpoints correctly and Open MPI
>>>>stepping over the IA to save the port number to circumvent this problem,
>>>>thereby confusing the provider.
>>>>
>>>>I commented out line 197 in ompi/mca/btl/udapl/btl_udapl.c (Open MPI
>>>>1.2.1 release) and this fixes the problem. As the problem in the
>>>>provider is currently being fixed, the whole saving of the port number
>>>>in the uDAPL BTL code will be unnecessary in the future.
>>>>
>>>>Steve Wise wrote:
>>>>
>>>>
>>>>>>>Can the UDAPL OFED wizards shed any light on the error messages that
>>>>>>>are listed below? In particular, these seem to be worrysome:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> setup_listener Permission denied
>>>>>>>>
>>>>>>>>
>>>>>>> setup_listener Address already in use
>>>>>>>
>>>>>>>
>>>>>>These failures are from rdma_cm_bind indicating the port is already
>>>>>>bound to this IA address. How are you creating the service point?
>>>>>>dat_psp_create or dat_psp_create_any? If it is psp_create_any then you
>>>>>>will see some failures until it gets to a free port. That is normal.
>>>>>>Just make sure your create call returns DAT_SUCCESS.
>>>>>>
>>>>>>
>>>>>>
>>>>>Arlin, why doesn't dapl_psp_create_any() just pass a port of zero down
>>>>>and let the rdma-cma pick an available port number?
>>>>>
>>>>>
>>>>>
>>>>>_______________________________________________
>>>>>general mailing list
>>>>>general_at_[hidden]
>>>>>http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>>>>
>>>>>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>>>>>
>>>>>
>>>>>
>>>_______________________________________________
>>>users mailing list
>>>users_at_[hidden]
>>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>
>>
>
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>