Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Boris Bierbaum (boris_at_[hidden])
Date: 2007-05-09 09:50:30

I've run the whole IMB Benchmark Suite on 2, 3, and 4 nodes with 2
processes per node and --mca btl udapl,self. I didn't encouter any problems.

The comment above line 197 says that dat_ep_query() returns wrong port
numbers (which it does indeed), but I can't find any call to
dat_ep_query() in the uDAPL BTL code. Maybe the comment is out of date?


Andrew Friedley wrote:
> You say that fixes the problem, does it work even when running more than
> one MPI process per node? (that is the case the hack fixes) Simply
> doing an mpirun with a -np paremeter higher than the number of nodes you
> have set up should trigger this case, and making sure to use '-mca btl
> udapl,self' (ie not SM or anything else).
> Andrew
> Boris Bierbaum wrote:
>> It has been explained in a different thread on [ofa-general] that the
>> problem lies in a combination of the OpenIB-cma provider not setting the
>> local and remote port numbers on endpoints correctly and Open MPI
>> stepping over the IA to save the port number to circumvent this problem,
>> thereby confusing the provider.
>> I commented out line 197 in ompi/mca/btl/udapl/btl_udapl.c (Open MPI
>> 1.2.1 release) and this fixes the problem. As the problem in the
>> provider is currently being fixed, the whole saving of the port number
>> in the uDAPL BTL code will be unnecessary in the future.
>> Steve Wise wrote:
>>>>> Can the UDAPL OFED wizards shed any light on the error messages that
>>>>> are listed below? In particular, these seem to be worrysome:
>>>>>> setup_listener Permission denied
>>>>> setup_listener Address already in use
>>>> These failures are from rdma_cm_bind indicating the port is already
>>>> bound to this IA address. How are you creating the service point?
>>>> dat_psp_create or dat_psp_create_any? If it is psp_create_any then you
>>>> will see some failures until it gets to a free port. That is normal.
>>>> Just make sure your create call returns DAT_SUCCESS.
>>> Arlin, why doesn't dapl_psp_create_any() just pass a port of zero down
>>> and let the rdma-cma pick an available port number?
>>> _______________________________________________
>>> general mailing list
>>> general_at_[hidden]
>>> To unsubscribe, please visit
> _______________________________________________
> users mailing list
> users_at_[hidden]

|  _  RWTH | Boris Bierbaum
|_|_`_     | Lehrstuhl fuer Betriebssysteme
   | |_) _  | RWTH Aachen D-52056 Aachen
     |_)(_` | Tel: +49-241-80-27805
        ._) | Fax: +49-241-80-22339