Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem moving from 1.4 to 1.6
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-06-28 11:10:55


You might well be able to:

  mpirun --mca btl ^openib,udapl ...

Which excludes both openib and udapl (both of which used the same librdmacm).

If this doesn't solve the problem, then please send the info Ralph asked for, and we'll dig deeper...

On Jun 27, 2014, at 3:41 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> Let me steer you on a different course. Can you run "ompi_info" and paste the output here? It looks to me like someone installed a version that includes uDAPL support, so you may have to disable some additional things to get it to run.
>
>
> On Jun 27, 2014, at 9:53 AM, Jeffrey A Cummings <Jeffrey.A.Cummings_at_[hidden]> wrote:
>
>> We have recently upgraded our cluster to a version of Linux which comes with openMPI version 1.6.2.
>>
>> An application which ran previously (using some version of 1.4) now errors out with the following messages:
>>
>> librdmacm: Fatal: no RDMA devices found
>> librdmacm: Fatal: no RDMA devices found
>> librdmacm: Fatal: no RDMA devices found
>> --------------------------------------------------------------------------
>> WARNING: Failed to open "OpenIB-cma" [DAT_INTERNAL_ERROR:].
>> This may be a real error or it may be an invalid entry in the uDAPL
>> Registry which is contained in the dat.conf file. Contact your local
>> System Administrator to confirm the availability of the interfaces in
>> the dat.conf file.
>> --------------------------------------------------------------------------
>> [tupile:25363] 2 more processes have sent help message help-mpi-btl-udapl.txt / dat_ia_open fail
>> [tupile:25363] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
>>
>> The mpirun command line contains the argument '--mca btl ^openib', which I thought told mpi to not look for the ib interface.
>>
>> Can anyone suggest what the problem might be? Did the relevant syntax change between versions 1.4 and 1.6?
>>
>>
>> Jeffrey A. Cummings
>> Engineering Specialist
>> Performance Modeling and Analysis Department
>> Systems Analysis and Simulation Subdivision
>> Systems Engineering Division
>> Engineering and Technology Group
>> The Aerospace Corporation
>> 571-307-4220
>> jeffrey.a.cummings_at_[hidden]
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: http://www.open-mpi.org/community/lists/users/2014/06/24721.php
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: http://www.open-mpi.org/community/lists/users/2014/06/24727.php

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/