Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Running openmpi jobs on two system-librdmacm: couldn't read ABI version
From: Syed Ahsan Ali (ahsanshah01_at_[hidden])
Date: 2013-03-26 03:54:43


It may be because the other system is running upgraded version of linux
which is not having infiniband drivers. Any solution?

On Tue, Mar 26, 2013 at 12:42 PM, Syed Ahsan Ali <ahsanshah01_at_[hidden]>wrote:

> Tried this but mpirun exits with this error
>
> mpirun -np 40 /home/MET/hrm/bin/hrm
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> CMA: unable to get RDMA device list
> CMA: unable to get RDMA device list
> CMA: unable to get RDMA device list
> CMA: unable to get RDMA device list
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> CMA: unable to get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> --------------------------------------------------------------------------
> [[33095,1],8]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> Module: OpenFabrics (openib)
> Host: pmd04.pakmet.com
> Another transport will be used instead, although this may result in
> lower performance.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> At least one pair of MPI processes are unable to reach each other for
> MPI communications. This means that no Open MPI device has indicated
> that it can be used to communicate between these processes. This is
> an error; Open MPI requires that all MPI processes be able to reach
> each other. This error can sometimes be the result of forgetting to
> specify the "self" BTL.
> Process 1 ([[33095,1],28]) is on host:
> compute-02-00.private02.pakmet.com
> Process 2 ([[33095,1],0]) is on host: pmd02
> BTLs attempted: openib self sm
> Your MPI job is now going to abort; sorry.
> --------------------------------------------------------------------------
>
>
> Ahsan
>
> On Fri, Mar 22, 2013 at 7:09 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>>
>> On Mar 22, 2013, at 3:42 AM, Syed Ahsan Ali <ahsanshah01_at_[hidden]>
>> wrote:
>>
>> Actually due to some data base corruption I am not able to add any new
>> node to cluster from the installer node. So I want to run parallel job on
>> more nodes without adding them to existing cluster.
>> You are right the binaries must be present on the remote node as well.
>> Is this possible throught nfs? just as the compute nodes are nfs mounted
>> with the installer node.
>>
>>
>> Sure - OMPI doesn't care how the binaries got there. Just so long as they
>> are present on the compute node.
>>
>>
>> Ahsan
>>
>>
>> On Fri, Mar 22, 2013 at 3:33 PM, Reuti <reuti_at_[hidden]>wrote:
>>
>>> Am 22.03.2013 um 10:14 schrieb Syed Ahsan Ali:
>>>
>>> > I have a very basic question. If we want to run mpirun job on two
>>> systems which are not part of cluster, then how we can make it possible.
>>> Can the host be specifiend on mpirun which is not compute node, rather a
>>> stand alone system.
>>>
>>> Sure, the machines can be specified as argument to `mpiexec`. But do you
>>> want to run applications just between these two machines, or should they
>>> participate on a larger parallel job with machines of the cluster: then a
>>> direct network connection between outside and inside of the cluster is
>>> necessary by some kind of forwarding in case these are separated networks.
>>>
>>> Also the paths to the started binaries may be different, in case the two
>>> machines are not sharing the same /home with the cluster and this needs to
>>> be honored.
>>>
>>> In case you are using a queuing system and want to route jobs to outside
>>> machines of the set up cluster: it's necessary to negotiate with the admin
>>> to allow jobs being scheduled thereto.
>>>
>>> -- Reuti
>>>
>>>
>>> > Thanks
>>> > Ahsan
>>> > _______________________________________________
>>> > users mailing list
>>> > users_at_[hidden]
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>>
>> --
>> Syed Ahsan Ali Bokhari
>> Electronic Engineer (EE)
>>
>> Research & Development Division
>> Pakistan Meteorological Department H-8/4, Islamabad.
>> Phone # off +92518358714
>> Cell # +923155145014
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>

-- 
Syed Ahsan Ali Bokhari
Electronic Engineer (EE)
Research & Development Division
Pakistan Meteorological Department H-8/4, Islamabad.
Phone # off  +92518358714
Cell # +923155145014