Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Running openmpi jobs on two system-librdmacm: couldn't read ABI version
From: Syed Ahsan Ali (ahsanshah01_at_[hidden])
Date: 2013-03-26 03:42:51


Tried this but mpirun exits with this error

mpirun -np 40 /home/MET/hrm/bin/hrm
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
librdmacm: couldn't read ABI version.
CMA: unable to get RDMA device list
CMA: unable to get RDMA device list
CMA: unable to get RDMA device list
CMA: unable to get RDMA device list
librdmacm: assuming: 4
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
CMA: unable to get RDMA device list
librdmacm: couldn't read ABI version.
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
librdmacm: assuming: 4
CMA: unable to get RDMA device list
--------------------------------------------------------------------------
[[33095,1],8]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
  Host: pmd04.pakmet.com
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it can be used to communicate between these processes. This is
an error; Open MPI requires that all MPI processes be able to reach
each other. This error can sometimes be the result of forgetting to
specify the "self" BTL.
  Process 1 ([[33095,1],28]) is on host: compute-02-00.private02.pakmet.com
  Process 2 ([[33095,1],0]) is on host: pmd02
  BTLs attempted: openib self sm
Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------

Ahsan

On Fri, Mar 22, 2013 at 7:09 PM, Ralph Castain <rhc_at_[hidden]> wrote:

>
> On Mar 22, 2013, at 3:42 AM, Syed Ahsan Ali <ahsanshah01_at_[hidden]> wrote:
>
> Actually due to some data base corruption I am not able to add any new
> node to cluster from the installer node. So I want to run parallel job on
> more nodes without adding them to existing cluster.
> You are right the binaries must be present on the remote node as well.
> Is this possible throught nfs? just as the compute nodes are nfs mounted
> with the installer node.
>
>
> Sure - OMPI doesn't care how the binaries got there. Just so long as they
> are present on the compute node.
>
>
> Ahsan
>
>
> On Fri, Mar 22, 2013 at 3:33 PM, Reuti <reuti_at_[hidden]> wrote:
>
>> Am 22.03.2013 um 10:14 schrieb Syed Ahsan Ali:
>>
>> > I have a very basic question. If we want to run mpirun job on two
>> systems which are not part of cluster, then how we can make it possible.
>> Can the host be specifiend on mpirun which is not compute node, rather a
>> stand alone system.
>>
>> Sure, the machines can be specified as argument to `mpiexec`. But do you
>> want to run applications just between these two machines, or should they
>> participate on a larger parallel job with machines of the cluster: then a
>> direct network connection between outside and inside of the cluster is
>> necessary by some kind of forwarding in case these are separated networks.
>>
>> Also the paths to the started binaries may be different, in case the two
>> machines are not sharing the same /home with the cluster and this needs to
>> be honored.
>>
>> In case you are using a queuing system and want to route jobs to outside
>> machines of the set up cluster: it's necessary to negotiate with the admin
>> to allow jobs being scheduled thereto.
>>
>> -- Reuti
>>
>>
>> > Thanks
>> > Ahsan
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
> Syed Ahsan Ali Bokhari
> Electronic Engineer (EE)
>
> Research & Development Division
> Pakistan Meteorological Department H-8/4, Islamabad.
> Phone # off +92518358714
> Cell # +923155145014
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>