Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Strange OpenMPI messages
From: TERRY DONTJE (terry.dontje_at_[hidden])
Date: 2012-02-15 05:46:21


Do you get any interfaces shown when you run "ibstat" on any of the
nodes your job is spawned on?

--td

On 2/15/2012 1:27 AM, Tohiko Looka wrote:
> Mm... This is really strange
> I don't have that service and there is no ib* output in 'ifconfig -a'
> or 'Infinband' in 'lspci'
> Which makes me believe that I don't have such a network. I also
> checked on an identical computer on the same network with the same
> results.
>
> What's strange is that these messages didn't use to show up and they
> don't show up on that identical computer; only on mine. Even though
> both computers have the same hardware, openMPI version and on the same
> network.
>
> I guess I can safely ignore these warnings and run on Ethernet, but it
> would be nice to know what happened there, in case anybody has an idea.
>
> Thank you,
>
> On Wed, Feb 15, 2012 at 12:52 AM, Gustavo Correa
> <gus_at_[hidden] <mailto:gus_at_[hidden]>> wrote:
>
> Hi Tohiko
>
> OpenFabrics network a.k.a. Infiniband a.k.a. IB.
> To check if the compute nodes have IB interfaces, try:
>
> lspci [and search the output for Infinband]
>
> To see if the IB interface is configured try:
>
> ifconfig -a [and search the output for ib0, ib1, or similar]
>
> To check if the OFED module is up try:
>
> 'service openibd status'
>
>
> As an alternative, you could also try to run your program over
> Ethernet, avoiding Infinband,
> in case you don't have IB or if somehow it is broken.
> It is slower than Infiniband, though.
>
> Try something like this:
>
> mpiexec -mca btl tcp,sm,self -np 4 ./my_mpi_program
>
> I hope this helps,
> Gus Correa
>
> On Feb 14, 2012, at 4:02 PM, Tohiko Looka wrote:
>
> > Sorry for the noob question, but how do I check my network type
> and if OFED service is running correctly or not? And how do I run it
> >
> > Thank you,
> >
> > On Tue, Feb 14, 2012 at 2:14 PM, Jeff Squyres
> <jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>> wrote:
> > Do you have an OpenFabrics-based network? (e.g., InfiniBand or
> iWarp)
> >
> > If so, this error message usually means that OFED is either
> installed incorrectly, or is not running properly (e.g., its
> services didn't get started properly upon boot).
> >
> > If you don't have an OpenFabrics-based network, then it usually
> means that you have OpenFabrics services running when you really
> shouldn't (because you don't have any OpenFabrics-based devices).
> >
> >
> > On Feb 14, 2012, at 4:48 AM, Tohiko Looka wrote:
> >
> > > Greetings,
> > >
> > > Until today I was running my openmpi applications with no
> errors/warnings
> > > Today I restarted my computer (possibly after an automatic
> openmpi update) and got these warnings when
> > > running my program
> > > [tohiko_at_kw12614 1d]$ mpirun -x LD_LIBRARY_PATH -hostfile hosts
> -np 10 hello
> > > librdmacm: couldn't read ABI version.
> > > librdmacm: assuming: 4
> > > CMA: unable to get RDMA device list
> > >
> --------------------------------------------------------------------------
> > > [[21652,1],0]: A high-performance Open MPI point-to-point
> messaging module
> > > was unable to find any relevant network interfaces:
> > >
> > > Module: OpenFabrics (openib)
> > > Host: kw12614
> > >
> > > Another transport will be used instead, although this may
> result in
> > > lower performance.
> > >
> --------------------------------------------------------------------------
> > > [kw12614:03195] 10 more processes have sent help message
> help-mpi-btl-base.txt / btl:no-nics
> > > [kw12614:03195] Set MCA parameter "orte_base_help_aggregate"
> to 0 to see all help / error messages
> > >
> > >
> > > Is this normal? And how come it happened now?
> > > -- Tohiko
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden] <mailto:users_at_[hidden]>
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]>
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden] <mailto:users_at_[hidden]>
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden] <mailto:users_at_[hidden]>
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>