Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Strange OpenMPI messages
From: Tohiko Looka (tohiko.looka_at_[hidden])
Date: 2012-02-15 12:53:15


Gustavo,

I will definitely try to compile OpenMPI myself and see if the problem
persist
Regarding your note on homogeneous nodes; I tried to do that as much as
possible.
But I had no control over two nodes and each of them had different setup.
As Jeff suggested, using .bashrc seems to solve the issue

Thanks

On Wed, Feb 15, 2012 at 6:52 PM, Gustavo Correa <gus_at_[hidden]>wrote:

> Hi Tohiko
>
> If you compiled Open MPI in a computer with IB hardware,
> then copied the installation tree to another machine,
> or if you installed from an RPM or other package generated in a
> machine with IB, your OpenMPI will have IB enabled, I think, even if the
> machine where it is running does not have IB.
>
> This is a matter of taste, but here is what I think,
> regarding a previous question you sent.
> I would rather compile open MPI from source, in the machine[s] where it
> will
> run, and install it with the same path on all machines {or in a single NFS
> shared directory},
> to make things simpler.
> I would use the most homogeneous set of machines possible, to avoid too
> many headaches.
> I.e. use the least common denominator, so to speak.
> Say, everything x86_64, all with Ethernet only [or all with IB + Ethernet,
> but you
> don't seem to have IB, at least not on all machines].
>
> I hope this helps,
> Gus Correa
>
> On Feb 15, 2012, at 1:27 AM, Tohiko Looka wrote:
>
> > Mm... This is really strange
> > I don't have that service and there is no ib* output in 'ifconfig -a' or
> 'Infinband' in 'lspci'
> > Which makes me believe that I don't have such a network. I also checked
> on an identical computer on the same network with the same results.
> >
> > What's strange is that these messages didn't use to show up and they
> don't show up on that identical computer; only on mine. Even though both
> computers have the same hardware, openMPI version and on the same network.
> >
> > I guess I can safely ignore these warnings and run on Ethernet, but it
> would be nice to know what happened there, in case anybody has an idea.
> >
> > Thank you,
> >
> > On Wed, Feb 15, 2012 at 12:52 AM, Gustavo Correa <gus_at_[hidden]>
> wrote:
> > Hi Tohiko
> >
> > OpenFabrics network a.k.a. Infiniband a.k.a. IB.
> > To check if the compute nodes have IB interfaces, try:
> >
> > lspci [and search the output for Infinband]
> >
> > To see if the IB interface is configured try:
> >
> > ifconfig -a [and search the output for ib0, ib1, or similar]
> >
> > To check if the OFED module is up try:
> >
> > 'service openibd status'
> >
> >
> > As an alternative, you could also try to run your program over Ethernet,
> avoiding Infinband,
> > in case you don't have IB or if somehow it is broken.
> > It is slower than Infiniband, though.
> >
> > Try something like this:
> >
> > mpiexec -mca btl tcp,sm,self -np 4 ./my_mpi_program
> >
> > I hope this helps,
> > Gus Correa
> >
> > On Feb 14, 2012, at 4:02 PM, Tohiko Looka wrote:
> >
> > > Sorry for the noob question, but how do I check my network type and if
> OFED service is running correctly or not? And how do I run it
> > >
> > > Thank you,
> > >
> > > On Tue, Feb 14, 2012 at 2:14 PM, Jeff Squyres <jsquyres_at_[hidden]>
> wrote:
> > > Do you have an OpenFabrics-based network? (e.g., InfiniBand or iWarp)
> > >
> > > If so, this error message usually means that OFED is either installed
> incorrectly, or is not running properly (e.g., its services didn't get
> started properly upon boot).
> > >
> > > If you don't have an OpenFabrics-based network, then it usually means
> that you have OpenFabrics services running when you really shouldn't
> (because you don't have any OpenFabrics-based devices).
> > >
> > >
> > > On Feb 14, 2012, at 4:48 AM, Tohiko Looka wrote:
> > >
> > > > Greetings,
> > > >
> > > > Until today I was running my openmpi applications with no
> errors/warnings
> > > > Today I restarted my computer (possibly after an automatic openmpi
> update) and got these warnings when
> > > > running my program
> > > > [tohiko_at_kw12614 1d]$ mpirun -x LD_LIBRARY_PATH -hostfile hosts -np
> 10 hello
> > > > librdmacm: couldn't read ABI version.
> > > > librdmacm: assuming: 4
> > > > CMA: unable to get RDMA device list
> > > >
> --------------------------------------------------------------------------
> > > > [[21652,1],0]: A high-performance Open MPI point-to-point messaging
> module
> > > > was unable to find any relevant network interfaces:
> > > >
> > > > Module: OpenFabrics (openib)
> > > > Host: kw12614
> > > >
> > > > Another transport will be used instead, although this may result in
> > > > lower performance.
> > > >
> --------------------------------------------------------------------------
> > > > [kw12614:03195] 10 more processes have sent help message
> help-mpi-btl-base.txt / btl:no-nics
> > > > [kw12614:03195] Set MCA parameter "orte_base_help_aggregate" to 0 to
> see all help / error messages
> > > >
> > > >
> > > > Is this normal? And how come it happened now?
> > > > -- Tohiko
> > > > _______________________________________________
> > > > users mailing list
> > > > users_at_[hidden]
> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > >
> > > --
> > > Jeff Squyres
> > > jsquyres_at_[hidden]
> > > For corporate legal information go to:
> > > http://www.cisco.com/web/about/doing_business/legal/cri/
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>