Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-10-09 16:05:30

If you do not have IB hardware, you might want to permanently disable
the IB support. You can do this by setting an MCA parameter or
simply removing the $prefix/lib/openmpi/mca_btl_openib.* files. This
will suppress the warning that you're seeing.

As for your problem with MPI_SEND, do you know that your program is
correct? I.e., it's a little odd that you're failing directly in
seedSends, not in an MPI function. Are you getting a core dump that
you can examine, or can you attach a debugger to see where exactly it
is failing?

On Oct 4, 2007, at 8:36 PM, Jim Kusznir wrote:

> Hi all:
> I'm having trouble getting torque/maui working with OpenMPI.
> Currently, I am getting hard failures when an MPI_Send is called.
> When
> run without qsub (no torque/maui), the mpi job runs fine, so its
> something that
> qsub/torque/maui is doing (I think). Here's the error:
> libibverbs: Fatal: couldn't open sysfs class 'infiniband_verbs'.
> ----------------------------------------------------------------------
> ----
> [0,1,0]: OpenIB on host localhost was unable to find any HCAs.
> Another transport will be used instead, although this may result in
> lower performance.
> ----------------------------------------------------------------------
> ----
> Signal:8 info.si_errno:0(Success) si_code:1(FPE_INTDIV)
> Failing at addr:0x40cc2d
> [0] func:/usr/lib64/openmpi/ [0x3ecfb21dc5]
> [1] func:/lib64/tls/ [0x3ed040c4f0]
> [2] func:repdig_mpi(sendSeeds+0x3d) [0x40cc2d]
> [3] func:repdig_mpi(main+0x3b6) [0x40c026]
> [4] func:/lib64/tls/ [0x3ecfd1c3fb]
> [5] func:repdig_mpi [0x4030ea]
> *** End of error message ***
> I don't really know where to begin looking. I know in the stack trace
> the actual problem is occurring in #2 (sendSeeds), but that is a basic
> MPI_Send(), and works when not using torque.
> My system (installed from Rocks 4.3) does not have infiniband; I think
> I just figured out how to disable it; in any case, the same warning
> shows up when not running it through torque, and the job runs
> successfully.
> Thoughts/suggestions?
> Thanks!
> --Jim
> _______________________________________________
> users mailing list
> users_at_[hidden]

Jeff Squyres
Cisco Systems