On Oct 15, 2009, at 2:14 AM, Sangamesh B wrote:
> I've run ibpingpong tests. They are working fine.
Sorry for the delay in replying.
Good.
> Are there any additional tests available which will make sure that
> "there is no problem with IB software and Open MPI. The problem is
> with Application or IB hardware"?
George mentioned the point that using "--mca btl openib,self" will
only allow OMPI to use those two networks. So you should be good
there -- with those command line options, it'll either run on IB or it
will fail to run if the IB is not working.
Unfortunately, OMPI currently only has a negative acknowledgement when
you're *not* using high-performance networks -- it doesn't give you a
positive acknowledgement when it *is* using a high-performance network
(because this is the much more common case).
> Because we've faced some critical problems:
>
> http://www.open-mpi.org/community/lists/users/2009/10/10843.php
This one *appears* to be an application issue. But there was no
information provided beyond the initial posting, so it's impossible to
say.
> http://www.open-mpi.org/community/lists/users/2009/09/10700.php
Pasha had a good reply to this post:
http://www.open-mpi.org/community/lists/users/2009/09/10705.php
If he's right (and he usually is :-) ), then one of your IB ports when
from ACTIVE to DOWN during the run, potentially indicating bad
hardware (i.e., Open MPI simply reported the error -- it's possible/
likely that Open MPI didn't *cause* the error). Pasha suggested using
ibdiagnet to verify your fabric. Failing that, you might want to
contact your IB/cluster vendor for assistance with a layer-0
diagnostic of your IB fabric.
Hope that helps!
--
Jeff Squyres
jsquyres_at_[hidden]
|