Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] bizarre failure with IMB/openib
From: Dave Love (d.love_at_[hidden])
Date: 2011-03-21 11:19:51


Peter Kjellström <cap_at_[hidden]> writes:

> Are you sure you launched it correctly and that you have (re)built OpenMPI
> against your Redhat-5 ib stack?

Yes. I had to rebuild because I'd omitted openib when we only needed
psm. As I said, I did exactly the same thing successfully with PMB
(initially because I wanted to try an old binary, and PMB was lying
around).

>> Your MPI job is now going to abort; sorry.
> ...
>> [lvgig116:07931] 19 more processes have sent help message
>> help-mca-bml-r2.txt / unreachable proc [lvgig116:07931] Set MCA parameter
>
> Seems to me that OpenMPI gave up because it didn't succeed in initializing any
> inter-node btl/mtl.

Sure, but why won't it load the btl under IMB when it will under PMB
(and other codes like XHPL), and how do I get any diagnostics?

My boss has just stumbled upon a reference while looking for something
else It looks as if it's an OFED bug entry, but I can't find an
operational version of an OFED tracker or any other reference to the bug
than (the equivalent of)
http://lists.openfabrics.org/pipermail/ewg/2010-March/014983.html :

  1976 maj jsquyres at cisco.com errors running IMB over openmpi-1.4.1

I guess Jeff will enlighten me if/when he spots this. (Thanks in
advance, obviously.)