Peter Kjellström <cap_at_[hidden]> writes:
> Are you sure you launched it correctly and that you have (re)built OpenMPI
> against your Redhat-5 ib stack?
Yes. I had to rebuild because I'd omitted openib when we only needed
psm. As I said, I did exactly the same thing successfully with PMB
(initially because I wanted to try an old binary, and PMB was lying
around).
>> Your MPI job is now going to abort; sorry.
> ...
>> [lvgig116:07931] 19 more processes have sent help message
>> help-mca-bml-r2.txt / unreachable proc [lvgig116:07931] Set MCA parameter
>
> Seems to me that OpenMPI gave up because it didn't succeed in initializing any
> inter-node btl/mtl.
Sure, but why won't it load the btl under IMB when it will under PMB
(and other codes like XHPL), and how do I get any diagnostics?
My boss has just stumbled upon a reference while looking for something
else It looks as if it's an OFED bug entry, but I can't find an
operational version of an OFED tracker or any other reference to the bug
than (the equivalent of)
http://lists.openfabrics.org/pipermail/ewg/2010-March/014983.html :
1976 maj jsquyres at cisco.com errors running IMB over openmpi-1.4.1
I guess Jeff will enlighten me if/when he spots this. (Thanks in
advance, obviously.)
|