On Mar 5, 2009, at 7:05 PM, Shinta Bonnefoy wrote:
> Thanks, the option --mca btl ^openib works fine !
>
> Half of the cluster has Infiniband/OpenFabrics (from node49 to
> node96)
> and the other half (nodes from 01 to 48) doesn't.
>
Aaaaahhhhh... this explains things. I wonder if we have not tested
the "some have OF, some do not" code paths well; I'm guessing we're
hitting a corner case during the shutdown.
> I just wanted to make openmpi run over ethernet/tcp first.
>
> I will try to make it run using OpenFabrics but I guess I need to
> recompile another package to do it so ?
>
No. Open MPI hides the dependencies on networking libraries such as
OF in its plugins. So you don't need to recompile your application;
you just run with or without the ^openib switch.
> If I mix some nodes with OpenFabrics and some other which don't have
> OpenFabrics, I should use the option "--mca btl ^openib" right ?
>
For now yes. We should fix this, though. But the fix won't be in
1.3.1; possibly in 1.3.2.
> And if I use exclusively similar nodes (either non OpenFabrics and
> only
> OpenFabrics), I don't have to use the option anymore.
>
Correct. OMPI will then automatically choose to use the openib BTL.
> But over OpenFabrics, does openmpi will use automatically the
> Infiniband
> hardware ???
>
Yes.
I'm guessing that there's only a problem when you have a job that
spans nodes with and without OF hardware, but all with the OF software
stack. I'll file a bug about this and see what we can do.
--
Jeff Squyres
Cisco Systems
|