Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Runtime error only on one node.
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-03-05 20:44:41


On Mar 5, 2009, at 7:05 PM, Shinta Bonnefoy wrote:

> Thanks, the option --mca btl ^openib works fine !
>
> Half of the cluster has Infiniband/OpenFabrics (from node49 to
> node96)
> and the other half (nodes from 01 to 48) doesn't.
>

Aaaaahhhhh... this explains things. I wonder if we have not tested
the "some have OF, some do not" code paths well; I'm guessing we're
hitting a corner case during the shutdown.

> I just wanted to make openmpi run over ethernet/tcp first.
>
> I will try to make it run using OpenFabrics but I guess I need to
> recompile another package to do it so ?
>

No. Open MPI hides the dependencies on networking libraries such as
OF in its plugins. So you don't need to recompile your application;
you just run with or without the ^openib switch.

> If I mix some nodes with OpenFabrics and some other which don't have
> OpenFabrics, I should use the option "--mca btl ^openib" right ?
>

For now yes. We should fix this, though. But the fix won't be in
1.3.1; possibly in 1.3.2.

> And if I use exclusively similar nodes (either non OpenFabrics and
> only
> OpenFabrics), I don't have to use the option anymore.
>

Correct. OMPI will then automatically choose to use the openib BTL.

> But over OpenFabrics, does openmpi will use automatically the
> Infiniband
> hardware ???
>

Yes.

I'm guessing that there's only a problem when you have a job that
spans nodes with and without OF hardware, but all with the OF software
stack. I'll file a bug about this and see what we can do.

-- 
Jeff Squyres
Cisco Systems