Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Debugging Runtime/Ethernet Problems
From: Lloyd Brown (lloyd_brown_at_[hidden])
Date: 2013-09-20 13:00:33

1 - How do I check the BTLs available? Something like "ompi_info | grep
-i btl"? If so, here's the list:

> MCA btl: ofud (MCA v2.0, API v2.0, Component v1.6.3)
> MCA btl: openib (MCA v2.0, API v2.0, Component v1.6.3)
> MCA btl: self (MCA v2.0, API v2.0, Component v1.6.3)
> MCA btl: sm (MCA v2.0, API v2.0, Component v1.6.3)
> MCA btl: tcp (MCA v2.0, API v2.0, Component v1.6.3)

2 - The IP interfaces on all nodes are:
- em1 - Ethernet - IP in the range
- ib0 - IPoIB (only on IB-enabled nodes) - IP in the range
- lo - loopback -

And I think that Jeff is absolutely right. This syntax did work:

> mpirun --mca btl ^openib --mca btl_tcp_if_exclude, ./osu_bw

And this one too, which is basically equivalent in this case:

> mpirun --mca btl ^openib --mca btl_tcp_if_exclude ib0,lo ./osu_bw

It is interesting to me, though, that I need to explicitly exclude
lo/ in this case, but when I'm on an Ethernet-only node, and I
just do the plain "mpirun ./appname", I don't have to exclude anything,
and it figures out to use em1, and not lo.

Lloyd Brown
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University

On 09/20/2013 10:31 AM, Jeff Squyres (jsquyres) wrote:
> On Sep 20, 2013, at 12:27 PM, Lloyd Brown <lloyd_brown_at_[hidden]> wrote:
>> Interesting. I was taking the approach of "only exclude what you're
>> certain you don't want" (the native IB and TCP/IPoIB stuff) since I
>> wasn't confident enough in my knowledge of the OpenMPI internals, to
>> know what I should explicitly include.
>> However, taking Jeff's suggestion, this does seem to work, and gives me
>> the expected Ethernet performance:
>> "mpirun --mca btl tcp,sm,self --mca btl_tcp_if_include em1 ./osu_bw"
>> So, in short, I'm still not sure why my exclude syntax doesn't work.
> Check two things:
> 1. What BTLs are available? Is there some other BTL that may be used instead of openib?
> 2. (this one is more likely) What IP interfaces are available on all nodes? The most obvious guess here is that you didn't exclude, and OMPI found this interface on all nodes, and therefore assumed that it was routable/usable on all nodes. Hence, one quick experiment might be to try your exclude syntax again, but *also* exclude