1 - How do I check the BTLs available? Something like "ompi_info | grep
-i btl"? If so, here's the list:
> MCA btl: ofud (MCA v2.0, API v2.0, Component v1.6.3)
> MCA btl: openib (MCA v2.0, API v2.0, Component v1.6.3)
> MCA btl: self (MCA v2.0, API v2.0, Component v1.6.3)
> MCA btl: sm (MCA v2.0, API v2.0, Component v1.6.3)
> MCA btl: tcp (MCA v2.0, API v2.0, Component v1.6.3)
2 - The IP interfaces on all nodes are:
- em1 - Ethernet - IP in the 192.168.216.0/22 range
- ib0 - IPoIB (only on IB-enabled nodes) - IP in the 192.168.212.0/22 range
- lo - loopback - 127.0.0.1/8
And I think that Jeff is absolutely right. This syntax did work:
> mpirun --mca btl ^openib --mca btl_tcp_if_exclude 192.168.212.0/22,127.0.0.1/8 ./osu_bw
And this one too, which is basically equivalent in this case:
> mpirun --mca btl ^openib --mca btl_tcp_if_exclude ib0,lo ./osu_bw
It is interesting to me, though, that I need to explicitly exclude
lo/127.0.0.1 in this case, but when I'm on an Ethernet-only node, and I
just do the plain "mpirun ./appname", I don't have to exclude anything,
and it figures out to use em1, and not lo.
Fulton Supercomputing Lab
Brigham Young University
On 09/20/2013 10:31 AM, Jeff Squyres (jsquyres) wrote:
> On Sep 20, 2013, at 12:27 PM, Lloyd Brown <lloyd_brown_at_[hidden]> wrote:
>> Interesting. I was taking the approach of "only exclude what you're
>> certain you don't want" (the native IB and TCP/IPoIB stuff) since I
>> wasn't confident enough in my knowledge of the OpenMPI internals, to
>> know what I should explicitly include.
>> However, taking Jeff's suggestion, this does seem to work, and gives me
>> the expected Ethernet performance:
>> "mpirun --mca btl tcp,sm,self --mca btl_tcp_if_include em1 ./osu_bw"
>> So, in short, I'm still not sure why my exclude syntax doesn't work.
> Check two things:
> 1. What BTLs are available? Is there some other BTL that may be used instead of openib?
> 2. (this one is more likely) What IP interfaces are available on all nodes? The most obvious guess here is that you didn't exclude 127.0.0.1/8, and OMPI found this interface on all nodes, and therefore assumed that it was routable/usable on all nodes. Hence, one quick experiment might be to try your exclude syntax again, but *also* exclude 127.0.0.8/8.