On 2/10/2012 11:50 AM, Jeff Squyres wrote:
I've done the latter and hit the same problem. There were
instructions somewhere on the web that I found that told one how to
This is an open question to OMPI developers...
It looks like RHEL (and maybe others?) adds the "virbr0" IP interface when Xen is activated. This IP interface is only used to communicate with the local Xen instance(s); it is not used to communicate over the real network.
In a case that I saw, the interface is created, set to "up", and is given an IP address in the 192.168.1.x range. This was done by default -- all the user had done was either say "yes, I want Xen enabled", or he didn't say he wanted it *disabled* (I'm not sure which).
What happens to that value if you then set btl_tcp_if_exclude to
some value on the mpirun command line? So this brings me to
something that has annoyed me for a bit. It seems to me that maybe
it would be nice to have a conf file that you can dump interface
names to exclude but would not be interpreted as a
btl_tcp_if_exclude options. For example there were some interfaces
on certain Sun machine (a long time ago) that went to the diagnostic
processor and caused a similar issue as the virbr0 issue. So we
started delivering a conf file that set btl_tcp_if_exclude but then
this precluded anyone from being able to set btl_tcp_if_include. If
we had a file one could specify the set of interfaces to use or
exclude but allow the user to operate on the result of that set it
seems that would be nice.
This causes a problem if you have Xen enabled on multiple machines in an OMPI job. OMPI will see the 192.168.1.x address and see that it's "up", so it'll add it to the eligible subnets that can be used. When OMPI sees that its peer processes also have 192.168.1.x, it'll try to use that network for OOB/BTL traffic -- which will fail, because these are local-only interfaces.
Should we add "virbr0" to the default value for [btl|oob]_tcp_if_exclude?
Or is there another way to detect that an interface is local-only and should not be used for OOB/BTL communication?
See this post on the user's list:
Terry D. Dontje | Principal
Engineering | +1.781.442.2631
95 Network Drive, Burlington, MA 01803