This is an open question to OMPI developers...
It looks like RHEL (and maybe others?) adds the "virbr0" IP interface when Xen is activated. This IP interface is only used to communicate with the local Xen instance(s); it is not used to communicate over the real network.
In a case that I saw, the interface is created, set to "up", and is given an IP address in the 192.168.1.x range. This was done by default -- all the user had done was either say "yes, I want Xen enabled", or he didn't say he wanted it *disabled* (I'm not sure which).
This causes a problem if you have Xen enabled on multiple machines in an OMPI job. OMPI will see the 192.168.1.x address and see that it's "up", so it'll add it to the eligible subnets that can be used. When OMPI sees that its peer processes also have 192.168.1.x, it'll try to use that network for OOB/BTL traffic -- which will fail, because these are local-only interfaces.
Should we add "virbr0" to the default value for [btl|oob]_tcp_if_exclude?
Or is there another way to detect that an interface is local-only and should not be used for OOB/BTL communication?
See this post on the user's list:
For corporate legal information go to: