Somewhere between r16924 and r16950 on the trunk,
things broke for our Big Red machine. The problem
occurs when a specific transport is not specified
during the mpirun, and both MX & TCP are available.
The problem was caused by r16942 that adjusted the
exclusivity of TCP and SCTP. And guess what... the
exclusivity of the MX BTL is now lower then the TCP BTL.
Glancing at the other BTL's btl_exclusivity settings in the
*_component.c or *_mca.c files shows me that portals
would also be affected.
mca_btl_tcp_module.super.btl_exclusivity = MCA_BTL_EXCLUSIVITY_LOW + 100;
mca_btl_portals_module.super.btl_exclusivity = 60;
mca_btl_mx_module.super.btl_exclusivity = 50;
I think we should fix this by changing portals and mx to use
MCA_BTL_EXCLUSIVITY_DEFAULT instead of 50 and 60,
since ofud, openib, and gm use MCA_BTL_EXCLUSIVITY_DEFAULT,
which happens to be 1024.
I'll apply a fix later today if I have time...
P.S. - For completeness, here is a sampling of MTT results
showing the problem:
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
tmattox_at_[hidden] || timattox_at_[hidden]
I'm a bright... http://www.the-brights.net/