I am currently profiling a simple case where I replace multiple S/R
calls with Allgather calls and it would _seem_ the simple S/R calls are
faster. Now, *before* I come to any conclusion on this, one of the
pieces I am missing is more details on how /if/when the tuned coll MCA
is selected. In other words, can I assume the tuned versions are used by
default? I skimmed through the well documented source code but before I
can even start to analyze the replacement's impact (in a small cluster),
I need to know how and when the tuned coll MCA is used/selected.