In the 1.3 and some of the latest 1.2.X versions tuned is the default
component for collectives. However, the tuned currently in the trunk
are optimized for high performance networks (such as IB or MX), and
they do not deliver the best performance on slower devices such as
In order to play with the different implementation of allgather you
should either on the $(HOME)/.openmpi/mca-params.conf or command line
set the following MCA parameters:
1) coll_tuned_use_dynamic_rules to one in order to enable fine grain
selection of the algorithms
2) coll_tuned_allgather_algorithm to a value between 0 and 6 (read the
output corresponding to this algorithm from 'ompi_info --param coll
tuned' once you enabled the dynamic rules).
This will allow you to select a specific algorithm for the allgather.
You can further tuned it, by playing with the fanout (in case of trees
topologies), and with the segment size (for the pipelined ones).
On Oct 3, 2008, at 8:48 AM, Eric Thibodeau wrote:
> Hello all,
> I am currently profiling a simple case where I replace multiple S/
> R calls with Allgather calls and it would _seem_ the simple S/R
> calls are faster. Now, *before* I come to any conclusion on this,
> one of the pieces I am missing is more details on how /if/when the
> tuned coll MCA is selected. In other words, can I assume the tuned
> versions are used by default? I skimmed through the well documented
> source code but before I can even start to analyze the replacement's
> impact (in a small cluster), I need to know how and when the tuned
> coll MCA is used/selected.
> users mailing list