"Iliev, Hristo" <Iliev_at_[hidden]> writes:
> Hi Dave,
> Is it MPI_ALLTOALL or MPI_ALLTOALLV that runs slower?
Well, the output says MPI_ALLTOALL, but this prompted me to check, and
it turns out that it's lumping both together.
> If it is the latter,
> the reason could be that the default implementation of MPI_ALLTOALLV in
> 1.6.5 is different from that in 1.5.4. To switch back to the previous one,
> --mca coll_tuned_use_dynamic_rules 1 --mca coll_tuned_alltoallv_algorithm 1
Yes, that does it.
Can someone comment generally on the situations in which the new default
I suspect where I'm seeing it lose (on dual-socket sandybridge, QDR IB)
is representative of a lot of chemistry code which tends to be a/the
major consumer of academic HPC cycles. If so, this probably merits an
> The logic that selects the MPI_ALLTOALL implementation is the same in both
> versions, although the pairwise implementation in 1.6.5 is a bit different.
> The difference should have negligible effects though.
> Note that coll_tuned_use_dynamic_rules has to be enabled in order for MCA
> parameters that allows you to select the algorithms to be registered.
Ah, thanks. This now seems familiar, but still obscure.
> Therefore you have use ompi_info as follows:
> ompi_info --mca coll_tuned_use_dynamic_rules 1 --param coll tuned
> Hope that helps!