This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
"Iliev, Hristo" <Iliev_at_[hidden]> writes:
> Hi Dave,
> Is it MPI_ALLTOALL or MPI_ALLTOALLV that runs slower?
Well, the output says MPI_ALLTOALL, but this prompted me to check, and
it turns out that it's lumping both together.
> If it is the latter,
> the reason could be that the default implementation of MPI_ALLTOALLV in
> 1.6.5 is different from that in 1.5.4. To switch back to the previous one,
> --mca coll_tuned_use_dynamic_rules 1 --mca coll_tuned_alltoallv_algorithm 1
Yes, that does it.
Can someone comment generally on the situations in which the new default
I suspect where I'm seeing it lose (on dual-socket sandybridge, QDR IB)
is representative of a lot of chemistry code which tends to be a/the
major consumer of academic HPC cycles. If so, this probably merits an
> The logic that selects the MPI_ALLTOALL implementation is the same in both
> versions, although the pairwise implementation in 1.6.5 is a bit different.
> The difference should have negligible effects though.
> Note that coll_tuned_use_dynamic_rules has to be enabled in order for MCA
> parameters that allows you to select the algorithms to be registered.
Ah, thanks. This now seems familiar, but still obscure.
> Therefore you have use ompi_info as follows:
> ompi_info --mca coll_tuned_use_dynamic_rules 1 --param coll tuned
> Hope that helps!