Ashley Pittman wrote:
> On Sat, 2008-08-16 at 08:03 -0400, Jeff Squyres wrote:
>> - large all to all operations are very stressful on the network, even
>> if you have very low latency / high bandwidth networking such as DDR IB
>> - if you only have 1 IB HCA in a machine with 8 cores, the problem
>> becomes even more difficult because all 8 of your MPI processes will
>> be hammering the HCA with read and write requests; it's a simple I/O
>> resource contention issue
> That alone doesn't explain the sudden jump (drop) in performance
>> - there are several different algorithms in Open MPI for performing
>> alltoall, but they were not tuned for ppn>4 (honestly, they were tuned
>> for ppn=1, but they still usually work "well enough" for ppn<=4). In
>> Open MPI v1.3, we introduce the "hierarch" collective module, which
>> should greatly help with ppn>4 kinds of scenarios for collectives
>> (including, at least to some degree, all to all)
> Is there a way to tell or influence which algorithm is used in the
> current case? Looking through the code I can see several but cannot see
> how to tune the thresholds.
The answer is sort of. By default, Open MPI uses precompiled thresholds
to select different algorithms.
However, if you want to experiment with the different algorithms within
the tuned component, you can tell Open MPI which one you want to use.
This algorithm is then used for all calls to that collective.
For example, to tell it to use "pairwise alltoall", you would do this.
> mpirun -np 2 --mca coll_tuned_use_dynamic_rules 1 --mca
coll_tuned_alltoall_algorithm 2 a.out
To see the different algorithms, you can look through the code or try
and glean it from a call to ompi_info.
> ompi_info -all -mca coll_tuned_use_dynamic_rules 1 | grep alltoall
You can also create a file that can change the thresholds if you decide
you want to change the precompiled ones. I have only lightly tested
that feature but it should work.