Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Allreduce()
From: George Bosilca (bosilca_at_[hidden])
Date: 2008-03-13 11:16:44

Collective communication was a hot topic for the last [at least one]
decade, and it is still today. Just minimizing the number of messages
is not generally the best approach. The collectives are sensitive to
the network characteristics and to the amount of data in a very
complex way. The best approach is a well balanced algorithms, where
the number of messages and the amount of data on each message is
computed based on the network properties. This paper can give you an
idea about this (

In terms of theoretical complexity, the best Allreduce algorithm I'm
aware about allow you to decrease the complexity to O(log(size)) +
O(count), where size is the size of the message and count is the
number of participants in the communicator. You can find a reference
to this algorithm here (

Even the Alltoall can be optimized, but unfortunately not as much as
the other collective. This collective is considered as the most
difficult to implement and to optimize, even on homogeneous
environments. Another interesting paper on alltoall is

Is there anything more complex than a alltoall ... Well you can take a
look in the same family of collectives communication patterns to


On Mar 13, 2008, at 9:29 AM, Brock Palen wrote:

> Yeah, I know what you mean about if you have to use a 'all to all'
> use MPI_Alltoall() don't roll your own.
> So on paper, alltoall at first glance appears to be: n*(n-1) -> n^2-
> n -> n^2 (for large n).
> Allreduce appears to be simplest, n point to points followed by a
> bcast(). Which can be simplified to gather + bcast.
> Last I knew MPI_Bcast() was log(n) and gather is (n). So for
> allreduce I get:
> n+log(n)
> I guess I am confused how to get alltoall() down from n^2.
> Thanks.
> Brock Palen
> Center for Advanced Computing
> brockp_at_[hidden]
> (734)936-1985
> On Mar 12, 2008, at 6:05 PM, Aurélien Bouteiller wrote:
>> If you can avoid them it is better to avoid them. However it is
>> always
>> better to use a MPI_Alltoall than coding your own all to all with
>> point to point, and in some algorithms you *need* to make a all to
>> all
>> communication. What you should understand by "avoid all to all" is
>> not
>> avoid MPI_alltoall, but choose a mathematic algorithm that does not
>> need all to all.
>> The algorithmic complexity of AllReduce is the same as AlltoAll.
>> Aurelien
>> Le 12 mars 08 à 17:01, Brock Palen a écrit :
>>> I have always been told that calls like MPI_Barrior()
>>> MPI_Allreduce()
>>> and MPI_Alltoall() should be avoided.
>>> I understand MPI_Alltoall() as it goes n*(n-1) sends and thus grows
>>> very very quickly. MPI_Barrior() is very latency sensitive and
>>> generally is not needed in most cases I have seen it used.
>>> But why MPI_Allreduce()?
>>> What other functions should generally be avoided?
>>> Sorry this is kinda off topic for the list :-)
>>> Brock Palen
>>> Center for Advanced Computing
>>> brockp_at_[hidden]
>>> (734)936-1985
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
> _______________________________________________
> users mailing list
> users_at_[hidden]

  • application/pkcs7-signature attachment: smime.p7s