Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Carsten Kutzner (ckutzne_at_[hidden])
Date: 2005-12-19 07:26:13


Hello,

I am desparately trying to get better all-to-all performance on Gbit
Ethernet (flow control is enabled). I have been playing around with
several all-to-all schemes and been able to reduce congestion by
communicating in an ordered fashion.

E.g. the simplest scheme looks like

   for (i=0; i<ncpu; i++)
   {
     /* send to dest */
     dest = (cpuid + i) % ncpu;
     /* receive from source */
     source = (ncpu + cpuid - i) % ncpu;

     MPI_Sendrecv(sendbuf+dest *sendcount, sendcount, sendtype, dest , 0,
                  recvbuf+source*recvcount, recvcount, recvtype, source, 0,
                  comm, &status);
   }

For sendcount=32768 and sendtype=float (yields 131072 bytes) the time such
an all-to-all takes is (average over 100 runs, std deviation in () ):

SENDRECV ALLTOALL on 16 PROCS
     32768 floats took 0.036783 (0.008798) seconds. Min: 0.034175 max: 0.123684
SENDRECV ALLTOALL on 32 PROCS
     32768 floats took 0.082687 (0.035920) seconds. Min: 0.071915 max: 0.285299

For comparison:
MPI_Alltoall on 16 PROCS
     32768 floats took 0.057936 (0.073605) seconds. Min: 0.027218 max: 0.275988
MPI_Alltoall on 32 PROCS
     32768 floats took 0.137835 (0.100580) seconds. Min: 0.055607 max: 0.412144

The sendrecv all-to-all performs better for these message sizes, but
on 32 CPUs (on 32 nodes) there is still congestion. When I try to separate
the communication phases by putting an MPI_Barrier(MPI_COMM_WORLD) after
the sendrecv, this makes the problem of congestion even worse:

SENDRECV ALLTOALL on 32 PROCS, with Barrier:
     32768 floats took 0.179162 (0.136885) seconds. Min: 0.091028 max: 0.729049

How can a barrier lead to more congestion???

Thanks in advance for helpful comments,
   Carsten

---------------------------------------------------
Dr. Carsten Kutzner
Max Planck Institute for Biophysical Chemistry
Theoretical and Computational Biophysics Department
Am Fassberg 11
37077 Goettingen, Germany
Tel. +49-551-2012313, Fax: +49-551-2012302
eMail ckutzne_at_[hidden]
http://www.gwdg.de/~ckutzne