Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Tony Ladd (ladd_at_[hidden])
Date: 2006-06-30 08:35:58


Thanks for the reply; I realize you guys must be really busy with the recent
release of openmpi. I tried 1.1 and I don't get error messages any more. But
the code now hangs; no error or exit. So I am not sure if this is the same
issue or something else. I am enclosing my source code. I compiled with icc
and linked against an icc compiled version of openmpi-1.1.

My program is a set of network benchmarks (a crude kind of netpipe) that
checks typical message passing patterns in my application codes.
Typical output is:

 32 CPU's: sync call time = 1003.0 time
rate (Mbytes/s) bandwidth (MBits/s)
     loop buffers size XC XE GS MS XC
       1 64 16384 2.48e-02 1.99e-02 1.21e+00 3.88e-02 4.23e+01
5.28e+01 8.65e-01 2.70e+01 1.08e+04 1.35e+04 4.43e+02 1.38e+04
       2 64 16384 2.17e-02 2.09e-02 1.21e+00 4.10e-02 4.82e+01
5.02e+01 8.65e-01 2.56e+01 1.23e+04 1.29e+04 4.43e+02 1.31e+04
       3 64 16384 2.20e-02 1.99e-02 1.01e+00 3.95e-02 4.77e+01
5.27e+01 1.04e+00 2.65e+01 1.22e+04 1.35e+04 5.33e+02 1.36e+04
       4 64 16384 2.16e-02 1.96e-02 1.25e+00 4.00e-02 4.85e+01
5.36e+01 8.37e-01 2.62e+01 1.24e+04 1.37e+04 4.28e+02 1.34e+04
       5 64 16384 2.25e-02 2.00e-02 1.25e+00 4.07e-02 4.66e+01
5.24e+01 8.39e-01 2.57e+01 1.19e+04 1.34e+04 4.30e+02 1.32e+04
       6 64 16384 2.19e-02 1.99e-02 1.29e+00 4.05e-02 4.79e+01
5.28e+01 8.14e-01 2.59e+01 1.23e+04 1.35e+04 4.17e+02 1.33e+04
       7 64 16384 2.19e-02 2.06e-02 1.25e+00 4.03e-02 4.79e+01
5.09e+01 8.38e-01 2.60e+01 1.23e+04 1.30e+04 4.29e+02 1.33e+04
       8 64 16384 2.24e-02 2.06e-02 1.25e+00 4.01e-02 4.69e+01
5.09e+01 8.39e-01 2.62e+01 1.20e+04 1.30e+04 4.30e+02 1.34e+04
       9 64 16384 4.29e-01 2.01e-02 6.35e-01 3.98e-02 2.45e+00
5.22e+01 1.65e+00 2.64e+01 6.26e+02 1.34e+04 8.46e+02 1.35e+04
      10 64 16384 2.16e-02 2.06e-02 8.87e-01 4.00e-02 4.85e+01
5.09e+01 1.18e+00 2.62e+01 1.24e+04 1.30e+04 6.05e+02 1.34e+04

Time is total for all 64 buffers. Rate is one way across one link (# of
1) XC is a bidirectional ring exchange. Each processor sends to the right
and receives from the left
2) XE is an edge exchange. Pairs of nodes exchange data, with each one
sending and receiving
3) GS is the MPI_AllReduce
4) MS is my version of MPI_AllReduce. It splits the vector into Np blocks
(Np is # of processors); each processor then acts as a head node for one
block. This uses the full bandwidth all the time, unlike AllReduce which
thins out as it gets to the top of the binary tree. On a 64 node Infiniband
system MS is about 5X faster than GS-in theory it would be 6X; ie log_2(64).
Here it is 25X-not sure why so much. But MS seems to be the cause of the
hangups with messages > 64K. I can run the other benchmarks OK,but this one
seems to hang for large messages. I think the problem is at least partly due
to the switch. All MS is doing is point to point communications, but
unfortunately it sometimes requires a high bandwidth between ASIC's. It
first it exchanges data between near neighbors in MPI_COMM_WORLD, but it
must progressively span wider gaps between nodes as it goes up the various
binary trees. After a while this requires extensive traffic between ASICS.
This seems to be a problem on both my HP 2724 and the Extreme Networks
Summit400t-48. I am currently working with Extreme to try to resolve the
switch issue. As I say; the code ran great on Infiniband, but I think those
switches have hardware flow control. Finally I checked the code again under
LAM and it ran OK. Slow, but no hangs.

To run the code compile and type:
mpirun -np 32 -machinefile hosts src/netbench 8
The 8 means 2^8 bytes (ie 256K). This was enough to hang every time on my

You can also edit the header file (header.h). MAX_LOOPS is how many times it
runs each test (currently 10); NUM_BUF is the number of buffers in each test
(must be more than number of processors), SYNC defines the global sync
frequency-every SYNC buffers. NUM_SYNC is the number of sequential barrier
calls it uses to determine the mean barrier call time. You can also switch
the verious tests on and off, which can be useful for debugging


Tony Ladd
Professor, Chemical Engineering
University of Florida
PO Box 116005
Gainesville, FL 32611-6005

Tel: 352-392-6509
FAX: 352-392-9513
Email: tladd_at_[hidden]

  • application/x-compressed attachment: src.tgz