Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?
From: Randolph Pullen (randolph_pullen_at_[hidden])
Date: 2010-08-22 22:57:25


Its a long shot but could it be related to the total data volume ?
ie  524288 * 80 = 41943040 bytes active in the cluster

Can you exceed this 41943040 data volume with a smaller message repeated more often or a larger one less often?

--- On Fri, 20/8/10, Rahul Nabar <rpnabar_at_[hidden]> wrote:

From: Rahul Nabar <rpnabar_at_[hidden]>
Subject: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?
To: "Open MPI Users" <users_at_[hidden]>
Received: Friday, 20 August, 2010, 12:03 PM

My Intel IMB-MPI tests stall, but only in very specific cases:larger
packet sizes + large core counts. Only happens for bcast, gather and
exchange tests. Only for the larger core counts (~256 cores). Other
tests like pingpong and sendrecev run fine even with larger core
counts.

e.g. This bcast test hangs consistently at the 524288 bytes packet
size when invoked on 256 cores. Same test runs fine on 128 cores.

NP=256;mpirun  -np $NP --host [ 32_HOSTS_8_core_each]  -mca btl
openib,sm,self    /mpitests/imb/src/IMB-MPI1 -npmin $NP  bcast

       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
            0         1000         0.02         0.02         0.02
            1          130        26.94        27.59        27.25
            2          130        26.44        27.09        26.77
            4          130        75.98        81.07        76.75
            8          130        28.41        29.06        28.74
           16          130        28.70        29.39        29.03
           32          130        28.48        29.15        28.85
           64          130        30.10        30.86        30.48
          128          130        31.62        32.41        32.01
          256          130        31.08        31.72        31.42
          512          130        31.79        32.58        32.13
         1024          130        33.22        34.06        33.65
         2048          130        66.21        67.61        67.21
         4096          130        79.14        80.86        80.37
         8192          130       103.38       105.21       104.70
        16384          130       160.82       163.67       162.97
        32768          130       516.11       541.75       533.46
        65536          130      1044.09      1063.63      1052.88
       131072          130      1740.09      1750.12      1746.78
       262144          130      3587.23      3598.52      3594.52
       524288           80      4000.99      6669.65      5737.78
stalls for at least 5 minutes at this point when I killed the test.

I did more extensive testing for various combinations of test-type and
core counts (see below). I know exactly when the tests fail but I
still cannot see a trend from this data. Any points or further debug
ideas? I do have padb installed and have collected core dumps if that
is going to help? One example below:

http://dl.dropbox.com/u/118481/padb.log.new.new.txt

System Details:
Intel Nehalem 2.2 GHz
10Gig Ethernet Chelsio Cards and Cisco Nexus Switch. Using the OFED drivers.
CentOS 5.4
Open MPI: 1.4.1 / Open RTE: 1.4.1 / OPAL: 1.4.1

------------------------------------------------------------------
bcast:
    NP256    hangs
    NP128    OK

Note: "bcast" mostly hangs at:

       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
       524288           80      2682.61      4408.94      3880.68
------------------------------------------------------------------
sendrecv:
    NP256    OK
------------------------------------------------------------------
gather:
    NP256    hangs
    NP128    hangs
    NP64    hangs
    NP32    OK

Note: "gather" always hangs at the following line of the test:
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
[snip]
         4096         1000       525.80       527.69       526.79
------------------------------------------------------------------
exchange:
    NP256    hangs
    NP128    OK

Note: "exchange" always hangs at:

#bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
8192         1000       109.65       110.79       110.37       282.08
------------------------------------------------------------------

Note: I kept the --host string the same (all 32 servers) and just
changed the NPMIN. Just in case this matters for how the procs are
mapped out
_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users