Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?
From: Rahul Nabar (rpnabar_at_[hidden])
Date: 2010-08-19 22:03:24


My Intel IMB-MPI tests stall, but only in very specific cases:larger
packet sizes + large core counts. Only happens for bcast, gather and
exchange tests. Only for the larger core counts (~256 cores). Other
tests like pingpong and sendrecev run fine even with larger core
counts.

e.g. This bcast test hangs consistently at the 524288 bytes packet
size when invoked on 256 cores. Same test runs fine on 128 cores.

NP=256;mpirun -np $NP --host [ 32_HOSTS_8_core_each] -mca btl
openib,sm,self /mpitests/imb/src/IMB-MPI1 -npmin $NP bcast

       #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
            0 1000 0.02 0.02 0.02
            1 130 26.94 27.59 27.25
            2 130 26.44 27.09 26.77
            4 130 75.98 81.07 76.75
            8 130 28.41 29.06 28.74
           16 130 28.70 29.39 29.03
           32 130 28.48 29.15 28.85
           64 130 30.10 30.86 30.48
          128 130 31.62 32.41 32.01
          256 130 31.08 31.72 31.42
          512 130 31.79 32.58 32.13
         1024 130 33.22 34.06 33.65
         2048 130 66.21 67.61 67.21
         4096 130 79.14 80.86 80.37
         8192 130 103.38 105.21 104.70
        16384 130 160.82 163.67 162.97
        32768 130 516.11 541.75 533.46
        65536 130 1044.09 1063.63 1052.88
       131072 130 1740.09 1750.12 1746.78
       262144 130 3587.23 3598.52 3594.52
       524288 80 4000.99 6669.65 5737.78
stalls for at least 5 minutes at this point when I killed the test.

I did more extensive testing for various combinations of test-type and
core counts (see below). I know exactly when the tests fail but I
still cannot see a trend from this data. Any points or further debug
ideas? I do have padb installed and have collected core dumps if that
is going to help? One example below:

http://dl.dropbox.com/u/118481/padb.log.new.new.txt

System Details:
Intel Nehalem 2.2 GHz
10Gig Ethernet Chelsio Cards and Cisco Nexus Switch. Using the OFED drivers.
CentOS 5.4
Open MPI: 1.4.1 / Open RTE: 1.4.1 / OPAL: 1.4.1

------------------------------------------------------------------
bcast:
    NP256 hangs
    NP128 OK

Note: "bcast" mostly hangs at:

       #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
       524288 80 2682.61 4408.94 3880.68
------------------------------------------------------------------
sendrecv:
    NP256 OK
------------------------------------------------------------------
gather:
    NP256 hangs
    NP128 hangs
    NP64 hangs
    NP32 OK

Note: "gather" always hangs at the following line of the test:
       #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
[snip]
         4096 1000 525.80 527.69 526.79
------------------------------------------------------------------
exchange:
    NP256 hangs
    NP128 OK

Note: "exchange" always hangs at:

#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
8192 1000 109.65 110.79 110.37 282.08
------------------------------------------------------------------

Note: I kept the --host string the same (all 32 servers) and just
changed the NPMIN. Just in case this matters for how the procs are
mapped out