Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?
From: Rahul Nabar (rpnabar_at_[hidden])
Date: 2010-08-25 12:26:45


On Thu, Aug 19, 2010 at 9:03 PM, Rahul Nabar <rpnabar_at_[hidden]> wrote:
> ------------------------------------------------------------------
> gather:
>    NP256    hangs
>    NP128    hangs
>    NP64    hangs
>    NP32    OK
>
> Note: "gather" always hangs at the following line of the test:
>       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
> [snip]
>         4096         1000       525.80       527.69       526.79
> ------------------------------------------------------------------

What I thought was a permanent "hang" for the NP64 "gather" test, was,
in fact, an exceedingly long stall. After waiting for more than 7
minutes the test runs forward to completion. What is surprising is
the _huge_ jump in times from the 4096 to 8192 byte packet sizes. Its
a step change from 275 to 1380 usecs. Any ideas what could cause this
and if this could be related to the other "hangs" I am seeing? We are
using jumbo frames with a MTU:9000 so that was one thought I had for
this transition.

On the other hand, this doesn't seem to be the case with the "hang"
for the NP256 bcast test. That one stayed hung for more than an hour
at which point I did kill it.

Just to make sure this wasn't just some quirk or buggy implementation
in the Intel-IMB test suite are there any alternative testing suites
that I could run on my cluster? I was a bit iffy about the "Intel-IMB
test suite" because I have found no active forums or mailing lists
that focus on this suite so can't really get in touch with any users
nor developers that might have an insight into how these benchmarks
run.

7m22.972s
# /opt/src/mpitests/imb/src/IMB-MPI1 -npmin 64 gather

# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#

# List of Benchmarks to run:

# Gather

#----------------------------------------------------------------
# Benchmarking Gather
# #processes = 64
#----------------------------------------------------------------
       #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
            0 1000 0.02 0.03 0.02
            1 1000 68.72 68.95 68.84
            2 1000 69.16 69.39 69.28
            4 1000 68.85 69.08 68.97
            8 1000 69.02 69.25 69.14
           16 1000 70.29 70.51 70.40
           32 1000 72.14 72.38 72.27
           64 1000 70.99 71.24 71.12
          128 1000 72.59 72.84 72.72
          256 1000 76.00 76.26 76.14
          512 1000 84.92 85.21 85.06
         1024 1000 101.69 102.01 101.84
         2048 1000 146.94 147.41 147.18
         4096 1000 275.61 276.45 276.04
         8192 13 1380.54 1607.84 1522.64
        16384 13 1497.09 1749.46 1656.61
        32768 13 2055.61 2380.37 2259.50
        65536 13 4553.46 5002.70 4837.14
       131072 13 7720.76 8926.69 8483.07
       262144 13 10423.99 12027.23 11440.07
       524288 13 19456.94 22369.62 21317.78
      1048576 13 38228.53 43892.99 41880.94
      2097152 13 99705.55 119614.62 115667.49
      4194304 10 425823.38 496396.78 468326.45