Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores: very poor performance
From: Gilbert Grosdidier (Gilbert.Grosdidier_at_[hidden])
Date: 2010-12-22 13:29:16


Bonsoir Eugene,

  First thanks for trying to help me.

  I already gave a try to some profiling tool, namely IPM, which is rather
simple to use. Here follows some output for a 1024 core run.
Unfortunately, I'm yet unable to have the equivalent MPT chart.

#IPMv0.983####################################################################
#
# command : unknown (completed)
# host : r34i0n0/x86_64_Linux mpi_tasks : 1024 on 128 nodes
# start : 12/21/10/13:18:09 wallclock : 3357.308618 sec
# stop : 12/21/10/14:14:06 %comm : 27.67
# gbytes : 0.00000e+00 total gflop/sec : 0.00000e+00 total
#
##############################################################################
# region : * [ntasks] = 1024
#
# [total] <avg> min max
# entries 1024 1
1 1
# wallclock 3.43754e+06 3356.98 3356.83
3357.31
# user 2.82831e+06 2762.02 2622.04
2923.37
# system 376230 367.412 174.603
492.919
# mpi 951328 929.031 633.137
1052.86
# %comm 27.6719 18.8601
31.363
# gflop/sec 0 0
0 0
# gbytes 0 0
0 0
#
#
# [time] [calls] <%mpi> <%wall>
# MPI_Waitall 741683 7.91081e+07 77.96
21.58
# MPI_Allreduce 114057 2.53665e+07 11.99
3.32
# MPI_Recv 40164.7 2048 4.22
1.17
# MPI_Isend 27420.6 6.53513e+08 2.88
0.80
# MPI_Barrier 25113.5 2048 2.64
0.73
# MPI_Sendrecv 2123.6 212992 0.22
0.06
# MPI_Irecv 464.616 6.53513e+08 0.05
0.01
# MPI_Reduce 215.447 171008 0.02
0.01
# MPI_Bcast 85.0198 1024 0.01
0.00
# MPI_Send 0.377043 2048 0.00
0.00
# MPI_Comm_rank 0.000744925 4096 0.00
0.00
# MPI_Comm_size 0.000252183 1024 0.00
0.00
###############################################################################

  It seems to my non-expert eye that MPI_Waitall is dominant among MPI
calls,
but not for the overall application, however I will have to compare with
MPT,
before concluding.

  Thanks again for your suggestions, that I'll address one by one.

  Best, G.

Le 22/12/2010 18:50, Eugene Loh a écrit :
> Can you isolate a bit more where the time is being spent? The
> performance effect you're describing appears to be drastic. Have you
> profiled the code? Some choices of tools can be found in the FAQ
> http://www.open-mpi.org/faq/?category=perftools The results may be
> "uninteresting" (all time spent in your MPI_Waitall calls, for
> example), but it'd be good to rule out other possibilities (e.g., I've
> seen cases where it's the non-MPI time that's the culprit).
>
> If all the time is spent in MPI_Waitall, then I wonder if it would be
> possible for you to reproduce the problem with just some
> MPI_Isend|Irecv|Waitall calls that mimic your program. E.g., "lots of
> short messages", or "lots of long messages", etc. It sounds like
> there is some repeated set of MPI exchanges, so maybe that set can be
> extracted and run without the complexities of the application.
>
> Anyhow, some profiling might help guide one to the problem.
>
> Gilbert Grosdidier wrote:
>
>> There are indeed a high rate of communications. But the buffer
>> size is always the same for a given pair of processes, and I thought
>> that mpi_leave_pinned should avoid freeing the memory in this case.
>> Am I wrong ?