Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] scaling problem with openmpi
From: Roman Martonak (r.martonak_at_[hidden])
Date: 2009-05-19 09:42:33


On Tue, May 19, 2009 at 3:29 PM, Peter Kjellstrom <cap_at_[hidden]> wrote:
> On Tuesday 19 May 2009, Roman Martonak wrote:
> ...
>> openmpi-1.3.2                           time per one MD step is 3.66 s
>>    ELAPSED TIME :    0 HOURS  1 MINUTES 25.90 SECONDS
>>  = ALL TO ALL COMM           102033. BYTES               4221.  =
>>  = ALL TO ALL COMM             7.802  MB/S          55.200 SEC  =
> ...
>> mvapich-1.1.0                            time per one MD step is 2.55 s
>>    ELAPSED TIME :    0 HOURS  1 MINUTES  0.65 SECONDS
>>  = ALL TO ALL COMM           102033. BYTES               4221.  =
>>  = ALL TO ALL COMM            14.815  MB/S          29.070 SEC  =
> ...
>> Intel MPI 3.2.1.009                 time per one MD step is 1.58 s
>>    ELAPSED TIME :    0 HOURS  0 MINUTES 38.16 SECONDS
>>  = ALL TO ALL COMM           102033. BYTES               4221.  =
>>  = ALL TO ALL COMM            38.696  MB/S          11.130 SEC  =
> ...
>> Clearly the whole difference is basically in the ALL TO ALL COMM time.
>> Running on 1 blade (8 cores) all three MPI implementations have very
>> similar same time per step of about 8.6 s.
>
> My guess is that what you see is the difference in MPI_Alltoall performance
> for the different MPI-implementations (running in your env. on your hw.).
>
> You could write a trivial loop like this and try on the three MPIs:
>
>  MPI_init
>  for i in 1 to 4221
>   MPI_Alltoall(size=102033, ...)
>  MPI_finialize
>
> And time it to comfirm this.
>
>> For CPMD I found that using the keyword TASKGROUP
>> which introduces a different way of parallelization it is possible to
>> improve on the openmpi time substantially and lower the time from 3.66
>> s to 1.67 s, almost to the value found with Intel MPI.
>
> I guess this changed what kind of communication is done and you no longer have
> to do 4221x 100Kbytes alltoall that seems to hurt OpenMPI so much.

With TASKGROUP=2 the summary looks as follows

       CPU TIME : 0 HOURS 0 MINUTES 42.09 SECONDS
   ELAPSED TIME : 0 HOURS 0 MINUTES 44.01 SECONDS
 *** CPMD| SIZE OF THE PROGRAM IS 73532/ 322740 kBYTES ***

 PROGRAM CPMD ENDED AT: Tue May 19 11:16:18 2009

 ================================================================
 = COMMUNICATION TASK AVERAGE MESSAGE LENGTH NUMBER OF CALLS =
 = SEND/RECEIVE 8585. BYTES 48447. =
 = BROADCAST 19063. BYTES 396. =
 = GLOBAL SUMMATION 103463. BYTES 372. =
 = GLOBAL MULTIPLICATION 0. BYTES 1. =
 = ALL TO ALL COMM 231821. BYTES 4221. =
 = PERFORMANCE TOTAL TIME =
 = SEND/RECEIVE 193.459 MB/S 2.150 SEC =
 = BROADCAST 10.785 MB/S 0.700 SEC =
 = GLOBAL SUMMATION 339.605 MB/S 0.680 SEC =
 = GLOBAL MULTIPLICATION 0.000 MB/S 0.001 SEC =
 = ALL TO ALL COMM 82.716 MB/S 11.830 SEC =
 = SYNCHRONISATION 2.360 SEC =
 ================================================================

Roman