Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Help configuring openmpi
From: Juan Carlos Larroya Huguet (JC.LARROYA_at_[hidden])
Date: 2008-05-13 15:15:19


Hi,

Thanks very much Jeff, you solved my problem. The new CPU times are correct.

output.000: temps apres petits calculs = 161.828640937805
output.001: temps apres petits calculs = 167.412606000900
output.002: temps apres petits calculs = 161.822407007217
output.003: temps apres petits calculs = 159.414180994034
output.004: temps apres petits calculs = 158.233778953552
output.005: temps apres petits calculs = 158.775961160660
output.006: temps apres petits calculs = 160.206702947617
output.007: temps apres petits calculs = 158.072614192963
output.008: temps apres petits calculs = 159.688425064087
output.009: temps apres petits calculs = 158.696867942810
output.010: temps apres petits calculs = 158.287634849548
output.011: temps apres petits calculs = 160.931638002396
output.012: temps apres petits calculs = 160.669780969620
output.013: temps apres petits calculs = 161.221219062805
output.014: temps apres petits calculs = 161.696250915527
output.015: temps apres petits calculs = 164.311156034470
output.016: temps apres petits calculs = 177.722136020660
output.017: temps apres petits calculs = 160.300070047379
output.018: temps apres petits calculs = 164.753610849380
output.019: temps apres petits calculs = 158.875360965729
output.020: temps apres petits calculs = 158.453947067261
output.021: temps apres petits calculs = 160.183310031891
output.022: temps apres petits calculs = 158.966534852982
output.023: temps apres petits calculs = 159.750366926193
output.024: temps apres petits calculs = 158.936643123627
output.025: temps apres petits calculs = 161.162981987000
output.026: temps apres petits calculs = 159.347134828568
output.027: temps apres petits calculs = 169.814289808273
output.028: temps apres petits calculs = 161.617573976517
output.029: temps apres petits calculs = 158.314706087112
output.030: temps apres petits calculs = 158.700573205948
output.031: temps apres petits calculs = 166.480212926865

Thanks again

JC

PS: I was working with openmpi 1.2.5, to test your suggestion I moved
to version 1.2.6... I tried to install openmpi in my own path using
configure --prefix=my_path but the make install remains sticky to the
default path ... /usr/local . I didn't found this problem with the
version 1.2.5... To bypass this problem I just modified the
ac_default_prefix variable in the configure file to my path... Maybe you
can make follows this issue to the right person/mail list...

Jeff Squyres wrote:
> If OMPI is spinning consuming 100% of your CPU, it usually means that
> some MPI function call is polling waiting for completion. Given the
> pattern you are seeing, I'm wondering if some Open MPI collective call
> is not finishing until you re-enter the MPI progression engine.
>
> Specifically, is your pattern like this:
>
> - some MPI collective function
> - enter a long period of computation involving no MPI calls
> - call another MPI function
>
> If so, you could well be getting bitten by what is known as an "early
> completion" optimization in the Open MPI v1.2 series that allows us to
> lower our latency slightly in some cases. In OMPI v1.2.6, we added an
> MCA parameter to disable this behavior: set then
> pml_ob1_use_early_completion MCA parameter to 0 and try your app again.
>
> This parameter is unnecessary in the [upcoming] v1.3 series; we
> changed how completions are done such that this should not be an issue.
>
>
> On May 12, 2008, at 9:52 AM, Juan Carlos Larroya Huguet wrote:
>
>
>> Hi,
>>
>> I'm using Openmpi in a linux cluster (itanium 64, intel compilers, 8
>> processors (4 dual) by node) in which openmpi is not the default ( I
>> mean supported) MPI-II implementation. Openmpi has been installed
>> easily
>> on the cluster but I think there is a problem with the configuration.
>>
>> I'm using two mpi codes : The first is a CFD code with a master/slave
>> structure... I have done some calculations on 128 proc's... 1 master
>> process and 127 slaves. Openmpi is slightly more efficient than the
>> supported MPI-II version.
>>
>> Then I've moved to a second solver (radiant heat transfer ) ... In
>> this
>> case, all the processors are doing the same thing. I have found that
>> after the initial phase of data reading some processors start to work
>> hard and the others (even consuming 99 of CPU) are waiting for
>> something! In fact I have 15 processes over 32 which are working (all
>> the processes are consuming 99% of CPU...) then as soon as they finish
>> the calculation the other processes start to do the job (in fact 12
>> processes) and then when these 12 start to finish the remaining 4 do
>> the
>> job....
>>
>> When looking to the computational time, I obtain that with the MPI-II
>> official version on the cluster...
>>
>> output.000: temps apres petits calculs = 170.445202827454
>> output.001: temps apres petits calculs = 170.657078027725
>> output.002: temps apres petits calculs = 168.880963802338
>> output.003: temps apres petits calculs = 172.611718893051
>> output.004: temps apres petits calculs = 169.420207977295
>> output.005: temps apres petits calculs = 168.880684852600
>> output.006: temps apres petits calculs = 170.222792863846
>> output.007: temps apres petits calculs = 172.987339973450
>> output.008: temps apres petits calculs = 170.321479082108
>> output.009: temps apres petits calculs = 167.417831182480
>> output.010: temps apres petits calculs = 170.633100032806
>> output.011: temps apres petits calculs = 168.988963842392
>> output.012: temps apres petits calculs = 166.893934011459
>> output.013: temps apres petits calculs = 169.844722032547
>> output.014: temps apres petits calculs = 169.541869163513
>> output.015: temps apres petits calculs = 166.023182868958
>> output.016: temps apres petits calculs = 166.047858953476
>> output.017: temps apres petits calculs = 166.298271894455
>> output.018: temps apres petits calculs = 166.990653991699
>> output.019: temps apres petits calculs = 170.565690040588
>> output.020: temps apres petits calculs = 170.455694913864
>> output.021: temps apres petits calculs = 170.545780897141
>> output.022: temps apres petits calculs = 165.962821960449
>> output.023: temps apres petits calculs = 169.934472084045
>> output.024: temps apres petits calculs = 170.169304847717
>> output.025: temps apres petits calculs = 172.316897153854
>> output.026: temps apres petits calculs = 166.030095100403
>> output.027: temps apres petits calculs = 168.219340801239
>> output.028: temps apres petits calculs = 165.486129045486
>> output.029: temps apres petits calculs = 165.923212051392
>> output.030: temps apres petits calculs = 165.996737957001
>> output.031: temps apres petits calculs = 167.544650793076
>>
>> all the processes are more or less consuming the same CPU time
>>
>> and with Openmpi I've obtained that
>>
>> output.000: temps apres petits calculs = 158.906322956085
>> output.001: temps apres petits calculs = 160.753660202026
>> output.002: temps apres petits calculs = 161.286659002304
>> output.003: temps apres petits calculs = 169.431221961975
>> output.004: temps apres petits calculs = 163.511161088943
>> output.005: temps apres petits calculs = 160.547757863998
>> output.006: temps apres petits calculs = 161.222673892975
>> output.007: temps apres petits calculs = 325.977787017822
>> output.008: temps apres petits calculs = 321.527663946152
>> output.009: temps apres petits calculs = 326.429191827774
>> output.010: temps apres petits calculs = 321.229686975479
>> output.011: temps apres petits calculs = 160.507288932800
>> output.012: temps apres petits calculs = 158.480596065521
>> output.013: temps apres petits calculs = 169.135869979858
>> output.014: temps apres petits calculs = 158.526450872421
>> output.015: temps apres petits calculs = 486.637645006180
>> output.016: temps apres petits calculs = 483.884088993073
>> output.017: temps apres petits calculs = 480.200496196747
>> output.018: temps apres petits calculs = 483.166898012161
>> output.019: temps apres petits calculs = 323.687628030777
>> output.020: temps apres petits calculs = 319.833092927933
>> output.021: temps apres petits calculs = 329.558218955994
>> output.022: temps apres petits calculs = 329.199027061462
>> output.023: temps apres petits calculs = 322.116630077362
>> output.024: temps apres petits calculs = 322.238983869553
>> output.025: temps apres petits calculs = 322.890433073044
>> output.026: temps apres petits calculs = 322.439801216125
>> output.027: temps apres petits calculs = 157.899522066116
>> output.028: temps apres petits calculs = 159.247365951538
>> output.029: temps apres petits calculs = 158.351451158524
>> output.030: temps apres petits calculs = 158.714610815048
>> output.031: temps apres petits calculs = 480.177379846573
>>
>> 15 processes have similar times (close to those obtained with the
>> official MPI), hen 12, then 4 as explained previously.
>>
>> I suppose that we need to tune the configuration of openmpi. Do you
>> know
>> how to do?
>>
>> Thanks in advance
>>
>> JC
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>