Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI users] Help configuring openmpi
From: Juan Carlos Larroya Huguet (JC.LARROYA_at_[hidden])
Date: 2008-05-12 09:52:13


Hi,

I'm using Openmpi in a linux cluster (itanium 64, intel compilers, 8
processors (4 dual) by node) in which openmpi is not the default ( I
mean supported) MPI-II implementation. Openmpi has been installed easily
on the cluster but I think there is a problem with the configuration.

I'm using two mpi codes : The first is a CFD code with a master/slave
structure... I have done some calculations on 128 proc's... 1 master
process and 127 slaves. Openmpi is slightly more efficient than the
supported MPI-II version.

Then I've moved to a second solver (radiant heat transfer ) ... In this
case, all the processors are doing the same thing. I have found that
after the initial phase of data reading some processors start to work
hard and the others (even consuming 99 of CPU) are waiting for
something! In fact I have 15 processes over 32 which are working (all
the processes are consuming 99% of CPU...) then as soon as they finish
the calculation the other processes start to do the job (in fact 12
processes) and then when these 12 start to finish the remaining 4 do the
job....

When looking to the computational time, I obtain that with the MPI-II
official version on the cluster...

output.000: temps apres petits calculs = 170.445202827454
output.001: temps apres petits calculs = 170.657078027725
output.002: temps apres petits calculs = 168.880963802338
output.003: temps apres petits calculs = 172.611718893051
output.004: temps apres petits calculs = 169.420207977295
output.005: temps apres petits calculs = 168.880684852600
output.006: temps apres petits calculs = 170.222792863846
output.007: temps apres petits calculs = 172.987339973450
output.008: temps apres petits calculs = 170.321479082108
output.009: temps apres petits calculs = 167.417831182480
output.010: temps apres petits calculs = 170.633100032806
output.011: temps apres petits calculs = 168.988963842392
output.012: temps apres petits calculs = 166.893934011459
output.013: temps apres petits calculs = 169.844722032547
output.014: temps apres petits calculs = 169.541869163513
output.015: temps apres petits calculs = 166.023182868958
output.016: temps apres petits calculs = 166.047858953476
output.017: temps apres petits calculs = 166.298271894455
output.018: temps apres petits calculs = 166.990653991699
output.019: temps apres petits calculs = 170.565690040588
output.020: temps apres petits calculs = 170.455694913864
output.021: temps apres petits calculs = 170.545780897141
output.022: temps apres petits calculs = 165.962821960449
output.023: temps apres petits calculs = 169.934472084045
output.024: temps apres petits calculs = 170.169304847717
output.025: temps apres petits calculs = 172.316897153854
output.026: temps apres petits calculs = 166.030095100403
output.027: temps apres petits calculs = 168.219340801239
output.028: temps apres petits calculs = 165.486129045486
output.029: temps apres petits calculs = 165.923212051392
output.030: temps apres petits calculs = 165.996737957001
output.031: temps apres petits calculs = 167.544650793076

all the processes are more or less consuming the same CPU time

and with Openmpi I've obtained that

output.000: temps apres petits calculs = 158.906322956085
output.001: temps apres petits calculs = 160.753660202026
output.002: temps apres petits calculs = 161.286659002304
output.003: temps apres petits calculs = 169.431221961975
output.004: temps apres petits calculs = 163.511161088943
output.005: temps apres petits calculs = 160.547757863998
output.006: temps apres petits calculs = 161.222673892975
output.007: temps apres petits calculs = 325.977787017822
output.008: temps apres petits calculs = 321.527663946152
output.009: temps apres petits calculs = 326.429191827774
output.010: temps apres petits calculs = 321.229686975479
output.011: temps apres petits calculs = 160.507288932800
output.012: temps apres petits calculs = 158.480596065521
output.013: temps apres petits calculs = 169.135869979858
output.014: temps apres petits calculs = 158.526450872421
output.015: temps apres petits calculs = 486.637645006180
output.016: temps apres petits calculs = 483.884088993073
output.017: temps apres petits calculs = 480.200496196747
output.018: temps apres petits calculs = 483.166898012161
output.019: temps apres petits calculs = 323.687628030777
output.020: temps apres petits calculs = 319.833092927933
output.021: temps apres petits calculs = 329.558218955994
output.022: temps apres petits calculs = 329.199027061462
output.023: temps apres petits calculs = 322.116630077362
output.024: temps apres petits calculs = 322.238983869553
output.025: temps apres petits calculs = 322.890433073044
output.026: temps apres petits calculs = 322.439801216125
output.027: temps apres petits calculs = 157.899522066116
output.028: temps apres petits calculs = 159.247365951538
output.029: temps apres petits calculs = 158.351451158524
output.030: temps apres petits calculs = 158.714610815048
output.031: temps apres petits calculs = 480.177379846573

15 processes have similar times (close to those obtained with the
official MPI), hen 12, then 4 as explained previously.

I suppose that we need to tune the configuration of openmpi. Do you know
how to do?

Thanks in advance

JC