Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_COMM_DUP freeze with OpenMPI 1.4.1
From: francoise.roch_at_[hidden]
Date: 2011-05-13 08:31:11


Hi,

The debugger traces are captured when the different tasks are blocked.
Before the MPI_COMM_DUP, a MPI_undefined color has been affected to the
master process and a MPI_COMM_SPLIT construct a new communicator not
containing the master.
The master process doesn't call the MPI_COMM_DUP routine, and so the
master is not blocked at this level of instruction, but further in the
program, at a barrier call. The master process behaviour is normal, it
wait for the slaves which are all blocked in MPI_COMM_DUP.

Here is the MUMPS portion of code (in zmumps_part1.F file) where the
slaves call MPI_COMM_DUP , id%PAR and MASTER are initialized to 0 before :

CALL MPI_COMM_SIZE(id%COMM, id%NPROCS, IERR )
IF ( id%PAR .eq. 0 ) THEN
IF ( id%MYID .eq. MASTER ) THEN
color = MPI_UNDEFINED
ELSE
color = 0
END IF
CALL MPI_COMM_SPLIT( id%COMM, color, 0,
& id%COMM_NODES, IERR )
id%NSLAVES = id%NPROCS - 1
ELSE
CALL MPI_COMM_DUP( id%COMM, id%COMM_NODES, IERR )
id%NSLAVES = id%NPROCS
END IF

IF (id%PAR .ne. 0 .or. id%MYID .NE. MASTER) THEN
CALL MPI_COMM_DUP( id%COMM_NODES, id%COMM_LOAD, IERR
ENDIF

------

In our case (id%PAR = 0), only the second MPI_COMM_DUP call is executed
on the slaves.

MUMPS library and our program are compiled with intel fortran 12 and I
have test -O1 option with no more success.

Franc,oise.

George Bosilca wrote:
> On May 10, 2011, at 08:10 , Tim Prince wrote:
>
>
>> On 5/10/2011 6:43 AM, francoise.roch_at_[hidden] wrote:
>>
>>> Hi,
>>>
>>> I compile a parallel program with OpenMPI 1.4.1 (compiled with intel
>>> compilers 12 from composerxe package) . This program is linked to MUMPS
>>> library 4.9.2, compiled with the same compilers and link with intel MKL.
>>> The OS is linux debian.
>>> No error in compiling or running the job, but the program freeze inside
>>> a call to "zmumps" routine, when the slaves process call MPI_COMM_DUP
>>> routine.
>>>
>>> The program is executed on 2 nodes of 12 cores each (westmere
>>> processors) with the following command :
>>>
>>> mpirun -np 24 --machinefile $OAR_NODE_FILE -mca plm_rsh_agent "oarsh"
>>> --mca btl self,openib -x LD_LIBRARY_PATH ./prog
>>>
>>> We have 12 process running on each node. We submit the job with OAR
>>> batch scheduler (the $OAR_NODE_FILE variable and "oarsh" command are
>>> specific to this scheduler and are usually working well with openmpi )
>>>
>>> via gdb, on the slaves, we can see that they are blocked in MPI_COMM_DUP :
>>>
>
> Francoise,
>
> Based on your traces the workers and the master are not doing the same MPI call. The workers are blocked in an MPI_Comm_dup in sub_pbdirect_init.f90:44, while the master is blocked in an MPI_Barrier in sub_pbdirect_init.f90:62. Can you verify that the slaves and the master are calling the MPI_Barrier and the MPI_Comm_dup in the same logical order?
>
> george.
>
>
>
>>> (gdb) where
>>> #0 0x00002b32c1533113 in poll () from /lib/libc.so.6
>>> #1 0x0000000000adf52c in poll_dispatch ()
>>> #2 0x0000000000adcea3 in opal_event_loop ()
>>> #3 0x0000000000ad69f9 in opal_progress ()
>>> #4 0x0000000000a34b4e in mca_pml_ob1_recv ()
>>> #5 0x00000000009b0768 in
>>> ompi_coll_tuned_allreduce_intra_recursivedoubling ()
>>> #6 0x00000000009ac829 in ompi_coll_tuned_allreduce_intra_dec_fixed ()
>>> #7 0x000000000097e271 in ompi_comm_allreduce_intra ()
>>> #8 0x000000000097dd06 in ompi_comm_nextcid ()
>>> #9 0x000000000097be01 in ompi_comm_dup ()
>>> #10 0x00000000009a0785 in PMPI_Comm_dup ()
>>> #11 0x000000000097931d in pmpi_comm_dup__ ()
>>> #12 0x0000000000644251 in zmumps (id=...) at zmumps_part1.F:144
>>> #13 0x00000000004c0d03 in sub_pbdirect_init (id=..., matrix_build=...)
>>> at sub_pbdirect_init.f90:44
>>> #14 0x0000000000628706 in fwt2d_elas_v2 () at fwt2d_elas.f90:1048
>>>
>>>
>>> the master wait further :
>>>
>>> (gdb) where
>>> #0 0x00002b9dc9f3e113 in poll () from /lib/libc.so.6
>>> #1 0x0000000000adf52c in poll_dispatch ()
>>> #2 0x0000000000adcea3 in opal_event_loop ()
>>> #3 0x0000000000ad69f9 in opal_progress ()
>>> #4 0x000000000098f294 in ompi_request_default_wait_all ()
>>> #5 0x0000000000a06e56 in ompi_coll_tuned_sendrecv_actual ()
>>> #6 0x00000000009ab8e3 in ompi_coll_tuned_barrier_intra_bruck ()
>>> #7 0x00000000009ac926 in ompi_coll_tuned_barrier_intra_dec_fixed ()
>>> #8 0x00000000009a0b20 in PMPI_Barrier ()
>>> #9 0x0000000000978c93 in pmpi_barrier__ ()
>>> #10 0x00000000004c0dc4 in sub_pbdirect_init (id=..., matrix_build=...)
>>> at sub_pbdirect_init.f90:62
>>> #11 0x0000000000628706 in fwt2d_elas_v2 () at fwt2d_elas.f90:1048
>>>
>>>
>>> Remark :
>>> The same code compiled and run well with intel MPI library, from the
>>> same intel package, on the same nodes.
>>>
>>>
>> Did you try compiling with equivalent options in each compiler? For example, (supposing you had gcc 4.6)
>> gcc -O3 -funroll-loops --param max-unroll-times=2 -march=corei7
>> would be equivalent (as closely as I know) to
>> icc -fp-model source -msse4.2 -ansi-alias
>>
>> As you should be aware, default settings in icc are more closely equivalent to
>> gcc -O3 -ffast-math -fno-cx-limited-range -funroll-loops --param max-unroll-times=2 -fnostrict-aliasing
>>
>> The options I suggest as an upper limit are probably more aggressive than most people have used successfully with OpenMPI.
>>
>> As to run-time MPI options, Intel MPI has affinity with Westmere awareness turned on by default. I suppose testing without affinity settings, particularly when banging against all hyperthreads, is a more severe test of your application. Don't you get better results at 1 rank per core?
>> --
>> Tim Prince
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> "To preserve the freedom of the human mind then and freedom of the press, every spirit should be ready to devote itself to martyrdom; for as long as we may think as we will, and speak as we think, the condition of man will proceed in improvement."
> -- Thomas Jefferson, 1799
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>