Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] MPI_COMM_DUP freeze with OpenMPI 1.4.1
From: francoise.roch_at_[hidden]
Date: 2011-05-13 08:31:11


Hi,

The debugger traces are captured when the different tasks are blocked.
Before the MPI_COMM_DUP, a MPI_undefined color has been affected to the
master process and a MPI_COMM_SPLIT construct a new communicator not
containing the master.
The master process doesn't call the MPI_COMM_DUP routine, and so the
master is not blocked at this level of instruction, but further in the
program, at a barrier call. The master process behaviour is normal, it
wait for the slaves which are all blocked in MPI_COMM_DUP.

Here is the MUMPS portion of code (in zmumps_part1.F file) where the
slaves call MPI_COMM_DUP , id%PAR and MASTER are initialized to 0 before :

CALL MPI_COMM_SIZE(id%COMM, id%NPROCS, IERR )
IF ( id%PAR .eq. 0 ) THEN
IF ( id%MYID .eq. MASTER ) THEN
color = MPI_UNDEFINED
ELSE
color = 0
END IF
CALL MPI_COMM_SPLIT( id%COMM, color, 0,
& id%COMM_NODES, IERR )
id%NSLAVES = id%NPROCS - 1
ELSE
CALL MPI_COMM_DUP( id%COMM, id%COMM_NODES, IERR )
id%NSLAVES = id%NPROCS
END IF

IF (id%PAR .ne. 0 .or. id%MYID .NE. MASTER) THEN
CALL MPI_COMM_DUP( id%COMM_NODES, id%COMM_LOAD, IERR
ENDIF

------

In our case (id%PAR = 0), only the second MPI_COMM_DUP call is executed
on the slaves.

MUMPS library and our program are compiled with intel fortran 12 and I
have test -O1 option with no more success.

Franc,oise.

George Bosilca wrote:
> On May 10, 2011, at 08:10 , Tim Prince wrote:
>
>
>> On 5/10/2011 6:43 AM, francoise.roch_at_[hidden] wrote:
>>
>>> Hi,
>>>
>>> I compile a parallel program with OpenMPI 1.4.1 (compiled with intel
>>> compilers 12 from composerxe package) . This program is linked to MUMPS
>>> library 4.9.2, compiled with the same compilers and link with intel MKL.
>>> The OS is linux debian.
>>> No error in compiling or running the job, but the program freeze inside
>>> a call to "zmumps" routine, when the slaves process call MPI_COMM_DUP
>>> routine.
>>>
>>> The program is executed on 2 nodes of 12 cores each (westmere
>>> processors) with the following command :
>>>
>>> mpirun -np 24 --machinefile $OAR_NODE_FILE -mca plm_rsh_agent "oarsh"
>>> --mca btl self,openib -x LD_LIBRARY_PATH ./prog
>>>
>>> We have 12 process running on each node. We submit the job with OAR
>>> batch scheduler (the $OAR_NODE_FILE variable and "oarsh" command are
>>> specific to this scheduler and are usually working well with openmpi )
>>>
>>> via gdb, on the slaves, we can see that they are blocked in MPI_COMM_DUP :
>>>
>
> Francoise,
>
> Based on your traces the workers and the master are not doing the same MPI call. The workers are blocked in an MPI_Comm_dup in sub_pbdirect_init.f90:44, while the master is blocked in an MPI_Barrier in sub_pbdirect_init.f90:62. Can you verify that the slaves and the master are calling the MPI_Barrier and the MPI_Comm_dup in the same logical order?
>
> george.
>
>
>
>>> (gdb) where
>>> #0 0x00002b32c1533113 in poll () from /lib/libc.so.6
>>> #1 0x0000000000adf52c in poll_dispatch ()
>>> #2 0x0000000000adcea3 in opal_event_loop ()
>>> #3 0x0000000000ad69f9 in opal_progress ()
>>> #4 0x0000000000a34b4e in mca_pml_ob1_recv ()
>>> #5 0x00000000009b0768 in
>>> ompi_coll_tuned_allreduce_intra_recursivedoubling ()
>>> #6 0x00000000009ac829 in ompi_coll_tuned_allreduce_intra_dec_fixed ()
>>> #7 0x000000000097e271 in ompi_comm_allreduce_intra ()
>>> #8 0x000000000097dd06 in ompi_comm_nextcid ()
>>> #9 0x000000000097be01 in ompi_comm_dup ()
>>> #10 0x00000000009a0785 in PMPI_Comm_dup ()
>>> #11 0x000000000097931d in pmpi_comm_dup__ ()
>>> #12 0x0000000000644251 in zmumps (id=...) at zmumps_part1.F:144
>>> #13 0x00000000004c0d03 in sub_pbdirect_init (id=..., matrix_build=...)
>>> at sub_pbdirect_init.f90:44
>>> #14 0x0000000000628706 in fwt2d_elas_v2 () at fwt2d_elas.f90:1048
>>>
>>>
>>> the master wait further :
>>>
>>> (gdb) where
>>> #0 0x00002b9dc9f3e113 in poll () from /lib/libc.so.6
>>> #1 0x0000000000adf52c in poll_dispatch ()
>>> #2 0x0000000000adcea3 in opal_event_loop ()
>>> #3 0x0000000000ad69f9 in opal_progress ()
>>> #4 0x000000000098f294 in ompi_request_default_wait_all ()
>>> #5 0x0000000000a06e56 in ompi_coll_tuned_sendrecv_actual ()
>>> #6 0x00000000009ab8e3 in ompi_coll_tuned_barrier_intra_bruck ()
>>> #7 0x00000000009ac926 in ompi_coll_tuned_barrier_intra_dec_fixed ()
>>> #8 0x00000000009a0b20 in PMPI_Barrier ()
>>> #9 0x0000000000978c93 in pmpi_barrier__ ()
>>> #10 0x00000000004c0dc4 in sub_pbdirect_init (id=..., matrix_build=...)
>>> at sub_pbdirect_init.f90:62
>>> #11 0x0000000000628706 in fwt2d_elas_v2 () at fwt2d_elas.f90:1048
>>>
>>>
>>> Remark :
>>> The same code compiled and run well with intel MPI library, from the
>>> same intel package, on the same nodes.
>>>
>>>
>> Did you try compiling with equivalent options in each compiler? For example, (supposing you had gcc 4.6)
>> gcc -O3 -funroll-loops --param max-unroll-times=2 -march=corei7
>> would be equivalent (as closely as I know) to
>> icc -fp-model source -msse4.2 -ansi-alias
>>
>> As you should be aware, default settings in icc are more closely equivalent to
>> gcc -O3 -ffast-math -fno-cx-limited-range -funroll-loops --param max-unroll-times=2 -fnostrict-aliasing
>>
>> The options I suggest as an upper limit are probably more aggressive than most people have used successfully with OpenMPI.
>>
>> As to run-time MPI options, Intel MPI has affinity with Westmere awareness turned on by default. I suppose testing without affinity settings, particularly when banging against all hyperthreads, is a more severe test of your application. Don't you get better results at 1 rank per core?
>> --
>> Tim Prince
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> "To preserve the freedom of the human mind then and freedom of the press, every spirit should be ready to devote itself to martyrdom; for as long as we may think as we will, and speak as we think, the condition of man will proceed in improvement."
> -- Thomas Jefferson, 1799
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>