Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_COMM_DUP freeze with OpenMPI 1.4.1
From: Tim Prince (n8tm_at_[hidden])
Date: 2011-05-10 11:10:51


On 5/10/2011 6:43 AM, francoise.roch_at_[hidden] wrote:
>
> Hi,
>
> I compile a parallel program with OpenMPI 1.4.1 (compiled with intel
> compilers 12 from composerxe package) . This program is linked to MUMPS
> library 4.9.2, compiled with the same compilers and link with intel MKL.
> The OS is linux debian.
> No error in compiling or running the job, but the program freeze inside
> a call to "zmumps" routine, when the slaves process call MPI_COMM_DUP
> routine.
>
> The program is executed on 2 nodes of 12 cores each (westmere
> processors) with the following command :
>
> mpirun -np 24 --machinefile $OAR_NODE_FILE -mca plm_rsh_agent "oarsh"
> --mca btl self,openib -x LD_LIBRARY_PATH ./prog
>
> We have 12 process running on each node. We submit the job with OAR
> batch scheduler (the $OAR_NODE_FILE variable and "oarsh" command are
> specific to this scheduler and are usually working well with openmpi )
>
> via gdb, on the slaves, we can see that they are blocked in MPI_COMM_DUP :
>
> (gdb) where
> #0 0x00002b32c1533113 in poll () from /lib/libc.so.6
> #1 0x0000000000adf52c in poll_dispatch ()
> #2 0x0000000000adcea3 in opal_event_loop ()
> #3 0x0000000000ad69f9 in opal_progress ()
> #4 0x0000000000a34b4e in mca_pml_ob1_recv ()
> #5 0x00000000009b0768 in
> ompi_coll_tuned_allreduce_intra_recursivedoubling ()
> #6 0x00000000009ac829 in ompi_coll_tuned_allreduce_intra_dec_fixed ()
> #7 0x000000000097e271 in ompi_comm_allreduce_intra ()
> #8 0x000000000097dd06 in ompi_comm_nextcid ()
> #9 0x000000000097be01 in ompi_comm_dup ()
> #10 0x00000000009a0785 in PMPI_Comm_dup ()
> #11 0x000000000097931d in pmpi_comm_dup__ ()
> #12 0x0000000000644251 in zmumps (id=...) at zmumps_part1.F:144
> #13 0x00000000004c0d03 in sub_pbdirect_init (id=..., matrix_build=...)
> at sub_pbdirect_init.f90:44
> #14 0x0000000000628706 in fwt2d_elas_v2 () at fwt2d_elas.f90:1048
>
>
> the master wait further :
>
> (gdb) where
> #0 0x00002b9dc9f3e113 in poll () from /lib/libc.so.6
> #1 0x0000000000adf52c in poll_dispatch ()
> #2 0x0000000000adcea3 in opal_event_loop ()
> #3 0x0000000000ad69f9 in opal_progress ()
> #4 0x000000000098f294 in ompi_request_default_wait_all ()
> #5 0x0000000000a06e56 in ompi_coll_tuned_sendrecv_actual ()
> #6 0x00000000009ab8e3 in ompi_coll_tuned_barrier_intra_bruck ()
> #7 0x00000000009ac926 in ompi_coll_tuned_barrier_intra_dec_fixed ()
> #8 0x00000000009a0b20 in PMPI_Barrier ()
> #9 0x0000000000978c93 in pmpi_barrier__ ()
> #10 0x00000000004c0dc4 in sub_pbdirect_init (id=..., matrix_build=...)
> at sub_pbdirect_init.f90:62
> #11 0x0000000000628706 in fwt2d_elas_v2 () at fwt2d_elas.f90:1048
>
>
> Remark :
> The same code compiled and run well with intel MPI library, from the
> same intel package, on the same nodes.
>
Did you try compiling with equivalent options in each compiler? For
example, (supposing you had gcc 4.6)
gcc -O3 -funroll-loops --param max-unroll-times=2 -march=corei7
would be equivalent (as closely as I know) to
icc -fp-model source -msse4.2 -ansi-alias

As you should be aware, default settings in icc are more closely
equivalent to
gcc -O3 -ffast-math -fno-cx-limited-range -funroll-loops --param
max-unroll-times=2 -fnostrict-aliasing

The options I suggest as an upper limit are probably more aggressive
than most people have used successfully with OpenMPI.

As to run-time MPI options, Intel MPI has affinity with Westmere
awareness turned on by default. I suppose testing without affinity
settings, particularly when banging against all hyperthreads, is a more
severe test of your application. Don't you get better results at 1
rank per core?

-- 
Tim Prince