Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] MPI_THREAD_MULTIPLE testing on trunk
From: Paul Kapinos (kapinos_at_[hidden])
Date: 2012-08-03 07:43:22


Christopher,
I cannot reproduce your problem on my fresh installed 1.6.1rc2. I've used the
attached program which is essentially your test case with a bit modification sin
order to make in compilable.

But what I see is that there seem to be a small timeout somewhere in
initializing stage: if you starting processes on nodes in another IB island
without explicitly definition which interface has to be used for startup
communication, it hangs for some 20 seconds. (I think openmpi try to communicate
over not connected Eth's and run into timeout). Thus we use this:
-mca oob_tcp_if_include ib0 -mca btl_tcp_if_include ib0

Nevertheless, I cannot reproduce your initial issue with 1.6.1rc2 in our
environment.

Best
Paul Kapinos

$ time /opt/MPI/openmpi-1.6.1rc2mt/linux/intel/bin/mpiexec -mca
oob_tcp_if_include ib0 -mca btl_tcp_if_include ib0 -np 4 -H
linuxscc005,linuxscc004 a.out
linuxscc004.rz.RWTH-Aachen.DE(3) of 4 provided=(3)
linuxscc005.rz.RWTH-Aachen.DE(0) of 4 provided=(3)
linuxscc004.rz.RWTH-Aachen.DE(1) of 4 provided=(3)
linuxscc005.rz.RWTH-Aachen.DE(2) of 4 provided=(3)
/opt/MPI/openmpi-1.6.1rc2mt/linux/intel/bin/mpiexec -mca oob_tcp_if_include
0.06s user 0.09s system 9% cpu 1.608 total

$ time /opt/MPI/openmpi-1.6.1rc2mt/linux/intel/bin/mpiexec -np 4 -H
linuxscc005,linuxscc004 a.out
linuxscc004.rz.RWTH-Aachen.DE(1) of 4 provided=(3)
linuxscc004.rz.RWTH-Aachen.DE(3) of 4 provided=(3)
linuxscc005.rz.RWTH-Aachen.DE(0) of 4 provided=(3)
linuxscc005.rz.RWTH-Aachen.DE(2) of 4 provided=(3)
/opt/MPI/openmpi-1.6.1rc2mt/linux/intel/bin/mpiexec -np 4 -H a.out 0.04s user
0.10s system 0% cpu 23.600 total

On 08/03/12 09:29, Christopher Yeoh wrote:
> I've narrowed it down to a very simple test case
> (you don't need to explicitly spawn any threads).
> Just need a program like:
....
> If its run with "--mpi-preconnect_mpi 1" then it hangs in MPI_Init_thread. If not,
> then it hangs in MPI_Barrier. Get a backtrace that looks like this (with the former):

-- 
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915