Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-10-31 08:31:32


THREAD_MULTIPLE support does not work in the 1.2 series. Try turning
it off.

On Oct 30, 2007, at 12:17 AM, Neeraj Chourasia wrote:

> Hi folks,
>
> I have been seeing some nasty behaviour in MPI_Send/Recv
> with large dataset(8 MB), when used with OpenMP and Openmpi
> together with IB Interconnect. Attached is a program.
>
> The code first calls MPI_Init_thread() followed by openmp
> thread creation API. The program works fine, if we do single side
> comm unication [Thread 0 of process 0 sending some data to any
> thread of process 1], but it hangs if both side tries to send some
> data (8 MB) using IB Interconnect
>
> Interesting to note that program works fine, if we send
> short data(1 MB or below).
>
> I see this with
>
> openmpi-1.2 or openmpi-1.2.4 (compiled with --enable-mpi-
> threads)
> ofed 1.2
> 2.6.9-42.4sp.XCsmp
> icc (Intel Compiler)
>
> compiled as
> mpicc -O3 -openmp temp.c
> run as
> mpirun -np 2 -hostfile nodelist a.out
>
> The error i am getting is
>
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------------
>
> [0,1,1][btl_openib_component.c:
> 1199:btl_openib_component_progress] from n129 to: n115 error
> polling LP CQ with status LOCAL PROTOCOL ERROR status number 4 for
> wr_id 6391728 opcode 0
> [0,1,1][btl_openib_component.c:1199:btl_openib_component_progress]
> from n129 to: n115 error polling LP CQ with status WORK REQUEST
> FLUSHED ERROR status number 5 for wr_id 7058304 opcode 128
> [0,1,0][btl_openib_component.c:1199:btl_openib_component_progress]
> from n115 to: n129 [0,1,0][btl_openib_component.c:
> 1199:btl_openib_component_progress] from n115 to: n129 error
> polling LP CQ with status WORK REQUEST FLUSHED ERROR status number
> 5 for wr_id 6854256 opcode 128
> error polling LP CQ with status LOCAL LENGTH ERROR status number 1
> for wr_id 6920112 opcode 0
>
>
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> -------------------
>
>
> Anyone else seeing similar? Any ideas for workarounds?
> As a point of reference, program works fine, if we force
> openmpi to select TCP interconnect using --mca btl tcp,self.
>
> -Neeraj
>
> <temp.c>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems