Thanks for your reply,

    but the program is running on TCP interconnect with same datasize and also on IB with small datasize say 1MB. So i dont think problem is in OpenMPI, it has to do something with IB logic, which probably doesnt work well with threads.

I also tried the program with MPI_THREAD_SERIALIZED, but in vain.

 When is the version 1.3 scheduled to be released? Would it fix such issues?

Correct me, if i am wrong

-Neeraj

On Wed, 31 Oct 2007 05:31:32 -0700 Open MPI Users wrote

THREAD_MULTIPLE support does not work in the 1.2 series. Try turning

it off.





On Oct 30, 2007, at 12:17 AM, Neeraj Chourasia wrote:



> Hi folks,

>

> I have been seeing some nasty behaviour in MPI_Send/Recv

> with large dataset(8 MB), when used with OpenMP and Openmpi

> together with IB Interconnect. Attached is a program.

>

> The code first calls MPI_Init_thread() followed by openmp

> thread creation API. The program works fine, if we do single side

> comm unication [Thread 0 of process 0 sending some data to any

> thread of process 1], but it hangs if both side tries to send some

> data (8 MB) using IB Interconnect

>

> Interesting to note that program works fine, if we send

> short data(1 MB or below).

>

> I see this with

>

> openmpi-1.2 or openmpi-1.2.4 (compiled with --enable-mpi-

> threads)

> ofed 1.2

> 2.6.9-42.4sp.XCsmp

> icc (Intel Compiler)

>

> compiled as

> mpicc -O3 -openmp temp.c

> run as

> mpirun -np 2 -hostfile nodelist a.out

>

> The error i am getting is

>

> ----------------------------------------------------------------------

> ----------------------------------------------------------------------

> ----------------------

>

> [0,1,1][btl_openib_component.c:

> 1199:btl_openib_component_progress] from n129 to: n115 error

> polling LP CQ with status LOCAL PROTOCOL ERROR status number 4 for

> wr_id 6391728 opcode 0

> [0,1,1][btl_openib_component.c:1199:btl_openib_component_progress]

> from n129 to: n115 error polling LP CQ with status WORK REQUEST

> FLUSHED ERROR status number 5 for wr_id 7058304 opcode 128

> [0,1,0][btl_openib_component.c:1199:btl_openib_component_progress]

> from n115 to: n129 [0,1,0][btl_openib_component.c:

> 1199:btl_openib_component_progress] from n115 to: n129 error

> polling LP CQ with status WORK REQUEST FLUSHED ERROR status number

> 5 for wr_id 6854256 opcode 128

> error polling LP CQ with status LOCAL LENGTH ERROR status number 1

> for wr_id 6920112 opcode 0

>

>

> ----------------------------------------------------------------------

> ----------------------------------------------------------------------

> -------------------

>

>

> Anyone else seeing similar? Any ideas for workarounds?

> As a point of reference, program works fine, if we force

> openmpi to select TCP interconnect using --mca btl tcp,self.

>

> -Neeraj

>

>

> _______________________________________________

> users mailing list

> users@open-mpi.org

> http://www.open-mpi.org/mailman/listinfo.cgi/users





--

Jeff Squyres

Cisco Systems



_______________________________________________

users mailing list

users@open-mpi.org

http://www.open-mpi.org/mailman/listinfo.cgi/users


Naukri Resume