Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Neeraj Chourasia (neeraj_ch1_at_[hidden])
Date: 2007-10-30 03:17:08


Hi folks,        I have been seeing some nasty behaviour in MPI_Send/Recv with large dataset(8 MB), when used with OpenMP and Openmpi together with IB Interconnect. Attached is a program.       The code first calls MPI_Init_thread() followed by openmp thread creation API. The program works fine, if we do single side comm unication [Thread 0 of process 0 sending some data to any thread of process 1], but it hangs if both side tries to send some data (8 MB) using IB Interconnect        Interesting to note that program works fine, if we send short data(1 MB or below).        I see this with        openmpi-1.2 or openmpi-1.2.4 (compiled with --enable-mpi-threads)        ofed 1.2        2.6.9-42.4sp.XCsmp        icc (Intel Compiler)  &nbs
p;     compiled as                mpicc -O3 -openmp temp.c        run as                mpirun -np 2 -hostfile nodelist a.out        The error i am getting is        ------------------------------------------------------------------------------------------------------------------------------------------------------------------        [0,1,1][btl_openib_component.c:1199:btl_openib_component_progress] from n129 to: n115 error polling LP CQ with status LOCAL PROTOCOL ERROR status number 4 for wr_id 6391728 opcode 0[0,1,1][btl_openib_component.c:1199:btl_openib_component_progress] from n129 to: n115 error polling LP CQ with status WORK REQUEST FLUSHED ERROR status number 5 for wr_id 7058304 opcode 128[0,1,0]
[
btl_openib_component.c:1199:btl_openib_component_progress] from n115 to: n129 [0,1,0][btl_openib_component.c:1199:btl_openib_component_progress] from n115 to: n129 error polling LP CQ with status WORK REQUEST FLUSHED ERROR status number 5 for wr_id 6854256 opcode 128error polling LP CQ with status LOCAL LENGTH ERROR status number 1 for wr_id 6920112 opcode 0        ---------------------------------------------------------------------------------------------------------------------------------------------------------------        Anyone else seeing similar?  Any ideas for workarounds?        As a point of reference, program works fine, if we force openmpi to select TCP interconnect using --mca btl tcp,self.-Neeraj