Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Neeraj Chourasia (neeraj_ch1_at_[hidden])
Date: 2007-11-01 00:52:08


Thanks for your reply,    but the program is running on TCP interconnect with same datasize and also on IB with small datasize say 1MB. So i dont think problem is in OpenMPI, it has to do something with IB logic, which probably doesnt work well with threads.I also tried the program with MPI_THREAD_SERIALIZED, but in vain. When is the version 1.3 scheduled to be released? Would it fix such issues?Correct me, if i am wrong-NeerajOn Wed, 31 Oct 2007 05:31:32 -0700 Open MPI Users wrote THREAD_MULTIPLE support does not work in the 1.2 series. Try turning it off. On Oct 30, 2007, at 12:17 AM, Neeraj Chourasia wrote: > Hi folks, > > I have been seeing some nasty behaviour in MPI_Send/Recv > with large dataset(8 MB), when used with OpenMP and Openmpi > together with IB Interconnect. Attached is a program. > > The code first calls MPI_Init_thread() followed by openmp > thread creation API. The program works fine, if we do single sid
e > comm unication [Thread 0 of process 0 sending some data to any > thread of process 1], but it hangs if both side tries to send some > data (8 MB) using IB Interconnect > > Interesting to note that program works fine, if we send > short data(1 MB or below). > > I see this with > > openmpi-1.2 or openmpi-1.2.4 (compiled with --enable-mpi- > threads) > ofed 1.2 > 2.6.9-42.4sp.XCsmp > icc (Intel Compiler) > > compiled as > mpicc -O3 -openmp temp.c > run as > mpirun -np 2 -hostfile nodelist a.out > > The error i am getting is > > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------------- > > [0,1,1][btl_openib_component.c: > 1199:btl_openib_component_progress] f
r
om n129 to: n115 error > polling LP CQ with status LOCAL PROTOCOL ERROR status number 4 for > wr_id 6391728 opcode 0 > [0,1,1][btl_openib_component.c:1199:btl_openib_component_progress] > from n129 to: n115 error polling LP CQ with status WORK REQUEST > FLUSHED ERROR status number 5 for wr_id 7058304 opcode 128 > [0,1,0][btl_openib_component.c:1199:btl_openib_component_progress] > from n115 to: n129 [0,1,0][btl_openib_component.c: > 1199:btl_openib_component_progress] from n115 to: n129 error > polling LP CQ with status WORK REQUEST FLUSHED ERROR status number > 5 for wr_id 6854256 opcode 128 > error polling LP CQ with status LOCAL LENGTH ERROR status number 1 > for wr_id 6920112 opcode 0 > > > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ------------------- > > > Anyone else seeing similar? A
ny ideas for workarounds? > As a point of reference, program works fine, if we force > openmpi to select TCP interconnect using --mca btl tcp,self. > > -Neeraj > > > _______________________________________________ > users mailing list > users_at_[hidden] > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems _______________________________________________ users mailing list users_at_[hidden] http://www.open-mpi.org/mailman/listinfo.cgi/users