Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Re :Re: OpenMP and OpenMPI Issue
From: Jack Galloway (jackg_at_[hidden])
Date: 2012-07-20 13:33:06

Jeff Squyres <jsquyres <at>> writes:

> On Oct 31, 2007, at 9:52 PM, Neeraj Chourasia wrote:
> > but the program is running on TCP interconnect with same
> > datasize and also on IB with small datasize say 1MB. So i dont
> > think problem is in OpenMPI, it has to do something with IB logic,
> > which probably doesnt work well with threads.
> Open MPi's TCP nominally supports threads, but I'd be surprised if it
> works consistently (i.e., it has not been tested thoroughly). The
> Open MPI IB code definitely does not yet work with threads.
> > I also tried the program with MPI_THREAD_SERIALIZED, but in vain.
> Open MPI currently treats this as no different than THREAD_SINGLE;
> the problem is that you'll still have multiple different threads
> calling MPI simultaneously with your program.
> > When is the version 1.3 scheduled to be released? Would it fix
> > such issues?
> No. We had been planning to make THREAD_MULTIPLE support available
> in the 1.3 series, but there honestly has not been enough customer
> demand for it such that we could not justify putting the resources /
> spending the time to finish it in Open MPI. THREAD_MULTIPLE is
> still on the long-term roadmap, but it will not be included in the
> 1.4 series.

This is an old thread, and I'm curious if there is support now for this? I have
a large code that I'm running, a hybrid MPI/OpenMP code, that is having trouble
over our infiniband network. I'm running a fairly large problem (uses about
18GB), and part way in, I get the following errors:

[[929,1],0][btl_openib_component.c:3238:handle_wc] from tebow to: tebow416 error
polling LP CQ with status LOCAL LENGTH ERROR status number 1 for wr_id 103761776
opcode 128 vendor error 105 qp_idx 3
mpirun has exited due to process rank 0 with PID 29873 on
node tebow exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

This seems very similar to the question that originated this thread, and since
we're now on version 1.4.5 I was wondering if there was any better help for this
(compiler options, run-time flags or anything), or if someone had encountered
this problem and solved it.