One possibility: if you increase the number of processes in the job, and they all interconnect, then the IB interface can (I believe) run out of memory at some point. IIRC, the answer was to reduce the size of the QPs so that you could support a larger number of them.

You should find info about controlling QP size in the IB FAQ area on the OMPI web site, I believe.

On Jun 23, 2011, at 7:56 AM, Mathieu Gontier wrote:


Thank for the answer.
I am testing with OpenMPI-1.4.3: my computation is queuing. But I did not read anything obvious related to my issue. Have you read something which could solve it?
I am going to submit my computation with --mca mpi_leave_pinned 0, but do you have any idea how it affect the performance? Compared to using Ethernet?

Many thanks for your support.

On 06/23/2011 03:01 PM, Josh Hursey wrote:
I wonder if this is related to memory pinning. Can you try turning off
the leave pinned, and see if the problem persists (this may affect
performance, but should avoid the crash):
  mpirun ... --mca mpi_leave_pinned 0 ...

Also it looks like Smoky has a slightly newer version of the 1.4
branch that you should try to switch to if you can. The following
command will show you all of the available installs on that machine:
  shell$ module avail ompi

For a list of supported compilers for that version try the 'show' option:
shell$ module show ompi/1.4.3

module-whatis	 This module configures your environment to make Open
MPI 1.4.3 available.
Supported Compilers:

Let me know if that helps.


On Wed, Jun 22, 2011 at 4:16 AM, Mathieu Gontier
<> wrote:
Dear all,

First of all, all my apologies because I post this message to both the bug
and user mailing list. But for the moment, I do not know if it is a bug!

I am running a CFD structured flow solver at ORNL, and I have an access to a
small cluster (Smoky) using OpenMPI-1.4.2 with Infiniband by default.
Recently we increased the size of our models, and since that time we have
run into many infiniband related problems.  The most serious problem is a
hard crash with the following error message:

error creating qp errno says Cannot allocate memory

If we force the solver to use ethernet (mpirun -mca btl ^openib) the
computations works correctly, although very slowly (a single iteration take
ages). Do you have any idea what could be causing these problems?

If it is due to a bug or a limitation into OpenMPI, do you think the version
1.4.3, the coming 1.4.4 or any 1.5 version could solve the problem? I read
the release notes, but I did not read any obvious patch which could fix my
problem. The system administrator is ready to compile a new package for us,
but I do not want to ask to install to many of them.


Mathieu Gontier
skype: mathieu_gontier
users mailing list


Mathieu Gontier
skype: mathieu_gontier
users mailing list