Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] [ompi-1.4.2] Infiniband issue on smoky @ ornl
From: Mathieu Gontier (mathieu.gontier_at_[hidden])
Date: 2011-06-22 04:16:33


Dear all,

First of all, all my apologies because I post this message to both the
bug and user mailing list. But for the moment, I do not know if it is a bug!

I am running a CFD structured flow solver at ORNL, and I have an access
to a small cluster (Smoky) using OpenMPI-1.4.2 with Infiniband by
default. Recently we increased the size of our models, and since that
time we have run into many infiniband related problems. The most
serious problem is a hard crash with the following error message:

[/smoky45][[60998,1],32][/sw/sources/ompi/1.4.2/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:464:qp_create_one]
error creating qp errno says Cannot allocate memory/

If we force the solver to use ethernet (mpirun -mca btl ^openib) the
computations works correctly, although very slowly (a single iteration
take ages). Do you have any idea what could be causing these problems?

If it is due to a bug or a limitation into OpenMPI, do you think the
version 1.4.3, the coming 1.4.4 or any 1.5 version could solve the
problem? I read the release notes, but I did not read any obvious patch
which could fix my problem. The system administrator is ready to compile
a new package for us, but I do not want to ask to install to many of them.

Thanks.

-- 
/
Mathieu Gontier
skype: mathieu_gontier /