Dear all,
First of all, all my apologies because I post this message to both
the bug and user mailing list. But for the moment, I do not know if
it is a bug!
I am running a CFD structured flow solver at ORNL, and I have an
access to a small cluster (Smoky) using OpenMPI-1.4.2 with
Infiniband by default. Recently we increased the size of our models,
and since that time we have run into many infiniband related
problems. The most serious problem is a hard crash with the
following error message:
[smoky45][[60998,1],32][/sw/sources/ompi/1.4.2/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:464:qp_create_one]
error creating qp errno says Cannot allocate memory
If we force the solver to use ethernet (mpirun -mca btl ^openib)
the computations works correctly, although very slowly (a single
iteration take ages). Do you have any idea what could be causing
these problems?
If it is due to a bug or a limitation into OpenMPI, do you think the
version 1.4.3, the coming 1.4.4 or any 1.5 version could solve the
problem? I read the release notes, but I did not read any obvious
patch which could fix my problem. The system administrator is ready
to compile a new package for us, but I do not want to ask to install
to many of them.
Thanks.
--
Mathieu Gontier
skype: mathieu_gontier