Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [ompi-1.4.2] Infiniband issue on smoky @ ornl
From: Mathieu Gontier (mathieu.gontier_at_[hidden])
Date: 2011-06-23 10:22:31


Thanks for your answer. It makes sense.
Sorry if my question seems silly, but what does QP mean? It is difficult
to read the FAQ without knowing that!


On 06/23/2011 04:00 PM, Ralph Castain wrote:
> One possibility: if you increase the number of processes in the job,
> and they all interconnect, then the IB interface can (I believe) run
> out of memory at some point. IIRC, the answer was to reduce the size
> of the QPs so that you could support a larger number of them.
> You should find info about controlling QP size in the IB FAQ area on
> the OMPI web site, I believe.
> On Jun 23, 2011, at 7:56 AM, Mathieu Gontier wrote:
>> Hello,
>> Thank for the answer.
>> I am testing with OpenMPI-1.4.3: my computation is queuing. But I did
>> not read anything obvious related to my issue. Have you read
>> something which could solve it?
>> I am going to submit my computation with --mca mpi_leave_pinned 0,
>> but do you have any idea how it affect the performance? Compared to
>> using Ethernet?
>> Many thanks for your support.
>> On 06/23/2011 03:01 PM, Josh Hursey wrote:
>>> I wonder if this is related to memory pinning. Can you try turning off
>>> the leave pinned, and see if the problem persists (this may affect
>>> performance, but should avoid the crash):
>>> mpirun ... --mca mpi_leave_pinned 0 ...
>>> Also it looks like Smoky has a slightly newer version of the 1.4
>>> branch that you should try to switch to if you can. The following
>>> command will show you all of the available installs on that machine:
>>> shell$ module avail ompi
>>> For a list of supported compilers for that version try the 'show' option:
>>> shell$ module show ompi/1.4.3
>>> -------------------------------------------------------------------
>>> /sw/smoky/modulefiles-centos/ompi/1.4.3:
>>> module-whatis This module configures your environment to make Open
>>> MPI 1.4.3 available.
>>> Supported Compilers:
>>> pathscale/3.2.99
>>> pathscale/3.2
>>> pgi/10.9
>>> pgi/10.4
>>> intel/11.1.072
>>> gcc/4.4.4
>>> gcc/4.4.3
>>> -------------------------------------------------------------------
>>> Let me know if that helps.
>>> Josh
>>> On Wed, Jun 22, 2011 at 4:16 AM, Mathieu Gontier
>>> <mathieu.gontier_at_[hidden]> wrote:
>>>> Dear all,
>>>> First of all, all my apologies because I post this message to both the bug
>>>> and user mailing list. But for the moment, I do not know if it is a bug!
>>>> I am running a CFD structured flow solver at ORNL, and I have an access to a
>>>> small cluster (Smoky) using OpenMPI-1.4.2 with Infiniband by default.
>>>> Recently we increased the size of our models, and since that time we have
>>>> run into many infiniband related problems. The most serious problem is a
>>>> hard crash with the following error message:
>>>> [smoky45][[60998,1],32][/sw/sources/ompi/1.4.2/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:464:qp_create_one]
>>>> error creating qp errno says Cannot allocate memory
>>>> If we force the solver to use ethernet (mpirun -mca btl ^openib) the
>>>> computations works correctly, although very slowly (a single iteration take
>>>> ages). Do you have any idea what could be causing these problems?
>>>> If it is due to a bug or a limitation into OpenMPI, do you think the version
>>>> 1.4.3, the coming 1.4.4 or any 1.5 version could solve the problem? I read
>>>> the release notes, but I did not read any obvious patch which could fix my
>>>> problem. The system administrator is ready to compile a new package for us,
>>>> but I do not want to ask to install to many of them.
>>>> Thanks.
>>>> --
>>>> Mathieu Gontier
>>>> skype: mathieu_gontier
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>> --
>> /
>> Mathieu Gontier
>> skype: mathieu_gontier /
>> _______________________________________________
>> users mailing list
>> users_at_[hidden] <mailto:users_at_[hidden]>
> _______________________________________________
> users mailing list
> users_at_[hidden]

Mathieu Gontier
skype: mathieu_gontier /