Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [ompi-1.4.2] Infiniband issue on smoky @ ornl
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-06-23 10:00:59


One possibility: if you increase the number of processes in the job, and they all interconnect, then the IB interface can (I believe) run out of memory at some point. IIRC, the answer was to reduce the size of the QPs so that you could support a larger number of them.

You should find info about controlling QP size in the IB FAQ area on the OMPI web site, I believe.

On Jun 23, 2011, at 7:56 AM, Mathieu Gontier wrote:

> Hello,
>
> Thank for the answer.
> I am testing with OpenMPI-1.4.3: my computation is queuing. But I did not read anything obvious related to my issue. Have you read something which could solve it?
> I am going to submit my computation with --mca mpi_leave_pinned 0, but do you have any idea how it affect the performance? Compared to using Ethernet?
>
> Many thanks for your support.
>
> On 06/23/2011 03:01 PM, Josh Hursey wrote:
>>
>> I wonder if this is related to memory pinning. Can you try turning off
>> the leave pinned, and see if the problem persists (this may affect
>> performance, but should avoid the crash):
>> mpirun ... --mca mpi_leave_pinned 0 ...
>>
>> Also it looks like Smoky has a slightly newer version of the 1.4
>> branch that you should try to switch to if you can. The following
>> command will show you all of the available installs on that machine:
>> shell$ module avail ompi
>>
>> For a list of supported compilers for that version try the 'show' option:
>> shell$ module show ompi/1.4.3
>> -------------------------------------------------------------------
>> /sw/smoky/modulefiles-centos/ompi/1.4.3:
>>
>> module-whatis This module configures your environment to make Open
>> MPI 1.4.3 available.
>> Supported Compilers:
>> pathscale/3.2.99
>> pathscale/3.2
>> pgi/10.9
>> pgi/10.4
>> intel/11.1.072
>> gcc/4.4.4
>> gcc/4.4.3
>> -------------------------------------------------------------------
>>
>> Let me know if that helps.
>>
>> Josh
>>
>>
>> On Wed, Jun 22, 2011 at 4:16 AM, Mathieu Gontier
>> <mathieu.gontier_at_[hidden]> wrote:
>>> Dear all,
>>>
>>> First of all, all my apologies because I post this message to both the bug
>>> and user mailing list. But for the moment, I do not know if it is a bug!
>>>
>>> I am running a CFD structured flow solver at ORNL, and I have an access to a
>>> small cluster (Smoky) using OpenMPI-1.4.2 with Infiniband by default.
>>> Recently we increased the size of our models, and since that time we have
>>> run into many infiniband related problems. The most serious problem is a
>>> hard crash with the following error message:
>>>
>>> [smoky45][[60998,1],32][/sw/sources/ompi/1.4.2/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:464:qp_create_one]
>>> error creating qp errno says Cannot allocate memory
>>>
>>> If we force the solver to use ethernet (mpirun -mca btl ^openib) the
>>> computations works correctly, although very slowly (a single iteration take
>>> ages). Do you have any idea what could be causing these problems?
>>>
>>> If it is due to a bug or a limitation into OpenMPI, do you think the version
>>> 1.4.3, the coming 1.4.4 or any 1.5 version could solve the problem? I read
>>> the release notes, but I did not read any obvious patch which could fix my
>>> problem. The system administrator is ready to compile a new package for us,
>>> but I do not want to ask to install to many of them.
>>>
>>> Thanks.
>>> --
>>>
>>> Mathieu Gontier
>>> skype: mathieu_gontier
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>
> --
>
> Mathieu Gontier
> skype: mathieu_gontier
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users