Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [ompi-1.4.2] Infiniband issue on smoky @ ornl
From: Samuel K. Gutierrez (samuel_at_[hidden])
Date: 2011-06-23 10:32:05


Hi,

What happens when you don't run with per-peer queue pairs? Try:

-mca btl_openib_receive_queues S,4096,128:S,12288,128:S,65536,128

--
Samuel K. Gutierrez
Los Alamos National Laborator
On Jun 23, 2011, at 7:56 AM, Mathieu Gontier wrote:
> Hello, 
> 
> Thank for the answer.
> I am testing with OpenMPI-1.4.3: my computation is queuing. But I did not read anything obvious related to my issue. Have you read something which could solve it? 
> I am going to submit my computation with --mca mpi_leave_pinned 0, but do you have any idea how it affect the performance? Compared to using Ethernet? 
> 
> Many thanks for your support. 
> 
> On 06/23/2011 03:01 PM, Josh Hursey wrote:
>> 
>> I wonder if this is related to memory pinning. Can you try turning off
>> the leave pinned, and see if the problem persists (this may affect
>> performance, but should avoid the crash):
>>   mpirun ... --mca mpi_leave_pinned 0 ...
>> 
>> Also it looks like Smoky has a slightly newer version of the 1.4
>> branch that you should try to switch to if you can. The following
>> command will show you all of the available installs on that machine:
>>   shell$ module avail ompi
>> 
>> For a list of supported compilers for that version try the 'show' option:
>> shell$ module show ompi/1.4.3
>> -------------------------------------------------------------------
>> /sw/smoky/modulefiles-centos/ompi/1.4.3:
>> 
>> module-whatis	 This module configures your environment to make Open
>> MPI 1.4.3 available.
>> Supported Compilers:
>>      pathscale/3.2.99
>>      pathscale/3.2
>>      pgi/10.9
>>      pgi/10.4
>>      intel/11.1.072
>>      gcc/4.4.4
>>      gcc/4.4.3
>> -------------------------------------------------------------------
>> 
>> Let me know if that helps.
>> 
>> Josh
>> 
>> 
>> On Wed, Jun 22, 2011 at 4:16 AM, Mathieu Gontier
>> <mathieu.gontier_at_[hidden]> wrote:
>>> Dear all,
>>> 
>>> First of all, all my apologies because I post this message to both the bug
>>> and user mailing list. But for the moment, I do not know if it is a bug!
>>> 
>>> I am running a CFD structured flow solver at ORNL, and I have an access to a
>>> small cluster (Smoky) using OpenMPI-1.4.2 with Infiniband by default.
>>> Recently we increased the size of our models, and since that time we have
>>> run into many infiniband related problems.  The most serious problem is a
>>> hard crash with the following error message:
>>> 
>>> [smoky45][[60998,1],32][/sw/sources/ompi/1.4.2/ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:464:qp_create_one]
>>> error creating qp errno says Cannot allocate memory
>>> 
>>> If we force the solver to use ethernet (mpirun -mca btl ^openib) the
>>> computations works correctly, although very slowly (a single iteration take
>>> ages). Do you have any idea what could be causing these problems?
>>> 
>>> If it is due to a bug or a limitation into OpenMPI, do you think the version
>>> 1.4.3, the coming 1.4.4 or any 1.5 version could solve the problem? I read
>>> the release notes, but I did not read any obvious patch which could fix my
>>> problem. The system administrator is ready to compile a new package for us,
>>> but I do not want to ask to install to many of them.
>>> 
>>> Thanks.
>>> --
>>> 
>>> Mathieu Gontier
>>> skype: mathieu_gontier
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>> 
>> 
> 
> -- 
> 
> Mathieu Gontier 
> skype: mathieu_gontier
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users