Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Trouble with Memlock when using OpenIB on an SGI ICE Cluster
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-01-06 11:34:34


That might well be a good idea (create an MCA param for the number of send / receive CQEs).

It certainly seems that OMPI shouldn't be scaling *any* IB resource based on the number of peer processes without at least some kind of upper bound.

Perhaps an IB vendor should reply here...

On Dec 31, 2010, at 8:31 AM, Gilbert Grosdidier wrote:

> Bonjour,
>
> Back to this painful issue, partly because I found a workaround,
> and partly because I would like to help.
>
> The initial post was : http://www.open-mpi.org/community/lists/users/2010/11/14843.php
> where I reported about OMPI 1.4.1, but it was the same for 1.4.3.
>
> I spotted the culprit to be line #274 into btl_openib.c where it was required to replace
> mca_btl_openib_component.qp_infos[qp].u.pp_qp.rd_rsv) * nprocs;
> with
> mca_btl_openib_component.qp_infos[qp].u.pp_qp.rd_rsv) * 32;
> mostly because nprocs = 4096 or 8192 in our case, which was leading to a
> very huge memlock resource requirement.
>
> Since I don't believe there is a relevant mca parameter to control this value accurately
> (am I wrong ?), I would suggest to invent such switch.
>
> It occurs to work because the number of peers for a given node (apart for rank 0) is very low,
> but it is definitely useful when all-to-all communication is not required on a big cluster.
>
> Could someone comment on this ?
>
> More info on request.
>
> Thanks, Happy New Year to you all, G.
>
>
>
> Le 29/11/2010 16:58, Gilbert Grosdidier a écrit :
>> Bonjour John,
>>
>> Thanks for your feedback, but my investigations so far did not help:
>> the memlock limit on the compute nodes are actually set to unlimited.
>> This most probably means that even if the btl_openib hits some memory allocation
>> limit, the message I got is inaccurate because the memlock resource is indeed already unlimited.
>>
>> Then, the btl allocation mechanism seems to be stopped
>> by the memlock resource being exhausted because the former is
>> attempting to create too many buffers, for example. I tried to explore this
>> kind of assumption by decreasing :
>> - btl_ofud_rd_num down to 32 or even 16
>> - btl_openib_cq_size down to 256 or even 64
>> but to no avail.
>>
>> So, I am asking for help about which other parameter could lead to (locked ?) memory exhaustion,
>> knowing that the current memlock wall shows up
>> - when I run with 4096 or 8192 cores (for 2048, that's fine)
>> - there are 4GB of RAM available for each core
>> - each core is communicating with no more than 8 neighbours, and they
>> stay the same along the whole job life.
>>
>> Does this triggers some idea for anyone ?
>>
>>
>> Thanks in advance, Best, Gilbert.
>>
>>
>> Le 20 nov. 10 à 19:27, John Hearns a écrit :
>>
>>> On 20 November 2010 16:31, Gilbert Grosdidier
>>>> Bonjour,
>>>
>>> Bonjour Gilbert.
>>>
>>> I manage ICE clusters also.
>>>
>>> Please could you have look at /etc/init.d/pbs on the compute blades?
>>>
>>>
>>>
>>> Do you have something like:
>>>
>>> if [ "${PBS_START_MOM}" -gt 0 ] ; then
>>> if check_prog "mom" ; then
>>> echo "PBS mom already running."
>>> else
>>> check_maxsys
>>> site_mom_startup
>>> if [ -f /etc/sgi-release -o -f /etc/sgi-compute-node-release ] ; then
>>> MEMLOCKLIM=`ulimit -l`
>>> NOFILESLIM=`ulimit -n`
>>> STACKLIM=`ulimit -s`
>>> ulimit -l unlimited
>>> ulimit -n 16384
>>> ulimit -s unlimited
>>> fi
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>>
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> --
> Cordialement, Gilbert.
>
> --
> *---------------------------------------------------------------------*
> Gilbert Grosdidier
> Gilbert.Grosdidier_at_[hidden]
>
> LAL / IN2P3 / CNRS Phone : +33 1 6446 8909
> Faculté des Sciences, Bat. 200 Fax : +33 1 6446 8546
> B.P. 34, F-91898 Orsay Cedex (FRANCE)
> *---------------------------------------------------------------------*
>

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/