Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Trouble with Memlock when using OpenIB on an SGI ICE Cluster
From: Gilbert Grosdidier (Gilbert.Grosdidier_at_[hidden])
Date: 2010-12-31 08:31:23


Bonjour,

  Back to this painful issue, partly because I found a workaround,
and partly because I would like to help.

  The initial post was :
http://www.open-mpi.org/community/lists/users/2010/11/14843.php
where I reported about OMPI 1.4.1, but it was the same for 1.4.3.

  I spotted the culprit to be line *#274* into btl_openib.c where it was
required to replace
*mca_btl_openib_component.qp_infos[qp].u.pp_qp.rd_rsv) * nprocs;*
with
*mca_btl_openib_component.qp_infos[qp].u.pp_qp.rd_rsv) * 32;*
mostly because nprocs = 4096 or 8192 in our case, which was leading to a
very huge memlock resource requirement.

  Since I don't believe there is a relevant mca parameter to control
this value accurately
(am I wrong ?), I would suggest to invent such switch.

  It occurs to work because the number of peers for a given node (apart
for rank 0) is very low,
but it is definitely useful when all-to-all communication is not
required on a big cluster.

  Could someone comment on this ?

  More info on request.

  Thanks, Happy New Year to you all, G.

Le 29/11/2010 16:58, Gilbert Grosdidier a écrit :
> Bonjour John,
>
> Thanks for your feedback, but my investigations so far did not help:
> the memlock limit on the compute nodes are actually set to unlimited.
> This most probably means that even if the btl_openib hits some memory
> allocation
> limit, the message I got is inaccurate because the memlock resource is
> indeed already unlimited.
>
> Then, the btl allocation mechanism seems to be stopped
> by the memlock resource being exhausted because the former is
> attempting to create too many buffers, for example. I tried to explore
> this
> kind of assumption by decreasing :
> - btl_ofud_rd_num down to 32 or even 16
> - btl_openib_cq_size down to 256 or even 64
> but to no avail.
>
> So, I am asking for help about which other parameter could lead to
> (locked ?) memory exhaustion,
> knowing that the current memlock wall shows up
> - when I run with 4096 or 8192 cores (for 2048, that's fine)
> - there are 4GB of RAM available for each core
> - each core is communicating with no more than 8 neighbours, and they
> stay the same along the whole job life.
>
> Does this triggers some idea for anyone ?
>
>
> Thanks in advance, Best, Gilbert.
>
>
> Le 20 nov. 10 à 19:27, John Hearns a écrit :
>
>> On 20 November 2010 16:31, Gilbert Grosdidier
>>> Bonjour,
>>
>> Bonjour Gilbert.
>>
>> I manage ICE clusters also.
>>
>> Please could you have look at /etc/init.d/pbs on the compute blades?
>>
>>
>>
>> Do you have something like:
>>
>> if [ "${PBS_START_MOM}" -gt 0 ] ; then
>> if check_prog "mom" ; then
>> echo "PBS mom already running."
>> else
>> check_maxsys
>> site_mom_startup
>> if [ -f /etc/sgi-release -o -f /etc/sgi-compute-node-release ]
>> ; then
>> MEMLOCKLIM=`ulimit -l`
>> NOFILESLIM=`ulimit -n`
>> STACKLIM=`ulimit -s`
>> ulimit -l unlimited
>> ulimit -n 16384
>> ulimit -s unlimited
>> fi
>> _______________________________________________
>> users mailing list
>> users_at_[hidden] <mailto:users_at_[hidden]>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
  Cordialement,   Gilbert.
--
*---------------------------------------------------------------------*
   Gilbert Grosdidier             Gilbert.Grosdidier_at_[hidden]
   LAL / IN2P3 / CNRS                 Phone : +33 1 6446 8909
   Faculté des Sciences, Bat. 200     Fax   : +33 1 6446 8546
   B.P. 34, F-91898 Orsay Cedex (FRANCE)
*---------------------------------------------------------------------*