Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Over committing?
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-04-14 07:27:18


I think the next reasonable step is to use some kind of diagnostic to find out where and why the application is hung. padb is a great free/open source tool that can be used here.

On Apr 14, 2011, at 4:46 AM, Rushton Martin wrote:

> I forwarded your question to the code custodian and received the
> following reply (GRIM is the major code, the one which shows the
> problem): "I've not tried the debugger but GRIM does have a number of
> mpi_barrier calls in it so I would think we are safe there. There is of
> course a performance downside with an over-use of barriers! As mentioned
> in the e-trail."
>
>
> Martin Rushton
> HPC System Manager, Weapons Technologies
> Tel: 01959 514777, Mobile: 07939 219057
> email: jmrushton_at_[hidden]
> www.QinetiQ.com
> QinetiQ - Delivering customer-focused solutions
>
> Please consider the environment before printing this email.
> -----Original Message-----
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
> Behalf Of Ralph Castain
> Sent: 14 April 2011 04:55
> To: Open MPI Users
> Subject: Re: [OMPI users] Over committing?
>
> Have you folks used a debugger such as TotalView or padb to look at
> these stalls?
>
> I ask because we discovered a long time ago that MPI collectives can
> "hang" in the scenario you describe. It is caused by one rank falling
> behind, and then never catching up due to resource allocations - i.e..,
> once you fall behind due to the processor being used by something else,
> you never catch up.
>
> The code that causes this is generally a loop around a collective such
> as Allreduce. The solution was to inject a "barrier" operation in the
> loop periodically, thus ensuring that all ranks had an opportunity to
> catch up.
>
> There is an MCA param you can set that will inject the barrier - it
> specifies to inject it every N collective operations (either before or
> after the Nth op):
>
> -mca coll_sync_barrier_before N
>
> or
>
> -mca coll_sync_barrier_after N
>
> It'll slow the job down a little, depending upon how often you inject
> the barrier. But it did allow us to run jobs reliably to completion when
> the code involved such issues.
>
>
> On Apr 13, 2011, at 10:07 AM, Rushton Martin wrote:
>
>> The 16 cores refers to x3755-m2s. We have a mix of 3550s and 3755s in
>
>> the cluster.
>>
>> It could be memory, but I think not. The jobs are well within memory
>> capacity, and the memory is mainly static. If out of memory then the
>> jobs would be first candidate for the job. Larger jobs run on the
>> 3755s which as well as more memory have local disks for paging to.
>>
>>
>> Martin Rushton
>> HPC System Manager, Weapons Technologies
>> Tel: 01959 514777, Mobile: 07939 219057
>> email: jmrushton_at_[hidden]
>> www.QinetiQ.com
>> QinetiQ - Delivering customer-focused solutions
>>
>> Please consider the environment before printing this email.
>> -----Original Message-----
>> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]]
>> On Behalf Of Reuti
>> Sent: 13 April 2011 16:53
>> To: Open MPI Users
>> Subject: Re: [OMPI users] Over committing?
>>
>> Am 13.04.2011 um 17:09 schrieb Rushton Martin:
>>
>>> Version 1.3.2
>>>
>>> Consider a job that will run with 28 processes. The user submits it
>>> with:
>>>
>>> $ qsub -l nodes=4:ppn=7 ...
>>>
>>> which reserves 7 cores on (in this case) each of x3550x014 x3550x015
>>> and
>>> x3550x016 x3550x020. Torque generates a file (PBS_NODEFILE) which
>>> lists each node 7 times.
>>>
>>> The mpirun command given within the batch script is:
>>>
>>> $ mpirun -np 28 -machinefile $PBS_NODEFILE <executable>
>>>
>>> This is what I would refer to as 7+7+7+7, and it runs fine.
>>>
>>> The problem occurs if, for instance, a 24 core job is attempted.
>>> qsub
>>
>>> gets nodes=3:ppn=8 and mpirun has -np 24. The job is now running on
>>> three nodes, using all eight cores on each node - 8+8+8. This sort
>>> of
>>
>>> job will eventually hang and has to be killed off.
>>>
>>> Cores Nodes Ppn Result
>>> ----- ----- --- ------
>>> 8 1 any works
>>> 8 >1 1-7 works
>>> 8 >1 8 hangs
>>> 16 1 any works
>>> 16 >1 1-15 works
>>> 16 >1 16 hangs
>>
>> How many cores do you have in each system? Looks like 8 is the maximum
>
>> IBM offers from their datasheet, and still you can request 16 per
> node?
>>
>> Can it be a memory porblem?
>>
>> -- Reuti
>>
>>
>>> We have also tried test jobs on 8+7 (or 7+8) with inconclusive
>> results.
>>> Some of the live jobs run for a month or more and cut down versions
>>> do
>>
>>> not model well.
>>>
>>> Martin Rushton
>>> HPC System Manager, Weapons Technologies
>>> Tel: 01959 514777, Mobile: 07939 219057
>>> email: jmrushton_at_[hidden]
>>> www.QinetiQ.com
>>> QinetiQ - Delivering customer-focused solutions
>>>
>>> Please consider the environment before printing this email.
>>> -----Original Message-----
>>> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]]
>>> On Behalf Of Ralph Castain
>>> Sent: 13 April 2011 15:34
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] Over committing?
>>>
>>>
>>> On Apr 13, 2011, at 8:13 AM, Rushton Martin wrote:
>>>
>>>> The bulk of our compute nodes are 8 cores (twin 4-core IBM
> x3550-m2).
>>>> Jobs are submitted by Torque/MOAB. When run with up to np=8 there
>>>> is
>>
>>>> good performance. Attempting to run with more processors brings
>>>> problems, specifically if any one node of a group of nodes has all 8
>
>>>> cores in use the job hangs. For instance running with 14 cores
>>>> (7+7)
>>
>>>> is fine, but running with 16 (8+8) hangs.
>>>>
>>>>> From the FAQs I note the issues of over committing and aggressive
>>>> scheduling. Is it possible for mpirun (or orted on the remote
>>>> nodes)
>>
>>>> to be blocked from progressing by a fully committed node? We have a
>
>>>> few
>>>> x3755-m2 machines with 16 cores, and we have detected a similar
>>>> issue
>>
>>>> with 16+16.
>>>
>>> I'm not entirely sure I understand your notation, but we have never
>>> seen an issue when running with fully loaded nodes (i.e., where the
>>> number of MPI procs on the node = the number of cores).
>>>
>>> What version of OMPI are you using? Are you binding the procs?
>>> This email and any attachments to it may be confidential and are
>>> intended solely for the use of the individual to whom it is
> addressed.
>>
>>> If you are not the intended recipient of this email, you must neither
>
>>> take any action based upon its contents, nor copy or show it to
>>> anyone. Please contact the sender if you believe you have received
>>> this email in error. QinetiQ may monitor email traffic data and also
>>> the content of email for the purposes of security. QinetiQ Limited
>>> (Registered in England & Wales: Company Number: 3796233) Registered
>>> office: Cody Technology Park, Ively Road, Farnborough, Hampshire,
>>> GU14
>>
>>> 0LX http://www.qinetiq.com.
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> The QinetiQ e-mail privacy policy and company information is detailed
> elsewhere in the body of this email.
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> The QinetiQ e-mail privacy policy and company information is detailed elsewhere in the body of this email.
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/