Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] machines swapping in running job[Scanned]
From: Arif Ali (aali_at_[hidden])
Date: 2008-04-22 12:10:10


Hi list,

I had a similar problem last year with IMB when the the job would just
hang on a PowerPC cluster, for which Jeff Sqyres gave me many pointers
to change paramaters to fix the problem. Now with another cluster that I
am building the IMB job hangs in the same place and also the machines in
the cluster start swapping at the time of the hang. Following from what
Jeff suggested I have tried the following mca paramaters

btl_openib_flags=1
btl_openib_ib_timeout=20
mpool_base_verbose=1
mpool_base_use_mem_hooks=1
btl_openib_eager_limit=3072
#btl_openib_eager_limit=4096
btl_openib_max_send_size=12288

After setting these paramaters, the machines swapped, but a lot less
than before and got a lot further in the run and ran to completion. Are
there any further suggestions on paramaters that can be tweaked to get
these machines not to swap.

I am also having the same swapping issue when running the HPCC benchmark
when it reaches the MPIRandomAccess where it swaps on all machines and
we can no longer access them and therefore we have to reboot the machines.

OS: SLES 10
Kernel: 2.6.16.46-0.12-smp
OFED release: 1.3
openmpi: 1.2.5 and 1.2.6 using btl openib
Switch: TopSpin
SM: on TopSpin switch
Ulimit has been set to unlimited as suggested in the FAQ

One thing to note, Both jobs run with no problems using TCP.

regards,

-- 
Arif Ali
Software Engineer
OCF plc
Mobile: +44 (0)7970 148 122         
DDI:    +44 (0)114 257 2240
Office: +44 (0)114 257 2200         
Fax:    +44 (0)114 257 0022
Email:  aali_at_[hidden]              
Web:    http://www.ocf.co.uk
Support Phone:   +44 (0)845 702 3829
Support E-mail:  support_at_[hidden]
Skype:  arif_ali80                  
MSN:    aali_at_[hidden]
This email is confidential in that it is intended for the exclusive 
attention of the addressee(s) indicated. If you are not the intended 
recipient, this email should not be read or disclosed to any other 
person. Please notify the sender immediately and delete this email from 
your computer system. Any opinions expressed are not necessarily those 
of the company from which this email was sent and, whilst to the best of
our knowledge no viruses or defects exist, no responsibility can be 
accepted for any loss or damage arising from its receipt or subsequent 
use of this email.