Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Doubts about the hpcc MPIRandomAccess problem
From: Ramiro Alba Queipo (raq_at_[hidden])
Date: 2008-09-10 07:17:48


Hello everybody:

I had the same problem described at thread
http://www.open-mpi.org/community/lists/users/2008/05/5601.php which I
solved setting btl_openib_free_list_max MCA parameter to 2048, but I
have some doubts and derived problems that I would like to comment:

1) Is this a problem which only affects to hpcc MPIRandomAccess test or
it may happen with any other code?

2) Should I set this parameter to some value by default? Would the
performance be affected? How should I take into account to tune this
parameter (if needed) for our home make applications?

3) I am using jfs file system on our cluster nodes and eventually I got
it corrupted or put in a read only state when running into memory
problems like the hpcc MPIRandomAccess or other problems with our home
make code.
 a) How can memory problems caused by user codes corrupt file systems /
and/or /home?
 b) Is this related to libibverbs bypassing the kernel TCP stack (I had
to set /dev/infiniband/uverbs0 rw to everybody)?
 c) Should I change to ext3 file system?
 d) Shoud I change other parameters according to
http://www.open-mpi.org/faq/?category=openfabrics#limiting-registered-memory-usage

I have a newly started infiniband cluster in stand by, so please, any
comment or advice will be welcomed

***********************************************************************
My environment info, is:

1) Openfabrics included in distribution
2) Linux distribution: Ubuntu 7.04
   uname -a -> Linux jff202 2.6.20-16-server #2 SMP Tue Feb 12
02:16:56 UTC 2008 x86_64 GNU/Linux

3) Subnet manager: OpenSM 3.1.11 from OFED 1.3 installed on the cluster
server with Ubuntu 8.04

4) ulimit -l -> unlimited

5) The MCA parameters that I have modified
at /etc/openmpi/openmpi-mca-params.conf are:

mpi_paffinity_alone = 1
pls_rsh_agent = rsh

Thanks in advance regards

-- 
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que està net.
For all your IT requirements visit: http://www.transtec.co.uk