Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI users] Doubts about the hpcc MPIRandomAccess problem
From: Ramiro Alba Queipo (raq_at_[hidden])
Date: 2008-09-10 07:17:48


Hello everybody:

I had the same problem described at thread
http://www.open-mpi.org/community/lists/users/2008/05/5601.php which I
solved setting btl_openib_free_list_max MCA parameter to 2048, but I
have some doubts and derived problems that I would like to comment:

1) Is this a problem which only affects to hpcc MPIRandomAccess test or
it may happen with any other code?

2) Should I set this parameter to some value by default? Would the
performance be affected? How should I take into account to tune this
parameter (if needed) for our home make applications?

3) I am using jfs file system on our cluster nodes and eventually I got
it corrupted or put in a read only state when running into memory
problems like the hpcc MPIRandomAccess or other problems with our home
make code.
 a) How can memory problems caused by user codes corrupt file systems /
and/or /home?
 b) Is this related to libibverbs bypassing the kernel TCP stack (I had
to set /dev/infiniband/uverbs0 rw to everybody)?
 c) Should I change to ext3 file system?
 d) Shoud I change other parameters according to
http://www.open-mpi.org/faq/?category=openfabrics#limiting-registered-memory-usage

I have a newly started infiniband cluster in stand by, so please, any
comment or advice will be welcomed

***********************************************************************
My environment info, is:

1) Openfabrics included in distribution
2) Linux distribution: Ubuntu 7.04
   uname -a -> Linux jff202 2.6.20-16-server #2 SMP Tue Feb 12
02:16:56 UTC 2008 x86_64 GNU/Linux

3) Subnet manager: OpenSM 3.1.11 from OFED 1.3 installed on the cluster
server with Ubuntu 8.04

4) ulimit -l -> unlimited

5) The MCA parameters that I have modified
at /etc/openmpi/openmpi-mca-params.conf are:

mpi_paffinity_alone = 1
pls_rsh_agent = rsh

Thanks in advance regards

-- 
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que està net.
For all your IT requirements visit: http://www.transtec.co.uk