Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] 1.6.1rc1 posted
From: Paul Kapinos (kapinos_at_[hidden])
Date: 2012-08-03 18:24:15


Jeff, All,

testing our well-known example of the registered memory problem (see
http://www.open-mpi.org/community/lists/users/2012/02/18565.php) on
freshly-installed 1.6.1rc2, found out that "Fall back to send/receive semantics"
did not work always it. However the behaviour has changed:

1.5.3. and older: MPI processes hang and block the IB interface(s) forever

1.6.1rc2: a) MPI processes run through (if the chunk size is less than 8Gb) with
or without a warning; or
           b) MPI processes die (if the chunk size is more than 8Gb)
Note that the same program which die in (b) run fine over IPoIB (-mca btl
^openib). However, the performance is very bad in this case... some 1100 sec.
instead of about a minute.

Reproducing: compile attached file and let it run on nodes with >=24GB with
     log_num_mtt : 20
     log_mtts_per_seg: 3
(=32Gb, our default values):
$ mpiexec ....<one proc per node> .... a.out 1080000000 1080000001

Well, we know about the need to raise the values of one of these parameters, but
I wanted to let you to know that your workaround for the problem is still not
100% perfect but only 99%.

Best,
Paul Kapinos

P.S: A note about the informative warning:
--------------------------------------------------------------------------
WARNING: It appears that your OpenFabrics subsystem is configured to only
allow registering part of your physical memory.
....
   Registerable memory: 32768 MiB
   Total memory: 98293 MiB
--------------------------------------------------------------------------
On node with 24 GB this warning did not came around, although the max. size of
registered memory (32GB) is only 1.5x of RAM, but in
http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem
at least the 2x RAM size is recommended.

Should this warning not came out in all cases when registered memory < 2x RAM?

On 07/28/12 04:20, Jeff Squyres wrote:
> - A bunch of changes to eliminate hangs on OpenFabrics-based networks.
> Users with Mellanox hardware are ***STRONGLY ENCOURAGED*** to check
> their registered memory kernel module settings to ensure that the OS
> will allow registering more than 8GB of memory. See this FAQ item
> for details:
>
> http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem
>
> - Fall back to send/receive semantics if registered memory is
> unavilable for RDMA.

-- 
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915