Jeff, All,
testing our well-known example of the registered memory problem (see
http://www.open-mpi.org/community/lists/users/2012/02/18565.php) on
freshly-installed 1.6.1rc2, found out that "Fall back to send/receive semantics"
did not work always it. However the behaviour has changed:
1.5.3. and older: MPI processes hang and block the IB interface(s) forever
1.6.1rc2: a) MPI processes run through (if the chunk size is less than 8Gb) with
or without a warning; or
b) MPI processes die (if the chunk size is more than 8Gb)
Note that the same program which die in (b) run fine over IPoIB (-mca btl
^openib). However, the performance is very bad in this case... some 1100 sec.
instead of about a minute.
Reproducing: compile attached file and let it run on nodes with >=24GB with
log_num_mtt : 20
log_mtts_per_seg: 3
(=32Gb, our default values):
$ mpiexec ....<one proc per node> .... a.out 1080000000 1080000001
Well, we know about the need to raise the values of one of these parameters, but
I wanted to let you to know that your workaround for the problem is still not
100% perfect but only 99%.
Best,
Paul Kapinos
P.S: A note about the informative warning:
--------------------------------------------------------------------------
WARNING: It appears that your OpenFabrics subsystem is configured to only
allow registering part of your physical memory.
....
Registerable memory: 32768 MiB
Total memory: 98293 MiB
--------------------------------------------------------------------------
On node with 24 GB this warning did not came around, although the max. size of
registered memory (32GB) is only 1.5x of RAM, but in
http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem
at least the 2x RAM size is recommended.
Should this warning not came out in all cases when registered memory < 2x RAM?
On 07/28/12 04:20, Jeff Squyres wrote:
> - A bunch of changes to eliminate hangs on OpenFabrics-based networks.
> Users with Mellanox hardware are ***STRONGLY ENCOURAGED*** to check
> their registered memory kernel module settings to ensure that the OS
> will allow registering more than 8GB of memory. See this FAQ item
> for details:
>
> http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem
>
> - Fall back to send/receive semantics if registered memory is
> unavilable for RDMA.
--
Dipl.-Inform. Paul Kapinos - High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23, D 52074 Aachen (Germany)
Tel: +49 241/80-24915
|