(cross-post to 'users' and 'devel' mailing lists)
Dear Open MPI developer,
a long time ago, I reported about an error in Open MPI:
http://www.open-mpi.org/community/lists/users/2012/02/18565.php
Well, in the 1.6 the behaviour has changed: the test case don't hang forever and
block an InfiniBand interface, but seem to run through, and now this error
message is printed:
--------------------------------------------------------------------------
The OpenFabrics (openib) BTL failed to register memory in the driver.
Please check /var/log/messages or dmesg for driver specific failure
reason.
The failure occured here:
Local host: mlx4_0
Device: openib_reg_mr
Function: Cannot allocate memory()
Errno says:
You may need to consult with your system administrator to get this
problem fixed.
--------------------------------------------------------------------------
Looking into FAQ
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
deliver us no hint about what is bad. The locked memory is unlimited:
--------------------------------------------------------------------------
pk224850_at_linuxbdc02:~[502]$ cat /etc/security/limits.conf | grep memlock
# - memlock - max locked-in-memory address space (KB)
* hard memlock unlimited
* soft memlock unlimited
--------------------------------------------------------------------------
Could it still be an Open MPI issue? Are you interested in reproduce this?
Best,
Paul Kapinos
P.S: The same test with Intel MPI cannot run using DAPL, but run very fine opef
'ofa' (= native verbs as Open MPI use it). So I believe the problem is rooted in
the communication pattern of the program; it send very LARGE messages to a lot
of/all other processes. (The program perform an matrix transposition of a
distributed matrix).
--
Dipl.-Inform. Paul Kapinos - High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23, D 52074 Aachen (Germany)
Tel: +49 241/80-24915
|