Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Still bothered / cannot run an application
From: Paul Kapinos (kapinos_at_[hidden])
Date: 2012-07-12 12:04:07


(cross-post to 'users' and 'devel' mailing lists)

Dear Open MPI developer,
a long time ago, I reported about an error in Open MPI:
http://www.open-mpi.org/community/lists/users/2012/02/18565.php

Well, in the 1.6 the behaviour has changed: the test case don't hang forever and
block an InfiniBand interface, but seem to run through, and now this error
message is printed:
--------------------------------------------------------------------------
The OpenFabrics (openib) BTL failed to register memory in the driver.
Please check /var/log/messages or dmesg for driver specific failure
reason.
The failure occured here:

   Local host: mlx4_0
   Device: openib_reg_mr
   Function: Cannot allocate memory()
   Errno says:

You may need to consult with your system administrator to get this
problem fixed.
--------------------------------------------------------------------------

Looking into FAQ
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
deliver us no hint about what is bad. The locked memory is unlimited:
--------------------------------------------------------------------------
pk224850_at_linuxbdc02:~[502]$ cat /etc/security/limits.conf | grep memlock
# - memlock - max locked-in-memory address space (KB)
* hard memlock unlimited
* soft memlock unlimited
--------------------------------------------------------------------------

Could it still be an Open MPI issue? Are you interested in reproduce this?

Best,
Paul Kapinos

P.S: The same test with Intel MPI cannot run using DAPL, but run very fine opef
'ofa' (= native verbs as Open MPI use it). So I believe the problem is rooted in
the communication pattern of the program; it send very LARGE messages to a lot
of/all other processes. (The program perform an matrix transposition of a
distributed matrix).

-- 
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915