You may try to disable registration cache, it may relieve pressure on memory resources.
--mca mpi_leave_pinned 0
You may find a bit more details here: http://www.open-mpi.org/faq/?category=openfabrics#large-message-leave-pinned
Using the option you may observe drop in BW performance.
Pavel (Pasha) Shamis
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Jul 5, 2013, at 3:33 PM, Ben <Benjamin.M.Auer_at_[hidden]> wrote:
> I'm part of a team that maintains a global climate model running under
> mpi. Recently we have been trying it out with different mpi stacks
> at high resolution/processor counts.
> At one point in the code there is a large number of mpi_isends/mpi_recv
> (tens to hundreds of thousands) when data distributed across all mpi
> processes must be collective on a particular processor or processors be
> transformed to a new resolution before writing. At first the model was
> crashing with a message:
> "A process failed to create a queue pair. This usually means either the
> device has run out of queue pairs (too many connections) or there are
> insufficient resources available to allocate a queue pair (out of
> memory). The latter can happen if either 1) insufficient memory is
> available, or 2) no more physical memory can be registered with the device."
> when it hit the part of code with the send/receives. Watching the node
> memory in an xterm I could see the memory skyrocket and fill the node.
> Somewhere we found a suggestion try using the xrc queues
> (http://www.open-mpi.org/faq/?category=openfabrics#ib-xrc) to get around
> this problem and indeed running with
> setenv OMPI_MCA_btl_openib_receive_queues
> mpirun --bind-to-core -np numproc ./app
> allowed the model to successfully run. It still seems to use a large
> amount of memory when it writes (on the order of several Gb). Does
> anyone have any suggestions on how to perhaps tweak the settings to
> help with memory use.
> Ben Auer, PhD SSAI, Scientific Programmer/Analyst
> NASA GSFC, Global Modeling and Assimilation Office
> Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
> Phone: 301-286-9176 Fax: 301-614-6246
> users mailing list