Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] mpi_leave_pinned is dangerous
From: Jens Glaser (jglaser_at_[hidden])
Date: 2012-11-03 23:41:43


Hi,

I am working on a CUDA/MPI application. It uses page-locked host buffers allocated with cudaHostAlloc(...,cudaHostAllocDefault), to which data from the device is copied before calling MPI.
The application, a particle simulation, reproducibly crashed or produced undefined behavior at large particle numbers, and I could not explain why this happened.
After some considerable debugging time (trying two different MPI libraries, MVAPICH2 1.9a and OpenMPI 1.6.1) I discovered openmpi's mpi_leave_pinned parameter.
Setting mpi_leave_pinned to 0 solved my problem, the crash did not occur again! So far, excellent!

I do have a request, however. After looking at the output of

$ ompi_info --param mpi all

I get
                 MCA mpi: parameter "mpi_leave_pinned" (current value: <-1>, data source: default
                          value)
                          Whether to use the "leave pinned" protocol or not. Enabling this
                          setting can help bandwidth performance when repeatedly sending and
                          receiving large messages with the same buffers over RDMA-based networks
                          (0 = do not use "leave pinned" protocol, 1 = use "leave pinned"
                          protocol, -1 = allow network to choose at runtime).

This seems to indicate that the default is that the network adapter chooses whether to enable or disable MPI. In my case, this default setting turns out to be disastrous.
Also, the FAQ is somewhat ambiguous about this parameter and states that mpi_leave_pinned is off by default in one place, but that it is -1 (as above) at another place.

http://www.open-mpi.org/faq/?category=openfabrics#large-message-leave-pinned
http://www.open-mpi.org/faq/?category=openfabrics#setting-mpi-leave-pinned-1.3.2

Can anyone please explain to me the intricacies of this parameter, and what are the ramifications/benefits of having this particular default value?

Thanks
Jens