Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Question about RDMA
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-06-17 12:28:44

On Jun 6, 2008, at 6:03 AM, Gabriele Fatigati wrote:

> Hi Jeff,

Sorry for the delay in replying -- I was on vacation all last week.

> thanks for you reply. I did understand previous questions about
> RDMA. Ever with SKaMPI, i tried to run with mpi_leave_pinned = 1, as
> you have suggested. But also in this case, execution time is very
> similar to previous case.
> Does it means that SKaMPI, reallocates buffer every time ? For
> example, with "MPI_Bcast-length" test, over 128 procs, the
> collective is repeated about 28 times, increasing buffer size for
> each step by internal formula, and finale buffer size =2097152 K.

It could be that SKaMPI does re-alloc its buffers for every call -- I
have not looked at the internals of SKaMPI in quite a long time.

It could also be that OMPI is not using the mpi_leave_pinned support.
Are you building OMPI with the memory manager? OMPI needs that memory
manager (ptmalloc2, in the case of Linux) to be able to properly
effect mpi_leave_pinned support. You should be able to run ompi_info
| grep malloc and see something like this:

               MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component

If that line doesn't show, then OMPI was not built with the memory
manager support, and mpi_leave_pinned will have no effect.

> Since there aren't advantages with leave_pinned = 1, it means that
> SKaMPI doesn't allocates buffer of 2097152 K initially, but it
> allocates small buffer and reallocates buffer every time, with more
> large size. Is it possible? If no, which is the cause of similar
> performance?

It *could* mean that SKaMPI doesn't re-use the same large buffer for
subsequent MPI operations. An examination of SKaMPI's code should
pretty easily be able to tell if this is the case.

It could also be that OMPI is using internal bufferers for a pipelined
broadcast -- I'll have to check with George on that.

> Another question: RDMA pipeline protocol for long messages, in
> OpenMPI 1.2.6 is setting by default?

I can't quite parse that question. OMPI v1.2.6 uses the pipelined
protocol for long messages by default. It uses a slightly different
protocol when mpi_leave_pinned is active. Both of these should be
described on the OMPI FAQ.

Jeff Squyres
Cisco Systems