On Fri, 23 Apr 2010 at 11:29:53, George Bosilca wrote:
> If you use any kind of high performance network that require memory
> registration for communications, then this high cost for the
> MPI_Alloc_mem will be hidden by the communications. However, the
> MPI_Alloc_mem function seems horribly complicated to me, as we do the
> whole "find-the-right-allocator" step every time instead of caching
> it. While this might be improved, I'm pretty sure the major part of
> the overhead comes from the registration itself.
> The MPI_Alloc_mem function allocate the memory and then it register it
> with the high speed interconnect (Infiniband as an example). If you
> don't have IB, then this should not happens. You can try to force the
> mpool to nothing, or disable the pinning
> (mpi_leave_pinned=0,mpi_leave_pinned_pipeline=0) to see if this affect
> the performances.
I have an IB cluster with 32 cores nodes. A big part of my
communications is done through sm, so registering systematically buffers
with IB is killing performance for nothing.
Following your tip, I disabled the pinning (using "mpirun -mca
mpi_leave_pinned 0 -mca mpi_leave_pinned_pipeline 0)".
The cycle (MPI_Alloc_mem/MPI_Free_mem) takes now 120 us, while
(malloc/free) takes 1 us.
In all cases, a program calling MPI_Sendrecv_replace() is hardly
penalized by these calls to MPI_Alloc_mem/MPI_Free_mem.
That's why I proposed to come back to the malloc/free scheme in this
> On Apr 22, 2010, at 08:50 , Pascal Deveze wrote:
>> Hi all,
>> The sendrecv_replace in Open MPI seems to allocate/free memory with
>> I measured the time to allocate/free a buffer of 1MB.
>> MPI_Alloc_mem/MPI_Free_mem take 350us while malloc/free only take 8us.
>> malloc/free in ompi/mpi/c/sendrecv_replace.c was replaced by
>> MPI_Alloc_mem/MPI_Free_mem with this commit :
>> user: twoodall
>> date: Thu Sep 22 16:43:17 2005 0000
>> summary: use MPI_Alloc_mem/MPI_Free_mem for internally allocated
>> Is there a real reason to use these functions or can we move back to
>> malloc/free ?
>> Is there a problem on my configuration explaining such slow
>> performance with MPI_Alloc_mem ?
>> devel mailing list