Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Question about RDMA
From: Gabriele Fatigati (g.fatigati_at_[hidden])
Date: 2008-06-06 06:03:19


Hi Jeff,
thanks for you reply. I did understand previous questions about RDMA. Ever
with SKaMPI, i tried to run with mpi_leave_pinned = 1, as you have
suggested. But also in this case, execution time is very similar to
previous case.

Does it means that SKaMPI, reallocates buffer every time ? For example, with
"MPI_Bcast-length" test, over 128 procs, the collective is repeated about
28 times, increasing buffer size for each step by internal formula, and
finale buffer size =2097152 K.

Since there aren't advantages with leave_pinned = 1, it means that SKaMPI
doesn't allocates buffer of 2097152 K initially, but it allocates small
buffer and reallocates buffer every time, with more large size. Is it
possible? If no, which is the cause of similar performance?

Another question: RDMA pipeline protocol for long messages, in OpenMPI 1.2.6
is setting by default?

2008/6/6 Gabriele Fatigati <gabriele.fatigati_at_[hidden]>:

> Hi Jeff,
> thanks for you reply. I did understand previous questions about RDMA. Ever
> with SKaMPI, i tried to run with mpi_leave_pinned = 1, as you have
> suggested. But also in this case, execution time is very similar to
> previous case.
>
> Does it means that SKaMPI, reallocates buffer every time ? For example,
> with "MPI_Bcast-length" test, over 128 procs, the collective is repeated
> about 28 times, increasing buffer size for each step by internal formula,
> and finale buffer size =2097152 K.
>
> Since there aren't advantages with leave_pinned = 1, it means that SKaMPI
> doesn't allocates buffer of 2097152 K initially, but it allocates small
> buffer and reallocates buffer every time, with more large size. Is it
> possible? If no, which is the cause of similar performance?
>
> Another question: RDMA pipeline protocol for long messages, in OpenMPI
> 1.2.6 is setting by default?
>
> 2008/6/6 Jeff Squyres <jsquyres_at_[hidden]>:
>
> Note that "eager" RDMA is only used for short messages -- it's not
>> really relevant to whether the same user buffers are re-used or not
>> (the mpi_leave_pinned parameter for long messages is only useful if
>> long buffers are re-used). See this FAQ item:
>>
>> http://www.open-mpi.org/faq/?category=openfabrics#ib-small-message-rdma
>>
>> For benchmarks (like SKAMPI) that re-use long buffers, you might want
>> to use the mpi_leave_pinned MCA parameter:
>>
>>
>> http://www.open-mpi.org/faq/?category=openfabrics#large-message-leave-pinned
>> http://www.open-mpi.org/faq/?category=tuning#running-perf-numbers
>>
>>
>> On Jun 5, 2008, at 9:47 AM, Gabriele Fatigati wrote:
>>
>> >
>> > Hi,
>> > i'm testing SKaMPI Benchmark on IBM Blade System over Infiniband.
>> > Current version of OpenMPI is 1.2.6
>> > I have tried to disable RDMA setting btl_openib_use_eager_rdma = 0.
>> > But, i have noted that, in MPI collectives execution time, there are
>> > few difference beetween RDMA active and none. Before tests, I
>> > expected that with RDMA off, excecution time was more long.
>> >
>> > So, i suppose that SKaMPI benchmark does continues reallocation of
>> > buffers that forbid benefits of RDMA protocol. Indeed, if initial
>> > buffer address change every time, we have to do very much
>> > registration of memory pages afterwards decay of perfomance.
>> >
>> > I used RDMA pipeline protocol. This protocol should makes no
>> > assumption about the application reuse of source and target buffers.
>> > But, is it every true?
>> > Parameters net are explained below.
>> >
>> > MCA btl: parameter "btl_openib_mpool" (current value: "rdma")
>> > MCA btl: parameter "btl_openib_ib_max_rdma_dst
>> > _ops" (current value: "4")
>> > MCA btl: parameter "btl_openib_use_eager_rdma" (current value: "1")
>> > MCA btl: parameter "btl_openib_eager_rdma_threshold" (current value:
>> > "16")
>> > MCA btl: parameter "btl_openib_max_eager_rdma" (current value: "16")
>> > MCA btl: parameter "btl_openib_eager_rdma_num" (current value: "16")
>> > MCA btl: parameter "btl_openib_min_rdma_size" (current value:
>> > "1048576")
>> > MCA btl: parameter "btl_openib_max_rdma_size" (current value:
>> > "1048576")
>> >
>> > --
>> > Gabriele Fatigati
>> >
>> > CINECA Systems & Tecnologies Department
>> >
>> > Supercomputing Group
>> >
>> > Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>> >
>> > www.cineca.it Tel: +39 051 6171722
>> >
>> > g.fatigati_at_[hidden] _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>
>
> --
> Gabriele Fatigati
>
> CINECA Systems & Tecnologies Department
>
> Supercomputing Group
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it Tel: +39 051 6171722
>
> g.fatigati_at_[hidden]
>

-- 
Gabriele Fatigati
CINECA Systems & Tecnologies Department
Supercomputing Group
Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
www.cineca.it Tel: +39 051 6171722
g.fatigati_at_[hidden]