Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Question about RDMA
From: Gabriele Fatigati (g.fatigati_at_[hidden])
Date: 2008-06-06 06:03:19


Hi Jeff,
thanks for you reply. I did understand previous questions about RDMA. Ever
with SKaMPI, i tried to run with mpi_leave_pinned = 1, as you have
suggested. But also in this case, execution time is very similar to
previous case.

Does it means that SKaMPI, reallocates buffer every time ? For example, with
"MPI_Bcast-length" test, over 128 procs, the collective is repeated about
28 times, increasing buffer size for each step by internal formula, and
finale buffer size =2097152 K.

Since there aren't advantages with leave_pinned = 1, it means that SKaMPI
doesn't allocates buffer of 2097152 K initially, but it allocates small
buffer and reallocates buffer every time, with more large size. Is it
possible? If no, which is the cause of similar performance?

Another question: RDMA pipeline protocol for long messages, in OpenMPI 1.2.6
is setting by default?

2008/6/6 Gabriele Fatigati <gabriele.fatigati_at_[hidden]>:

> Hi Jeff,
> thanks for you reply. I did understand previous questions about RDMA. Ever
> with SKaMPI, i tried to run with mpi_leave_pinned = 1, as you have
> suggested. But also in this case, execution time is very similar to
> previous case.
>
> Does it means that SKaMPI, reallocates buffer every time ? For example,
> with "MPI_Bcast-length" test, over 128 procs, the collective is repeated
> about 28 times, increasing buffer size for each step by internal formula,
> and finale buffer size =2097152 K.
>
> Since there aren't advantages with leave_pinned = 1, it means that SKaMPI
> doesn't allocates buffer of 2097152 K initially, but it allocates small
> buffer and reallocates buffer every time, with more large size. Is it
> possible? If no, which is the cause of similar performance?
>
> Another question: RDMA pipeline protocol for long messages, in OpenMPI
> 1.2.6 is setting by default?
>
> 2008/6/6 Jeff Squyres <jsquyres_at_[hidden]>:
>
> Note that "eager" RDMA is only used for short messages -- it's not
>> really relevant to whether the same user buffers are re-used or not
>> (the mpi_leave_pinned parameter for long messages is only useful if
>> long buffers are re-used). See this FAQ item:
>>
>> http://www.open-mpi.org/faq/?category=openfabrics#ib-small-message-rdma
>>
>> For benchmarks (like SKAMPI) that re-use long buffers, you might want
>> to use the mpi_leave_pinned MCA parameter:
>>
>>
>> http://www.open-mpi.org/faq/?category=openfabrics#large-message-leave-pinned
>> http://www.open-mpi.org/faq/?category=tuning#running-perf-numbers
>>
>>
>> On Jun 5, 2008, at 9:47 AM, Gabriele Fatigati wrote:
>>
>> >
>> > Hi,
>> > i'm testing SKaMPI Benchmark on IBM Blade System over Infiniband.
>> > Current version of OpenMPI is 1.2.6
>> > I have tried to disable RDMA setting btl_openib_use_eager_rdma = 0.
>> > But, i have noted that, in MPI collectives execution time, there are
>> > few difference beetween RDMA active and none. Before tests, I
>> > expected that with RDMA off, excecution time was more long.
>> >
>> > So, i suppose that SKaMPI benchmark does continues reallocation of
>> > buffers that forbid benefits of RDMA protocol. Indeed, if initial
>> > buffer address change every time, we have to do very much
>> > registration of memory pages afterwards decay of perfomance.
>> >
>> > I used RDMA pipeline protocol. This protocol should makes no
>> > assumption about the application reuse of source and target buffers.
>> > But, is it every true?
>> > Parameters net are explained below.
>> >
>> > MCA btl: parameter "btl_openib_mpool" (current value: "rdma")
>> > MCA btl: parameter "btl_openib_ib_max_rdma_dst
>> > _ops" (current value: "4")
>> > MCA btl: parameter "btl_openib_use_eager_rdma" (current value: "1")
>> > MCA btl: parameter "btl_openib_eager_rdma_threshold" (current value:
>> > "16")
>> > MCA btl: parameter "btl_openib_max_eager_rdma" (current value: "16")
>> > MCA btl: parameter "btl_openib_eager_rdma_num" (current value: "16")
>> > MCA btl: parameter "btl_openib_min_rdma_size" (current value:
>> > "1048576")
>> > MCA btl: parameter "btl_openib_max_rdma_size" (current value:
>> > "1048576")
>> >
>> > --
>> > Gabriele Fatigati
>> >
>> > CINECA Systems & Tecnologies Department
>> >
>> > Supercomputing Group
>> >
>> > Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>> >
>> > www.cineca.it Tel: +39 051 6171722
>> >
>> > g.fatigati_at_[hidden] _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>
>
> --
> Gabriele Fatigati
>
> CINECA Systems & Tecnologies Department
>
> Supercomputing Group
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it Tel: +39 051 6171722
>
> g.fatigati_at_[hidden]
>

-- 
Gabriele Fatigati
CINECA Systems & Tecnologies Department
Supercomputing Group
Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
www.cineca.it Tel: +39 051 6171722
g.fatigati_at_[hidden]