Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: OB1 optimizations
From: Paul Hargrove (phhargrove_at_[hidden])
Date: 2014-01-08 11:59:34


Nevermind, since Nathan just clarified that the results are not comparable.

-Paul [Sent from my phone]
On Jan 8, 2014 8:58 AM, "Paul Hargrove" <phhargrove_at_[hidden]> wrote:

> Interestingly enough the 4MB latency actually improved significantly
> relative to the initial numbers.
>
> -Paul [Sent from my phone]
> On Jan 8, 2014 8:50 AM, "George Bosilca" <bosilca_at_[hidden]> wrote:
>
>> These results are way worst that the one you send on your previous email?
>> What is the reason?
>>
>> George.
>>
>> On Jan 8, 2014, at 17:33 , Nathan Hjelm <hjelmn_at_[hidden]> wrote:
>>
>> > Ah, good catch. A new version is attached that should eliminate the race
>> > window for the multi-threaded case. Performance numbers are still
>> > looking really good. We beat mvapich2 in the small message ping-pong by
>> > a good margin. See the results below. The large message latency
>> > difference for large messages is probably due to a difference in the max
>> > send size for vader vs mvapich.
>> >
>> > To answer Pasha's question. I don't see a noticiable difference in
>> > performance for btl's with no sendi function (this includes
>> > ugni). OpenIB should get a boost. I will test that once I get an
>> > allocation.
>> >
>> > CPU: Xeon E5-2670 @ 2.60 GHz
>> >
>> > Open MPI (-mca btl vader,self):
>> > # OSU MPI Latency Test v4.1
>> > # Size Latency (us)
>> > 0 0.17
>> > 1 0.19
>> > 2 0.19
>> > 4 0.19
>> > 8 0.19
>> > 16 0.19
>> > 32 0.19
>> > 64 0.40
>> > 128 0.40
>> > 256 0.43
>> > 512 0.52
>> > 1024 0.67
>> > 2048 0.94
>> > 4096 1.44
>> > 8192 2.04
>> > 16384 3.47
>> > 32768 6.10
>> > 65536 9.38
>> > 131072 16.47
>> > 262144 29.63
>> > 524288 54.81
>> > 1048576 106.63
>> > 2097152 206.84
>> > 4194304 421.26
>> >
>> >
>> > mvapich2 1.9:
>> > # OSU MPI Latency Test
>> > # Size Latency (us)
>> > 0 0.23
>> > 1 0.23
>> > 2 0.23
>> > 4 0.23
>> > 8 0.23
>> > 16 0.28
>> > 32 0.28
>> > 64 0.39
>> > 128 0.40
>> > 256 0.40
>> > 512 0.42
>> > 1024 0.51
>> > 2048 0.71
>> > 4096 1.02
>> > 8192 1.60
>> > 16384 3.47
>> > 32768 5.05
>> > 65536 8.06
>> > 131072 14.82
>> > 262144 28.15
>> > 524288 53.69
>> > 1048576 127.47
>> > 2097152 235.58
>> > 4194304 683.90
>> >
>> >
>> > -Nathan
>> >
>> > On Tue, Jan 07, 2014 at 06:23:13PM -0700, George Bosilca wrote:
>> >> The local request is not correctly released, leading to assert in
>> debug
>> >> mode. This is because you avoid calling
>> MCA_PML_BASE_RECV_REQUEST_FINI,
>> >> fact that leaves the request in an ACTIVE state, condition carefully
>> >> checked during the call to destructor.
>> >>
>> >> I attached a second patch that fixes the issue above, and implement a
>> >> similar optimization for the blocking send.
>> >>
>> >> Unfortunately, this is not enough. The mca_pml_ob1_send_inline
>> >> optimization is horribly wrong in a multithreaded case as it alter
>> the
>> >> send_sequence without storing it. If you create a gap in the
>> send_sequence
>> >> a deadlock will __definitively__ occur. I strongly suggest you turn
>> off
>> >> the mca_pml_ob1_send_inline optimization on the multithreaded case.
>> All
>> >> the others optimizations should be safe in all cases.
>> >>
>> >> George.
>> >>
>> >> On Jan 8, 2014, at 01:15 , Shamis, Pavel <shamisp_at_[hidden]> wrote:
>> >>
>> >>> Overall it looks good. It would be helpful to validate performance
>> >> numbers for other interconnects as well.
>> >>> -Pasha
>> >>>
>> >>>> -----Original Message-----
>> >>>> From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Nathan
>> >>>> Hjelm
>> >>>> Sent: Tuesday, January 07, 2014 6:45 PM
>> >>>> To: Open MPI Developers List
>> >>>> Subject: [OMPI devel] RFC: OB1 optimizations
>> >>>>
>> >>>> What: Push some ob1 optimizations to the trunk and 1.7.5.
>> >>>>
>> >>>> What: This patch contains two optimizations:
>> >>>>
>> >>>> - Introduce a fast send path for blocking send calls. This path uses
>> >>>> the btl sendi function to put the data on the wire without the need
>> >>>> for setting up a send request. In the case of btl/vader this can
>> >>>> also avoid allocating/initializing a new fragment. With btl/vader
>> >>>> this optimization improves small message latency by 50-200ns in
>> >>>> ping-pong type benchmarks. Larger messages may take a small hit in
>> >>>> the range of 10-20ns.
>> >>>>
>> >>>> - Use a stack-allocated receive request for blocking recieves. This
>> >>>> optimization saves the extra instructions associated with accessing
>> >>>> the receive request free list. I was able to get another 50-200ns
>> >>>> improvement in the small-message ping-pong with this optimization.
>> I
>> >>>> see no hit for larger messages.
>> >>>>
>> >>>> When: These changes touch the critical path in ob1 and are targeted
>> for
>> >>>> 1.7.5. As such I will set a moderately long timeout. Timeout set for
>> >>>> next Friday (Jan 17).
>> >>>>
>> >>>> Some results from osu_latency on haswell:
>> >>>>
>> >>>> hjelmn_at_cn143 pt2pt]$ mpirun -n 2 --bind-to core -mca btl vader,self
>> >>>> ./osu_latency
>> >>>> # OSU MPI Latency Test v4.0.1
>> >>>> # Size Latency (us)
>> >>>> 0 0.11
>> >>>> 1 0.14
>> >>>> 2 0.14
>> >>>> 4 0.14
>> >>>> 8 0.14
>> >>>> 16 0.14
>> >>>> 32 0.15
>> >>>> 64 0.18
>> >>>> 128 0.36
>> >>>> 256 0.37
>> >>>> 512 0.46
>> >>>> 1024 0.56
>> >>>> 2048 0.80
>> >>>> 4096 1.12
>> >>>> 8192 1.68
>> >>>> 16384 2.98
>> >>>> 32768 5.10
>> >>>> 65536 8.12
>> >>>> 131072 14.07
>> >>>> 262144 25.30
>> >>>> 524288 47.40
>> >>>> 1048576 91.71
>> >>>> 2097152 195.56
>> >>>> 4194304 487.05
>> >>>>
>> >>>>
>> >>>> Patch Attached.
>> >>>>
>> >>>> -Nathan
>> >>> _______________________________________________
>> >>> devel mailing list
>> >>> devel_at_[hidden]
>> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >>
>> >> _______________________________________________
>> >> devel mailing list
>> >> devel_at_[hidden]
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >
>> >
>> >
>> <ob1_optimization_take3.patch>_______________________________________________
>> > devel mailing list
>> > devel_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>