Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] trac 1857: SM btl hangs when msg >=4k, Performance degradation ???
From: Lenny Verkhovsky (lenny.verkhovsky_at_[hidden])
Date: 2009-04-12 09:11:09


Sorry, guys, I tested it on 1.3 branch, trunk version(1.4a1r20980) seems to
be fixed.

BUT,

the default value of mpool_sm_min_size in 1.4a1r20980 is 67108864

when I set it to 0, there is a performance degradation, is it OK ?

$LD_LIBRARY_PATH=~/work/svn/ompi/trunk/build_x86-64/install/lib/
install/bin/mpirun -np 2 -mca btl sm,self -mca mpool_sm_min_size 0
~/work/svn/hpc/tools/benchmarks/OMB-3.1.1/osu_bw
# OSU MPI Bandwidth Test v3.1.1
# Size Bandwidth (MB/s)
1 1.20
2 3.39
4 6.93
8 14.09
16 27.80
32 50.58
64 101.08
128 173.23
256 257.81
512 436.86
1024 674.51
2048 856.80
4096 573.87
8192 607.55
16384 660.58
32768 685.23
65536 687.45
131072 690.52
262144 687.48
524288 676.77
1048576 675.74
2097152 676.89
4194304 677.28
lennyb_at_dellix7 ~/work/svn/ompi/trunk/build_x86-64
$LD_LIBRARY_PATH=~/work/svn/ompi/trunk/build_x86-64/install/lib/
install/bin/mpirun -np 2 -mca btl sm,self
~/work/svn/hpc/tools/benchmarks/OMB-3.1.1/osu_bw
# OSU MPI Bandwidth Test v3.1.1
# Size Bandwidth (MB/s)
1 1.72
2 3.70
4 7.43
8 13.45
16 29.83
32 52.66
64 105.08
128 181.16
256 288.16
512 426.83
1024 690.21
2048 867.00
4096 567.53
8192 667.35
16384 806.97
32768 892.95
65536 989.62
131072 1009.25
262144 1018.35
524288 1037.32
1048576 1048.75
2097152 1057.51
4194304 1062.16

Lenny.

On 4/12/09, Lenny Verkhovsky <lenny.verkhovsky_at_[hidden]> wrote:
>
> r20980 It still get stacked
>
> LD_LIBRARY_PATH=~/work/svn/hpc/dev/ompi_1_3_trunk/build_x86-64/install/lib/
> ~/work/svn/hpc/dev/ompi_1_3_trunk/build_x86-64/install/bin/mpirun -np 2 -mca
> btl self,sm ./osu_bw
>
> # OSU MPI Bandwidth Test v3.1.1
> # Size Bandwidth (MB/s)
> 1 1.46
> 2 3.66
> 4 7.29
> 8 14.64
> 16 29.44
> 32 56.94
> 64 112.25
> 128 189.02
> 256 278.26
> 512 448.58
> 1024 686.25
> 2048 865.27
>
>
>
> On 4/8/09, Jeff Squyres <jsquyres_at_[hidden]> wrote:
>>
>> Ditto -- works for me too. Huzzah!
>>
>>
>> On Apr 7, 2009, at 8:39 PM, Eugene Loh wrote:
>>
>> George Bosilca wrote:
>>>
>>> > This is interesting. I cannot trigger this deadlock on my AMD cluster
>>> > even when I set the sm_min_size to zero. However, on a Intel cluster
>>> > this can be triggered pretty easily.
>>> >
>>> > Anyway, I think I finally understood where the problem is coming
>>> > from. r20952 and r20953 are commits that, in addition to the ones
>>> > from yesterday, fix the problem. Please read the log on r20953 to see
>>> > how this happens.
>>> >
>>> > Of course, please stress it before we move it to the 1.3 branch.
>>>
>>> Okay, this fix works for me.
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>