Hello Daniel and list
Could it be a problem with memory bandwidth / contention in multi-core?
It has been reported in many mailing lists (mpich, beowulf, etc).
Here it seems to happen in dual-processor dual-core with our memory
intensive programs.
Have you checked what happens to the shared memory runs as you
you increase the number of active cores/processes?
Would it help to set the processor affinity in the shared memory runs?
http://www.open-mpi.org/faq/?category=building#build-paffinity
http://www.open-mpi.org/faq/?category=tuning#using-paffinity
Gus Correa
--
---------------------------------------------------------------------
Gustavo J. Ponce Correa, PhD - Email: gus_at_[hidden]
Lamont-Doherty Earth Observatory - Columbia University
P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------
Daniël Mantione wrote:
>Hello,
>
>I'm troubleshooting a weird benchmark situation that having the sm btl
>enabled gives me worse results than disabling it.
>
>For example, this on a single compute node with 2*Xeon5420, 8 GB RAM and a
>ConnectX gen2 IB card, with OFED 1.3 and OpenMPI 1.2.6 as software setup:
>
>[cvsupport_at_extern src]$ mpirun -np 8 --mca btl self,sm,openib -hostfile \
>hostfile ./IMB-MPI1.openmpi -npmin 8 PingPong
>
>#---------------------------------------------------
># Benchmarking PingPong
># #processes = 2
># ( 6 additional processes waiting in MPI_Barrier)
>#---------------------------------------------------
> #bytes #repetitions t[usec] Mbytes/sec
> 0 1000 0.87 0.00
> 1 1000 0.98 0.97
> 2 1000 0.97 1.96
> 4 1000 0.99 3.87
> 8 1000 0.98 7.78
> 16 1000 1.15 13.33
> 32 1000 1.13 26.93
> 64 1000 1.12 54.42
> 128 1000 1.27 96.31
> 256 1000 1.55 157.01
> 512 1000 2.04 239.00
> 1024 1000 2.75 355.62
> 2048 1000 4.58 426.40
> 4096 1000 7.12 548.93
> 8192 1000 11.29 692.14
> 16384 1000 18.83 829.75
> 32768 1000 34.57 904.08
> 65536 640 60.73 1029.22
> 131072 320 112.06 1115.43
> 262144 160 215.48 1160.21
> 524288 80 423.34 1181.09
> 1048576 40 858.18 1165.26
> 2097152 20 1744.15 1146.69
> 4194304 10 4055.60 986.29
>
>Now, when disabling the sm btl, the score is:
>
>#---------------------------------------------------
># Benchmarking PingPong
># #processes = 2
># ( 6 additional processes waiting in MPI_Barrier)
>#---------------------------------------------------
> #bytes #repetitions t[usec] Mbytes/sec
> 0 1000 1.08 0.00
> 1 1000 1.42 0.67
> 2 1000 1.19 1.60
> 4 1000 1.21 3.14
> 8 1000 1.61 4.75
> 16 1000 1.30 11.70
> 32 1000 1.32 23.13
> 64 1000 1.61 37.97
> 128 1000 2.80 43.53
> 256 1000 3.21 76.05
> 512 1000 4.06 120.15
> 1024 1000 5.03 194.21
> 2048 1000 7.15 273.05
> 4096 1000 10.05 388.55
> 8192 1000 16.02 487.76
> 16384 1000 29.63 527.41
> 32768 1000 51.23 610.03
> 65536 640 92.26 677.43
> 131072 320 141.03 886.36
> 262144 160 233.62 1070.14
> 524288 80 434.56 1150.60
> 1048576 40 818.84 1221.24
> 2097152 20 1403.75 1424.76
> 4194304 10 2523.40 1585.16
>
>
>Now, I do have fast Infiniband, but I can't believe that the openib btl is
>supposed to be faster than the sm btl. Does anyone know wether
>something can be tuned here?
>
>Best regards,
>
>Daniël Mantione
>
>------------------------------------------------------------------------
>
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>
|