Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] barrier problem
From: Shamis, Pavel (shamisp_at_[hidden])
Date: 2012-03-23 10:11:05


Pavel,

Mvapich implements multicore optimized collectives, which perform substantially better than default algorithms.
FYI, ORNL team works on new high performance collectives framework for OMPI. The framework provides significant boost in collectives performance.

Regards,

Pavel (Pasha) Shamis

---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Mar 23, 2012, at 9:17 AM, Pavel Mezentsev wrote:
I've been comparing 1.5.4 and 1.5.5rc3 with the same parameters that's why I didn't use --bind-to-core. I checked and the usage of --bind-to-core improved the result comparing to 1.5.4:
#repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
         1000        84.96        85.08        85.02
So I guess with 1.5.5 the processes move from core to core within node even though I use all cores, right? Then why 1.5.4 behaves differently?
I need --bind-to-core in some cases and that's why I need 1.5.5rc3 instead of more stable 1.5.4. I know that I can use numactl explicitly but --bind-to-core is more convinient :)
2012/3/23 Ralph Castain <rhc_at_[hidden]<mailto:rhc_at_[hidden]>>
I don't see where you told OMPI to --bind-to-core. We don't automatically bind, so you have to explicitly tell us to do so.
On Mar 23, 2012, at 6:20 AM, Pavel Mezentsev wrote:
> Hello
>
> I'm doing some testing with IMB and dicovered a strange thing:
>
> Since I have a system with new AMD opteron 6276 processors I'm using 1.5.5rc3 since it supports binding to cores.
>
> But when I run the barrier test form intel mpi benchmarks, the best I get is:
> #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
>           598     15159.56     15211.05     15184.70
>  (/opt/openmpi-1.5.5rc3/intel12/bin/mpirun -x OMP_NUM_THREADS=1  -hostfile hosts_all2all_2 -npernode 32 --mca btl openib,sm,self -mca coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm 1 -np 256 openmpi-1.5.5rc3/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 256 barrier)
>
> And with openmpi 1.5.4 the result is much better:
> #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
>          1000       113.23       113.33       113.28
>
> (/opt/openmpi-1.5.4/intel12/bin/mpirun -x OMP_NUM_THREADS=1  -hostfile hosts_all2all_2 -npernode 32 --mca btl openib,sm,self -mca coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm 3 -np 256 openmpi-1.5.4/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 256 barrier)
>
> and still I couldn't come close to the result I got with mvapich:
> #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
>          1000        17.51        17.53        17.53
>
> (/opt/mvapich2-1.8/intel12/bin/mpiexec.hydra -env OMP_NUM_THREADS 1 -hostfile hosts_all2all_2 -np 256 mvapich2-1.8/intel12/IMB-MPI1 -mem 2 -off_cache 16,64 -msglog 1:16 -npmin 256 barrier)
>
> I dunno if this is a bug or me doing something not the way I should. So is there a way to improve my results?
>
> Best regards,
> Pavel Mezentsev
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]<mailto:devel_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
devel_at_[hidden]<mailto:devel_at_[hidden]>
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
devel_at_[hidden]<mailto:devel_at_[hidden]>
http://www.open-mpi.org/mailman/listinfo.cgi/devel