Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] barrier problem
From: Pavel Mezentsev (pavel.mezentsev_at_[hidden])
Date: 2012-03-28 06:37:35


I took the best result from each version, that's why different algotithm
numbers were chosen.

I've studied the matter a bit further and here's what I got:
with openmpi 1.5.4 these are the average times:
/opt/openmpi-1.5.4/intel12/bin/mpirun -x OMP_NUM_THREADS=1 -hostfile
hosts_all2all_4 -npernode 32 --mca btl openib,sm,self -mca
coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm $i -np 128
openmpi-1.5.4/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 128
barrier
0 - 71.78
3 - 69.39
6 - 69.05

If I pin the processes with the following script:
#!/bin/bash

s=$(($OMPI_COMM_WORLD_NODE_RANK))

numactl --physcpubind=$((s)) --localalloc openmpi-1.5.4/intel12/IMB-MPI1
-off_cache 16,64 -msglog 1:16 -npmin 128 barrier
then the results improve:
0 - 51.96
3 - 52.39
6 - 28.64

On openmpi-1.5.5rc3 without any binding the results are awful (14964.15 is
the best)
If I use the '--bind-to-core' flag then the results are almost the same as
in 1.5.4 with binding script:
0 - 52.85
3 - 52.69
6 - 23.34

So almost everything seems to work fine now. The only problem left is that
algorithm number 5 hangs

2012/3/28 Jeffrey Squyres <jsquyres_at_[hidden]>

> FWIW:
>
> 1. There were definitely some issues with binding to cores and process
> layouts on Opterons that should be fixed in the 1.5.5 that was finally
> released today.
>
> 2. It is strange that the performance of barrier is so much different
> between 1.5.4 and 1.5.5. Is there a reason you were choosing different
> algorithm numbers between the two? (one of your command lines had
> "coll_tuned_barrier_algorithm 1", the other had
> "coll_tuned_barrier_algorithm 3").
>
>
> On Mar 23, 2012, at 10:11 AM, Shamis, Pavel wrote:
>
> > Pavel,
> >
> > Mvapich implements multicore optimized collectives, which perform
> substantially better than default algorithms.
> > FYI, ORNL team works on new high performance collectives framework for
> OMPI. The framework provides significant boost in collectives performance.
> >
> > Regards,
> >
> > Pavel (Pasha) Shamis
> > ---
> > Application Performance Tools Group
> > Computer Science and Math Division
> > Oak Ridge National Laboratory
> >
> >
> >
> >
> >
> >
> > On Mar 23, 2012, at 9:17 AM, Pavel Mezentsev wrote:
> >
> > I've been comparing 1.5.4 and 1.5.5rc3 with the same parameters that's
> why I didn't use --bind-to-core. I checked and the usage of --bind-to-core
> improved the result comparing to 1.5.4:
> > #repetitions t_min[usec] t_max[usec] t_avg[usec]
> > 1000 84.96 85.08 85.02
> >
> > So I guess with 1.5.5 the processes move from core to core within node
> even though I use all cores, right? Then why 1.5.4 behaves differently?
> >
> > I need --bind-to-core in some cases and that's why I need 1.5.5rc3
> instead of more stable 1.5.4. I know that I can use numactl explicitly but
> --bind-to-core is more convinient :)
> >
> > 2012/3/23 Ralph Castain <rhc_at_[hidden]<mailto:rhc_at_[hidden]>>
> > I don't see where you told OMPI to --bind-to-core. We don't
> automatically bind, so you have to explicitly tell us to do so.
> >
> > On Mar 23, 2012, at 6:20 AM, Pavel Mezentsev wrote:
> >
> >> Hello
> >>
> >> I'm doing some testing with IMB and dicovered a strange thing:
> >>
> >> Since I have a system with new AMD opteron 6276 processors I'm using
> 1.5.5rc3 since it supports binding to cores.
> >>
> >> But when I run the barrier test form intel mpi benchmarks, the best I
> get is:
> >> #repetitions t_min[usec] t_max[usec] t_avg[usec]
> >> 598 15159.56 15211.05 15184.70
> >> (/opt/openmpi-1.5.5rc3/intel12/bin/mpirun -x OMP_NUM_THREADS=1
> -hostfile hosts_all2all_2 -npernode 32 --mca btl openib,sm,self -mca
> coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm 1 -np 256
> openmpi-1.5.5rc3/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 256
> barrier)
> >>
> >> And with openmpi 1.5.4 the result is much better:
> >> #repetitions t_min[usec] t_max[usec] t_avg[usec]
> >> 1000 113.23 113.33 113.28
> >>
> >> (/opt/openmpi-1.5.4/intel12/bin/mpirun -x OMP_NUM_THREADS=1 -hostfile
> hosts_all2all_2 -npernode 32 --mca btl openib,sm,self -mca
> coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm 3 -np 256
> openmpi-1.5.4/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 256
> barrier)
> >>
> >> and still I couldn't come close to the result I got with mvapich:
> >> #repetitions t_min[usec] t_max[usec] t_avg[usec]
> >> 1000 17.51 17.53 17.53
> >>
> >> (/opt/mvapich2-1.8/intel12/bin/mpiexec.hydra -env OMP_NUM_THREADS 1
> -hostfile hosts_all2all_2 -np 256 mvapich2-1.8/intel12/IMB-MPI1 -mem 2
> -off_cache 16,64 -msglog 1:16 -npmin 256 barrier)
> >>
> >> I dunno if this is a bug or me doing something not the way I should. So
> is there a way to improve my results?
> >>
> >> Best regards,
> >> Pavel Mezentsev
> >>
> >>
> >> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]<mailto:devel_at_[hidden]>
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]<mailto:devel_at_[hidden]>
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]<mailto:devel_at_[hidden]>
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>