Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] 1.5.0 could be soon
From: Peter Kjellstrom (cap_at_[hidden])
Date: 2010-02-17 08:55:49


On Tuesday 16 February 2010, Jeff Squyres wrote:
> We've only got 2 "critical" 1.5.0 bugs left, and I think that those will
> both be closed out pretty soon.
>
> https://svn.open-mpi.org/trac/ompi/report/15
>
> Rainer and I both feel that a RC for 1.5.0 could be pretty soon.
>
> Does anyone have any heartburn with this? Does anyone have any things they
> still need to get in v1.5.0?

I noticed that 1.5a1r22627 still has a very suboptimal default selection of
(at least) alltoall algorithms. This has been mentioned several times since
the first major discussion[1] but nothing seems to have improved.

A short re-cap of the situation is that by default ompi switches from bruck to
basic-linear at ~100 bytes pkg size and this is bad<tm>. The first set of
figures below are with vanilla ompi and the second set is with a dynamic
rules file [2] that foreces bruck for all pkg sizes. For details on the
system see [3].

The problem is equally visible on tcp as on openib. A concrete result is that
OpenMPI on IB is way slower than other MPIs on 1G eth (for the affected pkg
sizes (100-3000 bytes)).

[cap_at_n115 mpi]$ mpirun --host $(hostlist --expand -s','
$SLURM_JOB_NODELIST) --bind-to-core ./alltoall.ompi15a1r22627
profile.ompibadness
running in profile-from-file mode
bw for 400 x 1 B : 2.0 Mbytes/s time was: 24.9 ms
bw for 400 x 25 B : 52.8 Mbytes/s time was: 23.9 ms
bw for 400 x 50 B : 82.2 Mbytes/s time was: 30.7 ms
bw for 400 x 75 B : 90.4 Mbytes/s time was: 41.8 ms
bw for 400 x 100 B : 109.2 Mbytes/s time was: 46.1 ms
bw for 400 x 200 B : 4.8 Mbytes/s time was: 2.1 s
bw for 400 x 300 B : 7.0 Mbytes/s time was: 2.2 s
bw for 400 x 400 B : 9.8 Mbytes/s time was: 2.1 s
bw for 400 x 500 B : 12.3 Mbytes/s time was: 2.0 s
bw for 400 x 750 B : 18.5 Mbytes/s time was: 2.0 s
bw for 400 x 1000 B : 24.6 Mbytes/s time was: 2.0 s
bw for 400 x 1250 B : 29.9 Mbytes/s time was: 2.1 s
bw for 400 x 1500 B : 35.1 Mbytes/s time was: 2.2 s
bw for 400 x 2000 B : 45.5 Mbytes/s time was: 2.2 s
bw for 400 x 2500 B : 51.0 Mbytes/s time was: 2.5 s
bw for 400 x 3000 B : 113.6 Mbytes/s time was: 1.3 s
bw for 400 x 3500 B : 123.3 Mbytes/s time was: 1.4 s
bw for 400 x 4000 B : 135.7 Mbytes/s time was: 1.5 s
totaltime was: 25.8 s
[cap_at_n115 mpi]$ mpirun --host $(hostlist --expand -s','
$SLURM_JOB_NODELIST) --bind-to-core -mca coll_tuned_use_dynamic_rules 1 -mca
coll_tuned_dynamic_rules_filename ./dyn_rules ./alltoall.ompi15a1r22627
profile.ompibadness
running in profile-from-file mode
bw for 400 x 1 B : 2.1 Mbytes/s time was: 24.3 ms
bw for 400 x 25 B : 55.1 Mbytes/s time was: 22.9 ms
bw for 400 x 50 B : 82.6 Mbytes/s time was: 30.5 ms
bw for 400 x 75 B : 89.4 Mbytes/s time was: 42.3 ms
bw for 400 x 100 B : 109.9 Mbytes/s time was: 45.9 ms
bw for 400 x 200 B : 115.1 Mbytes/s time was: 87.6 ms
bw for 400 x 300 B : 117.8 Mbytes/s time was: 128.3 ms
bw for 400 x 400 B : 105.4 Mbytes/s time was: 191.2 ms
bw for 400 x 500 B : 113.4 Mbytes/s time was: 222.1 ms
bw for 400 x 750 B : 119.3 Mbytes/s time was: 316.9 ms
bw for 400 x 1000 B : 120.9 Mbytes/s time was: 416.9 ms
bw for 400 x 1250 B : 121.0 Mbytes/s time was: 520.6 ms
bw for 400 x 1500 B : 120.3 Mbytes/s time was: 628.2 ms
bw for 400 x 2000 B : 118.0 Mbytes/s time was: 854.1 ms
bw for 400 x 2500 B : 96.5 Mbytes/s time was: 1.3 s
bw for 400 x 3000 B : 107.4 Mbytes/s time was: 1.4 s
bw for 400 x 3500 B : 109.1 Mbytes/s time was: 1.6 s
bw for 400 x 4000 B : 109.2 Mbytes/s time was: 1.8 s
totaltime was: 9.7 s

[1] [OMPI users] scaling problem with openmpi
From: Roman Martonak <r.martonak_at_[hidden]>
  To: users_at_[hidden]
  Date: 2009-05-16 00.20

[2]:
 1 # num of collectives
 3 # ID = 3 Alltoall collective (ID in coll_tuned.h)
 1 # number of com sizes
 32 # comm size 8
 1 # number of msg sizes
 0 3 0 0 # for message size 0, bruck 1, topo 0, 0 segmentation
 # end of first collective

[3]:
 OpenMPI: Built with intel-11.1.074 only configure options used were:
  --enable-orterun-prefix-by-default
  --prefix
 OS: CentOS-5.4 x86_64
 HW: Dual E5520 nodes with IB (ConnectX)
 Size of job: 8 nodes (that is 64 cores/ranks)

/Peter