Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Open MPI v1.3.3rc1 has escaped
From: Peter Kjellstrom (cap_at_[hidden])
Date: 2009-07-10 06:54:26


On Friday 10 July 2009, Jeff Squyres wrote:
> http://www.open-mpi.org/software/ompi/v1.3/
>
> Please test!

Built and ran just like(*) 1.3.2 on my limited tests (that is, worked quite
well)

OS:CentOS-5.3.x86_64 with its own OFED
HW:ConnectX-DDR on a Nehalem dual-quad platform
Size:4 nodes
Compilers: Intel-11.0-074 (built with C/C++/F90, tested C and F90)

(*) It seems to still have the problem reported in:

 [OMPI users] scaling problem with openmpi
 From: Roman Martonak <r.martonak_at_[hidden]>
 To: users_at_[hidden]
 Date: 2009-05-16 00.20

That is, it selects basic-linear for alltoall when it should have picked bruck
and the result is suckish performance:

as-shipped:

 $ mpirun -np 32 -host tbox13,tbox14,tbox15,tbox16 ./alltoall.openmpi133rc1 \
profile.short-small
 running in profile-from-file mode
 bw for 10000 x 0 B : 0.0 bytes/s time was: 142.1 us
 bw for 10000 x 1 B : 2.8 Mbytes/s time was: 224.0 ms
 bw for 10000 x 2 B : 5.5 Mbytes/s time was: 225.5 ms
 bw for 10000 x 4 B : 11.0 Mbytes/s time was: 225.6 ms
 bw for 10000 x 8 B : 23.6 Mbytes/s time was: 210.2 ms
 bw for 10000 x 16 B : 44.1 Mbytes/s time was: 224.9 ms
 bw for 10000 x 32 B : 79.2 Mbytes/s time was: 250.7 ms
 bw for 10000 x 64 B : 132.0 Mbytes/s time was: 300.6 ms
 bw for 10000 x 128 B : 195.7 Mbytes/s time was: 405.4 ms
 bw for 10000 x 256 B : 11.4 Mbytes/s time was: 14.0 s
 bw for 10000 x 512 B : 24.1 Mbytes/s time was: 13.2 s
 bw for 10000 x 1024 B : 53.6 Mbytes/s time was: 11.9 s
 totaltime was: 41.0 s

forcing bruck:

 $ mpirun -np 32 -mca coll_tuned_alltoall_algorithm 3 -mca \
coll_tuned_use_dynamic_rules 1 -host \
tbox13,tbox14,tbox15,tbox16 ./alltoall.openmpi133rc1 profile.short-small
 running in profile-from-file mode
 bw for 10000 x 0 B : 0.0 bytes/s time was: 142.1 us
 bw for 10000 x 1 B : 3.5 Mbytes/s time was: 176.8 ms
 bw for 10000 x 2 B : 6.9 Mbytes/s time was: 179.4 ms
 bw for 10000 x 4 B : 13.4 Mbytes/s time was: 184.5 ms
 bw for 10000 x 8 B : 24.3 Mbytes/s time was: 203.8 ms
 bw for 10000 x 16 B : 45.3 Mbytes/s time was: 219.0 ms
 bw for 10000 x 32 B : 81.0 Mbytes/s time was: 245.1 ms
 bw for 10000 x 64 B : 134.1 Mbytes/s time was: 295.9 ms
 bw for 10000 x 128 B : 198.3 Mbytes/s time was: 400.2 ms
 bw for 10000 x 256 B : 233.8 Mbytes/s time was: 679.0 ms
 bw for 10000 x 512 B : 281.5 Mbytes/s time was: 1.1 s
 bw for 10000 x 1024 B : 292.1 Mbytes/s time was: 2.2 s
 totaltime was: 5.9 s

I didn't follow up on this thinking it had been solved...

/Peter