George Bosilca wrote:
> Thanks for the spending time benchmarking OpenMPI and for sending us the
> feedback. We know we have some issues on the 1.0.2 version, more precisely
> with the collective communications. We just look inside the CMAQ code, and
> there are a lot of reduce and Allreduce. As it look like the collective
> are intensively used it's normal that the 1.0.2a4 is slower than MPICH (I
> expect the same behaviour for both MPICH1 and MPICH2). The collective are
> now fixed in the nightly build, we are working toward moving them on the
> next stable release. Until then, if you can redo the benchmark with one of
> the nightly build that will be very usefull. I'm confident that the
> results will improve considerably.
Hi. You're a brave guy even looking at CMAQ. =)
Anyway, here are the times on a few runs I did with Open MPI 1.1a1r887.
Basically what I'm seeing, my jobs run ok when they're local to one
machine, but as soon as they're split up between multiple machines
performance can vary:
4 cpu jobs:
8 cpu jobs:
And by the way. I was doing some maintenance work on my machines this
weekend, so absolutely everyone was kicked off. I'm positive nothing
else was interfering with these jobs.
Also, someone had asked what my setup was, so here it is basically:
HP Procurve 2848 gigabit ethernet switch.
Tyan K8S boards, with dual opteron 246's, 2 gigs of ram, and built in
broadcom gigabit ethernet adapters.
Rocks 4.0, with the latest updates from Red Hat, running a
Network attached storage via NFS.
I don't think my setup is the problem though anyway, as these jobs have
been running file for a while now with MPICH.