Make sure you don't use a "debug" build of Open MPI. If you use trunk,
the build system detects it and turns on debug by default. It really
kills performance. --disable-debug will remove all those nasty printfs
from the critical path.
You can also run a simple ping-pong test (Netpipe is a good one) to
make sure that the numbers are correct. Based on your processor model,
shared memory latency should be in the .45us, while bandwidth should
reach 9Gbit/s for messages longer than cache.
Le 8 oct. 08 à 17:09, Brian Dobbins a écrit :
> Hi guys,
> [From Eugene Loh:]
> OpenMPI - 25 m 39 s.
> MPICH2 - 15 m 53 s.
> With regards to your issue, do you have any indication when you get
> that 25m39s timing if there is a grotesque amount of time being
> spent in MPI calls? Or, is the slowdown due to non-MPI portions?
> Just to add my two cents: if this job can be run on less than 8
> processors (ideally, even on just 1), then I'd recommend doing so.
> That is, run it with OpenMPI and with MPICH2 on 1, 2 and 4
> processors as well. If the single-processor jobs still give vastly
> different timings, then perhaps Eugene is on the right track and it
> comes down to various computational optimizations and not so much
> the message-passing that's make a difference. Timings from 2 and 4
> process runs might be interesting as well to see how this difference
> changes with process counts.
> I've seen differences between various MPI libraries before, but
> nothing quite this severe either. If I get the time, maybe I'll try
> to set up Gromacs tonight -- I've got both MPICH2 and OpenMPI
> installed here and can try to duplicate the runs. Sangamesh, is
> this a standard benchmark case that anyone can download and run?
> - Brian
> Brian Dobbins
> Yale Engineering HPC
> users mailing list
* Dr. Aurélien Bouteiller
* Sr. Research Associate at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 350
* Knoxville, TN 37996
* 865 974 6321