variation 1: hostname
real 0m35.391s
variation 2: hostname slots=4
real 0m45.698s
variation 3: hostname slots=2
real 0m38.761s
It seems that the best performance I achieve is when I use variation 1 with only the hostname and execute the command:
"mpirun --hostfile hostfile -np 32 <my_application>" . Its shockingly about 13% better performance than if I use the hostfile with a syntax of "hostname slots=4".
I also tried variations of in my mpirun command, here are the times:
straight mpirun with not mca options
real 0m45.698s
and....
"-mca mpi_yield_when_idle 0"
real 0m44.912s
and....
"-mca mtl mx -mca pml cm"
real 0m45.002s
Warner Yuen
Scientific Computing Consultant
Apple Computer
email: wyuen@apple.com
Tel: 408.718.2859
Fax: 408.715.0133