The reported results here sound strange to me - any suggestions on what could be going on? This is a 16-core Linux system, with BTL's set to openib,sm,self. The 1.3b version he is using is a little out of date, and I will update it to the latest state of the branch on Monday and ask him to retry it.
Meantime, has anyone seen behavior like this elsewhere?
PS. Just to clarify, the blue line in the graph is 1.3b. And no - he didn't tell me what the scales mean (I've asked for more info).
Begin forwarded message:
Date: December 12, 2008 3:28:30 PM MST
Subject: Re: 1.3 beta on lobo collectives
I looked at allreduce on 1.3b and 1.28 and mvapich1.1
For 1 per node all is about the same.
For 16 per node openmpi is much worse. I imagine that mvapich does a
inter-node in sharedmem, then out of node.
Does openmpi have a particular tuned version we should be using, 1.3
was supposed to have better collectives as I recall.
(allreduce on a single MPI_DOUBLE)