Your code is obviously doing something much more than just launching and wiring up, so it is difficult to assess the difference in speed between 1.6.5 and 1.7.3 - my guess is that it has to do with changes in the MPI transport layer and nothing to do with PMI or not.
Likewise, I can't imagine any differences in wireup method accounting for the 500 seconds in execution time difference between the two versions when using the same launch method. I launch more than 10 nodes in far less time than that, so again I expect this has to do with something in the MPI layer.
The real question is why you see so much difference between launching via mpirun vs srun. Like I said, the launch and wireup times on such small scales is negligible, so somehow you are winding up selecting different MPI transport options. You can test this by just running "hello world" instead - I'll bet the mpirun vs srun time differences are a second or two at most.
Perhaps Jeff or someone else can suggest some debug flags you could use to understand these differences?
On Sep 3, 2013, at 6:13 PM, Christopher Samuel <samuel_at_[hidden]> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> On 03/09/13 10:56, Ralph Castain wrote:
>> Yeah - --with-pmi=<path-to-pmi.h>
> Actually I found that just --with-pmi=/usr/local/slurm/latest worked. :-)
> I've got some initial numbers for 64 cores, as I mentioned the system
> I found this on initially is so busy at the moment I won't be able to
> run anything bigger for a while, so I'm going to move my testing to
> another system which is a bit quieter, but slower (it's Nehalem vs
> All the below tests are with the same NAMD 2.9 binary and within the
> same Slurm job so it runs on the same cores each time. It's nice to
> find that C code at least seems to be backwardly compatible!
> 64 cores over 18 nodes:
> Open-MPI 1.6.5 with mpirun - 7842 seconds
> Open-MPI 1.7.3a1r29103 with srun - 7522 seconds
> so that's about a 4% speedup.
> 64 cores over 10 nodes:
> Open-MPI 1.7.3a1r29103 with mpirun - 8341 seconds
> Open-MPI 1.7.3a1r29103 with srun - 7476 seconds
> So that's about 11% faster, and the mpirun speed has decreased though
> of course that's built using PMI so perhaps that's the cause?
> - --
> Christopher Samuel Senior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: samuel_at_[hidden] Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/ http://twitter.com/vlsci
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
> -----END PGP SIGNATURE-----
> devel mailing list