Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested
From: Christopher Samuel (samuel_at_[hidden])
Date: 2014-05-07 23:40:31


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 08/05/14 12:54, Ralph Castain wrote:

> I think there was one 2.6.x that was borked, and definitely
> problems in the 14.03.x line. Can't pinpoint it for you, though.

No worries, thanks.

> Sounds good. I'm going to have to dig deeper into those numbers,
> though, as they don't entirely add up to me. Once the job gets
> launched, the launch method itself should have no bearing on
> computational speed - IF all things are equal. In other words, if
> the process layout is the same, and the binding pattern is the
> same, then computational speed should be roughly equivalent
> regardless of how the procs were started.

Not sure if it's significant but when mpirun was launching processes
it was using srun to start orted which then started MPI ranks whereas
with PMI/PMI2 it appeared to directly start the ranks.

> My guess is that your data might indicate a difference in the
> layout and/or binding pattern as opposed to PMI2 vs mpirun. At the
> scale you mention later in the thread (only 70 nodes x 16 ppn), the
> difference in launch timing would be zilch. So I'm betting you
> would find (upon further exploration) that (a) you might not have
> been binding processes when launching by mpirun, since we didn't
> bind by default until the 1.8 series, but were binding under direct
> srun launch, and (b) your process mapping would quite likely be
> different as we default to byslot mapping, and I believe srun
> defaults to bynode?

FWIW all our environment modules that do OMPI have:

setenv OMPI_MCA_orte_process_binding core

> Might be worth another comparison run when someone has time.

Yeah, I'll try and queue up some more tests - unfortunately the
cluster we tested on then is flat out at the moment but I'll try and
sneak a 64-core job using identical configs and compare mpirun, srun
on its own and srun with PMI2.

All the best,
Chris
- --
 Christopher Samuel Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel_at_[hidden] Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/ http://twitter.com/vlsci

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlNq/K8ACgkQO2KABBYQAh/q0wCcDvYjl4tYVXrHNciCkKgbnwF7
VHoAn3Q+gZXQNKzs++3uajmiGTkq/EeD
=ucJg
-----END PGP SIGNATURE-----