Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
From: Christopher Samuel (samuel_at_[hidden])
Date: 2013-07-23 19:28:18


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 23/07/13 19:34, Joshua Ladd wrote:

> Hi, Chris

Hi Joshua,

I've quoted you in full as I don't think your message made it through
to the slurm-dev list (at least I've not received it from there yet).

> Funny you should mention this now. We identified and diagnosed the
> issue some time ago as a combination of SLURM's PMI1
> implementation and some of, what I'll call, OMPI's topology
> requirements (probably not the right word.) Here's what is
> happening, in a nutshell, when you launch with srun:
>
> 1. Each process pushes his endpoint data up to the PMI "cloud" via
> PMI put (I think it's about five or six puts, bottom line, O(1).)
> 2. Then executes a PMI commit and PMI barrier to ensure all other
> processes have finished committing their data to the "cloud". 3.
> Subsequent to this, each process executes O(N) (N is the number of
> procs in the job) PMI gets in order to get all of the endpoint
> data for every process regardless of whether or not the process
> communicates with that endpoint.
>
> "We" (MLNX et al.) undertook an in-depth scaling study of this and
> identified several poorly scaling pieces with the worst offenders
> being:
>
> 1. PMI Barrier scales worse than linear. 2. At scale, the PMI get
> phase starts to look quadratic.
>
> The proposed solution that "we" (OMPI + SLURM) have come up with is
> to modify OMPI to support PMI2 and to use SLURM 2.6 which has
> support for PMI2 and is (allegedly) much more scalable than PMI1.
> Several folks in the combined communities are working hard, as we
> speak, trying to get this functional to see if it indeed makes a
> difference. Stay tuned, Chris. Hopefully we will have some data by
> the end of the week.

Wonderful, great to know that what we're seeing is actually real and
not just pilot error on our part! We're happy enough to tell users
to keep on using mpirun as they will be used to from our other Intel
systems and to only use srun if the code requires it (one or two
commercial apps that use Intel MPI).

Can I ask, if the PMI2 ideas work out is that likely to get backported
to OMPI 1.6.x ?

All the best,
Chris
- --
 Christopher Samuel Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel_at_[hidden] Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/ http://twitter.com/vlsci

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlHvEZIACgkQO2KABBYQAh9QogCeMuR/E4oPivdsX3r671+z7EWd
Hv8An1N8csHMby7bouT/gC07i/J2PW+i
=gZsB
-----END PGP SIGNATURE-----