Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-07-23 19:42:47


Not to 1.6 series, but it is in the about-to-be-released 1.7.3, and will be there from that point onwards. Still waiting to see if it resolves the difference.

On Jul 23, 2013, at 4:28 PM, Christopher Samuel <samuel_at_[hidden]> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 23/07/13 19:34, Joshua Ladd wrote:
>
>> Hi, Chris
>
> Hi Joshua,
>
> I've quoted you in full as I don't think your message made it through
> to the slurm-dev list (at least I've not received it from there yet).
>
>> Funny you should mention this now. We identified and diagnosed the
>> issue some time ago as a combination of SLURM's PMI1
>> implementation and some of, what I'll call, OMPI's topology
>> requirements (probably not the right word.) Here's what is
>> happening, in a nutshell, when you launch with srun:
>>
>> 1. Each process pushes his endpoint data up to the PMI "cloud" via
>> PMI put (I think it's about five or six puts, bottom line, O(1).)
>> 2. Then executes a PMI commit and PMI barrier to ensure all other
>> processes have finished committing their data to the "cloud". 3.
>> Subsequent to this, each process executes O(N) (N is the number of
>> procs in the job) PMI gets in order to get all of the endpoint
>> data for every process regardless of whether or not the process
>> communicates with that endpoint.
>>
>> "We" (MLNX et al.) undertook an in-depth scaling study of this and
>> identified several poorly scaling pieces with the worst offenders
>> being:
>>
>> 1. PMI Barrier scales worse than linear. 2. At scale, the PMI get
>> phase starts to look quadratic.
>>
>> The proposed solution that "we" (OMPI + SLURM) have come up with is
>> to modify OMPI to support PMI2 and to use SLURM 2.6 which has
>> support for PMI2 and is (allegedly) much more scalable than PMI1.
>> Several folks in the combined communities are working hard, as we
>> speak, trying to get this functional to see if it indeed makes a
>> difference. Stay tuned, Chris. Hopefully we will have some data by
>> the end of the week.
>
> Wonderful, great to know that what we're seeing is actually real and
> not just pilot error on our part! We're happy enough to tell users
> to keep on using mpirun as they will be used to from our other Intel
> systems and to only use srun if the code requires it (one or two
> commercial apps that use Intel MPI).
>
> Can I ask, if the PMI2 ideas work out is that likely to get backported
> to OMPI 1.6.x ?
>
> All the best,
> Chris
> - --
> Christopher Samuel Senior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: samuel_at_[hidden] Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/ http://twitter.com/vlsci
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iEYEARECAAYFAlHvEZIACgkQO2KABBYQAh9QogCeMuR/E4oPivdsX3r671+z7EWd
> Hv8An1N8csHMby7bouT/gC07i/J2PW+i
> =gZsB
> -----END PGP SIGNATURE-----
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel