Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-05-07 00:49:36


I should have looked closer to see the numbers you posted, Chris - those include time for MPI wireup. So what you are seeing is that mpirun is much more efficient at exchanging the MPI endpoint info than PMI. I suspect that PMI2 is not much better as the primary reason for the difference is that mpriun sends blobs, while PMI requires that everything be encoded into strings and sent in little pieces.

Hence, mpirun can exchange the endpoint info (the dreaded "modex" operation) much faster, and MPI_Init completes faster. Rest of the computation should be the same, so long compute apps will see the difference narrow considerably.

HTH
Ralph

On May 6, 2014, at 9:45 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> Ah, interesting - my comments were in respect to startup time (specifically, MPI wireup)
>
> On May 6, 2014, at 8:49 PM, Christopher Samuel <samuel_at_[hidden]> wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On 07/05/14 13:37, Moody, Adam T. wrote:
>>
>>> Hi Chris,
>>
>> Hi Adam,
>>
>>> I'm interested in SLURM / OpenMPI startup numbers, but I haven't
>>> done this testing myself. We're stuck with an older version of
>>> SLURM for various internal reasons, and I'm wondering whether it's
>>> worth the effort to back port the PMI2 support. Can you share some
>>> of the differences in times at different scales?
>>
>> We've not looked at startup times I'm afraid, this was time to
>> solution. We noticed it with Slurm when we first started using on
>> x86-64 for our NAMD tests (this from a posting to the list last year
>> when I raised the issue and were told PMI2 would be the solution):
>>
>>> Slurm 2.6.0, RHEL 6.4 (latest kernel), FDR IB.
>>>
>>> Here are some timings as reported as the WallClock time by NAMD
>>> itself (so not including startup/tear down overhead from Slurm).
>>>
>>> srun:
>>>
>>> run1/slurm-93744.out:WallClock: 695.079773 CPUTime: 695.079773
>>> run4/slurm-94011.out:WallClock: 723.907959 CPUTime: 723.907959
>>> run5/slurm-94013.out:WallClock: 726.156799 CPUTime: 726.156799
>>> run6/slurm-94017.out:WallClock: 724.828918 CPUTime: 724.828918
>>>
>>> Average of 692 seconds
>>>
>>> mpirun:
>>>
>>> run2/slurm-93746.out:WallClock: 559.311035 CPUTime: 559.311035
>>> run3/slurm-93910.out:WallClock: 544.116333 CPUTime: 544.116333
>>> run7/slurm-94019.out:WallClock: 586.072693 CPUTime: 586.072693
>>>
>>> Average of 563 seconds.
>>>
>>> So that's about 23% slower.
>>>
>>> Everything is identical (they're all symlinks to the same golden
>>> master) *except* for the srun / mpirun which is modified by
>>> copying the batch script and substituting mpirun for srun.
>>
>>
>>
>> - --
>> Christopher Samuel Senior Systems Administrator
>> VLSCI - Victorian Life Sciences Computation Initiative
>> Email: samuel_at_[hidden] Phone: +61 (0)3 903 55545
>> http://www.vlsci.org.au/ http://twitter.com/vlsci
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.14 (GNU/Linux)
>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>>
>> iEYEARECAAYFAlNprUUACgkQO2KABBYQAh9rLACfcZc4HR/u6G0bJejM3C/my7Nw
>> 8b4AnRasOMvKZjpjpyKkbplc6/Iq9qBK
>> =pqH9
>> -----END PGP SIGNATURE-----
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14694.php
>