Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-05-07 00:45:31


Ah, interesting - my comments were in respect to startup time (specifically, MPI wireup)

On May 6, 2014, at 8:49 PM, Christopher Samuel <samuel_at_[hidden]> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 07/05/14 13:37, Moody, Adam T. wrote:
>
>> Hi Chris,
>
> Hi Adam,
>
>> I'm interested in SLURM / OpenMPI startup numbers, but I haven't
>> done this testing myself. We're stuck with an older version of
>> SLURM for various internal reasons, and I'm wondering whether it's
>> worth the effort to back port the PMI2 support. Can you share some
>> of the differences in times at different scales?
>
> We've not looked at startup times I'm afraid, this was time to
> solution. We noticed it with Slurm when we first started using on
> x86-64 for our NAMD tests (this from a posting to the list last year
> when I raised the issue and were told PMI2 would be the solution):
>
>> Slurm 2.6.0, RHEL 6.4 (latest kernel), FDR IB.
>>
>> Here are some timings as reported as the WallClock time by NAMD
>> itself (so not including startup/tear down overhead from Slurm).
>>
>> srun:
>>
>> run1/slurm-93744.out:WallClock: 695.079773 CPUTime: 695.079773
>> run4/slurm-94011.out:WallClock: 723.907959 CPUTime: 723.907959
>> run5/slurm-94013.out:WallClock: 726.156799 CPUTime: 726.156799
>> run6/slurm-94017.out:WallClock: 724.828918 CPUTime: 724.828918
>>
>> Average of 692 seconds
>>
>> mpirun:
>>
>> run2/slurm-93746.out:WallClock: 559.311035 CPUTime: 559.311035
>> run3/slurm-93910.out:WallClock: 544.116333 CPUTime: 544.116333
>> run7/slurm-94019.out:WallClock: 586.072693 CPUTime: 586.072693
>>
>> Average of 563 seconds.
>>
>> So that's about 23% slower.
>>
>> Everything is identical (they're all symlinks to the same golden
>> master) *except* for the srun / mpirun which is modified by
>> copying the batch script and substituting mpirun for srun.
>
>
>
> - --
> Christopher Samuel Senior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: samuel_at_[hidden] Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/ http://twitter.com/vlsci
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iEYEARECAAYFAlNprUUACgkQO2KABBYQAh9rLACfcZc4HR/u6G0bJejM3C/my7Nw
> 8b4AnRasOMvKZjpjpyKkbplc6/Iq9qBK
> =pqH9
> -----END PGP SIGNATURE-----
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14694.php