Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-05-07 00:45:31


Ah, interesting - my comments were in respect to startup time (specifically, MPI wireup)

On May 6, 2014, at 8:49 PM, Christopher Samuel <samuel_at_[hidden]> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 07/05/14 13:37, Moody, Adam T. wrote:
>
>> Hi Chris,
>
> Hi Adam,
>
>> I'm interested in SLURM / OpenMPI startup numbers, but I haven't
>> done this testing myself. We're stuck with an older version of
>> SLURM for various internal reasons, and I'm wondering whether it's
>> worth the effort to back port the PMI2 support. Can you share some
>> of the differences in times at different scales?
>
> We've not looked at startup times I'm afraid, this was time to
> solution. We noticed it with Slurm when we first started using on
> x86-64 for our NAMD tests (this from a posting to the list last year
> when I raised the issue and were told PMI2 would be the solution):
>
>> Slurm 2.6.0, RHEL 6.4 (latest kernel), FDR IB.
>>
>> Here are some timings as reported as the WallClock time by NAMD
>> itself (so not including startup/tear down overhead from Slurm).
>>
>> srun:
>>
>> run1/slurm-93744.out:WallClock: 695.079773 CPUTime: 695.079773
>> run4/slurm-94011.out:WallClock: 723.907959 CPUTime: 723.907959
>> run5/slurm-94013.out:WallClock: 726.156799 CPUTime: 726.156799
>> run6/slurm-94017.out:WallClock: 724.828918 CPUTime: 724.828918
>>
>> Average of 692 seconds
>>
>> mpirun:
>>
>> run2/slurm-93746.out:WallClock: 559.311035 CPUTime: 559.311035
>> run3/slurm-93910.out:WallClock: 544.116333 CPUTime: 544.116333
>> run7/slurm-94019.out:WallClock: 586.072693 CPUTime: 586.072693
>>
>> Average of 563 seconds.
>>
>> So that's about 23% slower.
>>
>> Everything is identical (they're all symlinks to the same golden
>> master) *except* for the srun / mpirun which is modified by
>> copying the batch script and substituting mpirun for srun.
>
>
>
> - --
> Christopher Samuel Senior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: samuel_at_[hidden] Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/ http://twitter.com/vlsci
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iEYEARECAAYFAlNprUUACgkQO2KABBYQAh9rLACfcZc4HR/u6G0bJejM3C/my7Nw
> 8b4AnRasOMvKZjpjpyKkbplc6/Iq9qBK
> =pqH9
> -----END PGP SIGNATURE-----
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14694.php