Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-05-06 23:48:30


FWIW: we see varying reports about the scalability of Slurm, especially at large cluster sizes. Last I saw/tested, there is a quadratic term that begins to dominate above 2k nodes. Others swear it is better <shrug>. Guess I'd be cautious and definitely test things before investing in a move - I'm not convinced.

On May 6, 2014, at 8:37 PM, Moody, Adam T. <moody20_at_[hidden]> wrote:

> Hi Chris,
> I'm interested in SLURM / OpenMPI startup numbers, but I haven't done this testing myself. We're stuck with an older version of SLURM for various internal reasons, and I'm wondering whether it's worth the effort to back port the PMI2 support. Can you share some of the differences in times at different scales?
> Thanks,
> -Adam
> ________________________________________
> From: devel [devel-bounces_at_[hidden]] on behalf of Christopher Samuel [samuel_at_[hidden]]
> Sent: Tuesday, May 06, 2014 8:32 PM
> To: devel_at_[hidden]
> Subject: Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 07/05/14 12:53, Ralph Castain wrote:
>
>> We have been seeing a lot of problems with the Slurm PMI-2 support
>> (not in OMPI - it's the code in Slurm that is having problems). At
>> this time, I'm unaware of any advantage in using PMI-2 over PMI-1
>> in Slurm - the scaling is equally poor, and PMI-2 does not supports
>> any additional functionality.
>>
>> I know that Cray PMI-2 has a definite advantage, so I'm proposing
>> that we turn PMI-2 "off" when under Slurm unless the user
>> specifically requests we use it.
>
> Our local testing has shown that PMI-2 in 1.7.x gives a massive
> improvement in scaling when starting jobs with srun over using srun
> with OMPI 1.6.x and now that OMPI 1.8.x is out we're planning on
> moving to using PMI2 with OMPI and srun.
>
> Using mpirun gives good performance with OMPI 1.6.x but Slurm then
> gets all its memory stats wrong and if you run with CR_Core_Memory in
> Slurm you have a very high risk your job will get killed incorrectly.
>
> All the best,
> Chris
> - --
> Christopher Samuel Senior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: samuel_at_[hidden] Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/ http://twitter.com/vlsci
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iEYEARECAAYFAlNpqUwACgkQO2KABBYQAh/igwCfQSB/v3tI37Rq4z5z/0xT/BYU
> 6ToAn3Qt6tOt46LQD25eHhlx+3z/sjnQ
> =LEHf
> -----END PGP SIGNATURE-----
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14691.php
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14692.php