I'm interested in SLURM / OpenMPI startup numbers, but I haven't done this testing myself. We're stuck with an older version of SLURM for various internal reasons, and I'm wondering whether it's worth the effort to back port the PMI2 support. Can you share some of the differences in times at different scales?
From: devel [devel-bounces_at_[hidden]] on behalf of Christopher Samuel [samuel_at_[hidden]]
Sent: Tuesday, May 06, 2014 8:32 PM
Subject: Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested
-----BEGIN PGP SIGNED MESSAGE-----
On 07/05/14 12:53, Ralph Castain wrote:
> We have been seeing a lot of problems with the Slurm PMI-2 support
> (not in OMPI - it's the code in Slurm that is having problems). At
> this time, I'm unaware of any advantage in using PMI-2 over PMI-1
> in Slurm - the scaling is equally poor, and PMI-2 does not supports
> any additional functionality.
> I know that Cray PMI-2 has a definite advantage, so I'm proposing
> that we turn PMI-2 "off" when under Slurm unless the user
> specifically requests we use it.
Our local testing has shown that PMI-2 in 1.7.x gives a massive
improvement in scaling when starting jobs with srun over using srun
with OMPI 1.6.x and now that OMPI 1.8.x is out we're planning on
moving to using PMI2 with OMPI and srun.
Using mpirun gives good performance with OMPI 1.6.x but Slurm then
gets all its memory stats wrong and if you run with CR_Core_Memory in
Slurm you have a very high risk your job will get killed incorrectly.
All the best,
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel_at_[hidden] Phone: +61 (0)3 903 55545
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
-----END PGP SIGNATURE-----
devel mailing list
Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14691.php