Sorry for the delayed response. After much effort, the Open MPI 1.7 branch now supports PMI2 (in general, not just for ALPS) and has been tested and evaluated at small-ish scale (up to 512 ranks) with SLURM 2.6. We need to test this at larger scale and plan to do so in the coming weeks, but what we have observed thus far is the following:
1. KVS Fence operation appears to scale worse than linear. This issue resides solely on the SLURM side. Perhaps a better algorithm could be implemented - we have discussed recursive doubling and Bruck's as alternatives.
2. There are still O(N) calls to PMI2_get at the OMPI/ORTE level that don't appear to scale particularly well. Circumventing this remains an open challenge, though proposals have been tossed around such as having a single node leader get all the data from KVS space, put it into a shared segment where the other ranks on host can read from. Unfortunately, this is still O(N), just with a reduced coefficient.
3. We observed launch times take longer with SLURM 2.6 than they did with the 2.5.X series. However, anecdotally, scaling appears to be improved. From our (Mellanox's) point of view, getting something that doesn't "blow-up" quadratically as N goes to 4K ranks and beyond is more important than the absolute performance in launching any one job size.
>From the data that I have seen, it appears that simply switching to SLURM 2.6 (along with the latest OMPI 1.7) will most likely not provide comparable performance to launching with mpirun. I'll be sure to keep you and the community appraised of the situation as more data on larger systems becomes available in the coming weeks.
Joshua S. Ladd, PhD
HPC Algorithms Engineer
Cell: +1 (865) 258 - 8898
From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Christopher Samuel
Sent: Thursday, August 08, 2013 12:26 AM
Subject: Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
-----BEGIN PGP SIGNED MESSAGE-----
On 23/07/13 19:34, Joshua Ladd wrote:
> The proposed solution that "we" (OMPI + SLURM) have come up with is to
> modify OMPI to support PMI2 and to use SLURM 2.6 which has support for
> PMI2 and is (allegedly) much more scalable than PMI1.
> Several folks in the combined communities are working hard, as we
> speak, trying to get this functional to see if it indeed makes a
> difference. Stay tuned, Chris. Hopefully we will have some data by the
> end of the week.
Is there any news on this?
We'd love to be able to test this out if we can as I currently see a 60% penalty with srun with my test NAMD job from our tame MM person.
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel_at_[hidden] Phone: +61 (0)3 903 55545
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
-----END PGP SIGNATURE-----
devel mailing list