Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested
From: Christopher Samuel (samuel_at_[hidden])
Date: 2014-05-07 23:40:31

Hash: SHA1

On 08/05/14 12:54, Ralph Castain wrote:

> I think there was one 2.6.x that was borked, and definitely
> problems in the 14.03.x line. Can't pinpoint it for you, though.

No worries, thanks.

> Sounds good. I'm going to have to dig deeper into those numbers,
> though, as they don't entirely add up to me. Once the job gets
> launched, the launch method itself should have no bearing on
> computational speed - IF all things are equal. In other words, if
> the process layout is the same, and the binding pattern is the
> same, then computational speed should be roughly equivalent
> regardless of how the procs were started.

Not sure if it's significant but when mpirun was launching processes
it was using srun to start orted which then started MPI ranks whereas
with PMI/PMI2 it appeared to directly start the ranks.

> My guess is that your data might indicate a difference in the
> layout and/or binding pattern as opposed to PMI2 vs mpirun. At the
> scale you mention later in the thread (only 70 nodes x 16 ppn), the
> difference in launch timing would be zilch. So I'm betting you
> would find (upon further exploration) that (a) you might not have
> been binding processes when launching by mpirun, since we didn't
> bind by default until the 1.8 series, but were binding under direct
> srun launch, and (b) your process mapping would quite likely be
> different as we default to byslot mapping, and I believe srun
> defaults to bynode?

FWIW all our environment modules that do OMPI have:

setenv OMPI_MCA_orte_process_binding core

> Might be worth another comparison run when someone has time.

Yeah, I'll try and queue up some more tests - unfortunately the
cluster we tested on then is flat out at the moment but I'll try and
sneak a 64-core job using identical configs and compare mpirun, srun
on its own and srun with PMI2.

All the best,
- --
 Christopher Samuel Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel_at_[hidden] Phone: +61 (0)3 903 55545

Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird -