Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] grpcomm component hier gone...
From: Ake Sandgren (ake.sandgren_at_[hidden])
Date: 2013-01-03 09:52:53


On Thu, 2013-01-03 at 06:18 -0800, Ralph Castain wrote:
> On Jan 3, 2013, at 3:01 AM, Ake Sandgren <ake.sandgren_at_[hidden]> wrote:
>
> > On Thu, 2013-01-03 at 11:54 +0100, Ake Sandgren wrote:
> >> On Thu, 2013-01-03 at 11:15 +0100, Ake Sandgren wrote:
> >>> Hi!
> >>>
> >>> The grpcomm component hier seems to have vanished between 1.6.1 and
> >>> 1.6.3.
> >>> Why?
> >>> It seems that the version of slurm we are using (not the latest at the
> >>> moment) is using it for startup.
>
> It should be using PMI if you are directly launching processes via srun, and should not be using hier any more.

Shouldn't the grpcomm pmi component be turned on by default then, if it
is needed?

> >>>
> >>
> >> Hmm it seems it is the ess_slurmd_module.c that is using grpcomm=hier.
>
> Yes - that is the *only* scenario (a direct launch of procs via srun) that should use hier

What i have in my submit file is:
#SBATCH -n x

srun some-mpi-binary

This fails since hier is missing.

The reason one wants to use srun and not mpirun is getting slurms cgroup
containement.

> >
> > orte/mca/plm/base/plm_base_rsh_support.c also tries to use the hier
> > grpcomm
>
> Something is very wrong if that is true. How was this configured, and how are you starting this job?

Not sure if it actually tries to use hier at runtime, i just noticed
that it had a setenv OMPI_MCA_grpcomm=hier in the code.

So what is the real problem here?

configure line is:
./configure --enable-orterun-prefix-by-default --enable-cxx-exceptions