Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] grpcomm component hier gone...
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-01-03 10:00:31


On Jan 3, 2013, at 6:52 AM, Ake Sandgren <ake.sandgren_at_[hidden]> wrote:

> On Thu, 2013-01-03 at 06:18 -0800, Ralph Castain wrote:
>> On Jan 3, 2013, at 3:01 AM, Ake Sandgren <ake.sandgren_at_[hidden]> wrote:
>>
>>> On Thu, 2013-01-03 at 11:54 +0100, Ake Sandgren wrote:
>>>> On Thu, 2013-01-03 at 11:15 +0100, Ake Sandgren wrote:
>>>>> Hi!
>>>>>
>>>>> The grpcomm component hier seems to have vanished between 1.6.1 and
>>>>> 1.6.3.
>>>>> Why?
>>>>> It seems that the version of slurm we are using (not the latest at the
>>>>> moment) is using it for startup.
>>
>> It should be using PMI if you are directly launching processes via srun, and should not be using hier any more.
>
> Shouldn't the grpcomm pmi component be turned on by default then, if it
> is needed?

It should be

>
>>>>>
>>>>
>>>> Hmm it seems it is the ess_slurmd_module.c that is using grpcomm=hier.
>>
>> Yes - that is the *only* scenario (a direct launch of procs via srun) that should use hier
>
> What i have in my submit file is:
> #SBATCH -n x
>
> srun some-mpi-binary
>
> This fails since hier is missing.
>
> The reason one wants to use srun and not mpirun is getting slurms cgroup
> containement.
>
>>>
>>> orte/mca/plm/base/plm_base_rsh_support.c also tries to use the hier
>>> grpcomm
>>
>> Something is very wrong if that is true. How was this configured, and how are you starting this job?
>
> Not sure if it actually tries to use hier at runtime, i just noticed
> that it had a setenv OMPI_MCA_grpcomm=hier in the code.
>
> So what is the real problem here?

Do you have PMI installed and running on your system? I think that is the source of the trouble - if PMI isn't running, then this will fail.

>
> configure line is:
> ./configure --enable-orterun-prefix-by-default --enable-cxx-exceptions
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users