On Jan 3, 2013, at 6:52 AM, Ake Sandgren <ake.sandgren_at_[hidden]> wrote:
> On Thu, 2013-01-03 at 06:18 -0800, Ralph Castain wrote:
>> On Jan 3, 2013, at 3:01 AM, Ake Sandgren <ake.sandgren_at_[hidden]> wrote:
>>> On Thu, 2013-01-03 at 11:54 +0100, Ake Sandgren wrote:
>>>> On Thu, 2013-01-03 at 11:15 +0100, Ake Sandgren wrote:
>>>>> The grpcomm component hier seems to have vanished between 1.6.1 and
>>>>> It seems that the version of slurm we are using (not the latest at the
>>>>> moment) is using it for startup.
>> It should be using PMI if you are directly launching processes via srun, and should not be using hier any more.
> Shouldn't the grpcomm pmi component be turned on by default then, if it
> is needed?
It should be
>>>> Hmm it seems it is the ess_slurmd_module.c that is using grpcomm=hier.
>> Yes - that is the *only* scenario (a direct launch of procs via srun) that should use hier
> What i have in my submit file is:
> #SBATCH -n x
> srun some-mpi-binary
> This fails since hier is missing.
> The reason one wants to use srun and not mpirun is getting slurms cgroup
>>> orte/mca/plm/base/plm_base_rsh_support.c also tries to use the hier
>> Something is very wrong if that is true. How was this configured, and how are you starting this job?
> Not sure if it actually tries to use hier at runtime, i just noticed
> that it had a setenv OMPI_MCA_grpcomm=hier in the code.
> So what is the real problem here?
Do you have PMI installed and running on your system? I think that is the source of the trouble - if PMI isn't running, then this will fail.
> configure line is:
> ./configure --enable-orterun-prefix-by-default --enable-cxx-exceptions
> users mailing list