Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] SLURM environment variables at runtime
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-02-23 11:07:05


Resource managers generally frown on the idea of any program passing
RM-managed envars from one node to another, and this is certainly true of
slurm. The reason is that the RM reserves those values for its own use when
managing remote nodes. For example, if you got an allocation and then used
mpirun to launch a job across only a portion of that allocation, and then
ran another mpirun instance in parallel on the remainder of the nodes, the
slurm envars for those two mpirun instances -need- to be quite different.
Having mpirun forward the values it sees would cause the system to become
very confused.

We learned the hard way never to cross that line :-(

You have two options:

(a) you could get your sys admin to configure slurm correctly to provide
your desired envars on the remote nodes. This is the recommended (by slurm
and other RMs) way of getting what you requested. It is a simple
configuration option - if he needs help, he should contact the slurm mailing
list

(b) you can ask mpirun to do so, at your own risk. Specify each parameter
with a "-x FOO" argument. See "man mpirun" for details. Keep an eye out for
aberrant behavior.

Ralph

On Wed, Feb 23, 2011 at 8:38 AM, Henderson, Brent <brent.henderson_at_[hidden]>wrote:

> Hi Everyone, I have an OpenMPI/SLURM specific question,
>
>
>
> I’m using MPI as a launcher for another application I’m working on and it
> is dependent on the SLURM environment variables making their way into the
> a.out’s environment. This works as I need if I use HP-MPI/PMPI, but when I
> use OpenMPI, it appears that not all are set as I would like across all of
> the ranks.
>
>
>
> I have example output below from a simple a.out that just writes out the
> environment that it sees to a file whose name is based on the node name and
> rank number. Note that with OpenMPI, that things like SLURM_NNODES and
> SLURM_TASKS_PER_NODE are not set the same for ranks on the different nodes
> and things like SLURM_LOCALID are just missing entirely.
>
>
>
> So the question is, should the environment variables on the remote nodes
> (from the perspective of where the job is launched) have the full set of
> SLURM environment variables as seen on the launching node?
>
>
>
> Thanks,
>
>
>
> Brent Henderson
>
>
>
> [brent_at_node2 mpi]$ rm node*
>
> [brent_at_node2 mpi]$ mkdir openmpi hpmpi
>
> [brent_at_node2 mpi]$ salloc -N 2 -n 4 mpirun ./printenv.openmpi
>
> salloc: Granted job allocation 23
>
> Hello world! I'm 3 of 4 on node1
>
> Hello world! I'm 2 of 4 on node1
>
> Hello world! I'm 1 of 4 on node2
>
> Hello world! I'm 0 of 4 on node2
>
> salloc: Relinquishing job allocation 23
>
> [brent_at_node2 mpi]$ mv node* openmpi/
>
> [brent_at_node2 mpi]$ egrep
> 'NODEID|NNODES|LOCALID|NODELIST|NPROCS|PROCID|TASKS_PER'
> openmpi/node1.3.of.4
>
> SLURM_JOB_NODELIST=node[1-2]
>
> SLURM_NNODES=1
>
> SLURM_NODELIST=node[1-2]
>
> SLURM_TASKS_PER_NODE=1
>
> SLURM_NPROCS=1
>
> SLURM_STEP_NODELIST=node1
>
> SLURM_STEP_TASKS_PER_NODE=1
>
> SLURM_NODEID=0
>
> SLURM_PROCID=0
>
> SLURM_LOCALID=0
>
> [brent_at_node2 mpi]$ egrep
> 'NODEID|NNODES|LOCALID|NODELIST|NPROCS|PROCID|TASKS_PER'
> openmpi/node2.1.of.4
>
> SLURM_JOB_NODELIST=node[1-2]
>
> SLURM_NNODES=2
>
> SLURM_NODELIST=node[1-2]
>
> SLURM_TASKS_PER_NODE=2(x2)
>
> SLURM_NPROCS=4
>
> [brent_at_node2 mpi]$
>
>
>
>
>
> [brent_at_node2 mpi]$ /opt/hpmpi/bin/mpirun -srun -N 2 -n 4 ./printenv.hpmpi
>
> Hello world! I'm 2 of 4 on node2
>
> Hello world! I'm 3 of 4 on node2
>
> Hello world! I'm 0 of 4 on node1
>
> Hello world! I'm 1 of 4 on node1
>
> [brent_at_node2 mpi]$ mv node* hpmpi/
>
> [brent_at_node2 mpi]$ egrep
> 'NODEID|NNODES|LOCALID|NODELIST|NPROCS|PROCID|TASKS_PER' hpmpi/node1.1.of.4
>
> SLURM_NODELIST=node[1-2]
>
> SLURM_TASKS_PER_NODE=2(x2)
>
> SLURM_STEP_NODELIST=node[1-2]
>
> SLURM_STEP_TASKS_PER_NODE=2(x2)
>
> SLURM_NNODES=2
>
> SLURM_NPROCS=4
>
> SLURM_NODEID=0
>
> SLURM_PROCID=1
>
> SLURM_LOCALID=1
>
> [brent_at_node2 mpi]$ egrep
> 'NODEID|NNODES|LOCALID|NODELIST|NPROCS|PROCID|TASKS_PER' hpmpi/node2.3.of.4
>
> SLURM_NODELIST=node[1-2]
>
> SLURM_TASKS_PER_NODE=2(x2)
>
> SLURM_STEP_NODELIST=node[1-2]
>
> SLURM_STEP_TASKS_PER_NODE=2(x2)
>
> SLURM_NNODES=2
>
> SLURM_NPROCS=4
>
> SLURM_NODEID=1
>
> SLURM_PROCID=3
>
> SLURM_LOCALID=1
>
> [brent_at_node2 mpi]$
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>