Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] openmpi-1.7.5a1r30692 and slurm problems
From: Adrian Reber (adrian_at_[hidden])
Date: 2014-02-12 09:58:50


No, the system has only a few MOAB_* variables and many SLURM_*
variables:

$BASH $IFS $SECONDS $SLURM_PTY_PORT
$BASHOPTS $LINENO $SHELL $SLURM_PTY_WIN_COL
$BASHPID $LINES $SHELLOPTS $SLURM_PTY_WIN_ROW
$BASH_ALIASES $MACHTYPE $SHLVL $SLURM_SRUN_COMM_HOST
$BASH_ARGC $MAILCHECK $SLURMD_NODENAME $SLURM_SRUN_COMM_PORT
$BASH_ARGV $MOAB_CLASS $SLURM_CHECKPOINT_IMAGE_DIR $SLURM_STEPID
$BASH_CMDS $MOAB_GROUP $SLURM_CONF $SLURM_STEP_ID
$BASH_COMMAND $MOAB_JOBID $SLURM_CPUS_ON_NODE $SLURM_STEP_LAUNCHER_PORT
$BASH_LINENO $MOAB_NODECOUNT $SLURM_DISTRIBUTION $SLURM_STEP_NODELIST
$BASH_SOURCE $MOAB_PARTITION $SLURM_GTIDS $SLURM_STEP_NUM_NODES
$BASH_SUBSHELL $MOAB_PROCCOUNT $SLURM_JOBID $SLURM_STEP_NUM_TASKS
$BASH_VERSINFO $MOAB_SUBMITDIR $SLURM_JOB_CPUS_PER_NODE $SLURM_STEP_TASKS_PER_NODE
$BASH_VERSION $MOAB_USER $SLURM_JOB_ID $SLURM_SUBMIT_DIR
$COLUMNS $OPTERR $SLURM_JOB_NODELIST $SLURM_SUBMIT_HOST
$COMP_WORDBREAKS $OPTIND $SLURM_JOB_NUM_NODES $SLURM_TASKS_PER_NODE
$DIRSTACK $OSTYPE $SLURM_LAUNCH_NODE_IPADDR $SLURM_TASK_PID
$EUID $PATH $SLURM_LOCALID $SLURM_TOPOLOGY_ADDR
$GROUPS $POSIXLY_CORRECT $SLURM_NNODES $SLURM_TOPOLOGY_ADDR_PATTERN
$HISTCMD $PPID $SLURM_NODEID $SRUN_DEBUG
$HISTFILE $PS1 $SLURM_NODELIST $TERM
$HISTFILESIZE $PS2 $SLURM_NPROCS $TMPDIR
$HISTSIZE $PS4 $SLURM_NTASKS $UID
$HOSTNAME $PWD $SLURM_PRIO_PROCESS $_
$HOSTTYPE $RANDOM $SLURM_PROCID

On Wed, Feb 12, 2014 at 06:12:45AM -0800, Ralph Castain wrote:
> Seems rather odd - since this is managed by Moab, you shouldn't be seeing SLURM envars at all. What you should see are PBS_* envars, including a PBS_NODEFILE that actually contains the allocation.
>
>
> On Feb 12, 2014, at 4:42 AM, Adrian Reber <adrian_at_[hidden]> wrote:
>
> > I tried the nightly snapshot (openmpi-1.7.5a1r30692.tar.gz) on a system
> > with slurm and moab. I requested an interactive session using:
> >
> > msub -I -l nodes=3:ppn=8
> >
> > and started a simple test case which fails:
> >
> > $ mpirun -np 2 ./mpi-test 1
> > --------------------------------------------------------------------------
> > There are not enough slots available in the system to satisfy the 2 slots
> > that were requested by the application:
> > ./mpi-test
> >
> > Either request fewer slots for your application, or make more slots available
> > for use.
> > --------------------------------------------------------------------------
> > srun: error: xxxx108: task 1: Exited with exit code 1
> > srun: Terminating job step 131823.4
> > srun: error: xxxx107: task 0: Exited with exit code 1
> > srun: Job step aborted
> > slurmd[xxxx108]: *** STEP 131823.4 KILLED AT 2014-02-12T13:30:32 WITH SIGNAL 9 ***
> >
> >
> > requesting only one core works:
> >
> > $ mpirun ./mpi-test 1
> > 4.4.7 20120313 (Red Hat 4.4.7-4):Process 0 on xxxx106 out of 1: 0.000000
> > 4.4.7 20120313 (Red Hat 4.4.7-4):Process 0 on xxxx106 out of 1: 0.000000
> >
> >
> > using openmpi-1.6.5 works with multiple cores:
> >
> > $ mpirun -np 24 ./mpi-test 2
> > 4.4.7 20120313 (Red Hat 4.4.7-4):Process 0 on xxxx106 out of 24: 0.000000
> > 4.4.7 20120313 (Red Hat 4.4.7-4):Process 12 on xxxx106 out of 24: 12.000000
> > 4.4.7 20120313 (Red Hat 4.4.7-4):Process 11 on xxxx108 out of 24: 11.000000
> > 4.4.7 20120313 (Red Hat 4.4.7-4):Process 18 on xxxx106 out of 24: 18.000000
> >
> > $ echo $SLURM_JOB_CPUS_PER_NODE
> > 8(x3)
> >
> > I never used slurm before so this could also be a user error on my side.
> > But as 1.6.5 works it seems something has changed and wanted to let
> > you know in case it was not intentionally.
> >
> > Adrian
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

                Adrian

-- 
Adrian Reber <adrian_at_[hidden]>            http://lisas.de/~adrian/
"Let us all bask in television's warm glowing warming glow." -- Homer Simpson