Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] openmpi-1.7.5a1r30692 and slurm problems
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-02-12 13:47:13


Interesting - good to know. Thanks

On Feb 12, 2014, at 10:38 AM, Adrian Reber <adrian_at_[hidden]> wrote:

> It seems this is indeed a Moab bug for interactive jobs. At least a bug
> was opened against moab. Using non-interactive jobs the variables have
> the correct values and mpirun has no problems detecting the correct
> number of cores.
>
> On Wed, Feb 12, 2014 at 07:50:40AM -0800, Ralph Castain wrote:
>> Another possibility to check - it is entirely possible that Moab is miscommunicating the values to Slurm. You might need to check it - I'll install a copy of 2.6.5 on my machines and see if I get similar issues when Slurm does the allocation itself.
>>
>> On Feb 12, 2014, at 7:47 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>>
>>>
>>> On Feb 12, 2014, at 7:32 AM, Adrian Reber <adrian_at_[hidden]> wrote:
>>>
>>>>
>>>> $ msub -I -l nodes=3:ppn=8
>>>> salloc: Job is in held state, pending scheduler release
>>>> salloc: Pending job allocation 131828
>>>> salloc: job 131828 queued and waiting for resources
>>>> salloc: job 131828 has been allocated resources
>>>> salloc: Granted job allocation 131828
>>>> sh-4.1$ echo $SLURM_TASKS_PER_NODE
>>>> 1
>>>> sh-4.1$ rpm -q slurm
>>>> slurm-2.6.5-1.el6.x86_64
>>>> sh-4.1$ echo $SLURM_NNODES
>>>> 1
>>>> sh-4.1$ echo $SLURM_JOB_NODELIST
>>>> xxxx[107-108,176]
>>>> sh-4.1$ echo $SLURM_JOB_CPUS_PER_NODE
>>>> 8(x3)
>>>> sh-4.1$ echo $SLURM_NODELIST
>>>> xxxx[107-108,176]
>>>> sh-4.1$ echo $SLURM_NPROCS
>>>> 1
>>>> sh-4.1$ echo $SLURM_NTASKS
>>>> 1
>>>> sh-4.1$ echo $SLURM_TASKS_PER_NODE
>>>> 1
>>>>
>>>> The information in *_NODELIST seems to make sense, but all the other
>>>> variables (PROCS, TASKS, NODES) report '1', which seems wrong.
>>>
>>> Indeed - and that's the problem. Slurm 2.6.5 is the most recent release, and my guess is that SchedMD once again has changed the @$!#%#@ meaning of their envars. Frankly, it is nearly impossible to track all the variants they have created over the years.
>>>
>>> Please check to see if someone did a little customizing on your end as sometimes people do that to Slurm. Could also be they did something in the Slurm config file that is causing the changed behavior.
>>>
>>> Meantime, I'll try to ponder a potential solution in case this really is the "latest" Slurm screwup.
>>>
>>>
>>>>
>>>>
>>>> On Wed, Feb 12, 2014 at 07:19:54AM -0800, Ralph Castain wrote:
>>>>> ...and your version of Slurm?
>>>>>
>>>>> On Feb 12, 2014, at 7:19 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>>>
>>>>>> What is your SLURM_TASKS_PER_NODE?
>>>>>>
>>>>>> On Feb 12, 2014, at 6:58 AM, Adrian Reber <adrian_at_[hidden]> wrote:
>>>>>>
>>>>>>> No, the system has only a few MOAB_* variables and many SLURM_*
>>>>>>> variables:
>>>>>>>
>>>>>>> $BASH $IFS $SECONDS $SLURM_PTY_PORT
>>>>>>> $BASHOPTS $LINENO $SHELL $SLURM_PTY_WIN_COL
>>>>>>> $BASHPID $LINES $SHELLOPTS $SLURM_PTY_WIN_ROW
>>>>>>> $BASH_ALIASES $MACHTYPE $SHLVL $SLURM_SRUN_COMM_HOST
>>>>>>> $BASH_ARGC $MAILCHECK $SLURMD_NODENAME $SLURM_SRUN_COMM_PORT
>>>>>>> $BASH_ARGV $MOAB_CLASS $SLURM_CHECKPOINT_IMAGE_DIR $SLURM_STEPID
>>>>>>> $BASH_CMDS $MOAB_GROUP $SLURM_CONF $SLURM_STEP_ID
>>>>>>> $BASH_COMMAND $MOAB_JOBID $SLURM_CPUS_ON_NODE $SLURM_STEP_LAUNCHER_PORT
>>>>>>> $BASH_LINENO $MOAB_NODECOUNT $SLURM_DISTRIBUTION $SLURM_STEP_NODELIST
>>>>>>> $BASH_SOURCE $MOAB_PARTITION $SLURM_GTIDS $SLURM_STEP_NUM_NODES
>>>>>>> $BASH_SUBSHELL $MOAB_PROCCOUNT $SLURM_JOBID $SLURM_STEP_NUM_TASKS
>>>>>>> $BASH_VERSINFO $MOAB_SUBMITDIR $SLURM_JOB_CPUS_PER_NODE $SLURM_STEP_TASKS_PER_NODE
>>>>>>> $BASH_VERSION $MOAB_USER $SLURM_JOB_ID $SLURM_SUBMIT_DIR
>>>>>>> $COLUMNS $OPTERR $SLURM_JOB_NODELIST $SLURM_SUBMIT_HOST
>>>>>>> $COMP_WORDBREAKS $OPTIND $SLURM_JOB_NUM_NODES $SLURM_TASKS_PER_NODE
>>>>>>> $DIRSTACK $OSTYPE $SLURM_LAUNCH_NODE_IPADDR $SLURM_TASK_PID
>>>>>>> $EUID $PATH $SLURM_LOCALID $SLURM_TOPOLOGY_ADDR
>>>>>>> $GROUPS $POSIXLY_CORRECT $SLURM_NNODES $SLURM_TOPOLOGY_ADDR_PATTERN
>>>>>>> $HISTCMD $PPID $SLURM_NODEID $SRUN_DEBUG
>>>>>>> $HISTFILE $PS1 $SLURM_NODELIST $TERM
>>>>>>> $HISTFILESIZE $PS2 $SLURM_NPROCS $TMPDIR
>>>>>>> $HISTSIZE $PS4 $SLURM_NTASKS $UID
>>>>>>> $HOSTNAME $PWD $SLURM_PRIO_PROCESS $_
>>>>>>> $HOSTTYPE $RANDOM $SLURM_PROCID
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Feb 12, 2014 at 06:12:45AM -0800, Ralph Castain wrote:
>>>>>>>> Seems rather odd - since this is managed by Moab, you shouldn't be seeing SLURM envars at all. What you should see are PBS_* envars, including a PBS_NODEFILE that actually contains the allocation.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Feb 12, 2014, at 4:42 AM, Adrian Reber <adrian_at_[hidden]> wrote:
>>>>>>>>
>>>>>>>>> I tried the nightly snapshot (openmpi-1.7.5a1r30692.tar.gz) on a system
>>>>>>>>> with slurm and moab. I requested an interactive session using:
>>>>>>>>>
>>>>>>>>> msub -I -l nodes=3:ppn=8
>>>>>>>>>
>>>>>>>>> and started a simple test case which fails:
>>>>>>>>>
>>>>>>>>> $ mpirun -np 2 ./mpi-test 1
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>> There are not enough slots available in the system to satisfy the 2 slots
>>>>>>>>> that were requested by the application:
>>>>>>>>> ./mpi-test
>>>>>>>>>
>>>>>>>>> Either request fewer slots for your application, or make more slots available
>>>>>>>>> for use.
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>> srun: error: xxxx108: task 1: Exited with exit code 1
>>>>>>>>> srun: Terminating job step 131823.4
>>>>>>>>> srun: error: xxxx107: task 0: Exited with exit code 1
>>>>>>>>> srun: Job step aborted
>>>>>>>>> slurmd[xxxx108]: *** STEP 131823.4 KILLED AT 2014-02-12T13:30:32 WITH SIGNAL 9 ***
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> requesting only one core works:
>>>>>>>>>
>>>>>>>>> $ mpirun ./mpi-test 1
>>>>>>>>> 4.4.7 20120313 (Red Hat 4.4.7-4):Process 0 on xxxx106 out of 1: 0.000000
>>>>>>>>> 4.4.7 20120313 (Red Hat 4.4.7-4):Process 0 on xxxx106 out of 1: 0.000000
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> using openmpi-1.6.5 works with multiple cores:
>>>>>>>>>
>>>>>>>>> $ mpirun -np 24 ./mpi-test 2
>>>>>>>>> 4.4.7 20120313 (Red Hat 4.4.7-4):Process 0 on xxxx106 out of 24: 0.000000
>>>>>>>>> 4.4.7 20120313 (Red Hat 4.4.7-4):Process 12 on xxxx106 out of 24: 12.000000
>>>>>>>>> 4.4.7 20120313 (Red Hat 4.4.7-4):Process 11 on xxxx108 out of 24: 11.000000
>>>>>>>>> 4.4.7 20120313 (Red Hat 4.4.7-4):Process 18 on xxxx106 out of 24: 18.000000
>>>>>>>>>
>>>>>>>>> $ echo $SLURM_JOB_CPUS_PER_NODE
>>>>>>>>> 8(x3)
>>>>>>>>>
>>>>>>>>> I never used slurm before so this could also be a user error on my side.
>>>>>>>>> But as 1.6.5 works it seems something has changed and wanted to let
>>>>>>>>> you know in case it was not intentionally.
>>>>>>>>>
>>>>>>>>> Adrian
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> devel_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>>> Adrian
>>>>>>>
>>>>>>> --
>>>>>>> Adrian Reber <adrian_at_[hidden]> http://lisas.de/~adrian/
>>>>>>> "Let us all bask in television's warm glowing warming glow." -- Homer Simpson
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> Adrian
>>>>
>>>> --
>>>> Adrian Reber <adrian_at_[hidden]> http://lisas.de/~adrian/
>>>> There's got to be more to life than compile-and-go.
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> Adrian
>
> --
> Adrian Reber <adrian_at_[hidden]> http://lisas.de/~adrian/
> Killing is wrong.
> -- Losira, "That Which Survives", stardate unknown
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel