Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] mpirun oddity w/ PBS on an SGI UV
From: Paul Hargrove (phhargrove_at_[hidden])
Date: 2014-02-06 12:47:26


Ralph,

It worked on my second try, when I spelled it "ras_tm_smp" :-)

Thanks,
-Paul

On Wed, Feb 5, 2014 at 11:59 AM, Paul Hargrove <phhargrove_at_[hidden]> wrote:

> Ralph,
>
> I will try to build tonight's trunk tarball and then test a run tomorrow.
> Please ping me if I don't post my results by Thu evening (PST).
>
> -Paul
>
>
> On Wed, Feb 5, 2014 at 7:52 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> I added this to the trunk in r30568 - a new MCA param "ras_tm_smp_mode"
>> will tell us to use the PBS_PPN envar to get the number of slots allocated
>> per node. We then just use the PBS_Nodefile to read the names of the nodes,
>> which I expect will be one for each partition.
>>
>> Let me know if this solves the problem - I scheduled it for 1.7.5
>>
>> Thanks!
>> Ralph
>>
>> On Jan 31, 2014, at 4:33 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>
>> No worries about PBS itself - better to allow you to just run this way.
>> Easy to add a switch for this purpose.
>>
>> For now, just add --oversubscribe to the command line
>>
>> On Jan 31, 2014, at 3:32 PM, Paul Hargrove <phhargrove_at_[hidden]> wrote:
>>
>> Ralph,
>>
>> The mods may have been done by the staff at PSC rather than by SGI.
>> Note the "_psc" suffix:
>> $ which pbsnodes
>> /usr/local/packages/torque/2.3.13_psc/bin/pbsnodes
>>
>> Their sources appear to be available in the f/s too.
>> Using "tar -d" to compare that to the pristine torque-2.3.13 tarball show
>> the following files were modified:
>> torque-2.3.13/src/resmom/job_func.c
>> torque-2.3.13/src/resmom/mom_main.c
>> torque-2.3.13/src/resmom/requests.c
>> torque-2.3.13/src/resmom/linux/mom_mach.h
>> torque-2.3.13/src/resmom/linux/mom_mach.c
>> torque-2.3.13/src/resmom/linux/cpuset.c
>> torque-2.3.13/src/resmom/start_exec.c
>> torque-2.3.13/src/scheduler.tcl/pbs_sched.c
>> torque-2.3.13/src/cmds/qalter.c
>> torque-2.3.13/src/cmds/qsub.c
>> torque-2.3.13/src/cmds/qstat.c
>> torque-2.3.13/src/server/resc_def_all.c
>> torque-2.3.13/src/server/req_quejob.c
>> torque-2.3.13/torque.spec
>>
>> I'll provide what assistance I can in testing.
>> That includes providing (off-list) the actual diffs of PSC's torque
>> against the tarball, if desired.
>>
>> In the meantime, since -npernode didn't work, what is the right way to
>> say:
>> "I have 1 slot but I want to overcommit and run 16 mpi ranks".
>>
>> -Paul
>>
>>
>> On Fri, Jan 31, 2014 at 3:20 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>
>>>
>>> On Jan 31, 2014, at 3:13 PM, Paul Hargrove <phhargrove_at_[hidden]> wrote:
>>>
>>> Ralph,
>>>
>>> As I said this is NOT a cluster - it is a 4k-core shared memory machine.
>>>
>>>
>>> I understood - that wasn't the nature of my question
>>>
>>> TORQUE is allocating cpus (time-shared mode, IIRC), not nodes.
>>> So, there is always exactly one line in $PBS_NODESFILE.
>>>
>>>
>>> Interesting - because that isn't the standard way Torque behaves. It is
>>> supposed to put one line/slot in the nodefile, each line containing the
>>> name of the node. Clearly, SGI has reconfigured Torque to do something
>>> different.
>>>
>>>
>>> The system runs as 2 partitions of 2k-cores each.
>>> So, the contents odf$PBS_NODESFILE has exactly 2 possible values, each 1
>>> line.
>>>
>>> The values of PBS_PPN and PBS_NCPUS both reflect the size of the
>>> allocation.
>>>
>>> At a minimum, shouldn't Open MPI be multiplying the lines in
>>> $PBS_NODESFILE by the value of $PBS_PPN?
>>>
>>>
>>> No, as above, that isn't the way Torque generally behaves. It would
>>> appear that we need a "switch" here to handle SGI's modifications. Should
>>> be doable - just haven't had anyone using an SGI machine before :-)
>>>
>>>
>>> Additionally, when I try "mpirun -npernode 16 ./ring_c" I am still told
>>> there are not enough slots.
>>> Shouldn't that be working with 1 line is $PBS_NODESFILE?
>>>
>>> -Paul
>>>
>>>
>>>
>>>
>>> On Fri, Jan 31, 2014 at 2:47 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>
>>>> We read the nodes from the PBS_NODEFILE, Paul - can you pass that along?
>>>>
>>>> On Jan 31, 2014, at 2:33 PM, Paul Hargrove <phhargrove_at_[hidden]> wrote:
>>>>
>>>> I am trying to test the trunk on an SGI UV (to validate Nathan's port
>>>> of btl:vader to SGI's variant of xpmem).
>>>>
>>>> At configure time, PBS's TM support was correctly located.
>>>>
>>>> My PBS batch script includes
>>>> #PBS -l ncpus=16
>>>> because that is what this installation requires (not nodes, mppnodes,
>>>> or anything like that).
>>>> One is allocating cpus on a large shared-memory machine, not a set of
>>>> nodes in a cluster.
>>>>
>>>> However, this appears to be causing mpirun to think I have just 1 slot:
>>>>
>>>> + mpirun -np 2 ./ring_c
>>>>
>>>> --------------------------------------------------------------------------
>>>> There are not enough slots available in the system to satisfy the 2
>>>> slots
>>>> that were requested by the application:
>>>> ./ring_c
>>>>
>>>> Either request fewer slots for your application, or make more slots
>>>> available
>>>> for use.
>>>>
>>>> --------------------------------------------------------------------------
>>>>
>>>> In case they contain useful info, here are the PBS env vars in the job:
>>>>
>>>> PBS_HT_NCPUS=32
>>>> PBS_VERSION=TORQUE-2.3.13
>>>> PBS_JOBNAME=qs
>>>> PBS_ENVIRONMENT=PBS_BATCH
>>>> PBS_HOME=/var/spool/torque
>>>>
>>>> PBS_O_WORKDIR=/usr/users/6/hargrove/SCRATCH/OMPI/openmpi-trunk-linux-x86_64-uv-trunk/BLD/examples
>>>> PBS_PPN=16
>>>> PBS_TASKNUM=1
>>>> PBS_O_HOME=/usr/users/6/hargrove
>>>> PBS_MOMPORT=15003
>>>> PBS_O_QUEUE=debug
>>>> PBS_O_LOGNAME=hargrove
>>>> PBS_O_LANG=en_US.UTF-8
>>>> PBS_JOBCOOKIE=9EEF5DF75FA705A241FEF66EDFE01C5B
>>>> PBS_NODENUM=0
>>>> PBS_O_SHELL=/usr/psc/shells/bash
>>>> PBS_SERVER=tg-login1.blacklight.psc.teragrid.org
>>>> PBS_JOBID=314827.tg-login1.blacklight.psc.teragrid.org
>>>> PBS_NCPUS=16
>>>> PBS_O_HOST=tg-login1.blacklight.psc.teragrid.org
>>>> PBS_VNODENUM=0
>>>> PBS_QUEUE=debug_r1
>>>> PBS_O_MAIL=/var/mail/hargrove
>>>> PBS_NODEFILE=/var/spool/torque/aux//
>>>> 314827.tg-login1.blacklight.psc.teragrid.org
>>>> PBS_O_PATH=[...removed...]
>>>>
>>>> If any additional info is needed to help make mpirun "just work",
>>>> please let me know.
>>>>
>>>> However, at this point I am mostly interested in any work-arounds that
>>>> will let me run something other than a singleton on this system.
>>>>
>>>> -Paul
>>>>
>>>> --
>>>> Paul H. Hargrove PHHargrove_at_[hidden]
>>>> Future Technologies Group
>>>> Computer and Data Sciences Department Tel: +1-510-495-2352
>>>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>
>>>
>>>
>>> --
>>> Paul H. Hargrove PHHargrove_at_[hidden]
>>> Future Technologies Group
>>> Computer and Data Sciences Department Tel: +1-510-495-2352
>>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>>
>>
>> --
>> Paul H. Hargrove PHHargrove_at_[hidden]
>> Future Technologies Group
>> Computer and Data Sciences Department Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
>
> --
> Paul H. Hargrove PHHargrove_at_[hidden]
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>

-- 
Paul H. Hargrove                          PHHargrove_at_[hidden]
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900