Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] mpirun oddity w/ PBS on an SGI UV
From: Paul Hargrove (phhargrove_at_[hidden])
Date: 2014-01-31 17:33:31


I am trying to test the trunk on an SGI UV (to validate Nathan's port of
btl:vader to SGI's variant of xpmem).

At configure time, PBS's TM support was correctly located.

My PBS batch script includes
  #PBS -l ncpus=16
because that is what this installation requires (not nodes, mppnodes, or
anything like that).
One is allocating cpus on a large shared-memory machine, not a set of nodes
in a cluster.

However, this appears to be causing mpirun to think I have just 1 slot:

+ mpirun -np 2 ./ring_c
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2 slots
that were requested by the application:
  ./ring_c

Either request fewer slots for your application, or make more slots
available
for use.
--------------------------------------------------------------------------

In case they contain useful info, here are the PBS env vars in the job:

PBS_HT_NCPUS=32
PBS_VERSION=TORQUE-2.3.13
PBS_JOBNAME=qs
PBS_ENVIRONMENT=PBS_BATCH
PBS_HOME=/var/spool/torque
PBS_O_WORKDIR=/usr/users/6/hargrove/SCRATCH/OMPI/openmpi-trunk-linux-x86_64-uv-trunk/BLD/examples
PBS_PPN=16
PBS_TASKNUM=1
PBS_O_HOME=/usr/users/6/hargrove
PBS_MOMPORT=15003
PBS_O_QUEUE=debug
PBS_O_LOGNAME=hargrove
PBS_O_LANG=en_US.UTF-8
PBS_JOBCOOKIE=9EEF5DF75FA705A241FEF66EDFE01C5B
PBS_NODENUM=0
PBS_O_SHELL=/usr/psc/shells/bash
PBS_SERVER=tg-login1.blacklight.psc.teragrid.org
PBS_JOBID=314827.tg-login1.blacklight.psc.teragrid.org
PBS_NCPUS=16
PBS_O_HOST=tg-login1.blacklight.psc.teragrid.org
PBS_VNODENUM=0
PBS_QUEUE=debug_r1
PBS_O_MAIL=/var/mail/hargrove
PBS_NODEFILE=/var/spool/torque/aux//
314827.tg-login1.blacklight.psc.teragrid.org
PBS_O_PATH=[...removed...]

If any additional info is needed to help make mpirun "just work", please
let me know.

However, at this point I am mostly interested in any work-arounds that will
let me run something other than a singleton on this system.

-Paul

-- 
Paul H. Hargrove                          PHHargrove_at_[hidden]
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900