Table of contents:
- How do I run jobs under Torque / PBS Pro?
- Does Open MPI support Open PBS?
- How does Open MPI get the list of hosts from Torque / PBS Pro?
- What happens if $PBS_NODEFILE is modified?
- Can I specify a hostfile or use the --host option to mpirun
when running in a Torque / PBS environment?
- How do I determine if Open MPI is configured for Torque/PBS Pro?
|1. How do I run jobs under Torque / PBS Pro?|
The short answer is just to use
mpirun as normal.
When properly configured, Open MPI obtains both the list of hosts and how many
processes to start on each host from Torque / PBS Pro directly.
Hence, it is unnecessary to specify the
-np options to
mpirun. Open MPI will use PBS/Torque-native
mechanisms to launch and kill processes (
ssh are not
# Allocate a PBS job with 4 nodes
shell$ qsub -I -lnodes=4
# Now run an Open MPI job on all the nodes allocated by PBS/Torque
# (starting with Open MPI v1.2; you need to specify -np for the 1.0
# and 1.1 series).
shell$ mpirun my_mpi_application
This will run the 4 MPI processes on the nodes that were allocated by
PBS/Torque. Or, if submitting a script:
shell$ cat my_script.sh
shell$ qsub -l nodes=4 my_script.sh
|2. Does Open MPI support Open PBS?|
As of this writing, Open PBS is so ancient that we are not
aware of any sites running it. As such, we have never tested Open MPI
with Open PBS and therefore do not know if it would work or not.
|3. How does Open MPI get the list of hosts from Torque / PBS Pro?|
Open MPI has changed how it obtains hosts from Torque / PBS
Pro over time:
- v1.0 and v1.1 series: The list of hosts allocated to a Torque /
PBS Pro job is obtained directly from the scheduler using the internal
- v1.2 series: Due to scalability limitations in how the TM API
was used in the v1.0 and v1.1 series, Open MPI was modified to read
the $PBS_NODEFILE to obtain hostnames. Specifically, reading the
$PBS_NODEFILE is much faster at scale than how the v1.0 and v1.1
series used the TM API.
It is possible that future versions of Open MPI may switch back to
using the TM API in a more scalable fashion, but there isn't currently
a huge demand for it (reading the $PBS_NODEFILE works just fine).
Note that the TM API is used to launch processes in all versions of
Open MPI; the only thing that has changed over time is how Open MPI
|4. What happens if $PBS_NODEFILE is modified?|
Bad Things will happen.
We've had reports from some sites that system administrators modify
the $PBS_NODEFILE in each job according to local policies. This will
currently cause Open MPI to behave in an unpredictable fashion. As
long as no new hosts are added to the hostfile, it usually means
that Open MPI will incorrectly map processes to hosts, but in some
cases it can cause Open MPI to fail to launch processes altogether.
The best course of action is to not modify the $PBS_NODEFILE.
|5. Can I specify a hostfile or use the --host option to mpirun
when running in a Torque / PBS environment?|
Prior to v1.3, no.
Open MPI <v1.3 will fail to launch processes properly when a hostfile is
specified on the
mpirun command line, or if the
option is used.
As of v1.3, Open MPI can use the
--host options in
conjunction with TM jobs.
|6. How do I determine if Open MPI is configured for Torque/PBS Pro?|
If you are configuring and installing Open MPI yourself, and you want
to insure that you are building the components of Open MPI required for
Torque/PBS Pro support, include the
--with-tm option on the
command line. Run
./configure --help for further information about this
ompi_info command can be used to determine whether or not an
installed Open MPI includes Torque/PBS Pro support:
shell$ ompi_info | grep ras
If the Open MPI installation includes support for Torque/PBS Pro, you
should see a line similar to that below. Note the MCA version information
varies depending on which version of Open MPI is installed.
MCA ras: tm (MCA v2.1.0, API v2.0.0, Component v3.0.0)