Open MPI logo

FAQ:
Running jobs under Torque / PBS Pro

  |   Home   |   Support   |   FAQ   |   all just the FAQ

Table of contents:

  1. How do I run jobs under Torque / PBS Pro?
  2. Does Open MPI support Open PBS?
  3. How does Open MPI get the list of hosts from Torque / PBS Pro?
  4. What happens if $PBS_NODEFILE is modified?
  5. Can I specify a hostfile or use the --host option to mpirun when running in a Torque / PBS environment?
  6. How do I determine if Open MPI is configured for Torque/PBS Pro?


1. How do I run jobs under Torque / PBS Pro?

The short answer is just to use mpirun as normal.

When properly configured, Open MPI obtains both the list of hosts and how many processes to start on each host from Torque / PBS Pro directly. Hence, it is unnecessary to specify the --hostfile, --host, or -np options to mpirun. Open MPI will use PBS/Torque-native mechanisms to launch and kill processes (rsh and/or ssh are not required).

For example:

1
2
3
4
5
6
7
# Allocate a PBS job with 4 nodes
shell$ qsub -I -lnodes=4
 
# Now run an Open MPI job on all the nodes allocated by PBS/Torque
# (starting with Open MPI v1.2; you need to specify -np for the 1.0
# and 1.1 series).
shell$ mpirun my_mpi_application

This will run the 4 MPI processes on the nodes that were allocated by PBS/Torque. Or, if submitting a script:

1
2
3
4
shell$ cat my_script.sh
#!/bin/sh
mpirun my_mpi_application
shell$ qsub -l nodes=4 my_script.sh


2. Does Open MPI support Open PBS?

As of this writing, Open PBS is so ancient that we are not aware of any sites running it. As such, we have never tested Open MPI with Open PBS and therefore do not know if it would work or not.


3. How does Open MPI get the list of hosts from Torque / PBS Pro?

Open MPI has changed how it obtains hosts from Torque / PBS Pro over time:

  • v1.0 and v1.1 series: The list of hosts allocated to a Torque / PBS Pro job is obtained directly from the scheduler using the internal TM API.
  • v1.2 series: Due to scalability limitations in how the TM API was used in the v1.0 and v1.1 series, Open MPI was modified to read the $PBS_NODEFILE to obtain hostnames. Specifically, reading the $PBS_NODEFILE is much faster at scale than how the v1.0 and v1.1 series used the TM API.

It is possible that future versions of Open MPI may switch back to using the TM API in a more scalable fashion, but there isn't currently a huge demand for it (reading the $PBS_NODEFILE works just fine).

Note that the TM API is used to launch processes in all versions of Open MPI; the only thing that has changed over time is how Open MPI obtains hostnames.


4. What happens if $PBS_NODEFILE is modified?

Bad Things will happen.

We've had reports from some sites that system administrators modify the $PBS_NODEFILE in each job according to local policies. This will currently cause Open MPI to behave in an unpredictable fashion. As long as no new hosts are added to the hostfile, it usually means that Open MPI will incorrectly map processes to hosts, but in some cases it can cause Open MPI to fail to launch processes altogether.

The best course of action is to not modify the $PBS_NODEFILE.


5. Can I specify a hostfile or use the --host option to mpirun when running in a Torque / PBS environment?

Prior to v1.3, no.

Open MPI <v1.3 will fail to launch processes properly when a hostfile is specified on the mpirun command line, or if the mpirun --host option is used.

As of v1.3, Open MPI can use the --hostfile and --host options in conjunction with TM jobs.


6. How do I determine if Open MPI is configured for Torque/PBS Pro?

If you are configuring and installing Open MPI yourself, and you want to insure that you are building the components of Open MPI required for Torque/PBS Pro support, include the --with-tm option on the configure command line. Run ./configure --help for further information about this configure option.

The ompi_info command can be used to determine whether or not an installed Open MPI includes Torque/PBS Pro support:

1
shell$ ompi_info | grep ras

If the Open MPI installation includes support for Torque/PBS Pro, you should see a line similar to that below. Note the MCA version information varies depending on which version of Open MPI is installed.

1
      MCA ras: tm (MCA v2.1.0, API v2.0.0, Component v3.0.0)