This FAQ is for Open MPI v4.x and earlier.
If you are looking for documentation for Open MPI v5.x and later, please visit docs.open-mpi.org.
Table of contents:
- What prerequisites are necessary for running an Open MPI job?
- What ABI guarantees does Open MPI provide?
- Do I need a common filesystem on all my nodes?
- How do I add Open MPI to my
PATH and LD_LIBRARY_PATH ?
- What if I can't modify my
PATH and/or LD_LIBRARY_PATH ?
- How do I launch Open MPI parallel jobs?
- How do I run a simple SPMD MPI job?
- How do I run an MPMD MPI job?
- How do I specify the hosts on which my MPI job runs?
- I can run ompi_info and launch MPI jobs on a single host, but not across multiple hosts. Why?
- How can I diagnose problems when running across multiple hosts?
- When I build Open MPI with the Intel compilers, I get warnings
about "orted" or my MPI application not finding libimf.so. What do I do?
- When I build Open MPI with the PGI compilers, I get warnings
about "orted" or my MPI application not finding libpgc.so. What do I do?
- When I build Open MPI with the PathScale compilers, I get warnings
about "orted" or my MPI application not finding libmv.so. What do I do?
- Can I run non-MPI programs with
mpirun / mpiexec ?
- Can I run GUI applications with Open MPI?
- Can I run ncurses-based / curses-based / applications with
funky input schemes with Open MPI?
- What other options are available to
mpirun ?
- How do I use the
--hostfile option to mpirun ?
- How do I use the
--host option to mpirun ?
- How do I control how my processes are scheduled across nodes?
- I'm not using a hostfile. How are slots calculated?
- Can I run multiple parallel processes on a uniprocessor machine?
- Can I oversubscribe nodes (run more processes than processors)?
- Can I force Agressive or Degraded performance modes?
- How do I run with the TotalView parallel debugger?
- How do I run with the DDT parallel debugger?
- What launchers are available?
- How do I specify to the
rsh launcher to use rsh or ssh ?
- How do I run with the Slurm and PBS/Torque launchers?
- Can I suspend and resume my MPI job?
- How do I run with LoadLeveler?
- How do I load libmpi at runtime?
- What MPI environmental variables exist?
- How do I get my MPI job to wireup its MPI connections right away?
- What kind of CUDA support exists in Open MPI?
- What are the Libfabric (OFI) components in Open MPI?
- How can Open MPI communicate with Intel Omni-Path Architecture (OPA)
based devices?
1. What prerequisites are necessary for running an Open MPI job? |
In general, Open MPI requires that its executables are in your
PATH on every node that you will run on and if Open MPI was compiled
as dynamic libraries (which is the default), the directory where its
libraries are located must be in your LD_LIBRARY_PATH on every node.
Specifically, if Open MPI was installed with a prefix of /opt/openmpi,
then the following should be in your PATH and LD_LIBRARY_PATH
1
2
| PATH: /opt/openmpi/bin
LD_LIBRARY_PATH: /opt/openmpi/lib |
Depending on your environment, you may need to set these values in
your shell startup files (e.g., .profile , .cshrc , etc.).
NOTE: There are exceptions to this rule — notably the --prefix option to mpirun.
See this FAQ entry for more
details on how to add Open MPI to your PATH and LD_LIBRARY_PATH .
Additionally, Open MPI requires that jobs can be started on remote
nodes without any input from the keyboard. For example, if using
rsh or ssh as the remote agent, you must have your environment
setup to allow execution on remote nodes without entering a password
or passphrase.
2. What ABI guarantees does Open MPI provide? |
Open MPI's versioning and ABI scheme is described
here, but is summarized here in this FAQ entry for convenience.
Open MPI provided forward application binary interface (ABI)
compatibility for MPI applications starting with v1.3.2. Prior to
that version, no ABI guarantees were provided.
NOTE: Prior to v1.3.2, subtle
and strange failures are almost guaranteed to occur if applications
were compiled and linked against shared libraries from one version of
Open MPI and then run with another. The Open MPI team strongly
discourages making any ABI assumptions before v1.3.2.
NOTE: ABI for the "use mpi"
Fortran interface was inadvertantly broken in the v1.6.3 release, and
was restored in the v1.6.4 release. Any Fortran applications that
utilize the "use mpi" MPI interface that were compiled and linked
against the v1.6.3 release will not be link-time compatible with other
releases in the 1.5.x / 1.6.x series. Such applications remain source
compatible, however, and can be recompiled/re-linked with other Open
MPI releases.
Starting with v1.3.2, Open MPI provides forward ABI compatibility —
with respect to the MPI API only — in all versions of a given feature
release series and its corresponding super stable
series. For example, on a single platform, an MPI application
linked against Open MPI v1.3.2 shared libraries can be updated to
point to the shared libraries in any successive v1.3.x or v1.4 release
and still work properly (e.g., via the LD_LIBRARY_PATH environment
variable or other operating system mechanism).
For the v1.5 series, this means that all releases of v1.5.x and v1.6.x
will be ABI compatible, per the above definition.
Open MPI reserves the right to break ABI compatibility at new feature
release series. For example, the same MPI application from above
(linked against Open MPI v1.3.2 shared libraries) will not work with
Open MPI v1.5 shared libraries. Similarly, MPI applications
compiled/linked against Open MPI 1.6.x will not be ABI compatible with
Open MPI 1.7.x
3. Do I need a common filesystem on all my nodes? |
No, but it certainly makes life easier if you do.
A common environment to run Open MPI is in a "Beowulf"-class or
similar cluster (e.g., a bunch of 1U servers in a bunch of racks).
Simply stated, Open MPI can run on a group of servers or workstations
connected by a network. As mentioned above, there are several
prerequisites, however (for example, you typically must have an
account on all the machines, you can rsh or ssh between the nodes
without using a password, etc.).
Regardless of whether Open MPI is installed on a shared / networked
filesystem or independently on each node, it is usually easiest if
Open MPI is available in the same filesystem location on every node.
For example, if you install Open MPI to /opt/openmpi-5.0.5 on
one node, ensure that it is available in /opt/openmpi-5.0.5
on all nodes.
This FAQ entry
has a bunch more information about installation locations for Open
MPI.
4. How do I add Open MPI to my PATH and LD_LIBRARY_PATH ? |
Open MPI must be able to find its executables in your PATH
on every node (if Open MPI was compiled as dynamic libraries, then its
library path must appear in LD_LIBRARY_PATH as well). As such, your
configuration/initialization files need to add Open MPI to your PATH
/ LD_LIBRARY_PATH properly.
How to do this may be highly dependent upon your local configuration,
so you may need to consult with your local system administrator. Some
system administrators take care of these details for you, some don't.
YMMV. Some common examples are included below, however.
You must have at least a minimum understanding of how your shell works
to get Open MPI in your PATH / LD_LIBRARY_PATH properly. Note
that Open MPI must be added to your PATH and LD_LIBRARY_PATH in
two situations: (1) when you login to an interactive shell,
(2) and when you login to non-interactive shells on remote nodes.
- If (1) is not configured properly, executables like
mpicc will
not be found, and it is typically obvious what is wrong. The Open MPI
executable directory can manually be added to the PATH , or the
user's startup files can be modified such that the Open MPI
executables are added to the PATH every login. This latter approach
is preferred.
All shells have some kind of script file that is executed at login
time to set things like PATH and LD_LIBRARY_PATH and perform other
environmental setup tasks. This startup file is the one that needs to
be edited to add Open MPI to the PATH and LD_LIBRARY_PATH . Consult
the manual page for your shell for specific details (some shells are
picky about the permissions of the startup file, for example). The
table below lists some common shells and the startup files that they
read/execute upon login:
Shell |
Interactive login startup file |
sh (Bourne shell, or bash named "sh ") |
.profile |
csh |
.cshrc followed by .login |
tcsh |
.tcshrc if it exists, .cshrc if it does not, followed by
.login |
bash |
.bash_profile if it exists, or .bash_login if it exists, or
.profile if it exists (in that order). Note that some Linux
distributions automatically come with .bash_profile scripts for
users that automatically execute .bashrc as well. Consult the bash
man page for more information. |
- If (2) is not configured properly, executables like
mpirun will
not function properly, and it can be somewhat confusing to figure out
(particularly for bash users).
The startup files in question here are the ones that are
automatically executed for a non-interactive login on a remote node
(e.g., "rsh othernode ps "). Note that not all shells support
this, and that some shells use different files for this than listed in
(1). Some shells will supersede (2) with (1). That is, fulfilling
(2) may automatically fulfill (1). The following table lists some
common shells and the startup file that is automatically executed,
either by Open MPI or by the shell itself:
Shell |
Non-interactive login startup file |
sh (Bourne or bash named "sh ") |
This shell does not execute any file automatically, so Open MPI
will execute the .profile script before invoking Open MPI
executables on remote nodes |
csh |
.cshrc |
tcsh |
.tcshrc if it exists, or .cshrc if it does not |
bash |
.bashrc if it exists |
5. What if I can't modify my PATH and/or LD_LIBRARY_PATH ? |
There are some situations where you cannot modify the PATH or
LD_LIBRARY_PATH — e.g., some ISV applications prefer to hide all
parallelism from the user, and therefore do not want to make the user
modify their shell startup files. Another case is where you want a
single user to be able to launch multiple MPI jobs simultaneously,
each with a different MPI implementation. Hence, setting shell
startup files to point to one MPI implementation would be problematic.
In such cases, you have two options:
- Use
mpirun 's --prefix command line option (described
below).
- Modify the wrapper compilers to include directives to include
run-time search locations for the Open MPI libraries (see this FAQ entry)
mpirun 's --prefix command line option takes as an argument the
top-level directory where Open MPI was installed. While relative
directory names are possible, they can become ambiguous depending on
the job launcher used; using absolute directory names is strongly
recommended.
For example, say that Open MPI was installed into
/opt/openmpi-5.0.5 . You would use the --prefix option like
this:
1
| shell$ mpirun --prefix /opt/openmpi-5.0.5 -np 4 a.out |
This will prefix the PATH and LD_LIBRARY_PATH on both the local
and remote hosts with /opt/openmpi-5.0.5/bin and
/opt/openmpi-5.0.5/lib , respectively. This is usually
unnecessary when using resource managers to launch jobs (e.g., Slurm,
Torque, etc.) because they tend to copy the entire local environment
— to include the PATH and LD_LIBRARY_PATH — to remote nodes
before execution. As such, if PATH and LD_LIBRARY_PATH are set
properly on the local node, the resource manager will automatically
propagate those values out to remote nodes. The --prefix option is
therefore usually most useful in rsh or ssh -based environments (or
similar).
Beginning with the 1.2 series, it is possible to make this the default
behavior by passing to configure the flag
--enable-mpirun-prefix-by-default . This will make mpirun behave
exactly the same as "mpirun --prefix $prefix ...", where $prefix is
the value given to --prefix in configure .
Finally, note that specifying the absolute pathname to mpirun is
equivalent to using the --prefix argument. For example, the
following is equivalent to the above command line that uses --prefix :
1
| shell$ /opt/openmpi-5.0.5/bin/mpirun -np 4 a.out |
6. How do I launch Open MPI parallel jobs? |
Similar to many MPI implementations, Open MPI provides the
commands mpirun and mpiexec to launch MPI jobs. Several of the
questions in this FAQ category deal with using these commands.
Note, however, that these commands are exactly identical.
Specifically, they are symbolic links to a common back-end launcher
command named orterun (Open MPI's run-time environment interaction
layer is named the Open Run-Time Environment, or ORTE — hence
orterun ).
As such, the rest of this FAQ usually refers only to mpirun , even
though the same discussions also apply to mpiexec and orterun
(because they are all, in fact, the same command).
7. How do I run a simple SPMD MPI job? |
Open MPI provides both mpirun and mpiexec commands. A simple way
to start a single program, multiple data (SPMD) application in
parallel is:
1
| shell$ mpirun -np 4 my_parallel_application |
This starts a four-process parallel application, running four copies
of the executable named my_parallel_application .
The rsh starter component accepts the --hostfile (also known as
--machinefile ) option to indicate which hosts to start the processes
on:
1
2
3
4
| shell$ cat my_hostfile
host01.example.com
host02.example.com
shell$ mpirun --hostfile my_hostfile -np 4 my_parallel_application |
This command will launch one copy of my_parallel_application on each
of host01.example.com and host02.example.com .
More information about the --hostfile option, and hostfiles in
general, is available in this FAQ
entry.
Note, however, that not all environments require a hostfile. For
example, Open MPI will automatically detect when it is running in
batch / scheduled environments (such as SGE, PBS/Torque, Slurm, and
LoadLeveler), and will use host information provided by those systems.
Also note that if using a launcher that requires a hostfile and no
hostfile is specified, all processes are launched on the local host.
8. How do I run an MPMD MPI job? |
Both the mpirun and mpiexec commands support multiple
program, multiple data (MPMD) style launches, either from the command
line or from a file. For example:
1
| shell$ mpirun -np 2 a.out : -np 2 b.out |
This will launch a single parallel application, but the first two
processes will be instances of the a.out executable, and the second
two processes will be instances of the b.out executable. In MPI
terms, this will be a single MPI_COMM_WORLD , but the a.out
processes will be ranks 0 and 1 in MPI_COMM_WORLD , while the b.out
processes will be ranks 2 and 3 in MPI_COMM_WORLD .
mpirun (and mpiexec ) can also accept a parallel application
specified in a file instead of on the command line. For example:
1
| shell$ mpirun --app my_appfile |
where the file my_appfile contains the following:
1
2
3
4
5
6
7
| # Comments are supported; comments begin with #
# Application context files specify each sub-application in the
# parallel job, one per line. The first sub-application is the 2
# a.out processes:
-np 2 a.out
# The second sub-application is the 2 b.out processes:
-np 2 b.out |
This will result in the same behavior as running a.out and b.out
from the command line.
Note that mpirun and mpiexec are identical in command-line options
and behavior; using the above command lines with mpiexec instead of
mpirun will result in the same behavior.
9. How do I specify the hosts on which my MPI job runs? |
There are three general mechanisms:
- The
--hostfile option to mpirun . Use this option to specify
a list of hosts on which to run. Note that for compatibility with
other MPI implementations, --machinefile is a synonym for
--hostfile . See this FAQ entry for more information about the --hostfile option.
- The
--host option to mpirun can be used to specify a list of
hosts on which to run on the command line. See this FAQ entry for more information
about the --host option.
- If you are running in a scheduled environment (e.g., in a Slurm,
Torque, or LSF job), Open MPI will automatically get the lists of
hosts from the scheduler.
NOTE: The specification
of hosts using any of the above methods has nothing to do with the
network interfaces that are used for MPI traffic. The list of hosts
is only used for specifying which hosts on which to launch
MPI processes.
10. I can run ompi_info and launch MPI jobs on a single host, but not across multiple hosts. Why? |
(You should probably also see this FAQ entry, too.)
If you can run ompi_info and possibly even launch MPI
processes locally, but fail to launch MPI processes on remote hosts,
it is likely that you do not have your PATH and/or LD_LIBRARY_PATH
setup properly on the remote nodes.
Specifically, the Open MPI commands usually run properly even if
LD_LIBRARY_PATH is not set properly because they encode the
Open MPI library location in their executables and search there by
default. Hence, running ompi_info (and friends) usually works, even
in some improperly setup environments.
However, Open MPI's wrapper compilers do not encode the Open MPI
library locations in MPI executables by default (the wrappers only
specify a bare minimum of flags necessary to create MPI executables;
we consider any flags beyond this bare minimum set a local policy
decision). Hence, attempting to launch MPI executables in
environments where LD_LIBRARY_PATH is either not set or was set
improperly may result in messages about libmpi.so not being found.
You can
change Open MPI's wrapper compiler behavior to specify the run-time
location of Open MPI's libraries, if you wish.
Depending on how Open MPI was configured
and/or invoked, it may even be possible to run MPI applications in
environments where PATH and/or LD_LIBRARY_PATH is not set, or is
set improperly. This can be desirable for environments where multiple
MPI implementations are installed, such as multiple versions of Open
MPI.
11. How can I diagnose problems when running across multiple hosts? |
In addition to what is mentioned in this
FAQ entry, when you are able to run MPI jobs on a single host, but
fail to run them across multiple hosts, try the following:
- Ensure that your launcher is able to launch across multiple
hosts. For example, if you are using
ssh , try to ssh to each
remote host and ensure that you are not prompted for a password.
For example:
1
2
| shell$ ssh remotehost hostname
remotehost |
If you are unable to launch across multiple hosts, check that your SSH
keys are setup properly. Or, if you are running in a managed
environment, such as in a Slurm, Torque, or other job launcher, check
that you have reserved enough hosts, are running in an allocated job,
etc.
- Ensure that your PATH and LD_LIBRARY_PATH are set correctly on
each remote host on which you are trying to run. For example, with
ssh :
1
2
3
| shell$ ssh remotehost env | grep -i path
PATH=...path on the remote host...
LD_LIBRARY_PATH=...LD library path on the remote host... |
If your PATH or LD_LIBRARY_PATH are not set properly, see this FAQ entry for the correct
values. Keep in mind that it is fine to have multiple Open MPI
installations installed on a machine; the first Open MPI
installation found by PATH and LD_LIBARY_PATH is the one that
matters.
- Run a simple, non-MPI job across multiple hosts. This verifies
that the Open MPI run-time system is functioning properly across
multiple hosts. For example, try running the
hostname command:
1
2
3
4
5
| shell$ mpirun --host remotehost hostname
remotehost
shell$ mpirun --host remotehost,otherhost hostname
remotehost
otherhost |
If you are unable to run non-MPI jobs across multiple hosts, check
for common problems such as:
- Check your non-interactive shell setup on each remote host
to ensure that it is setting up the PATH and LD_LIBRARY_PATH properly.
- Check that Open MPI is finding and launching the correct version of Open MPI on the remote hosts.
- Ensure that you have firewalling disabled between hosts (Open MPI
opens random TCP and sometimes random UDP ports between hosts in a
single MPI job).
- Try running with the
plm_base_verbose MCA parameter at level
10, which will enable extra debugging output to see how Open MPI
launches on remote hosts. For example: [mpirun --mca plm_base_verbose
10 --host remotehost hostname]
- Now run a simple MPI job across multiple hosts that does not
involve MPI communications. The "hello_c" program in the
examples
directory in the Open MPI distribution is a good choice. This
verifies that the MPI subsystem is able to initialize and terminate
properly. For example:
1
2
3
| shell$ mpirun --host remotehost,otherhost hello_c
Hello, world, I am 0 of 1, (Open MPI v5.0.5, package: Open MPI jsquyres@builder.cisco.com Distribution, ident: 5.0.5, DATE)
Hello, world, I am 1 of 1, (Open MPI v5.0.5, package: Open MPI jsquyres@builder.cisco.com Distribution, ident: 5.0.5, DATE) |
If you are unable to run simple, non-communication MPI jobs, this can
indicate that your Open MPI installation is unable to initialize
properly on remote hosts. Double check your non-interactive login
setup on remote hosts.
- Now run a simple MPI job across multiple hosts that does does
some simple MPI communications. The "ring_c" program in the
examples directory in the Open MPI distribution is a good choice.
This verifies that the MPI subsystem is able to pass MPI traffic
across your network. For example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| shell$ mpirun --host remotehost,otherhost ring_c
Process 0 sending 10 to 0, tag 201 (1 processes in ring)
Process 0 sent to 0
Process 0 decremented value: 9
Process 0 decremented value: 8
Process 0 decremented value: 7
Process 0 decremented value: 6
Process 0 decremented value: 5
Process 0 decremented value: 4
Process 0 decremented value: 3
Process 0 decremented value: 2
Process 0 decremented value: 1
Process 0 decremented value: 0
Process 0 exiting |
If you are unable to run simple MPI jobs across multiple hosts, this
may indicate a problem with the network(s) that Open MPI is trying to
use for MPI communications. Try limiting the networks that it uses,
and/or exploring levels 1 through 3 MCA parameters for the
communications module that you are using. For example, if you're
using the TCP BTL, see the output of [ompi_info --level 3 --param btl
tcp] .
12. When I build Open MPI with the Intel compilers, I get warnings
about "orted" or my MPI application not finding libimf.so. What do I do? |
The problem is usually because the Intel libraries cannot be
found on the node where Open MPI is attempting to launch an MPI
executable. For example:
1
2
3
4
5
6
| shell$ mpirun -np 1 --host node1.example.com mpi_hello
orted: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory
--------------------------------------------------------------------------
A daemon (pid 11893) died unexpectedly with status 127 while
attempting to launch so we are aborting.
[...more error messages...] |
Open MPI first attempts to launch a "helper" daemon
(orted ) on node1.example.com , but it failed because one
of orted 's dependent libraries was not able to be found. This
particular library, libimf.so , is an Intel compiler library. As
such, it is likely that the user did not setup the Intel compiler
library in their environment properly on this node.
Double check that you have setup the Intel compiler environment on the
target node, for both interactive and non-interactive logins. It is a
common error to ensure that the Intel compiler environment is setup
properly for interactive logins, but not for
non-interactive logins. For example:
1
2
3
4
5
6
7
8
9
10
11
12
13
| head_node$ cd $HOME
head_node$ mpicc mpi_hello.c -o mpi_hello
head_node$ ./mpi_hello
Hello world, I am 0 of 1.
head_node$ ssh node2.example.com
Welcome to node2.
node2$ ./mpi_hello
Hello world, I am 0 of 1.
node2$ exit
head_node$ ssh node2.example.com $HOME/mpi_hello
mpi_hello: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory |
The above example shows that running a trivial C program compiled by
the Intel compilers works fine on both the head node and node1 when
logging in interactively, but fails when run on node1
non-interactively. Check your shell script startup files and verify
that the Intel compiler environment is setup properly for
non-interactive logins.
13. When I build Open MPI with the PGI compilers, I get warnings
about "orted" or my MPI application not finding libpgc.so. What do I do? |
The problem is usually because the PGI libraries cannot be
found on the node where Open MPI is attempting to launch an MPI
executable. For example:
1
2
3
4
5
6
| shell$ mpirun -np 1 --host node1.example.com mpi_hello
orted: error while loading shared libraries: libpgc.so: cannot open shared object file: No such file or directory
--------------------------------------------------------------------------
A daemon (pid 11893) died unexpectedly with status 127 while
attempting to launch so we are aborting.
[...more error messages...] |
Open MPI first attempts to launch a "helper" daemon
(orted ) on node1.example.com , but it failed because one
of orted 's dependent libraries was not able to be found. This
particular library, libpgc.so , is a PGI compiler library. As
such, it is likely that the user did not setup the PGI compiler
library in their environment properly on this node.
Double check that you have setup the PGI compiler environment on the
target node, for both interactive and non-interactive logins. It is a
common error to ensure that the PGI compiler environment is setup
properly for interactive logins, but not for
non-interactive logins. For example:
1
2
3
4
5
6
7
8
9
10
11
12
13
| head_node$ cd $HOME
head_node$ mpicc mpi_hello.c -o mpi_hello
head_node$ ./mpi_hello
Hello world, I am 0 of 1.
head_node$ ssh node2.example.com
Welcome to node2.
node2$ ./mpi_hello
Hello world, I am 0 of 1.
node2$ exit
head_node$ ssh node2.example.com $HOME/mpi_hello
mpi_hello: error while loading shared libraries: libpgc.so: cannot open shared object file: No such file or directory |
The above example shows that running a trivial C program compiled by
the PGI compilers works fine on both the head node and node1 when
logging in interactively, but fails when run on node1
non-interactively. Check your shell script startup files and verify
that the PGI compiler environment is setup properly for
non-interactive logins.
14. When I build Open MPI with the PathScale compilers, I get warnings
about "orted" or my MPI application not finding libmv.so. What do I do? |
The problem is usually because the PathScale libraries cannot be
found on the node where Open MPI is attempting to launch an MPI
executable. For example:
1
2
3
4
5
6
| shell$ mpirun -np 1 --host node1.example.com mpi_hello
orted: error while loading shared libraries: libmv.so: cannot open shared object file: No such file or directory
--------------------------------------------------------------------------
A daemon (pid 11893) died unexpectedly with status 127 while
attempting to launch so we are aborting.
[...more error messages...] |
Open MPI first attempts to launch a "helper" daemon
(orted ) on node1.example.com , but it failed because one
of orted 's dependent libraries was not able to be found. This
particular library, libmv.so , is a PathScale compiler library. As
such, it is likely that the user did not setup the PathScale compiler
library in their environment properly on this node.
Double check that you have setup the PathScale compiler environment on the
target node, for both interactive and non-interactive logins. It is a
common error to ensure that the PathScale compiler environment is setup
properly for interactive logins, but not for
non-interactive logins. For example:
1
2
3
4
5
6
7
8
9
10
11
12
13
| head_node$ cd $HOME
head_node$ mpicc mpi_hello.c -o mpi_hello
head_node$ ./mpi_hello
Hello world, I am 0 of 1.
head_node$ ssh node2.example.com
Welcome to node2.
node2$ ./mpi_hello
Hello world, I am 0 of 1.
node2$ exit
head_node$ ssh node2.example.com $HOME/mpi_hello
mpi_hello: error while loading shared libraries: libmv.so: cannot open shared object file: No such file or directory |
The above example shows that running a trivial C program compiled by
the PathScale compilers works fine on both the head node and node1 when
logging in interactively, but fails when run on node1
non-interactively. Check your shell script startup files and verify
that the PathScale compiler environment is setup properly for
non-interactive logins.
15. Can I run non-MPI programs with mpirun / mpiexec ? |
Yes.
Indeed, Open MPI's mpirun and mpiexec are actually synonyms for
our underlying launcher named orterun (i.e., the Open Run-Time
Environment layer in Open MPI, or ORTE). So you can use mpirun and
mpiexec to launch any application. For example:
1
| shell$ mpirun -np 2 --host a,b uptime |
This will launch a copy of the Unix command uptime on the hosts a
and b .
Other questions in the FAQ section deal with the specifics of the
mpirun command line interface; suffice it to say that it works
equally well for MPI and non-MPI applications.
16. Can I run GUI applications with Open MPI? |
Yes, but it will depend on your local setup and may require
additional setup.
In short: you will need to have X forwarding enabled from the remote
processes to the display where you want output to appear. In a secure
environment, you can simply allow all X requests to be shown on the
target display and set the DISPLAY environment variable in all MPI
processes' environments to the target display, perhaps something like
this:
1
2
3
4
| shell$ hostname
my_desktop.secure-cluster.example.com
shell$ xhost +
shell$ mpirun -np 4 -x DISPLAY=my_desktop.secure-cluster.example.com a.out |
However, this technique is not generally suitable for unsecure
environments (because it allows anyone to read and write to your
display). A slightly more secure way is to only allow X connections
from the nodes where your application will be running:
1
2
3
4
5
6
7
8
| shell$ hostname
my_desktop.secure-cluster.example.com
shell$ xhost +compute1 +compute2 +compute3 +compute4
compute1 being added to access control list
compute2 being added to access control list
compute3 being added to access control list
compute4 being added to access control list
shell$ mpirun -np 4 -x DISPLAY=my_desktop.secure-cluster.example.com a.out |
(assuming that the four nodes you are running on are compute1
through compute4 ).
Other methods are available, but they involve sophisticated X
forwarding through mpirun and are generally more complicated than
desirable.
17. Can I run ncurses-based / curses-based / applications with
funky input schemes with Open MPI? |
Maybe. But probably not.
Open MPI provides fairly sophisticated stdin / stdout / stderr
forwarding. However, it does not work well with curses, ncurses,
readline, or other sophisticated I/O packages that generally require
direct control of the terminal.
Every application and I/O library is different — you should try to
see if yours is supported. But chances are that it won't work.
Sorry. :-(
18. What other options are available to mpirun ? |
mpirun supports the "--help" option which provides a usage
message and a summary of the options that it supports. It should be
considered the definitive list of what options are provided.
Several notable options are:
- --hostfile: Specify a hostfile for launchers (such as the
rsh
launcher) that need to be told on which hosts to start parallel
applications. Note that for compatibility with other MPI
implementations, --machinefile is a synonym for --hostfile.
- --host: Specify a host or list of hosts to run on (see this FAQ entry for more details)
- --np (or -np): Indicate the number of processes to
start.
- --mca (or -mca): Set MCA parameters (see the Run-Time Tuning FAQ)
- --wdir <directory>: Set the working directory of the
started applications. If not supplied, the current working directory
is assumed (or
$HOME , if the current working directory does not
exist on all nodes).
- -x <env-variable-name>: The name of an environment
variable to export to the parallel application. The -x option can
be specified multiple times to export multiple environment
variables to the parallel application.
19. How do I use the --hostfile option to mpirun ? |
The --hostfile option to mpirun takes a filename that
lists hosts on which to launch MPI processes.
NOTE: The hosts listed in
a hostfile have nothing to do with which network interfaces are used
for MPI communication. They are only used to specify on which hosts
to launch MPI processes.
Hostfiles my_hostfile are simple text files with hosts specified,
one per line. Each host can also specify a default and maximum number
of slots to be used on that host (i.e., the number of available
processors on that host). Comments are also supported, and blank
lines are ignored. For example:
1
2
3
4
5
6
7
8
9
10
11
| # This is an example hostfile. Comments begin with #
#
# The following node is a single processor machine:
foo.example.com
# The following node is a dual-processor machine:
bar.example.com slots=2
# The following node is a quad-processor machine, and we absolutely
# want to disallow over-subscribing it:
yow.example.com slots=4 max-slots=4 |
slots and max-slots are discussed more
in this FAQ entry
Hostfiles works in two different ways:
- Exclusionary: If a list of hosts to run on has been provided by
another source (e.g., by a hostfile or a batch scheduler such as
Slurm, PBS/Torque, SGE, etc.), the hosts provided by the hostfile must
be in the already-provided host list. If the hostfile-specified nodes
are not in the already-provided host list,
mpirun will abort
without launching anything.
In this case, hostfiles act like an exclusionary filter — they limit
the scope of where processes will be scheduled from the original list
of hosts to produce a final list of hosts.
For example, say that a scheduler job contains hosts node01 through
node04 . If you run:
1
2
3
| shell$ cat my_hosts
node03
shell$ mpirun -np 1 --hostfile my_hosts hostname |
This will run a single copy of hostname on the host node03 .
However, if you run:
1
2
3
| shell$ cat my_hosts
node17
shell$ mpirun -np 1 --hostfile my_hosts hostname |
This is an error (because node17 is not listed in my_hosts );
mpirun will abort.
Finally, note that in exclusionary mode, processes will only be
executed on the hostfile-specified hosts, even if it causes
oversubscription. For example:
1
2
3
| shell$ cat my_hosts
node03
shell$ mpirun -np 4 --hostfile my_hosts hostname |
This will launch 4 copies of hostname on host node03 .
- Inclusionary: If a list of hosts has not been provided by
another source, then the hosts provided by the
--hostfile option
will be used as the original and final host list.
In this case, --hostfile acts as an inclusionary agent; all
--hostfile -supplied hosts become available for scheduling processes.
For example (assume that you are not in a scheduling environment
where a list of nodes is being transparently supplied):
1
2
3
4
5
| shell$ cat my_hosts
node01.example.com
node02.example.com
node03.example.com
shell$ mpirun -np 3 --hostfile my_hosts hostname |
This will launch a single copy of hostname on the hosts
node01.example.com , node02.example.com , and node03.example.com .
Note, too, that --hostfile is essentially a per-application switch.
Hence, if you specify multiple applications (as in an MPMD job),
--hostfile can be specified multiple times:
1
2
3
4
5
6
7
| shell$ cat hostfile_1
node01.example.com
shell$ cat hostfile_2
node02.example.com
shell$ mpirun -np 1 --hostfile hostfile_1 hostname : -np 1 --hostfile hostfile_2 uptime
node01.example.com
06:11:45 up 1 day, 2:32, 0 users, load average: 21.65, 20.85, 19.84 |
Notice that hostname was launched on node01.example.com and
uptime was launched on host02.example.com.
20. How do I use the --host option to mpirun ? |
The --host option to mpirun takes a comma-delimited list
of hosts on which to run. For example:
1
| shell$ mpirun -np 3 --host a,b,c hostname |
Will launch one copy of hostname on hosts a , b , and c .
NOTE: The hosts specified
by the --host option have nothing to do with which network
interfaces are used for MPI communication. They are only used to
specify on which hosts to launch MPI processes.
--host works in two different ways:
- Exclusionary: If a list of hosts to run on has been provided by
another source (e.g., by a hostfile or a batch scheduler such as
Slurm, PBS/Torque, SGE, etc.), the hosts provided by the
--host option
must be in the already-provided host list. If the --host -specified
nodes are not in the already-provided host list, mpirun will abort
without launching anything.
In this case, the --host option acts like an exclusionary filter —
it limits the scope of where processes will be scheduled from the
original list of hosts to produce a final list of hosts.
For example, say that the hostfile my_hosts contains the hosts
node1 through node4 . If you run:
1
| shell$ mpirun -np 1 --hostfile my_hosts --host node3 hostname |
This will run a single copy of hostname on the host node3 .
However, if you run:
1
| shell$ mpirun -np 1 --hostfile my_hosts --host node17 hostname |
This is an error (because node17 is not listed in my_hosts );
mpirun will abort.
Finally, note that in exclusionary mode, processes will only be
executed on the --host -specified hosts, even if it causes
oversubscription. For example:
1
| shell$ mpirun -np 4 --host a uptime |
This will launch 4 copies of uptime on host a .
- Inclusionary: If a list of hosts has not been provided by
another source, then the hosts provided by the
--host option will be
used as the original and final host list.
In this case, --host acts as an inclusionary agent; all
--host -supplied hosts become available for scheduling processes.
For example (assume that you are not in a scheduling environment
where a list of nodes is being transparently supplied):
1
| shell$ mpirun -np 3 --host a,b,c hostname |
This will launch a single copy of hostname on the hosts a , b ,
and c .
Note, too, that --host is essentially a per-application switch.
Hence, if you specify multiple applications (as in an MPMD job),
--host can be specified multiple times:
1
| shell$ mpirun -np 1 --host a hostname : -np 1 --host b uptime |
This will launch hostname on host a and uptime on host b .
21. How do I control how my processes are scheduled across nodes? |
The short version is that if you are not oversubscribing your
nodes (i.e., trying to run more processes than you have told Open MPI
are available on that node), scheduling is pretty simple and occurs
either on a by-slot or by-node round robin schedule. If you're
oversubscribing, the issue gets much more complicated — keep reading.
The more complete answer is: Open MPI schedules processes to nodes by
asking two questions from each application on the mpirun command
line:
- How many processes should be launched?
- Where should those processes be launched?
The "how many" question is directly answered with the -np switch
to mpirun . The "where" question is a little more complicated, and
depends on three factors:
- The final node list (e.g., after
--host exclusionary or
inclusionary processing)
- The scheduling policy (which applies to all applications in a
single job)
- The default and maximum number of slots on each host
As briefly mentioned in this FAQ
entry, slots are Open MPI's representation of how many
processors are available on a given host.
The default number of slots on any machine, if not explicitly
specified, is 1 (e.g., if a host is listed in a hostfile by has no
corresponding "slots" keyword). Schedulers (such as Slurm,
PBS/Torque, SGE, etc.) automatically provide an accurate default slot
count.
Max slot counts, however, are rarely specified by schedulers. The max
slot count for each node will default to "infinite" if it is not
provided (meaning that Open MPI will oversubscribe the node if you ask
it to — see more on oversubscribing in this FAQ entry).
Open MPI currently supports two scheduling policies: by slot and by
node:
- By slot: This is the default scheduling policy, but can also be
explicitly requested by using either the
--byslot option to mpirun
or by setting the MCA parameter rmaps_base_schedule_policy to the
string "slot".
In this mode, Open MPI will schedule processes on a node until all of
its default slots are exhausted before proceeding to the next node.
In MPI terms, this means that Open MPI tries to maximize the number of
adjacent ranks in MPI_COMM_WORLD on the same host without
oversubscribing that host.
For example:
1
2
3
4
5
6
7
8
9
10
11
12
| shell$ cat my-hosts
node0 slots=2 max_slots=20
node1 slots=2 max_slots=20
shell$ mpirun --hostfile my-hosts -np 8 --byslot | sort
Hello World I am rank 0 of 8 running on node0
Hello World I am rank 1 of 8 running on node0
Hello World I am rank 2 of 8 running on node1
Hello World I am rank 3 of 8 running on node1
Hello World I am rank 4 of 8 running on node0
Hello World I am rank 5 of 8 running on node0
Hello World I am rank 6 of 8 running on node1
Hello World I am rank 7 of 8 running on node1 |
- By node: This policy can be requested either by using the
--bynode option to mpirun or by setting the MCA parameter
rmaps_base_schedule_policy to the string "node".
In this mode, Open MPI will schedule a single process on each node in
a round-robin fashion (looping back to the beginning of the node list
as necessary) until all processes have been scheduled. Nodes are
skipped once their default slot counts are exhausted.
For example:
1
2
3
4
5
6
7
8
9
10
11
12
| shell$ cat my-hosts
node0 slots=2 max_slots=20
node1 slots=2 max_slots=20
shell$ mpirun --hostname my-hosts -np 8 --bynode hello | sort
Hello World I am rank 0 of 8 running on node0
Hello World I am rank 1 of 8 running on node1
Hello World I am rank 2 of 8 running on node0
Hello World I am rank 3 of 8 running on node1
Hello World I am rank 4 of 8 running on node0
Hello World I am rank 5 of 8 running on node1
Hello World I am rank 6 of 8 running on node0
Hello World I am rank 7 of 8 running on node1 |
In both policies, if the default slot count is exhausted on all nodes
while there are still processes to be scheduled, Open MPI will loop
through the list of nodes again and try to schedule one more process
to each node until all processes are scheduled. Nodes are skipped in
this process if their maximum slot count is exhausted. If the maximum
slot count is exhausted on all nodes while there are still processes
to be scheduled, Open MPI will abort without launching any processes.
NOTE: This is the scheduling policy in Open MPI because of a long
historical precedent in LAM/MPI. However, the scheduling of processes
to processors is a component in the RMAPS framework in Open MPI; it
can be changed. If you don't like how this scheduling occurs, please
let us know.
22. I'm not using a hostfile. How are slots calculated? |
If you are using a supported resource manager, Open MPI will
get the slot information directly from that entity. If you are using
the --host parameter to mpirun , be aware that each instance of a
hostname bumps up the internal slot count by one. For example:
1
| shell$ mpirun --host node0,node0,node0,node0 .... |
This tells Open MPI that host "node0" has a slot count of 4. This is
very different than, for example:
1
| shell$ mpirun -np 4 --host node0 a.out |
This tells Open MPI that host "node0" has a slot count of 1 but you
are running 4 processes on it. Specifically, Open MPI assumes that
you are oversubscribing the node.
23. Can I run multiple parallel processes on a uniprocessor machine? |
Yes.
But be very careful to ensure that Open MPI
knows that you are oversubscibing your node! If Open
MPI is unaware that you are oversubscribing a node, severe performance degradation can result.
See this FAQ entry for more details
on oversubscription.
24. Can I oversubscribe nodes (run more processes than processors)? |
Yes.
However, it is critical that Open MPI knows that you are
oversubscribing the node, or severe performance degradation can result.
The short explanation is as follows: never
specify a number of slots that is more than the available number of
processors. For example, if you want to run 4
processes on a uniprocessor, then indicate that you only have 1 slot
but want to run 4 processes. For example:
1
2
3
| shell$ cat my-hostfile
localhost
shell$ mpirun -np 4 --hostfile my-hostfile a.out |
Specifically: do NOT have a
hostfile that contains "slots = 4 " (because there is only one
available processor).
Here's the full explanation:
Open MPI basically runs its message passing progression engine in two
modes: aggressive and degraded.
- Degraded: When Open MPI thinks that it is in an oversubscribed
mode (i.e., more processes are running than there are processors
available), MPI processes will automatically run in degraded mode
and frequently yield the processor to its peers, thereby allowing all
processes to make progress (be sure to see this
FAQ entry that describes how degraded mode affects processor and
memory affinity).
- Aggressive: When Open MPI thinks that it is in an exactly- or
under-subscribed mode (i.e., the number of running processes is equal
to or less than the number of available processors), MPI processes
will automatically run in aggressive mode, meaning that they will
never voluntarily give up the processor to other processes. With some
network transports, this means that Open MPI will spin in tight loops
attempting to make message passing progress, effectively causing other
processes to not get any CPU cycles (and therefore never make any
progress).
For example, on a uniprocessor node:
1
2
3
| shell$ cat my-hostfile
localhost slots=4
shell$ mpirun -np 4 --hostfile my-hostfile a.out |
This would cause all 4 MPI processes to run in aggressive mode
because Open MPI thinks that there are 4 available processors
to use. This is actually a lie (there is only 1 processor — not 4),
and can cause extremely bad performance.
25. Can I force Agressive or Degraded performance modes? |
Yes.
The MCA parameter mpi_yield_when_idle controls whether an MPI
process runs in Aggressive or Degraded performance mode. Setting it
to zero forces Aggressive mode; any other value forces Degraded mode
(see this FAQ
entry to see how to set MCA parameters).
Note that this value only affects the behavior of MPI processes when
they are blocking in MPI library calls. It does not affect behavior
of non-MPI processes, nor does it affect the behavior of a process
that is not inside an MPI library call.
Open MPI normally sets this parameter automatically (see this FAQ entry for details). Users are
cautioned against setting this parameter unless you are really,
absolutely, positively sure of what you are doing.
26. How do I run with the TotalView parallel debugger? |
Generally, you can run Open MPI processes with TotalView as
follows:
1
| shell$ mpirun --debug ...mpirun arguments... |
Assuming that TotalView is the first supported parallel debugger in
your path, Open MPI will autmoatically invoke the correct underlying
command to run your MPI process in the TotalView debugger. Be sure to
see this
FAQ entry for details about what versions of Open MPI and
TotalView are compatible.
For reference, this underlying command form is the following:
1
| shell$ totalview mpirun -a ...mpirun arguments... |
So if you wanted to run a 4-process MPI job of your a.out
executable, it would look like this:
1
| shell$ totalview mpirun -a -np 4 a.out |
Alternatively, Open MPI's mpirun offers the "-tv " convenience
option which does the same thing as TotalView's "-a " syntax. For
example:
1
| shell$ mpirun -tv -np 4 a.out |
Note that by default, TotalView will stop deep in the machine code of
mpirun itself, which is not what most users want. It is possible
to get TotalView to recognize that mpirun is simply a "starter"
program and should be (effectively) ignored. Specifically, TotalView
can be configured to skip mpirun (and mpiexec and orterun ) and
jump right into your MPI application. This can be accomplished by
placing some startup instructions in a TotalView-specific file named
$HOME/.tvdrc .
Open MPI includes a sample TotalView startup file that performs this
function (see etc/openmpi-totalview.tcl in Open MPI distribution
tarballs; it is also installed, by default, to
$prefix/etc/openmpi-totalview.tcl in the Open MPI installation).
This file can be either copied to $HOME/.tvdrc or sourced from the
$HOME/.tvdrc file. For example, placing the following line in your
$HOME/.tvdrc (replacing /path/to/openmpi/installation with the
proper directory name, of course) will use the Open MPI-provided
startup file:
1
| source /path/to/openmpi/installation/etc/openmpi-totalview.tcl |
27. How do I run with the DDT parallel debugger? |
As of August 2015, DDT has built-in startup for MPI
applications within its Alinea Forge GUI product. You can simply use
the built-in support to launch, monitor, and kill MPI jobs.
If you are using an older version of DDT that does not have this
built-in support, keep reading.
If you've used DDT at least once before (to use the
configuration wizard to setup support for Open MPI), you can start it
on the command line with:
1
| shell$ mpirun --debug ...mpirun arguments... |
Assuming that you are using Open MPI v1.2.4 or later, and assuming
that DDT is the first supported parallel debugger in your path, Open
MPI will automatically invoke the correct underlying command to run
your MPI process in the DDT debugger. For reference (or if you are
using an earlier version of Open MPI), this underlying command form is
the following:
1
| shell$ ddt -n {nprocs} -start {exe-name} |
Note that passing arbitrary arguments to Open MPI's mpirun is not
supported with the DDT debugger.
You can also attach to already-running processes with either of the
following two syntaxes:
1
2
3
| shell$ ddt -attach {hostname1:pid} [{hostname2:pid} ...] {exec-name}
# Or
shell$ ddt -attach-file {filename of newline separated hostname:pid pairs} {exec-name} |
DDT can even be configured to operate with cluster/resource schedulers
such that it can run on a local workstation, submit your MPI job via
the scheduler, and then attach to the MPI job when it starts.
See the official DDT documentation for more details.
28. What launchers are available? |
The documentation contained in the Open MPI tarball will have
the most up-to-date information, but as of v1.0, Open MPI supports:
- BProc versions 3 and 4 (discontinued starting with OMPI v1.3)
- Sun Grid Engine (SGE), and the open source Grid Engine (support first introduced in Open MPI v1.2)
- PBS Pro, Torque, and Open PBS
- LoadLeveler scheduler (full support since 1.1.1)
- rsh / ssh
- Slurm
- LSF
- XGrid (discontinued starting with OMPI 1.4)
- Yod (Cray XT-3 and XT-4)
29. How do I specify to the rsh launcher to use rsh or ssh ? |
See this FAQ entry.
30. How do I run with the Slurm and PBS/Torque launchers? |
If support for these systems is included in your Open MPI
installation (which you can check with the ompi_info command — look
for components named "slurm " and/or "tm "), Open MPI will
automatically detect when it is running inside such jobs and will just
"do the Right Thing."
See this FAQ entry for
a description of how to run jobs in Slurm; see this FAQ entry for a description
of how to run jobs in PBS/Torque.
31. Can I suspend and resume my MPI job? |
See this FAQ entry.
32. How do I run with LoadLeveler? |
If support for LoadLeveler is included in your Open MPI
installation (which you can check with the ompi_info command — look
for components named "loadleveler "), Open MPI will
automatically detect when it is running inside such jobs and will just
"do the Right Thing."
Specifically, if you execute an mpirun command in a LoadLeveler job,
it will automatically determine what nodes and how many slots on each
node have been allocated to the current job. There is no need to
specify what nodes to run on. Open MPI will then attempt to launch the
job using whatever resource is available (on Linux rsh/ssh is used).
For example:
1
2
3
4
5
6
7
8
| shell$ cat job
#@ output = job.out
#@ error = job.err
#@ job_type = parallel
#@ node = 3
#@ tasks_per_node = 4
mpirun a.out
shell$ llsubmit job |
This will run 4 MPI process per node on the 3 nodes which were allocated by
LoadLeveler for this job.
For users of Open MPI 1.1
series: In version 1.1.0 there exists a problem which
will make it so that Open MPI will not be able to determine what nodes
are available to it if the job has more than 128 tasks. In the 1.1.x
series starting with version 1.1.1., this can be worked around by
passing "-mca ras_loadleveler_priority 110 " to mpirun. Version 1.2
and above work without any additional flags.
33. How do I load libmpi at runtime? |
If you want to load a the shared library libmpi explicitly
at runtime either by using dlopen() from C/C ++ or something like
the ctypes package from Python, some extra care is required. The
default configuration of Open MPI uses dlopen() internally to load
its support components. These components rely on symbols available in
libmpi . In order to make the symbols in libmpi available to the
components loaded by Open MPI at runtime, libmpi must be loaded with
the RTLD_GLOBAL option.
In C/C++, this option is specified as the second parameter to the
POSIX dlopen(3) function.
When using ctypes with Python, this can be done with the second
(optional) parameter to CDLL() . For example (shown below in Mac OS
X, where Open MPI's shared library name ends in ".dylib"; other
operating systems use other suffixes, such as ".so"):
from ctypes import *
mpi = CDLL('libmpi.0.dylib', RTLD_GLOBAL)
f = pythonapi.Py_GetArgcArgv
argc = c_int()
argv = POINTER(c_char_p)()
f(byref(argc), byref(argv))
mpi.MPI_Init(byref(argc), byref(argv))
# Your MPI program here
mpi.MPI_Finalize()
Other scripting languages should have similar options when dynamically
loading shared libraries.
34. What MPI environmental variables exist? |
Beginning with the v1.3 release, Open MPI provides the following
environmental variables that will be defined on every
MPI process:
- OMPI_COMM_WORLD_SIZE - the number of processes in this process's
MPI_COMM_WORLD
- OMPI_COMM_WORLD_RANK - the MPI rank of this process in
MPI_COMM_WORLD
- OMPI_COMM_WORLD_LOCAL_RANK - the relative rank of this process
on this node within its job. For example, if four processes in a job
share a node, they will each be given a local rank ranging from 0 to
3.
- OMPI_UNIVERSE_SIZE - the number of process slots allocated to
this job. Note that this may be different than the number of processes
in the job.
- OMPI_COMM_WORLD_LOCAL_SIZE - the number of ranks from this job
that are running on this node.
- OMPI_COMM_WORLD_NODE_RANK - the relative rank of this process on
this node looking across ALL jobs.
Open MPI guarantees that these variables will remain stable throughout
future releases
35. How do I get my MPI job to wireup its MPI connections right away? |
By default, Open MPI opens MPI connections between processes
in a "lazy" fashion - i.e., the connections are only opened when the
MPI process actually attempts to send a message to another process for
the first time. This is done since (a) Open MPI has no idea what
connections an application process will really use, and (b) creating
the connections takes time. Once the connection is established, it
remains "connected" until one of the two connected processes
terminates, so the creation time cost is paid only once.
Applications that require a fully connected topology, however, can see
improved startup time if they automatically "pre-connect" all their
processes during MPI_Init. Accordingly, Open MPI provides the MCA
parameter "mpi_preconnect_mpi" which directs Open MPI to establish a
"mostly" connected topology during MPI_Init (note that this MCA
parameter used to be named "mpi_preconnect_all" prior to Open MPI
v1.5; in v1.5, it was deprecated and replaced with
"mpi_preconnect_mpi"). This is accomplished in a somewhat scalable
fashion to help minimize startup time.
Users can set this parameter in two ways:
- in the environment as OMPI_MCA_mpi_preconnect_mpi=1
- on the command line as mpirun -mca mpi_preconnect_mpi 1
See this FAQ entry
for more details on how to set MCA parameters.
36. What kind of CUDA support exists in Open MPI? |
See these two FAQ categories:
37. What are the Libfabric (OFI) components in Open MPI? |
Open MPI has two main components for Libfabric (a.k.a., OFI) communications:
-
ofi MTL: Available since Open MPI v1.10, this component is used with the
cm PML and is used for two-sided MPI communication (e.g., MPI_SEND and MPI_RECV ).
The ofi MTL requires that the Libfabric provider support reliable datagrams with
ordered tagged messaging (specifically: FI_EP_RDM endpoints, FI_TAGGED
capabilities, and FI_ORDER_SAS ordering).
-
ofi BTL: Available since Open MPI v4.0.0, this component is used for
one-sided MPI communications (e.g., MPI_PUT ). The ofi BTL requires that
the Libfabric provider support reliable datagrams, RMA and atomic operations,
and remote atomic completion notifications (specifically: FI_EP_RDM endpoints,
FI_RMA and FI_ATOMIC capabilities, and FI_DELIVERY_COMPLETE op flags).
See each Lifabric provider man page (e.g., fi_sockets(7)) to understand which
provider will work for each of the above-listed Open MPI components. Some
providers may require to be used with one of the Libfabric utility providers;
for example, the verbs provider needs to be paired with utility provider
ofi_rxm to provide reliable datagram endpoint support (verbs;ofi_rxm ).
Both components have MCA parameters to specify the Libfabric provider(s) that
will be included/excluded in the selection process. For example:
1
| shell$ mpirun --mca pml cm --mca mtl ofi --mca mtl_ofi_provider_include psm2 mpi_hello |
In addition, each component has specific parameters for each one; see
ompi_info --param -level 9 for a full list. For
example:
1
| shell$ ompi_info --param mtl ofi --level 9 |
For more information refer to
libfabric.org web site.
38. How can Open MPI communicate with Intel Omni-Path Architecture (OPA)
based devices? |
Currently, Open MPI supports PSM2 MTL and OFI MTL (using PSM2 OFI
provider) components which can be used to communicate with
Intel Omni-Path (OPA) software stack
For guidlines on tuning run-time characteristics when using OPA devices, please
refer to this FAQ entry.
|