Table of contents:
- What Myrinet-based components does Open MPI have?
- How do I specify to use the Myrinet GM network for MPI messages?
- How do I specify to use the Myrinet MX network for MPI messages?
- But wait -- I also have a TCP network. Do I need to explicitly
disable the TCP BTL?
- How do I know what MCA parameters are available for tuning MPI performance?
- I'm experiencing a problem with Open MPI on my Myrinet-based network; how do I troubleshoot and get help?
- How do I adjust the MX first fragment size? Are there constraints?
|1. What Myrinet-based components does Open MPI have?|
Some versions of Open MPI support both GM and MX for MPI
| Open MPI series
|| GM supported
|| MX supported
||Yes (BTL and MTL)
|v1.3 / v1.4 series
||Yes (BTL and MTL)
|v1.5 / v1.6 series
||Yes (MTL and MTL)
|v1.7 / v1.8 series
||Yes (MTL only)
|v1.9 and beyond
|2. How do I specify to use the Myrinet GM network for MPI messages?|
In general, you specify that the
gm BTL component should be used.
However, note that you should also specify that the
self BTL component
should be used.
self is for loopback communication (i.e., when an MPI
process sends to itself). This is technically a different
communication channel than Myrinet. For example:
shell$ mpirun --mca btl gm,self ...
Failure to specify the
self BTL may result in Open MPI being unable
to complete send-to-self scenarios (meaning that your program will run
fine until a process tries to send to itself).
To use Open MPI's shared memory support for on-host communication
instead of GM's shared memory support, simply include the
shell$ mpirun --mca btl gm,sm,self ...
Finally, note that if the
gm component is
available at run time, Open MPI should automatically use it by
default (ditto for
sm). Hence, it's usually unnecessary to
specify these options on the
mpirun command line. They are
typically only used when you want to be absolutely positively
definitely sure to use the specific BTL.
|3. How do I specify to use the Myrinet MX network for MPI messages?|
As of version 1.2, Open MPI has two different components
to support Myrinet MX, the
mx BTL and the
mx MTL, only one of which can be
used at a time. Prior versions only have the
If available, the
mx BTL is used by default. However, to be sure it is
selected you can specify it. Note that you should also specify the
self BTL component (for loopback communication) and the
component (for on-host communication). For example:
shell$ mpirun --mca btl mx,sm,self ...
To use the
mx MTL component, it must be specified. Also, you must use
cm PML component. For example:
shell$ mpirun --mca mtl mx --mca pml cm ...
Note that one cannot use both the
mx MTL and the
mx BTL components
at once. Deciding which to use largely depends on the application being
|4. But wait -- I also have a TCP network. Do I need to explicitly
disable the TCP BTL?|
No. See this FAQ entry for more details.
|5. How do I know what MCA parameters are available for tuning MPI performance?|
ompi_info command can display all the parameters
available for the
mx BTL components and the
mx MTL component:
# Show the gm BTL parameters
shell$ ompi_info --param btl gm
# Show the mx BTL parameters
shell$ ompi_info --param btl mx
# Show the mx MTL parameters
shell$ ompi_info --param mtl mx
|6. I'm experiencing a problem with Open MPI on my Myrinet-based network; how do I troubleshoot and get help?|
In order for us to help you, it is most helpful if you can
run a few steps before sending an e-mail to both perform some basic
troubleshooting and provide us with enough information about your
environment to help you. Please include answers to the following
questions in your e-mail:
- Which Myricom software stack are you running: GM or MX? Which
- Are you using "fma", the "gm_mapper", or the "mx_mapper"?
- If running GM, include the output from running the
from a known "good" node and a known "bad" node.
If running MX, include the output from running
mx_info from a known
"good" node and a known "bad" node.
What are the contents of the file
- Is the "Map version" value from this output is the same across
- NOTE: If the map version
is not the same, ensure that you are not running a mixture of FMA on
some nodes and the mapper on others. Also check the connectivity of
nodes that seem to have an inconsistent map version.
Gather up this information and see
this page about how to submit a help request to the user's mailing
|7. How do I adjust the MX first fragment size? Are there constraints?|
The MX library limits the maximum message fragment size for
both on-node and off-node messages. As of MX v1.0.3, the inter-node
maximum fragment size is 32k, and the intra-node maximum fragment size
is 16k -- fragments sent larger than these sizes will fail.
Open MPI automatically fragments large messages; it currently limits
its first fragment size on MX networks to the lower of these two
values -- 16k. As such, increasing the value of the MCA parameter
btl_mx_first_frag_size larger than 16k may cause failures in
some cases (i.e., when using MX to send large messages to processes on
the same node); it will cause failures in all cases if it is set above
Note that this only affects the first fragment of messages; latter
fragments do not have this size restriction. The MCA parameter
btl_mx_max_send_size can be used to vary the maximum size of