I can't speak for MVAPICH - you probably need to ask them about this scenario. OMPI will automatically select whatever available transport that can reach the intended process. This requires that each communicating pair of processes have access to at least one common transport.

So if a process that is on a node with only 1G-E wants to communicate with another process, then the node where that other process is running must also have access to a compatible Ethernet interface (1G can talk to 10G, so they can have different capabilities) on that subnet (or on a subnet that knows how to route to the other one). If both nodes have 10G-E as well as 1G-E interfaces, then OMPI will automatically take the 10G interface as it is the faster of the two.

Note this means that if a process is on a node that only has IB, and wants to communicate to a process on a node that only has 1G-E, then the two processes cannot communicate.


On Jul 5, 2013, at 2:34 PM, Michael Thomadakis <drmichaelt7777@gmail.com> wrote:

Hello OpenMPI

We area seriously considering deploying OpenMPI 1.6.5 for production (and 1.7.2 for testing) on HPC clusters which consists of nodes with different types of networking interfaces.

1) Interface selection

We are using OpenMPI 1.6.5 and was wondering how one would go about selecting at run time which networking interface to use for MPI communications in case that both IB, 10GigE and 1 GigE are present. 

This issues arises in a cluster with nodes that are equipped with different types of interfaces:

Some have both IB-QDR or FDR and 10- and 1-GigE. Others only have 10-GigE and 1-GigE and simply others only 1-GigE.

2) OpenMPI 1.6.5 level of support for Heterogeneous Fabric

Can OpenMPI support running an MPI application using a mix of nodes with all of the above networking interface combinations ? 

  2.a) Can the same MPI code (SPMD or MPMD) have a subset of its ranks run on nodes with QDR IB and another subset on FDR IB simultaneously? These are Mellanox QDR and FDR HCAs. 

Mellanox mentioned to us that they support both QDR and FDR HCAs attached to the same IB subnet. Do you think MVAPICH2 will have any issue with this?

2.b) Can the same MPI code (SPMD or MPMD) have a subset of its ranks run on nodes with IB and another subset over 10GiGE simultaneously? 

That is imagine nodes I1, I2, ..., IN having say QDR HCAs and nodes G1, G2, GM having only 10GigE interfaces. Could we have the same MPI application run across both types of nodes? 

Or should there be say 2 communicators with one of them explicitly overlaid on a IB only subnet and the other on a 10GigE only subnet? 

Please let me know if the above are not very clear.

Thank you much
users mailing list