Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How to select specific out of multiple interfaces for communication and support for heterogeneous fabrics
From: Michael Thomadakis (drmichaelt7777_at_[hidden])
Date: 2013-07-05 22:37:21


Great ... thanks. We will try it out as soon as the common backbone IB is
in place.

cheers
Michael

On Fri, Jul 5, 2013 at 6:10 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> As long as the IB interfaces can communicate to each other, you should be
> fine.
>
> On Jul 5, 2013, at 3:26 PM, Michael Thomadakis <drmichaelt7777_at_[hidden]>
> wrote:
>
> Sorry on the mvapich2 reference :)
>
> All nodes are attached over a common 1GigE network. We wish ofcourse that
> if a node-pair is connected via a higher-speed fabric *as well* (IB FDR
> or 10GigE) then that this would be leveraged instead of the common 1GigE.
>
> One question: suppose that we use nodes having either FDR or QDR IB
> interfaces available, connected to one common IB fabric, all defined over a
> common IP subnet: Will OpenMPI have any problem with this? Can MPI
> communication take place over this type of hybrid IB fabric? We already
> have a sub-cluster with QDR HCAs and we are attaching it to IB fabric with
> FDR "backbone" and another cluster with FDR HCAs.
>
> Do you think there may be some issue with this? The HCAs are FDR and QDR
> Mellanox devices and the switching is also over FDR Mellanox fabric.
> Mellanox claims that at the IB level this is doable (i.e., FDR link pairs
> talk to each other at FDR speeds and QDR link pairs at QDR).
>
> I guess if we use the RC connection types then it does not matter to
> OpenMPI.
>
> thanks ....
> Michael
>
>
>
>
> On Fri, Jul 5, 2013 at 4:59 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> I can't speak for MVAPICH - you probably need to ask them about this
>> scenario. OMPI will automatically select whatever available transport that
>> can reach the intended process. This requires that each communicating pair
>> of processes have access to at least one common transport.
>>
>> So if a process that is on a node with only 1G-E wants to communicate
>> with another process, then the node where that other process is running
>> must also have access to a compatible Ethernet interface (1G can talk to
>> 10G, so they can have different capabilities) on that subnet (or on a
>> subnet that knows how to route to the other one). If both nodes have 10G-E
>> as well as 1G-E interfaces, then OMPI will automatically take the 10G
>> interface as it is the faster of the two.
>>
>> Note this means that if a process is on a node that only has IB, and
>> wants to communicate to a process on a node that only has 1G-E, then the
>> two processes cannot communicate.
>>
>> HTH
>> Ralph
>>
>> On Jul 5, 2013, at 2:34 PM, Michael Thomadakis <drmichaelt7777_at_[hidden]>
>> wrote:
>>
>> Hello OpenMPI
>>
>> We area seriously considering deploying OpenMPI 1.6.5 for production (and
>> 1.7.2 for testing) on HPC clusters which consists of nodes with *different
>> types of networking interfaces*.
>>
>>
>> 1) Interface selection
>>
>> We are using OpenMPI 1.6.5 and was wondering how one would go about
>> selecting* at run time* which networking interface to use for MPI
>> communications in case that both IB, 10GigE and 1 GigE are present.
>>
>> This issues arises in a cluster with nodes that are equipped with
>> different types of interfaces:
>>
>> *Some *have both IB-QDR or FDR and 10- and 1-GigE. Others *only* have
>> 10-GigE and 1-GigE and simply others only 1-GigE.
>>
>>
>> 2) OpenMPI 1.6.5 level of support for Heterogeneous Fabric
>>
>> Can OpenMPI support running an MPI application using a mix of nodes with
>> all of the above networking interface combinations ?
>>
>> 2.a) Can the same MPI code (SPMD or MPMD) have a subset of its ranks
>> run on nodes with QDR IB and another subset on FDR IB simultaneously? These
>> are Mellanox QDR and FDR HCAs.
>>
>> Mellanox mentioned to us that they support both QDR and FDR HCAs attached
>> to the same IB subnet. Do you think MVAPICH2 will have any issue with this?
>>
>> 2.b) Can the same MPI code (SPMD or MPMD) have a subset of its ranks run
>> on nodes with IB and another subset over 10GiGE simultaneously?
>>
>> That is imagine nodes I1, I2, ..., IN having say QDR HCAs and nodes G1,
>> G2, GM having only 10GigE interfaces. Could we have the same MPI
>> application run across both types of nodes?
>>
>> Or should there be say 2 communicators with one of them explicitly
>> overlaid on a IB only subnet and the other on a 10GigE only subnet?
>>
>>
>> Please let me know if the above are not very clear.
>>
>> Thank you much
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>