Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Question about '--mca btl tcp,self'
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-03-17 10:52:06


To add on to what Ralph said:

1. There are two different message passing paths in OMPI:
   - "OOB" (out of band): used for control messages
   - "BTL" (byte transfer layer): used for MPI traffic
   (there are actually others, but these seem to be the relevant 2 for your setup)

2. If you don't specify which OOB interfaces to use OMPI will (basically) just pick one. It doesn't really matter which one it uses; the OOB channel doesn't use too much bandwidth, and is mostly just during startup and shutdown.

The one exception to this is stdout/stderr routing. If your MPI app writes to stdout/stderr, this also uses the OOB path. So if you output a LOT to stdout, then the OOB interface choice might matter.

3. If you don't specify which MPI interfaces to use, OMPI will basically find the "best" set of interfaces and use those. IP interfaces are always rated less than OS-bypass interfaces (e.g., verbs/IB).

Or, as you noticed, you can give a comma-delimited list of BTLs to use. OMPI will then use -- at most -- exactly those BTLs, but definitely no others. Each BTL typically has an additional parameter or parameters that can be used to specify which interfaces to use for the network interface type that that BTL uses. For example, btl_tcp_if_include tells the TCP BTL which interface(s) to use.

Also, note that you seem to have missed a BTL: sm (shared memory). sm is the preferred BTL to use for same-server communication. It is much faster than both the TCP loopback device (which OMPI excludes by default, BTW, which is probably why you got reachability errors when you specifying "--mca btl tcp,self") and the verbs (i.e., "openib") BTL for same-server communication.

4. If you don't specify anything, OMPI usually picks the best thing for you. In your case, it'll probably be equivalent to:

 mpirun --mca btl openib,sm,self ...

And the control messages will flow across one of your IP interfaces.

5. If you want to be specific about which one it uses, you can specify oob_tcp_if_include. For example:

  mpirun --mca oob_tcp_if_include eth0 ...

Make sense?

On Mar 15, 2014, at 1:18 AM, Jianyu Liu <jerry_leo_at_[hidden]> wrote:

>> On Mar 14, 2014, at 10:16:34 AM,Jeff Squyres <jsquyres_at_[hidden]> wrote:
>>
>>> On Mar 14, 2014, at 10:11 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>
>>>> 1. If specified '--mca btl tcp,self', which interface application will run on, use GigE adaper OR use the OpenFabrics interface in IP over IB mode (just like a high performance GigE adapter) ?
>>>
>>> Both - ip over ib looks just like an Ethernet adaptor
>>
>>
>> To be clear: the TCP BTL will use all TCP interfaces (regardless of underlying physical transport). Your GigE adapter and your IP adapter both present IP interfaces to>the OS, and both support TCP. So the TCP BTL will use them, because it just sees the TCP/IP interfaces.
>
> Thanks for your kindly input.
>
> Please see if I have understood correctly
>
> Assume there are two nework
> Gigabit Ethernet
>
> eth0-renamed : 192.168.[1-22].[1-14] / 255.255.192.0
>
> InfiniBand network
>
> ib0 : 172.20.[1-22].[1-4] / 255.255.0.0
>
>
> 1. If specified '--mca btl tcp,self
>
> The control information ( such as setup and teardown ) are routed to and passed by Gigabit Ethernet in TCP/IP mode
> The MPI messages are routed to and passed by InfiniBand network in IP over IB mode
> On the same machine, the TCP lookback device will be used for passing control and MPI messages
>
> 2. If specified '--mca btl tcp,self --mca btl_tcp_if_include ib0'
>
> Both of control information ( such as setup and teardown ) and MPI messages are routed to and passed by InfiniBand network in IP over IB mode
> On the same machine, The TCP lookback device will be used for passing control and MPI messages
>
>
> 3. If specified '--mca btl openib,self'
>
> The control information ( such as setup and teardown ) are routed to and passed by InfiniBand network in IP over IB mode
> The MPI messages are routed to and passed by InfiniBand network in RDMA mode
> On the same machine, the TCP lookback device will be used for passing control and MPI messages
>
>
> 4. If without specifiying any 'mca btl' parameters
>
> The control information ( such as setup and teardown ) are routed to and passed by Gigabit Ethernet in TCP/IP mode
> The MPI messages are routed and passed by InfiniBand network in RDMA mode
> On the same machine, the shared memory (sm) BTL will be used for control and MPI passing messages
>
>
> Appreciating your kindly input
>
> Jianyu
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/