Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2006-03-13 08:49:04


On Mar 13, 2006, at 8:38 AM, Michael Kluskens wrote:

> On Mar 11, 2006, at 1:00 PM, Jayabrata Chakrabarty wrote:
>> Hi I have been looking for information on how to use multiple
>> Gigabit Ethernet Interface for MPI communication.
>>
>> So far what i have found out is i have to use mca_btl_tcp.
>>
>> But what i wish to know, is what IP Address to assign to each
>> Network Interface. I also wish to know if there will be any change
>> in the format of "hostfile"
>>
>> I have two Gigabit Ethernet Interface on a cluster of 5 nodes at
>> present.
> It seems to me that an easier approach would be to bond the ethernet
> interfaces together at the Unix/Linux level and then you have only
> one ethernet interface to worry about in MPI. Our Operton-based
> cluster shipped with that setup in SUSE Linux. When I rebuilt it
> with Debian Linux I configured the ethernet interface bonding myself
> using references I found via google. My master node has three
> physical interfaces and two ip addresses, all the rest have two
> physical interfaces and one ip address.
>
> I have not tested throughput to see if I choose the best type of
> bonding, but the choices were clear enough.

That is one option, yes. However, Channel bonding can result in much
lower performance than letting Open MPI do the stripping and
fragmenting. This is true for a couple of reasons. Channel bonding
requires that packet delivery be in order, so it can not round-robin
short message delivery. While we may have to queue a message
temporarily, we can effectively use both NICs for short messages.
Second, our effective bandwidth for large messages should be nearly N
* effective bandwidth for one NIC. This is rarely the case for
channel bonding, again because of ordering issues. We don't even
have to queue long message fragments internally in the multi-nic
case, as we can immediately write that part of the message into user
space (even if it's after a fragment we haven't received yet).

Of course, if you also need more bandwidth for NFS or MPI
implementations that don't support multi-nic usage, you might not
have an option outside of channel bonding.

Brian

-- 
   Brian Barrett
   Open MPI developer
   http://www.open-mpi.org/