Message: 2
Date: Mon, 13 Mar 2006 08:42:59 -0500
From: Brian Barrett <brbarret@open-mpi.org>
Subject: Re: [OMPI users] Using Multiple Gigabit Ethernet Interface
To: Open MPI Users <users@open-mpi.org>
Message-ID: <8F91AC34-6393-4173-84EF-5E2AC59BE6A9@open-mpi.org>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed

On Mar 11, 2006, at 1:00 PM, Jayabrata Chakrabarty wrote:

> Hi I have been looking for information on how to use multiple  
> Gigabit Ethernet Interface for MPI communication.
>
> So far what i have found out is i have to use mca_btl_tcp.
>
> But what i wish to know, is what IP Address to assign to each  
> Network Interface. I also wish to know if there will be any change  
> in the format of "hostfile"
>
> I have two Gigabit Ethernet Interface on a cluster of 5 nodes at  
> present.
  

Open MPI will use all available (and active) ethernet devices for MPI  
communication by default.  It does a relatively simplistic netmask  
comparison to prefer connections in the same subnet (so if host A has  
addresses 192.168.1.1/24 and 192.168.2.1/24 and host B has addresses  
192.168.1.2/24 and 192.168.2.2/24, OMPI will make a connection  
between the two 192.168.1 addresses and another between the two  
192.168.2 addresses).  If you have two separate switches for your two  
networks (which I would think would give best performance), make sure  
that the two have IP address ranges that are in different subnet mask  
ranges.  Other than that, Open MPI will do the rest.

In Open MPI, the hostfile is completely independent of the MPI  
communication network names, so no change is needed there.

I believe (but I could be wrong) that there was an issue with  
multiple TCP networks in 1.0.1.  I believe this might have been  
resolved in our upcoming 1.0.2 release.  You may want to try one of  
the 1.0.2 pre-releases if you run into trouble with the 1.0.1 release.

Hope this helps,

Brian


-- Brian Barrett Open MPI developer http://www.open-mpi.org/ ------------------------------ Dear Brian, I have the same setup as Mr. Chakrbarty with 16 nodes, Oscar 4.2.1 beta 4 and two Gigabit ethernet cards with two 16 and 24 port switches one smart and the other managed. I use dhcp to get the IP addresses for one eth card(The Ip addresses of these range from 192.168.1.1 ... 16) and use static IP addresses for the other NIC of 192.168.5.1 ... 16. The MTU of the first is 9000 for both the nICs and switch. For the second the MTU is 1500 for both the switch and the NIC cards as the switch cannot go beyond an MTU of beyond 1500. Using the -mca btl tcp switch with the 192.168.1.1 .. 16 NICs included and the 192.168.5.1 ... 16 excluded by switches -mca btl_tcp_if_include eth1(MTU=9000) and -mca btl_tcp_if_exclude eth0 (MTU=1500) I get with HPL a performance of approx 28.3GigaFlops with both Open Mpi and Mpich2. But since as you say above if you include both gigabit cards with the switch -mca btl_tcp_if_include eth0,eth1 using Open Mpi 1.1 (beta) or 1.01 teh performance should increase for the same N and NB in HPL I get aslight performance decrease instead of increase of about 0.5 to 1 gigaflop less. The hostfile is simply a1, a2 ... a16 using Oscar's DNS to resolve the domain name. Why is there a performance decrease? Regards, Allan Menezes