Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem running an mpi applicatio​n on nodes with more than one interface
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2012-02-17 11:19:49


+1

It is definitely bad Linux practice to have 2 ports on the same subnet.

If you still want that configuration, however (e.g., you have some conditions in your environment that make it workable), you can make Open MPI only use one or more of those interfaces via the btl_tcp_if_include (or btl_tcp_if_exclude) and oob_tcp_if_include (or oob_tcp_if_exclude) parameters.

For example:

   mpirun --mca btl_tcp_if_include eth21,eth23 \
     --mca oob_tcp_if_include eth21 ...

"BTL" is the plugin type for MPI communication, so providing multiple interfaces there is a good idea (i.e., OMPI will stripe large messages over both ports). "OOB" is OMPI's bootstrap system (i.e., it's used for startup and shutdown), so limiting it to 1 port is fine -- it doesn't have high bandwidth requirements.

I suppose we could make Open MPI print a warning in the case where it detects multiple IP interfaces on the same subnet (because it may or may not work); I'll file a feature enhancement.

On Feb 17, 2012, at 9:34 AM, Rolf vandeVaart wrote:

> Open MPI cannot handle having two interfaces on a node on the same subnet. I believe it has to do with our matching code when we try to match up a connection.
> The result is a hang as you observe. I also believe it is not good practice to have two interfaces on the same subnet.
> If you put them on different subnets, things will work fine and communication will stripe over the two of them.
>
> Rolf
>
>
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On Behalf Of Richard Bardwell
> Sent: Friday, February 17, 2012 5:37 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Problem running an mpi applicatio​n on nodes with more than one interface
>
> I had exactly the same problem.
> Trying to run mpi between 2 separate machines, with each machine having
> 2 ethernet ports, causes really weird behaviour on the most basic code.
> I had to disable one of the ethernet ports on each of the machines
> and it worked just fine after that. No idea why though !
>
> ----- Original Message -----
> From: Jingcha Joba
> To: users_at_[hidden]
> Sent: Thursday, February 16, 2012 8:43 PM
> Subject: [OMPI users] Problem running an mpi applicatio​n on nodes with more than one interface
>
> Hello Everyone,
> This is my 1st post in open-mpi forum.
> I am trying to run a simple program which does Sendrecv between two nodes having 2 interface cards on each of two nodes.
> Both the nodes are running RHEL6, with open-mpi 1.4.4 on a 8 core Xeon processor.
> What I noticed was that when using two or more interface on both the nodes, the mpi "hangs" attempting to connect.
> These details might help,
> Node 1 - Denver has a single port "A" card (eth21 - 25.192.xx.xx - which I use to ssh to that machine), and a double port "B" card (eth23 - 10.3.1.1 & eth24 - 10.3.1.2).
> Node 2 - Chicago also the same single port A card (eth19 - 25.192.xx.xx - again uses for ssh) and a double port B card ( eth29 - 10.3.1.3 & eth30 - 10.3.1.4).
> My /etc/host looks like
> 25.192.xx.xx denver.xxx.com denver
> 10.3.1.1 denver.xxx.com denver
> 10.3.1.2 denver.xxx.com denver
> 25.192.xx.xx chicago.xxx.com chicago
> 10.3.1.3 chicago.xxx.com chicago
> 10.3.1.4 chicago.xxx.com chicago
> ...
> ...
> ...
> This is how I run,
> mpirun --hostfile host1 --mca btl tcp,sm,self --mca btl_tcp_if_exclude eth21,eth19,lo,virbr0 --mca btl_base_verbose 30 -np 4 ./Sendrecv
> I get bunch of things from both chicago and denver, which says its has found components like tcp, sm, self and stuffs, and then hangs at
> [denver.xxx.com:21682] btl: tcp: attempting to connect() to address 10.3.1.3 on port 4
> [denver.xxx.com:21682] btl: tcp: attempting to connect() to address 10.3.1.4 on port 4
> However, if I run the same program by excluding eth29 or eth30, then it works fine. Something like this:
> mpirun --hostfile host1 --mca btl tcp,sm,self --mca btl_tcp_if_exclude eth21,eth19,eth29,lo,virbr0 --mca btl_base_verbose 30 -np 4 ./Sendrecv
> My hostfile looks like this
> [sshuser_at_denver Sendrecv]$ cat host1
> denver slots=2
> chicago slots=2
> I am not sure if I have to provide somethbing else. Please if I have to, please feel to ask me..
> thanks,
> --
> Joba
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/