Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Problem running an mpi applicatio​n on nodes with more than one interface
From: Jingcha Joba (pukkimonkey_at_[hidden])
Date: 2012-02-16 15:43:28


Hello Everyone,
This is my 1st post in open-mpi forum.
I am trying to run a simple program which does Sendrecv between two nodes
having 2 interface cards on each of two nodes.
Both the nodes are running RHEL6, with open-mpi 1.4.4 on a 8 core Xeon
processor.
 What I noticed was that when using two or more interface on both the
nodes, the mpi "hangs" attempting to connect.
These details might help,
Node 1 - Denver has a single port "A" card (*eth21* - 25.192.xx.xx - which
I use to ssh to that machine), and a double port "B" card (*eth23* -
10.3.1.1 & *eth24* - 10.3.1.2).
Node 2 - Chicago also the same single port A card (*eth19* - 25.192.xx.xx -
again uses for ssh) and a double port B card ( *eth29* - 10.3.1.3 &
*eth30*- 10.3.1.4).
 My /etc/host looks like
25.192.xx.xx denver.xxx.com denver
10.3.1.1 denver.xxx.com denver
10.3.1.2 denver.xxx.com denver
25.192.xx.xx chicago.xxx.com chicago
10.3.1.3 chicago.xxx.com chicago
10.3.1.4 chicago.xxx.com chicago
...
...
...
 This is how I run,
mpirun --hostfile host1 --mca btl tcp,sm,self --mca btl_tcp_if_exclude
eth21,eth19,lo,virbr0 --mca btl_base_verbose 30 -np 4 ./Sendrecv
 I get bunch of things from both chicago and denver, which says its has
found components like tcp, sm, self and stuffs, and then hangs at
*[denver.xxx.com:21682] btl: tcp: attempting to connect() to address
10.3.1.3 on port 4
[denver.xxx.com:21682] btl: tcp: attempting to connect() to address
10.3.1.4 on port 4
*
However, if I run the same program by excluding eth29 or eth30, then it
works fine. Something like this:
mpirun --hostfile host1 --mca btl tcp,sm,self --mca btl_tcp_if_exclude
eth21,eth19,*eth29*,lo,virbr0 --mca btl_base_verbose 30 -np 4 ./Sendrecv
 My hostfile looks like this
[sshuser_at_denver Sendrecv]$ cat host1
denver slots=2
chicago slots=2
 I am not sure if I have to provide somethbing else. Please if I have to,
please feel to ask me..
thanks,

--
Joba