Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How does authentication between nodes work without password? (Newbie alert on)
From: Gus Correa (gus_at_[hidden])
Date: 2011-02-14 18:46:56


Tena Sakai wrote:
> Hi Kevin,
>
> Thanks for your reply.
> Dasher is physically located under my desk and vixen is in a
> cecure data center.
>
>> does dasher have any network interfaces that vixen does not?
>
> No, I don't think so.
> Here is more definitive info:
> [tsakai_at_dasher Rmpi]$ ifconfig
> eth0 Link encap:Ethernet HWaddr 00:1A:A0:E1:84:A9
> inet addr:172.16.0.116 Bcast:172.16.3.255 Mask:255.255.252.0
> inet6 addr: fe80::21a:a0ff:fee1:84a9/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:2347 errors:0 dropped:0 overruns:0 frame:0
> TX packets:1005 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:100
> RX bytes:531809 (519.3 KiB) TX bytes:269872 (263.5 KiB)
> Memory:c2200000-c2220000
>
> lo Link encap:Local Loopback
> inet addr:127.0.0.1 Mask:255.0.0.0
> inet6 addr: ::1/128 Scope:Host
> UP LOOPBACK RUNNING MTU:16436 Metric:1
> RX packets:74 errors:0 dropped:0 overruns:0 frame:0
> TX packets:74 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:7824 (7.6 KiB) TX bytes:7824 (7.6 KiB)
>
> [tsakai_at_dasher Rmpi]$
>
> However, vixen has two ethernet[tsakai_at_vixen Rmpi]$ cat moo
> [root_at_vixen ec2]# /sbin/ifconfig
> eth0 Link encap:Ethernet HWaddr 00:1A:A0:1C:00:31
> inet addr:10.1.1.2 Bcast:192.168.255.255 Mask:255.0.0.0
> inet6 addr: fe80::21a:a0ff:fe1c:31/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:61913135 errors:0 dropped:0 overruns:0 frame:0
> TX packets:61923635 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:47832124690 (44.5 GiB) TX bytes:54515478860 (50.7 GiB)
> Interrupt:185 Memory:ea000000-ea012100
>
> eth1 Link encap:Ethernet HWaddr 00:1A:A0:1C:00:33
> inet addr:172.16.1.107 Bcast:172.16.3.255 Mask:255.255.252.0
> inet6 addr: fe80::21a:a0ff:fe1c:33/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:5204431112 errors:0 dropped:0 overruns:0 frame:0
> TX packets:8935796075 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:371123590892 (345.6 GiB) TX bytes:13424246629869 (12.2
> TiB)
> Interrupt:193 Memory:ec000000-ec012100
>
> lo Link encap:Local Loopback
> inet addr:127.0.0.1 Mask:255.0.0.0
> inet6 addr: ::1/128 Scope:Host
> UP LOOPBACK RUNNING MTU:16436 Metric:1
> RX packets:244169216 errors:0 dropped:0 overruns:0 frame:0
> TX packets:244169216 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:1190976360356 (1.0 TiB) TX bytes:1190976360356 (1.0
> TiB)
>
> [root_at_vixen ec2]# interfaces:
>
> Please see the mail posting that follows this, my reply to Ashley,
> whom nailed the problem precisely.
>
> Regards,
>
> Tena
>
>
> On 2/14/11 1:35 PM, "Kevin.Buckley_at_[hidden]"
> <Kevin.Buckley_at_[hidden]> wrote:
>
>> This probably shows my lack of understanding as to how OpenMPI
>> negotiates the connectivity between nodes when given a choice
>> of interfaces but anyway:
>>
>> does dasher have any network interfaces that vixen does not?
>>
>> The scenario I am imgaining would be that you ssh into dasher
>> from vixen using a "network" that both share and similarly, when
>> you mpirun from vixen, the network that OpenMPI uses is constrained
>> by the interfaces that can be seen from vixen, so you are fine.
>>
>> However when you are on dasher, mpirun sees another interface which
>> it takes a liking to and so tries to use that, but that interface
>> is not available to vixen so the OpenMPI processes spawned there
>> terminate when they can't find that interface so as to talk back
>> to dasher's controlling process.
>>
>> I know that you are no longer working with VMs but it's along those
>> lines that I was thinking: extra network interfaces that you assume
>> won't be used but which are and which could then be overcome by use
>> of an explicit
>>
>> --mca btl_tcp_if_exclude virbr0
>>
>> or some such construction (virbr0 used as an example here).
>>
>> Kevin
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Hi Tena

They seem to be connected through the LAN 172.16.0.0/255.255.252.0,
with private IPs 172.16.0.116 (dashen,eth0) and
172.16.1.107 (vixen,eth1).
These addresses are probably what OpenMPI is using.
Not much like a cluster, but just machines in a LAN.

Hence, I don't understand why the lack of symmetry in the
firewall protection.
Either vixen's is too loose, or dashen's is too tight, I'd risk to say.
Maybe dashen was installed later, just got whatever boilerplate firewall
that comes with RedHat, CentOS, Fedora.
If there is a gateway for this LAN somewhere with another firewall,
which is probably the case,
I'd guess it is OK to turn off dashen's firewall.

Do you have Internet access from either machine?

Vixen has yet another private IP 10.1.1.2 (eth0),
with a bit weird combination of broadcast address 192.168.255.255 (?),
and mask 255.0.0.0.
Maybe vixen is/was part of another group of machines, via this other IP,
a cluster perhaps?

What is in your ${TORQUE}/server_priv/nodes file?
IPs or names (vixen & dashen).

Are they on a DNS server or do you resolve their names/IPs
via /etc/hosts?

Hopefully vixen's name resolves as 172.16.1.107.
(ping -R vixen may tell).

Gus Correa