Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How does authentication between nodes work without password? (Newbie alert on)
From: Tena Sakai (tsakai_at_[hidden])
Date: 2011-02-14 18:03:14


Hi Kevin,

Thanks for your reply.
Dasher is physically located under my desk and vixen is in a
cecure data center.

> does dasher have any network interfaces that vixen does not?

No, I don't think so.
Here is more definitive info:
  [tsakai_at_dasher Rmpi]$ ifconfig
  eth0 Link encap:Ethernet HWaddr 00:1A:A0:E1:84:A9
            inet addr:172.16.0.116 Bcast:172.16.3.255 Mask:255.255.252.0
            inet6 addr: fe80::21a:a0ff:fee1:84a9/64 Scope:Link
            UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
            RX packets:2347 errors:0 dropped:0 overruns:0 frame:0
            TX packets:1005 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:100
            RX bytes:531809 (519.3 KiB) TX bytes:269872 (263.5 KiB)
            Memory:c2200000-c2220000

  lo Link encap:Local Loopback
            inet addr:127.0.0.1 Mask:255.0.0.0
            inet6 addr: ::1/128 Scope:Host
            UP LOOPBACK RUNNING MTU:16436 Metric:1
            RX packets:74 errors:0 dropped:0 overruns:0 frame:0
            TX packets:74 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:0
            RX bytes:7824 (7.6 KiB) TX bytes:7824 (7.6 KiB)

  [tsakai_at_dasher Rmpi]$

However, vixen has two ethernet[tsakai_at_vixen Rmpi]$ cat moo
  [root_at_vixen ec2]# /sbin/ifconfig
  eth0 Link encap:Ethernet HWaddr 00:1A:A0:1C:00:31
            inet addr:10.1.1.2 Bcast:192.168.255.255 Mask:255.0.0.0
            inet6 addr: fe80::21a:a0ff:fe1c:31/64 Scope:Link
            UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
            RX packets:61913135 errors:0 dropped:0 overruns:0 frame:0
            TX packets:61923635 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:1000
            RX bytes:47832124690 (44.5 GiB) TX bytes:54515478860 (50.7 GiB)
            Interrupt:185 Memory:ea000000-ea012100
  
  eth1 Link encap:Ethernet HWaddr 00:1A:A0:1C:00:33
            inet addr:172.16.1.107 Bcast:172.16.3.255 Mask:255.255.252.0
            inet6 addr: fe80::21a:a0ff:fe1c:33/64 Scope:Link
            UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
            RX packets:5204431112 errors:0 dropped:0 overruns:0 frame:0
            TX packets:8935796075 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:1000
            RX bytes:371123590892 (345.6 GiB) TX bytes:13424246629869 (12.2
TiB)
            Interrupt:193 Memory:ec000000-ec012100
  
  lo Link encap:Local Loopback
            inet addr:127.0.0.1 Mask:255.0.0.0
            inet6 addr: ::1/128 Scope:Host
            UP LOOPBACK RUNNING MTU:16436 Metric:1
            RX packets:244169216 errors:0 dropped:0 overruns:0 frame:0
            TX packets:244169216 errors:0 dropped:0 overruns:0 carrier:0
            collisions:0 txqueuelen:0
            RX bytes:1190976360356 (1.0 TiB) TX bytes:1190976360356 (1.0
TiB)
  
  [root_at_vixen ec2]# interfaces:

Please see the mail posting that follows this, my reply to Ashley,
whom nailed the problem precisely.

Regards,

Tena

On 2/14/11 1:35 PM, "Kevin.Buckley_at_[hidden]"
<Kevin.Buckley_at_[hidden]> wrote:

>
> This probably shows my lack of understanding as to how OpenMPI
> negotiates the connectivity between nodes when given a choice
> of interfaces but anyway:
>
> does dasher have any network interfaces that vixen does not?
>
> The scenario I am imgaining would be that you ssh into dasher
> from vixen using a "network" that both share and similarly, when
> you mpirun from vixen, the network that OpenMPI uses is constrained
> by the interfaces that can be seen from vixen, so you are fine.
>
> However when you are on dasher, mpirun sees another interface which
> it takes a liking to and so tries to use that, but that interface
> is not available to vixen so the OpenMPI processes spawned there
> terminate when they can't find that interface so as to talk back
> to dasher's controlling process.
>
> I know that you are no longer working with VMs but it's along those
> lines that I was thinking: extra network interfaces that you assume
> won't be used but which are and which could then be overcome by use
> of an explicit
>
> --mca btl_tcp_if_exclude virbr0
>
> or some such construction (virbr0 used as an example here).
>
> Kevin