Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Matteo Guglielmi (matteo.guglielmi_at_[hidden])
Date: 2007-02-12 12:54:21


This is the ifconfig output from the machine I'm used to submit the
parallel job:

### ifconfig output - master node ###

[root_at_lcbcpc02 ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:15:17:10:53:C8
          inet addr:128.178.54.74 Bcast:128.178.54.255 Mask:255.255.255.0
          inet6 addr: fe80::215:17ff:fe10:53c8/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:11563938 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6670398 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:16562149093 (15.4 GiB) TX bytes:1312532185 (1.2 GiB)
          Base address:0x2020 Memory:c2820000-c2840000

eth1 Link encap:Ethernet HWaddr 00:15:17:10:53:C9
          inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0
          inet6 addr: fe80::215:17ff:fe10:53c9/64 Scope:Link
          UP BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
          Base address:0x2000 Memory:c2800000-c2820000

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:468156 errors:0 dropped:0 overruns:0 frame:0
          TX packets:468156 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:500286061 (477.1 MiB) TX bytes:500286061 (477.1 MiB)

This is the ifconfig output from the "slave node":

### ifconfig output - slave node ###

[root_at_lcbcpc04 ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:15:17:10:53:74
          inet addr:128.178.54.76 Bcast:128.178.54.255 Mask:255.255.255.0
          inet6 addr: fe80::215:17ff:fe10:5374/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:320264 errors:0 dropped:0 overruns:0 frame:0
          TX packets:151942 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:139280839 (132.8 MiB) TX bytes:82889237 (79.0 MiB)
          Base address:0x2020 Memory:c2820000-c2840000

eth1 Link encap:Ethernet HWaddr 00:15:17:10:53:75
          inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0
          inet6 addr: fe80::215:17ff:fe10:5375/64 Scope:Link
          UP BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
          Base address:0x2000 Memory:c2800000-c2820000

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:2820 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2820 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2178053 (2.0 MiB) TX bytes:2178053 (2.0 MiB)

Thanks Jeff!!!

Jeff Squyres wrote:
> I'm assuming that these are Linux hosts. If so, errno 111 is
> "connection refused" possibly meaning that there is still some
> firewall active or the wrong interface is being used to establish
> connections between these machines.
>
> Can you send the output of "ifconfig" (might be /sbin/ifconfig on
> your machine?) from both machines?
>
>
> On Feb 11, 2007, at 3:45 PM, matteo.guglielmi_at_[hidden] wrote:
>
>
>> Since I've installed openmpi I cannot submit any job that uses cpus
>> from
>> different machines.
>>
>> ### hostfile ###
>> lcbcpc02.epfl.ch slots=4 max-slots=4
>> lcbcpc04.epfl.ch slots=4 max-slots=4
>> ################
>>
>> ### error message ###
>> [matteo_at_lcbcpc02 TEST]$ mpirun --hostfile ~matteo/hostfile -np 8
>> /home/matteo/Software/NWChem/5.0/bin/nwchem ./nwchem.nw
>> [0,1,5][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:
>> 572:mca_btl_tcp_endpoint_complete_connect]
>> [0,1,6][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:
>> 572:mca_btl_tcp_endpoint_complete_connect]
>> connect() failed with errno=111
>> 6: lcbcpc04.epfl.ch len=16
>> [0,1,4][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:
>> 572:mca_btl_tcp_endpoint_complete_connect]
>> connect() failed with errno=111
>> 4: lcbcpc04.epfl.ch len=16
>> [0,1,7][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:
>> 572:mca_btl_tcp_endpoint_complete_connect]
>> connect() failed with errno=111
>> 7: lcbcpc04.epfl.ch len=16
>> connect() failed with errno=111
>> 5: lcbcpc04.epfl.ch len=16
>> #####################
>>
>> I did disable the firewall on both machines but I still get that
>> error message.
>>
>> Thanks,
>> MG.
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>