Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Matteo Guglielmi (matteo.guglielmi_at_[hidden])
Date: 2007-02-12 12:54:21


This is the ifconfig output from the machine I'm used to submit the
parallel job:

### ifconfig output - master node ###

[root_at_lcbcpc02 ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:15:17:10:53:C8
          inet addr:128.178.54.74 Bcast:128.178.54.255 Mask:255.255.255.0
          inet6 addr: fe80::215:17ff:fe10:53c8/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:11563938 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6670398 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:16562149093 (15.4 GiB) TX bytes:1312532185 (1.2 GiB)
          Base address:0x2020 Memory:c2820000-c2840000

eth1 Link encap:Ethernet HWaddr 00:15:17:10:53:C9
          inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0
          inet6 addr: fe80::215:17ff:fe10:53c9/64 Scope:Link
          UP BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
          Base address:0x2000 Memory:c2800000-c2820000

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:468156 errors:0 dropped:0 overruns:0 frame:0
          TX packets:468156 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:500286061 (477.1 MiB) TX bytes:500286061 (477.1 MiB)

This is the ifconfig output from the "slave node":

### ifconfig output - slave node ###

[root_at_lcbcpc04 ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:15:17:10:53:74
          inet addr:128.178.54.76 Bcast:128.178.54.255 Mask:255.255.255.0
          inet6 addr: fe80::215:17ff:fe10:5374/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:320264 errors:0 dropped:0 overruns:0 frame:0
          TX packets:151942 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:139280839 (132.8 MiB) TX bytes:82889237 (79.0 MiB)
          Base address:0x2020 Memory:c2820000-c2840000

eth1 Link encap:Ethernet HWaddr 00:15:17:10:53:75
          inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0
          inet6 addr: fe80::215:17ff:fe10:5375/64 Scope:Link
          UP BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
          Base address:0x2000 Memory:c2800000-c2820000

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:2820 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2820 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2178053 (2.0 MiB) TX bytes:2178053 (2.0 MiB)

Thanks Jeff!!!

Jeff Squyres wrote:
> I'm assuming that these are Linux hosts. If so, errno 111 is
> "connection refused" possibly meaning that there is still some
> firewall active or the wrong interface is being used to establish
> connections between these machines.
>
> Can you send the output of "ifconfig" (might be /sbin/ifconfig on
> your machine?) from both machines?
>
>
> On Feb 11, 2007, at 3:45 PM, matteo.guglielmi_at_[hidden] wrote:
>
>
>> Since I've installed openmpi I cannot submit any job that uses cpus
>> from
>> different machines.
>>
>> ### hostfile ###
>> lcbcpc02.epfl.ch slots=4 max-slots=4
>> lcbcpc04.epfl.ch slots=4 max-slots=4
>> ################
>>
>> ### error message ###
>> [matteo_at_lcbcpc02 TEST]$ mpirun --hostfile ~matteo/hostfile -np 8
>> /home/matteo/Software/NWChem/5.0/bin/nwchem ./nwchem.nw
>> [0,1,5][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:
>> 572:mca_btl_tcp_endpoint_complete_connect]
>> [0,1,6][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:
>> 572:mca_btl_tcp_endpoint_complete_connect]
>> connect() failed with errno=111
>> 6: lcbcpc04.epfl.ch len=16
>> [0,1,4][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:
>> 572:mca_btl_tcp_endpoint_complete_connect]
>> connect() failed with errno=111
>> 4: lcbcpc04.epfl.ch len=16
>> [0,1,7][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:
>> 572:mca_btl_tcp_endpoint_complete_connect]
>> connect() failed with errno=111
>> 7: lcbcpc04.epfl.ch len=16
>> connect() failed with errno=111
>> 5: lcbcpc04.epfl.ch len=16
>> #####################
>>
>> I did disable the firewall on both machines but I still get that
>> error message.
>>
>> Thanks,
>> MG.
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>