Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Charles Wright (charles_at_[hidden])
Date: 2006-03-16 16:32:53


Thanks for the tip.

I see that both number 1 and 2 are true.
Openmpi is insisting on using my eth0 (I know this by watching the
firewall log on the node it is trying to go to)

This is despite the fact that I have the first dns entry go to eth1,
normally that is all pbs would need to do the right thing and use the
network I prefer.

Ok so I see there are some options to in/exclude interfaces.

however mpiexec is igorning my requests.
I tried it two ways. Neither worked. Firewall rejects traffic coming
into 1.0.x.x. network in both cases.

/opt/asn/apps/openmpi-1.0.1/bin/mpiexec --gmca btl_tcp_if_include eth1
-n 2 $XD1LAUNCHER ./mpimeasure
/opt/asn/apps/openmpi-1.0.1/bin/mpiexec --gmca btl_tcp_if_exclude eth0
-n 2 $XD1LAUNCHER ./mpimeasure

(see dns works... not over eth0)
uahrcw_at_c344-6:~/mpi-benchmarks> /sbin/ifconfig
eth0 Link encap:Ethernet HWaddr 00:0E:AB:01:58:60
          inet addr:1.0.21.134 Bcast:1.127.255.255 Mask:255.128.0.0
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:6596091 errors:0 dropped:0 overruns:0 frame:0
          TX packets:316165 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:560395541 (534.4 Mb) TX bytes:34367848 (32.7 Mb)
          Interrupt:16

eth1 Link encap:Ethernet HWaddr 00:0E:AB:01:58:61
          inet addr:1.128.21.134 Mask:255.128.0.0
          UP RUNNING NOARP MTU:1500 Metric:1
          RX packets:5600487 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4863441 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:6203028277 (5915.6 Mb) TX bytes:566471561 (540.2 Mb)
          Interrupt:25

eth2 Link encap:Ethernet HWaddr 00:0E:AB:01:58:62
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:829064 errors:0 dropped:0 overruns:0 frame:0
          TX packets:181572 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:61216408 (58.3 Mb) TX bytes:19079579 (18.1 Mb)
          Base address:0x2000 Memory:fea80000-feaa0000

eth2:2 Link encap:Ethernet HWaddr 00:0E:AB:01:58:62
          inet addr:129.66.9.146 Bcast:129.66.9.255 Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          Base address:0x2000 Memory:fea80000-feaa0000

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:14259 errors:0 dropped:0 overruns:0 frame:0
          TX packets:14259 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:879631 (859.0 Kb) TX bytes:879631 (859.0 Kb)

uahrcw_at_c344-6:~/mpi-benchmarks> ping c344-5
PING c344-5.x.asc.edu (1.128.21.133) 56(84) bytes of data.
64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=1 ttl=64
time=0.067 ms
64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=2 ttl=64
time=0.037 ms
64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=3 ttl=64
time=0.022 ms

--- c344-5.x.asc.edu ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.022/0.042/0.067/0.018 ms

George Bosilca wrote:
>I see only 2 possibilities:
>1. your trying to run Open MPI on nodes having multiple IP
>addresses.
>2. your nodes are behind firewalls and Open MPI is unable to pass through.
>
>Please check the FAQ on http://www.open-mpi.org/faq/ to find out the full
>answer to your question.
>
> Thanks,
> george.
>
>On Thu, 16 Mar 2006, Charles Wright wrote:
>
>
>>Hello,
>> I'm just compiled open-mpi and tried to run my code which just
>>measures bandwidth from one node to another. (Code compile fine and
>>runs under other mpi implementations)
>>
>>When I did I got this.
>>
>>uahrcw_at_c275-6:~/mpi-benchmarks> cat openmpitcp.o15380
>>c317-6
>>c317-5
>>[c317-5:24979] [0,0,2]-[0,0,0] mca_oob_tcp_peer_complete_connect:
>>connection failed (errno=110) - retrying (pid=24979)
>>[c317-5:24979] mca_oob_tcp_peer_timer_handler
>>[c317-5:24997] [0,1,1]-[0,0,0] mca_oob_tcp_peer_complete_connect:
>>connection failed (errno=110) - retrying (pid=24997)
>>[c317-5:24997] mca_oob_tcp_peer_timer_handler
>>
>>[0,1,1][btl_tcp_endpoint.c:559:mca_btl_tcp_endpoint_complete_connect]
>>connect() failed with errno=110
>>
>>
>>I compiled open-mpi with Pbspro 5.4-4 and I'm guessing that has
>>something to do with it.
>>
>>I've attached my config.log
>>
>>Any help with this would be appreciated.
>>
>>uahrcw_at_c275-6:~/mpi-benchmarks> ompi_info
>> Open MPI: 1.0.1r8453
>> Open MPI SVN revision: r8453
>> Open RTE: 1.0.1r8453
>> Open RTE SVN revision: r8453
>> OPAL: 1.0.1r8453
>> OPAL SVN revision: r8453
>> Prefix: /opt/asn/apps/openmpi-1.0.1
>>Configured architecture: x86_64-unknown-linux-gnu
>> Configured by: asnrcw
>> Configured on: Fri Feb 24 15:19:37 CST 2006
>> Configure host: c275-6
>> Built by: asnrcw
>> Built on: Fri Feb 24 15:40:09 CST 2006
>> Built host: c275-6
>> C bindings: yes
>> C++ bindings: yes
>> Fortran77 bindings: yes (all)
>> Fortran90 bindings: no
>> C compiler: gcc
>> C compiler absolute: /usr/bin/gcc
>> C++ compiler: g++
>> C++ compiler absolute: /usr/bin/g++
>> Fortran77 compiler: g77
>> Fortran77 compiler abs: /usr/bin/g77
>> Fortran90 compiler: ifort
>> Fortran90 compiler abs: /opt/asn/intel/fce/9.0/bin/ifort
>> C profiling: yes
>> C++ profiling: yes
>> Fortran77 profiling: yes
>> Fortran90 profiling: no
>> C++ exceptions: no
>> Thread support: posix (mpi: no, progress: no)
>> Internal debug support: no
>> MPI parameter check: runtime
>>Memory profiling support: no
>>Memory debugging support: no
>> libltdl support: 1
>> MCA memory: malloc_hooks (MCA v1.0, API v1.0, Component
>>v1.0.1)
>> MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA timer: linux (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
>> MCA coll: basic (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA coll: self (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA coll: sm (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA io: romio (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA pml: teg (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA ptl: self (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA ptl: sm (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA ptl: tcp (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA btl: self (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA btl: sm (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
>> MCA topo: unity (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA gpr: null (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA gpr: replica (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA iof: svc (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA ns: proxy (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA ns: replica (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
>> MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA ras: localhost (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA ras: slurm (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA ras: tm (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA rds: resfile (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA rml: oob (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA pls: daemon (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA pls: proxy (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA pls: fork (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA pls: rsh (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA pls: slurm (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA pls: tm (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA sds: env (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA sds: seed (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA sds: singleton (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA sds: slurm (MCA v1.0, API v1.0, Component v1.0.1)
>> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.0.1)
>>uahrcw_at_c275-6:~/mpi-benchmarks>
>>
>>
>>
>>
>
>"We must accept finite disappointment, but we must never lose infinite
>hope."
> Martin Luther King
>
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>

-- 
Charles Wright, HPC Systems Administrator
Alabama Research and Education Network
Computer Sciences Corporation