Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Charles Wright (charles_at_[hidden])
Date: 2006-03-16 16:58:32


That works!!
Thanks!!

George Bosilca wrote:
>Sorry I wasn't clear enough on my previous post. The error messages that
>you get are comming from the OOB which is the framework we're using to
>setup the MPI run. The options that you use (btl_tcp_if_include) are only
>used for MPI communications. Please add "--mca oob_tcp_include eth0" to
>force the OOB framework to use eth0. In order to don't have to type all
>these options all the time you can add them in the
>$(HOME).openmpi/mca-params.conf file. A file containing:
>
>oob_tcp_include=eth1
>btl_tcp_if_include=eth1
>
>should solve your problems, if the firewall is opened on eth1 between
>these nodes.
>
> Thanks,
> george.
>
>On Thu, 16 Mar 2006, Charles Wright wrote:
>
>
>>Thanks for the tip.
>>
>>I see that both number 1 and 2 are true.
>>Openmpi is insisting on using my eth0 (I know this by watching the
>>firewall log on the node it is trying to go to)
>>
>>This is despite the fact that I have the first dns entry go to eth1,
>>normally that is all pbs would need to do the right thing and use the
>>network I prefer.
>>
>>Ok so I see there are some options to in/exclude interfaces.
>>
>>however mpiexec is igorning my requests.
>>I tried it two ways. Neither worked. Firewall rejects traffic coming
>>into 1.0.x.x. network in both cases.
>>
>>/opt/asn/apps/openmpi-1.0.1/bin/mpiexec --gmca btl_tcp_if_include eth1
>>-n 2 $XD1LAUNCHER ./mpimeasure
>>/opt/asn/apps/openmpi-1.0.1/bin/mpiexec --gmca btl_tcp_if_exclude eth0
>>-n 2 $XD1LAUNCHER ./mpimeasure
>>
>>(see dns works... not over eth0)
>>uahrcw_at_c344-6:~/mpi-benchmarks> /sbin/ifconfig
>>eth0 Link encap:Ethernet HWaddr 00:0E:AB:01:58:60
>> inet addr:1.0.21.134 Bcast:1.127.255.255 Mask:255.128.0.0
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:6596091 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:316165 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:560395541 (534.4 Mb) TX bytes:34367848 (32.7 Mb)
>> Interrupt:16
>>
>>eth1 Link encap:Ethernet HWaddr 00:0E:AB:01:58:61
>> inet addr:1.128.21.134 Mask:255.128.0.0
>> UP RUNNING NOARP MTU:1500 Metric:1
>> RX packets:5600487 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:4863441 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:6203028277 (5915.6 Mb) TX bytes:566471561 (540.2 Mb)
>> Interrupt:25
>>
>>eth2 Link encap:Ethernet HWaddr 00:0E:AB:01:58:62
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:829064 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:181572 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:61216408 (58.3 Mb) TX bytes:19079579 (18.1 Mb)
>> Base address:0x2000 Memory:fea80000-feaa0000
>>
>>eth2:2 Link encap:Ethernet HWaddr 00:0E:AB:01:58:62
>> inet addr:129.66.9.146 Bcast:129.66.9.255 Mask:255.255.255.0
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> Base address:0x2000 Memory:fea80000-feaa0000
>>
>>lo Link encap:Local Loopback
>> inet addr:127.0.0.1 Mask:255.0.0.0
>> UP LOOPBACK RUNNING MTU:16436 Metric:1
>> RX packets:14259 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:14259 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:0
>> RX bytes:879631 (859.0 Kb) TX bytes:879631 (859.0 Kb)
>>
>>uahrcw_at_c344-6:~/mpi-benchmarks> ping c344-5
>>PING c344-5.x.asc.edu (1.128.21.133) 56(84) bytes of data.
>>64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=1 ttl=64
>>time=0.067 ms
>>64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=2 ttl=64
>>time=0.037 ms
>>64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=3 ttl=64
>>time=0.022 ms
>>
>>--- c344-5.x.asc.edu ping statistics ---
>>3 packets transmitted, 3 received, 0% packet loss, time 1999ms
>>rtt min/avg/max/mdev = 0.022/0.042/0.067/0.018 ms
>>
>>
>>
>>George Bosilca wrote:
>>
>>>I see only 2 possibilities:
>>>1. your trying to run Open MPI on nodes having multiple IP
>>>addresses.
>>>2. your nodes are behind firewalls and Open MPI is unable to pass through.
>>>
>>>Please check the FAQ on http://www.open-mpi.org/faq/ to find out the full
>>>answer to your question.
>>>
>>> Thanks,
>>> george.
>>>
>>>On Thu, 16 Mar 2006, Charles Wright wrote:
>>>
>>>
>>>
>>>>Hello,
>>>> I'm just compiled open-mpi and tried to run my code which just
>>>>measures bandwidth from one node to another. (Code compile fine and
>>>>runs under other mpi implementations)
>>>>
>>>>When I did I got this.
>>>>
>>>>uahrcw_at_c275-6:~/mpi-benchmarks> cat openmpitcp.o15380
>>>>c317-6
>>>>c317-5
>>>>[c317-5:24979] [0,0,2]-[0,0,0] mca_oob_tcp_peer_complete_connect:
>>>>connection failed (errno=110) - retrying (pid=24979)
>>>>[c317-5:24979] mca_oob_tcp_peer_timer_handler
>>>>[c317-5:24997] [0,1,1]-[0,0,0] mca_oob_tcp_peer_complete_connect:
>>>>connection failed (errno=110) - retrying (pid=24997)
>>>>[c317-5:24997] mca_oob_tcp_peer_timer_handler
>>>>
>>>>[0,1,1][btl_tcp_endpoint.c:559:mca_btl_tcp_endpoint_complete_connect]
>>>>connect() failed with errno=110
>>>>
>>>>
>>>>I compiled open-mpi with Pbspro 5.4-4 and I'm guessing that has
>>>>something to do with it.
>>>>
>>>>I've attached my config.log
>>>>
>>>>Any help with this would be appreciated.
>>>>
>>>>uahrcw_at_c275-6:~/mpi-benchmarks> ompi_info
>>>> Open MPI: 1.0.1r8453
>>>> Open MPI SVN revision: r8453
>>>> Open RTE: 1.0.1r8453
>>>> Open RTE SVN revision: r8453
>>>> OPAL: 1.0.1r8453
>>>> OPAL SVN revision: r8453
>>>> Prefix: /opt/asn/apps/openmpi-1.0.1
>>>>Configured architecture: x86_64-unknown-linux-gnu
>>>> Configured by: asnrcw
>>>> Configured on: Fri Feb 24 15:19:37 CST 2006
>>>> Configure host: c275-6
>>>> Built by: asnrcw
>>>> Built on: Fri Feb 24 15:40:09 CST 2006
>>>> Built host: c275-6
>>>> C bindings: yes
>>>> C++ bindings: yes
>>>> Fortran77 bindings: yes (all)
>>>> Fortran90 bindings: no
>>>> C compiler: gcc
>>>> C compiler absolute: /usr/bin/gcc
>>>> C++ compiler: g++
>>>> C++ compiler absolute: /usr/bin/g++
>>>> Fortran77 compiler: g77
>>>>Fortran77 compiler abs: /usr/bin/g77
>>>> Fortran90 compiler: ifort
>>>>Fortran90 compiler abs: /opt/asn/intel/fce/9.0/bin/ifort
>>>> C profiling: yes
>>>> C++ profiling: yes
>>>> Fortran77 profiling: yes
>>>> Fortran90 profiling: no
>>>> C++ exceptions: no
>>>> Thread support: posix (mpi: no, progress: no)
>>>>Internal debug support: no
>>>> MPI parameter check: runtime
>>>>Memory profiling support: no
>>>>Memory debugging support: no
>>>> libltdl support: 1
>>>> MCA memory: malloc_hooks (MCA v1.0, API v1.0, Component
>>>>v1.0.1)
>>>> MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA timer: linux (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>>>> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
>>>> MCA coll: basic (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA coll: self (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA coll: sm (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA io: romio (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA pml: teg (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA ptl: self (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA ptl: sm (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA ptl: tcp (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA btl: self (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA btl: sm (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
>>>> MCA topo: unity (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA gpr: null (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA gpr: replica (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA iof: svc (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA ns: proxy (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA ns: replica (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
>>>> MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA ras: localhost (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA ras: slurm (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA ras: tm (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA rds: resfile (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA rml: oob (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA pls: daemon (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA pls: proxy (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA pls: fork (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA pls: rsh (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA pls: slurm (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA pls: tm (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA sds: env (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA sds: seed (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA sds: singleton (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA sds: slurm (MCA v1.0, API v1.0, Component v1.0.1)
>>>> MCA sds: pipe (MCA v1.0, API v1.0, Component v1.0.1)
>>>>uahrcw_at_c275-6:~/mpi-benchmarks>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>"We must accept finite disappointment, but we must never lose infinite
>>>hope."
>>> Martin Luther King
>>>
>>>_______________________________________________
>>>users mailing list
>>>users_at_[hidden]
>>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>
>>
>
>"We must accept finite disappointment, but we must never lose infinite
>hope."
> Martin Luther King
>
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>

-- 
Charles Wright, HPC Systems Administrator
Alabama Research and Education Network
Computer Sciences Corporation