Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] unknown interface on openmpi-1.8.2a1r31742
From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2014-05-14 14:44:52


Hi Ralph,

> Hmmm...well, that's an interesting naming scheme :-)
>
> Try adding "-mca oob_base_verbose 10 --report-uri -" on your cmd line
> and let's see what it thinks is happening

tyr fd1026 105 mpiexec -np 3 --host tyr,sunpc1,linpc1 --mca oob_base_verbose 10 --report-uri - hostname
[tyr.informatik.hs-fulda.de:06877] mca: base: components_register: registering oob components
[tyr.informatik.hs-fulda.de:06877] mca: base: components_register: found loaded component tcp
[tyr.informatik.hs-fulda.de:06877] mca: base: components_register: component tcp register function successful
[tyr.informatik.hs-fulda.de:06877] mca: base: components_open: opening oob components
[tyr.informatik.hs-fulda.de:06877] mca: base: components_open: found loaded component tcp
[tyr.informatik.hs-fulda.de:06877] mca: base: components_open: component tcp open function successful
[tyr.informatik.hs-fulda.de:06877] mca:oob:select: checking available component tcp
[tyr.informatik.hs-fulda.de:06877] mca:oob:select: Querying component [tcp]
[tyr.informatik.hs-fulda.de:06877] oob:tcp: component_available called
[tyr.informatik.hs-fulda.de:06877] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
[tyr.informatik.hs-fulda.de:06877] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4
[tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:tcp:init creating module for V4 address on interface bge0
[tyr.informatik.hs-fulda.de:06877] [[48379,0],0] creating OOB-TCP module for interface bge0
[tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:tcp:init adding 193.174.24.39 to our list of V4 connections
[tyr.informatik.hs-fulda.de:06877] [[48379,0],0] TCP STARTUP
[tyr.informatik.hs-fulda.de:06877] [[48379,0],0] attempting to bind to IPv4 port 0
[tyr.informatik.hs-fulda.de:06877] [[48379,0],0] assigned IPv4 port 55567
[tyr.informatik.hs-fulda.de:06877] mca:oob:select: Adding component to end
[tyr.informatik.hs-fulda.de:06877] mca:oob:select: Found 1 active transports
3170566144.0;tcp://193.174.24.39:55567
[sunpc1:07690] mca: base: components_register: registering oob components
[sunpc1:07690] mca: base: components_register: found loaded component tcp
[sunpc1:07690] mca: base: components_register: component tcp register function successful
[sunpc1:07690] mca: base: components_open: opening oob components
[sunpc1:07690] mca: base: components_open: found loaded component tcp
[sunpc1:07690] mca: base: components_open: component tcp open function successful
[sunpc1:07690] mca:oob:select: checking available component tcp
[sunpc1:07690] mca:oob:select: Querying component [tcp]
[sunpc1:07690] oob:tcp: component_available called
[sunpc1:07690] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
[sunpc1:07690] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4
[sunpc1:07690] [[48379,0],1] oob:tcp:init creating module for V4 address on interface nge0
[sunpc1:07690] [[48379,0],1] creating OOB-TCP module for interface nge0
[sunpc1:07690] [[48379,0],1] oob:tcp:init adding 193.174.26.210 to our list of V4 connections
[sunpc1:07690] [[48379,0],1] TCP STARTUP
[sunpc1:07690] [[48379,0],1] attempting to bind to IPv4 port 0
[sunpc1:07690] [[48379,0],1] assigned IPv4 port 39616
[sunpc1:07690] mca:oob:select: Adding component to end
[sunpc1:07690] mca:oob:select: Found 1 active transports
[sunpc1:07690] [[48379,0],1]: set_addr to uri 3170566144.0;tcp://193.174.24.39:55567
[sunpc1:07690] [[48379,0],1]:set_addr checking if peer [[48379,0],0] is reachable via component tcp
[sunpc1:07690] [[48379,0],1] oob:tcp: working peer [[48379,0],0] address tcp://193.174.24.39:55567
[sunpc1:07690] [[48379,0],1] UNFOUND KERNEL INDEX -13 FOR ADDRESS 193.174.24.39
[sunpc1:07690] [[48379,0],1] PEER [[48379,0],0] MAY BE REACHABLE BY ROUTING - ASSIGNING MODULE AT KINDEX 2 INTERFACE nge0
[sunpc1:07690] [[48379,0],1] PASSING ADDR 193.174.24.39 TO INTERFACE nge0 AT KERNEL INDEX 2
[sunpc1:07690] [[48379,0],1]:tcp set addr for peer [[48379,0],0]
[sunpc1:07690] [[48379,0],1]: peer [[48379,0],0] is reachable via component tcp
[sunpc1:07690] [[48379,0],1] OOB_SEND: ../../../../../openmpi-1.8.2a1r31742/orte/mca/rml/oob/rml_oob_send.c:199
[sunpc1:07690] [[48379,0],1]:tcp:processing set_peer cmd for interface nge0
[sunpc1:07690] [[48379,0],1] oob:base:send to target [[48379,0],0]
[sunpc1:07690] [[48379,0],1] oob:tcp:send_nb to peer [[48379,0],0]:10
[sunpc1:07690] [[48379,0],1] tcp:send_nb to peer [[48379,0],0]
[sunpc1:07690] [[48379,0],1]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:508] post send to [[48379,0],0]
[sunpc1:07690] [[48379,0],1]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:442] processing send to peer
[[48379,0],0]:10
[sunpc1:07690] [[48379,0],1]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:476] queue pending to [[48379,0],0]
[sunpc1:07690] [[48379,0],1] tcp:send_nb: initiating connection to [[48379,0],0]
[sunpc1:07690] [[48379,0],1]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:490] connect to [[48379,0],0]
[sunpc1:07690] [[48379,0],1] orte_tcp_peer_try_connect: attempting to connect to proc [[48379,0],0] via interface nge0
[sunpc1:07690] [[48379,0],1] oob:tcp:peer creating socket to [[48379,0],0]
[sunpc1:07690] [[48379,0],1] orte_tcp_peer_try_connect: attempting to connect to proc [[48379,0],0] via interface nge0 on socket 10
[sunpc1:07690] [[48379,0],1] orte_tcp_peer_try_connect: attempting to connect to proc [[48379,0],0] on 193.174.24.39:55567 - 0 retries
[tyr.informatik.hs-fulda.de:06877] [[48379,0],0] mca_oob_tcp_listen_thread: new connection: (15, 0) 193.174.26.210:39617
[tyr.informatik.hs-fulda.de:06877] [[48379,0],0] connection_handler: working connection (15, 11) 193.174.26.210:39617
[tyr.informatik.hs-fulda.de:06877] [[48379,0],0] CONNECTION REQUEST ON UNKNOWN INTERFACE
[sunpc1:07690] [[48379,0],1] waiting for connect completion to [[48379,0],0] - activating send event
[sunpc1:07690] [[48379,0],1] tcp:send_handler called to send to peer [[48379,0],0]
[sunpc1:07690] [[48379,0],1] tcp:send_handler CONNECTING
[sunpc1:07690] [[48379,0],1]:tcp:complete_connect called for peer [[48379,0],0] on socket 10
[sunpc1:07690] [[48379,0],1] tcp_peer_complete_connect: sending ack to [[48379,0],0]
[sunpc1:07690] [[48379,0],1] SEND CONNECT ACK
[sunpc1:07690] [[48379,0],1] send blocking of 48 bytes to socket 10
[sunpc1:07690] [[48379,0],1] connect-ack sent to socket 10
[sunpc1:07690] [[48379,0],1] tcp_peer_complete_connect: setting read event on connection to [[48379,0],0]
[linpc1:21511] mca: base: components_register: registering oob components
[linpc1:21511] mca: base: components_register: found loaded component tcp
[linpc1:21511] mca: base: components_register: component tcp register function successful
[linpc1:21511] mca: base: components_open: opening oob components
[linpc1:21511] mca: base: components_open: found loaded component tcp
[linpc1:21511] mca: base: components_open: component tcp open function successful
[linpc1:21511] mca:oob:select: checking available component tcp
[linpc1:21511] mca:oob:select: Querying component [tcp]
[linpc1:21511] oob:tcp: component_available called

[linpc1:21511] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
[linpc1:21511] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4
[linpc1:21511] [[48379,0],2] oob:tcp:init creating module for V4 address on interface eth0
[linpc1:21511] [[48379,0],2] creating OOB-TCP module for interface eth0
[linpc1:21511] [[48379,0],2] oob:tcp:init adding 193.174.26.208 to our list of V4 connections
[linpc1:21511] [[48379,0],2] TCP STARTUP
[linpc1:21511] [[48379,0],2] attempting to bind to IPv4 port 0
[linpc1:21511] [[48379,0],2] assigned IPv4 port 39724
[linpc1:21511] mca:oob:select: Adding component to end
[linpc1:21511] mca:oob:select: Found 1 active transports
[linpc1:21511] [[48379,0],2]: set_addr to uri 3170566144.0;tcp://193.174.24.39:55567
[linpc1:21511] [[48379,0],2]:set_addr checking if peer [[48379,0],0] is reachable via component tcp
[linpc1:21511] [[48379,0],2] oob:tcp: working peer [[48379,0],0] address tcp://193.174.24.39:55567
[linpc1:21511] [[48379,0],2] UNFOUND KERNEL INDEX -13 FOR ADDRESS 193.174.24.39
[linpc1:21511] [[48379,0],2] PEER [[48379,0],0] MAY BE REACHABLE BY ROUTING - ASSIGNING MODULE AT KINDEX 2 INTERFACE eth0
[linpc1:21511] [[48379,0],2] PASSING ADDR 193.174.24.39 TO INTERFACE eth0 AT KERNEL INDEX 2
[linpc1:21511] [[48379,0],2]:tcp set addr for peer [[48379,0],0]
[linpc1:21511] [[48379,0],2]: peer [[48379,0],0] is reachable via component tcp
[linpc1:21511] [[48379,0],2] OOB_SEND: ../../../../../openmpi-1.8.2a1r31742/orte/mca/rml/oob/rml_oob_send.c:199
[linpc1:21511] [[48379,0],2]:tcp:processing set_peer cmd for interface eth0
[linpc1:21511] [[48379,0],2] oob:base:send to target [[48379,0],0]
[linpc1:21511] [[48379,0],2] oob:tcp:send_nb to peer [[48379,0],0]:10
[linpc1:21511] [[48379,0],2] tcp:send_nb to peer [[48379,0],0]
[linpc1:21511] [[48379,0],2]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:508] post send to [[48379,0],0]
[linpc1:21511] [[48379,0],2]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:442] processing send to peer
[[48379,0],0]:10
[linpc1:21511] [[48379,0],2]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:476] queue pending to [[48379,0],0]
[linpc1:21511] [[48379,0],2] tcp:send_nb: initiating connection to [[48379,0],0]
[linpc1:21511] [[48379,0],2]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:490] connect to [[48379,0],0]
[linpc1:21511] [[48379,0],2] orte_tcp_peer_try_connect: attempting to connect to proc [[48379,0],0] via interface eth0
[linpc1:21511] [[48379,0],2] oob:tcp:peer creating socket to [[48379,0],0]
[linpc1:21511] [[48379,0],2] orte_tcp_peer_try_connect: attempting to connect to proc [[48379,0],0] via interface eth0 on socket 9
[linpc1:21511] [[48379,0],2] orte_tcp_peer_try_connect: attempting to connect to proc [[48379,0],0] on 193.174.24.39:55567 - 0 retries
[linpc1:21511] [[48379,0],2] waiting for connect completion to [[48379,0],0] - activating send event
[linpc1:21511] [[48379,0],2] tcp:send_handler called to send to peer [[48379,0],0]
[linpc1:21511] [[48379,0],2] tcp:send_handler CONNECTING
[linpc1:21511] [[48379,0],2]:tcp:complete_connect called for peer [[48379,0],0] on socket 9
[linpc1:21511] [[48379,0],2] tcp_peer_complete_connect: sending ack to [[48379,0],0]
[linpc1:21511] [[48379,0],2] SEND CONNECT ACK
[linpc1:21511] [[48379,0],2] send blocking of 48 bytes to socket 9
[linpc1:21511] [[48379,0],2] connect-ack sent to socket 9
[linpc1:21511] [[48379,0],2] tcp_peer_complete_connect: setting read event on connection to [[48379,0],0]
[tyr.informatik.hs-fulda.de:06877] [[48379,0],0] mca_oob_tcp_listen_thread: new connection: (16, 11) 193.174.26.208:53741
[tyr.informatik.hs-fulda.de:06877] [[48379,0],0] connection_handler: working connection (16, 11) 193.174.26.208:53741
[tyr.informatik.hs-fulda.de:06877] [[48379,0],0] CONNECTION REQUEST ON UNKNOWN INTERFACE
^CKilled by signal 2.
[tyr.informatik.hs-fulda.de:06877] [[48379,0],0] OOB_SEND: ../../../../../openmpi-1.8.2a1r31742/orte/mca/rml/oob/rml_oob_send.c:199
[tyr.informatik.hs-fulda.de:06877] [[48379,0],0] OOB_SEND: ../../../../../openmpi-1.8.2a1r31742/orte/mca/rml/oob/rml_oob_send.c:199
[tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:base:send to target [[48379,0],1]
[tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:base:send unknown peer [[48379,0],1]
[tyr.informatik.hs-fulda.de:06877] [[48379,0],0] is NOT reachable by TCP
[tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:base:send to target [[48379,0],2]
[tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:base:send unknown peer [[48379,0],2]
[tyr.informatik.hs-fulda.de:06877] [[48379,0],0] is NOT reachable by TCP
Killed by signal 2.
[tyr.informatik.hs-fulda.de:06877] [[48379,0],0] TCP SHUTDOWN
[tyr.informatik.hs-fulda.de:06877] mca: base: close: component tcp closed
[tyr.informatik.hs-fulda.de:06877] mca: base: close: unloading component tcp
tyr fd1026 106

Thank you very much for your help in advance. Do you need anything else?

Kind regards

Siegmar

> On May 14, 2014, at 9:06 AM, Siegmar Gross <Siegmar.Gross_at_[hidden]> wrote:
>
> > Hi Ralph,
> >
> >> What are the interfaces on these machines?
> >
> > tyr fd1026 111 ifconfig -a
> > lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
> > inet 127.0.0.1 netmask ff000000
> > bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
> > inet 193.174.24.39 netmask ffffffe0 broadcast 193.174.24.63
> > tyr fd1026 112
> >
> >
> > tyr fd1026 112 ssh sunpc1 ifconfig -a
> > lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
> > inet 127.0.0.1 netmask ff000000
> > nge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
> > inet 193.174.26.210 netmask ffffffc0 broadcast 193.174.26.255
> > tyr fd1026 113
> >
> >
> > tyr fd1026 113 ssh linpc1 /sbin/ifconfig -a
> > eth0 Link encap:Ethernet HWaddr 00:14:4F:23:FD:A8
> > inet addr:193.174.26.208 Bcast:193.174.26.255 Mask:255.255.255.192
> > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> > RX packets:18052524 errors:127 dropped:0 overruns:0 frame:127
> > TX packets:15917888 errors:0 dropped:0 overruns:0 carrier:0
> > collisions:0 txqueuelen:1000
> > RX bytes:4158294157 (3965.6 Mb) TX bytes:12060556809 (11501.8 Mb)
> > Interrupt:23 Base address:0x4000
> >
> > eth1 Link encap:Ethernet HWaddr 00:14:4F:23:FD:A9
> > BROADCAST MULTICAST MTU:1500 Metric:1
> > RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> > collisions:0 txqueuelen:1000
> > RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
> > Interrupt:45 Base address:0xa000
> >
> > lo Link encap:Local Loopback
> > inet addr:127.0.0.1 Mask:255.0.0.0
> > UP LOOPBACK RUNNING MTU:16436 Metric:1
> > RX packets:1083 errors:0 dropped:0 overruns:0 frame:0
> > TX packets:1083 errors:0 dropped:0 overruns:0 carrier:0
> > collisions:0 txqueuelen:0
> > RX bytes:329323 (321.6 Kb) TX bytes:329323 (321.6 Kb)
> >
> > tyr fd1026 114
> >
> >
> > Do you need something else?
> >
> >
> > Kind regards
> >
> > Siegmar
> >
> >
> >
> >
> >> On May 14, 2014, at 7:45 AM, Siegmar Gross <Siegmar.Gross_at_[hidden]> wrote:
> >>
> >>> Hi,
> >>>
> >>> I just installed openmpi-1.8.2a1r31742 on my machines (Solaris 10
> >>> Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with
> >>> Sun C5.12 and still have the following problem.
> >>>
> >>> tyr fd1026 102 which mpiexec
> >>> /usr/local/openmpi-1.8.2_64_cc/bin/mpiexec
> >>> tyr fd1026 103 mpiexec -np 3 --host tyr,sunpc1,linpc1 hostname
> >>> [tyr.informatik.hs-fulda.de:12827] [[37949,0],0] CONNECTION
> >>> REQUEST ON UNKNOWN INTERFACE
> >>> [tyr.informatik.hs-fulda.de:12827] [[37949,0],0] CONNECTION
> >>> REQUEST ON UNKNOWN INTERFACE
> >>> ^CKilled by signal 2.
> >>> Killed by signal 2.
> >>> tyr fd1026 104
> >>>
> >>>
> >>> The command works fine with openmpi-1.6.6rc1.
> >>>
> >>> tyr fd1026 102 which mpiexec
> >>> /usr/local/openmpi-1.6.6_64_cc/bin/mpiexec
> >>> tyr fd1026 103 mpiexec -np 3 --host tyr,sunpc1,linpc1 hostname
> >>> tyr.informatik.hs-fulda.de
> >>> linpc1
> >>> sunpc1
> >>> tyr fd1026 104
> >>>
> >>>
> >>> I have reported the problem before and I would be grateful, if
> >>> somebody could solve it. Please let me know if I can provide any
> >>> other information.
> >>>
> >>>
> >>> Kind regards
> >>>
> >>> Siegmar
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> users_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >>
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>