Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] unknown interface on openmpi-1.8.2a1r31742
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-05-15 15:04:07


This bug should be fixed in tonight's tarball, BTW.

On May 15, 2014, at 9:19 AM, Ralph Castain <rhc_at_[hidden]> wrote:

> It is an unrelated bug introduced by a different commit - causing mpirun to segfault upon termination. The fact that you got the hostname to run indicates that this original fix works, so at least we know the connection logic is now okay.
>
> Thanks
> Ralph
>
>
> On May 15, 2014, at 3:40 AM, Siegmar Gross <Siegmar.Gross_at_[hidden]> wrote:
>
>> Hi Ralph,
>>
>>> Just committed a potential fix to the trunk - please let me know
>>> if it worked for you
>>
>> Now I get the hostnames but also a segmentation fault.
>>
>> tyr fd1026 101 which mpiexec
>> /usr/local/openmpi-1.9_64_cc/bin/mpiexec
>> tyr fd1026 102 mpiexec -np 3 --host tyr,sunpc1,linpc1 hostname
>> tyr.informatik.hs-fulda.de
>> linpc1
>> sunpc1
>> [tyr:22835] *** Process received signal ***
>> [tyr:22835] Signal: Segmentation Fault (11)
>> [tyr:22835] Signal code: Address not mapped (1)
>> [tyr:22835] Failing at address: ffffffff7bf16de0
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:opal_backtrace_print+0x1c
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:0x183960
>> /lib/sparcv9/libc.so.1:0xd8b98
>> /lib/sparcv9/libc.so.1:0xcc70c
>> /lib/sparcv9/libc.so.1:0xcc918
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:0x1ce0e8 [ Signal 2125151224 (?)]
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:0x1ccde4
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:opal_libevent2021_event_del+0x88
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:opal_libevent2021_event_base_free+0x154
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:0x1bb9e8
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:mca_base_framework_close+0x1a0
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:opal_finalize+0xcc
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:orte_finalize+0x168
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/bin/orterun:orterun+0x23e0
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/bin/orterun:main+0x24
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/bin/orterun:_start+0x12c
>> [tyr:22835] *** End of error message ***
>> Segmentation fault
>> tyr fd1026 103 ompi_info | grep "revision:"
>> Open MPI repo revision: r31769
>> Open RTE repo revision: r31769
>> OPAL repo revision: r31769
>> tyr fd1026 104
>>
>>
>>
>> I get the following output in "dbx".
>>
>> tyr fd1026 104 /opt/solstudio12.3/bin/sparcv9/dbx /usr/local/openmpi-1.9_64_cc/bin/mpiexec
>> For information about new features see `help changes'
>> To remove this message, put `dbxenv suppress_startup_message 7.9' in your .dbxrc
>> Reading mpiexec
>> Reading ld.so.1
>> Reading libopen-rte.so.0.0.0
>> Reading libopen-pal.so.0.0.0
>> Reading libsendfile.so.1
>> Reading libpicl.so.1
>> Reading libkstat.so.1
>> Reading liblgrp.so.1
>> Reading libsocket.so.1
>> Reading libnsl.so.1
>> Reading librt.so.1
>> Reading libm.so.2
>> Reading libthread.so.1
>> Reading libc.so.1
>> Reading libdoor.so.1
>> Reading libaio.so.1
>> Reading libmd.so.1
>> (dbx) run -np 3 --host tyr,sunpc1,linpc1 hostname
>> Running: mpiexec -np 3 --host tyr,sunpc1,linpc1 hostname
>> (process id 23328)
>> Reading libc_psr.so.1
>> Reading mca_shmem_mmap.so
>> Reading libmp.so.2
>> Reading libscf.so.1
>> Reading libuutil.so.1
>> Reading libgen.so.1
>> Reading mca_shmem_posix.so
>> Reading mca_shmem_sysv.so
>> Reading mca_sec_basic.so
>> Reading mca_ess_env.so
>> Reading mca_ess_hnp.so
>> Reading mca_ess_singleton.so
>> Reading mca_ess_tool.so
>> Reading mca_pstat_test.so
>> Reading mca_state_app.so
>> Reading mca_state_hnp.so
>> Reading mca_state_novm.so
>> Reading mca_state_orted.so
>> Reading mca_state_staged_hnp.so
>> Reading mca_state_staged_orted.so
>> Reading mca_state_tool.so
>> Reading mca_errmgr_default_app.so
>> Reading mca_errmgr_default_hnp.so
>> Reading mca_errmgr_default_orted.so
>> Reading mca_errmgr_default_tool.so
>> Reading mca_plm_isolated.so
>> Reading mca_plm_rsh.so
>> Reading mca_oob_tcp.so
>> Reading mca_rml_oob.so
>> Reading mca_routed_binomial.so
>> Reading mca_routed_debruijn.so
>> Reading mca_routed_direct.so
>> Reading mca_routed_radix.so
>> Reading mca_dstore_hash.so
>> Reading mca_grpcomm_bad.so
>> Reading mca_ras_simulator.so
>> Reading mca_rmaps_lama.so
>> Reading mca_rmaps_mindist.so
>> Reading mca_rmaps_ppr.so
>> Reading mca_rmaps_rank_file.so
>> Reading mca_rmaps_resilient.so
>> Reading mca_rmaps_round_robin.so
>> Reading mca_rmaps_seq.so
>> Reading mca_rmaps_staged.so
>> Reading mca_odls_default.so
>> Reading mca_rtc_hwloc.so
>> Reading mca_iof_hnp.so
>> Reading mca_iof_mr_hnp.so
>> Reading mca_iof_mr_orted.so
>> Reading mca_iof_orted.so
>> Reading mca_iof_tool.so
>> Reading mca_filem_raw.so
>> Reading mca_dfs_app.so
>> Reading mca_dfs_orted.so
>> Reading mca_dfs_test.so
>> tyr.informatik.hs-fulda.de
>> linpc1
>> sunpc1
>> t_at_1 (l_at_1) signal SEGV (no mapping at the fault address) in event_queue_remove at 0xffffffff7e9ce0e8
>> 0xffffffff7e9ce0e8: event_queue_remove+0x01a8: stx %l0, [%l3 + 24]
>> Current function is opal_event_base_close
>> 62 opal_event_base_free (opal_event_base);
>>
>> (dbx) check -all
>> dbx: warning: check -all will be turned on in the next run of the process
>> access checking - OFF
>> memuse checking - OFF
>>
>> (dbx) run -np 3 --host tyr,sunpc1,linpc1 hostname
>> Running: mpiexec -np 3 --host tyr,sunpc1,linpc1 hostname
>> (process id 23337)
>> Reading rtcapihook.so
>> Reading libdl.so.1
>> Reading rtcaudit.so
>> Reading libmapmalloc.so.1
>> Reading rtcboot.so
>> Reading librtc.so
>> Reading libmd_psr.so.1
>> RTC: Enabling Error Checking...
>> RTC: Using UltraSparc trap mechanism
>> RTC: See `help rtc showmap' and `help rtc limitations' for details.
>> RTC: Running program...
>> Write to unallocated (wua) on thread 1:
>> Attempting to write 1 byte at address 0xffffffff79f04000
>> t_at_1 (l_at_1) stopped in _readdir at 0xffffffff56574da0
>> 0xffffffff56574da0: _readdir+0x0064: call _PROCEDURE_LINKAGE_TABLE_+0x2380 [PLT] ! 0xffffffff56742a80
>> Current function is find_dyn_components
>> 393 if (0 != lt_dlforeachfile(dir, save_filename, NULL)) {
>> (dbx)
>>
>>
>>
>> Do you need anything else?
>>
>>
>> KInd regards
>>
>> Siegmar
>>
>>
>>
>>
>> On May 14, 2014, at 11:44 AM, Siegmar Gross <Siegmar.Gross_at_[hidden]> wrote:
>>>
>>>> Hi Ralph,
>>>>
>>>>> Hmmm...well, that's an interesting naming scheme :-)
>>>>>
>>>>> Try adding "-mca oob_base_verbose 10 --report-uri -" on your cmd line
>>>>> and let's see what it thinks is happening
>>>>
>>>>
>>>> tyr fd1026 105 mpiexec -np 3 --host tyr,sunpc1,linpc1 --mca oob_base_verbose 10 --report-uri - hostname
>>>> [tyr.informatik.hs-fulda.de:06877] mca: base: components_register: registering oob components
>>>> [tyr.informatik.hs-fulda.de:06877] mca: base: components_register: found loaded component tcp
>>>> [tyr.informatik.hs-fulda.de:06877] mca: base: components_register: component tcp register function successful
>>>> [tyr.informatik.hs-fulda.de:06877] mca: base: components_open: opening oob components
>>>> [tyr.informatik.hs-fulda.de:06877] mca: base: components_open: found loaded component tcp
>>>> [tyr.informatik.hs-fulda.de:06877] mca: base: components_open: component tcp open function successful
>>>> [tyr.informatik.hs-fulda.de:06877] mca:oob:select: checking available component tcp
>>>> [tyr.informatik.hs-fulda.de:06877] mca:oob:select: Querying component [tcp]
>>>> [tyr.informatik.hs-fulda.de:06877] oob:tcp: component_available called
>>>> [tyr.informatik.hs-fulda.de:06877] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
>>>> [tyr.informatik.hs-fulda.de:06877] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:tcp:init creating module for V4 address on interface bge0
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] creating OOB-TCP module for interface bge0
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:tcp:init adding 193.174.24.39 to our list of V4 connections
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] TCP STARTUP
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] attempting to bind to IPv4 port 0
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] assigned IPv4 port 55567
>>>> [tyr.informatik.hs-fulda.de:06877] mca:oob:select: Adding component to end
>>>> [tyr.informatik.hs-fulda.de:06877] mca:oob:select: Found 1 active transports
>>>> 3170566144.0;tcp://193.174.24.39:55567
>>>> [sunpc1:07690] mca: base: components_register: registering oob components
>>>> [sunpc1:07690] mca: base: components_register: found loaded component tcp
>>>> [sunpc1:07690] mca: base: components_register: component tcp register function successful
>>>> [sunpc1:07690] mca: base: components_open: opening oob components
>>>> [sunpc1:07690] mca: base: components_open: found loaded component tcp
>>>> [sunpc1:07690] mca: base: components_open: component tcp open function successful
>>>> [sunpc1:07690] mca:oob:select: checking available component tcp
>>>> [sunpc1:07690] mca:oob:select: Querying component [tcp]
>>>> [sunpc1:07690] oob:tcp: component_available called
>>>> [sunpc1:07690] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
>>>> [sunpc1:07690] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4
>>>> [sunpc1:07690] [[48379,0],1] oob:tcp:init creating module for V4 address on interface nge0
>>>> [sunpc1:07690] [[48379,0],1] creating OOB-TCP module for interface nge0
>>>> [sunpc1:07690] [[48379,0],1] oob:tcp:init adding 193.174.26.210 to our list of V4 connections
>>>> [sunpc1:07690] [[48379,0],1] TCP STARTUP
>>>> [sunpc1:07690] [[48379,0],1] attempting to bind to IPv4 port 0
>>>> [sunpc1:07690] [[48379,0],1] assigned IPv4 port 39616
>>>> [sunpc1:07690] mca:oob:select: Adding component to end
>>>> [sunpc1:07690] mca:oob:select: Found 1 active transports
>>>> [sunpc1:07690] [[48379,0],1]: set_addr to uri 3170566144.0;tcp://193.174.24.39:55567
>>>> [sunpc1:07690] [[48379,0],1]:set_addr checking if peer [[48379,0],0] is reachable via component tcp
>>>> [sunpc1:07690] [[48379,0],1] oob:tcp: working peer [[48379,0],0] address tcp://193.174.24.39:55567
>>>> [sunpc1:07690] [[48379,0],1] UNFOUND KERNEL INDEX -13 FOR ADDRESS 193.174.24.39
>>>> [sunpc1:07690] [[48379,0],1] PEER [[48379,0],0] MAY BE REACHABLE BY ROUTING - ASSIGNING MODULE AT KINDEX 2
>> INTERFACE nge0
>>>> [sunpc1:07690] [[48379,0],1] PASSING ADDR 193.174.24.39 TO INTERFACE nge0 AT KERNEL INDEX 2
>>>> [sunpc1:07690] [[48379,0],1]:tcp set addr for peer [[48379,0],0]
>>>> [sunpc1:07690] [[48379,0],1]: peer [[48379,0],0] is reachable via component tcp
>>>> [sunpc1:07690] [[48379,0],1] OOB_SEND: ../../../../../openmpi-1.8.2a1r31742/orte/mca/rml/oob/rml_oob_send.c:199
>>>> [sunpc1:07690] [[48379,0],1]:tcp:processing set_peer cmd for interface nge0
>>>> [sunpc1:07690] [[48379,0],1] oob:base:send to target [[48379,0],0]
>>>> [sunpc1:07690] [[48379,0],1] oob:tcp:send_nb to peer [[48379,0],0]:10
>>>> [sunpc1:07690] [[48379,0],1] tcp:send_nb to peer [[48379,0],0]
>>>> [sunpc1:07690] [[48379,0],1]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:508] post send to
>> [[48379,0],0]
>>>> [sunpc1:07690] [[48379,0],1]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:442] processing
>> send to peer
>>>> [[48379,0],0]:10
>>>> [sunpc1:07690] [[48379,0],1]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:476] queue pending
>> to [[48379,0],0]
>>>> [sunpc1:07690] [[48379,0],1] tcp:send_nb: initiating connection to [[48379,0],0]
>>>> [sunpc1:07690] [[48379,0],1]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:490] connect to
>> [[48379,0],0]
>>>> [sunpc1:07690] [[48379,0],1] orte_tcp_peer_try_connect: attempting to connect to proc [[48379,0],0] via interface
>> nge0
>>>> [sunpc1:07690] [[48379,0],1] oob:tcp:peer creating socket to [[48379,0],0]
>>>> [sunpc1:07690] [[48379,0],1] orte_tcp_peer_try_connect: attempting to connect to proc [[48379,0],0] via interface
>> nge0 on socket 10
>>>> [sunpc1:07690] [[48379,0],1] orte_tcp_peer_try_connect: attempting to connect to proc [[48379,0],0] on
>> 193.174.24.39:55567 - 0 retries
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] mca_oob_tcp_listen_thread: new connection: (15, 0)
>> 193.174.26.210:39617
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] connection_handler: working connection (15, 11)
>> 193.174.26.210:39617
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] CONNECTION REQUEST ON UNKNOWN INTERFACE
>>>> [sunpc1:07690] [[48379,0],1] waiting for connect completion to [[48379,0],0] - activating send event
>>>> [sunpc1:07690] [[48379,0],1] tcp:send_handler called to send to peer [[48379,0],0]
>>>> [sunpc1:07690] [[48379,0],1] tcp:send_handler CONNECTING
>>>> [sunpc1:07690] [[48379,0],1]:tcp:complete_connect called for peer [[48379,0],0] on socket 10
>>>> [sunpc1:07690] [[48379,0],1] tcp_peer_complete_connect: sending ack to [[48379,0],0]
>>>> [sunpc1:07690] [[48379,0],1] SEND CONNECT ACK
>>>> [sunpc1:07690] [[48379,0],1] send blocking of 48 bytes to socket 10
>>>> [sunpc1:07690] [[48379,0],1] connect-ack sent to socket 10
>>>> [sunpc1:07690] [[48379,0],1] tcp_peer_complete_connect: setting read event on connection to [[48379,0],0]
>>>> [linpc1:21511] mca: base: components_register: registering oob components
>>>> [linpc1:21511] mca: base: components_register: found loaded component tcp
>>>> [linpc1:21511] mca: base: components_register: component tcp register function successful
>>>> [linpc1:21511] mca: base: components_open: opening oob components
>>>> [linpc1:21511] mca: base: components_open: found loaded component tcp
>>>> [linpc1:21511] mca: base: components_open: component tcp open function successful
>>>> [linpc1:21511] mca:oob:select: checking available component tcp
>>>> [linpc1:21511] mca:oob:select: Querying component [tcp]
>>>> [linpc1:21511] oob:tcp: component_available called
>>>>
>>>> [linpc1:21511] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
>>>> [linpc1:21511] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4
>>>> [linpc1:21511] [[48379,0],2] oob:tcp:init creating module for V4 address on interface eth0
>>>> [linpc1:21511] [[48379,0],2] creating OOB-TCP module for interface eth0
>>>> [linpc1:21511] [[48379,0],2] oob:tcp:init adding 193.174.26.208 to our list of V4 connections
>>>> [linpc1:21511] [[48379,0],2] TCP STARTUP
>>>> [linpc1:21511] [[48379,0],2] attempting to bind to IPv4 port 0
>>>> [linpc1:21511] [[48379,0],2] assigned IPv4 port 39724
>>>> [linpc1:21511] mca:oob:select: Adding component to end
>>>> [linpc1:21511] mca:oob:select: Found 1 active transports
>>>> [linpc1:21511] [[48379,0],2]: set_addr to uri 3170566144.0;tcp://193.174.24.39:55567
>>>> [linpc1:21511] [[48379,0],2]:set_addr checking if peer [[48379,0],0] is reachable via component tcp
>>>> [linpc1:21511] [[48379,0],2] oob:tcp: working peer [[48379,0],0] address tcp://193.174.24.39:55567
>>>> [linpc1:21511] [[48379,0],2] UNFOUND KERNEL INDEX -13 FOR ADDRESS 193.174.24.39
>>>> [linpc1:21511] [[48379,0],2] PEER [[48379,0],0] MAY BE REACHABLE BY ROUTING - ASSIGNING MODULE AT KINDEX 2
>> INTERFACE eth0
>>>> [linpc1:21511] [[48379,0],2] PASSING ADDR 193.174.24.39 TO INTERFACE eth0 AT KERNEL INDEX 2
>>>> [linpc1:21511] [[48379,0],2]:tcp set addr for peer [[48379,0],0]
>>>> [linpc1:21511] [[48379,0],2]: peer [[48379,0],0] is reachable via component tcp
>>>> [linpc1:21511] [[48379,0],2] OOB_SEND: ../../../../../openmpi-1.8.2a1r31742/orte/mca/rml/oob/rml_oob_send.c:199
>>>> [linpc1:21511] [[48379,0],2]:tcp:processing set_peer cmd for interface eth0
>>>> [linpc1:21511] [[48379,0],2] oob:base:send to target [[48379,0],0]
>>>> [linpc1:21511] [[48379,0],2] oob:tcp:send_nb to peer [[48379,0],0]:10
>>>> [linpc1:21511] [[48379,0],2] tcp:send_nb to peer [[48379,0],0]
>>>> [linpc1:21511] [[48379,0],2]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:508] post send to
>> [[48379,0],0]
>>>> [linpc1:21511] [[48379,0],2]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:442] processing
>> send to peer
>>>> [[48379,0],0]:10
>>>> [linpc1:21511] [[48379,0],2]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:476] queue pending
>> to [[48379,0],0]
>>>> [linpc1:21511] [[48379,0],2] tcp:send_nb: initiating connection to [[48379,0],0]
>>>> [linpc1:21511] [[48379,0],2]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:490] connect to
>> [[48379,0],0]
>>>> [linpc1:21511] [[48379,0],2] orte_tcp_peer_try_connect: attempting to connect to proc [[48379,0],0] via interface
>> eth0
>>>> [linpc1:21511] [[48379,0],2] oob:tcp:peer creating socket to [[48379,0],0]
>>>> [linpc1:21511] [[48379,0],2] orte_tcp_peer_try_connect: attempting to connect to proc [[48379,0],0] via interface
>> eth0 on socket 9
>>>> [linpc1:21511] [[48379,0],2] orte_tcp_peer_try_connect: attempting to connect to proc [[48379,0],0] on
>> 193.174.24.39:55567 - 0 retries
>>>> [linpc1:21511] [[48379,0],2] waiting for connect completion to [[48379,0],0] - activating send event
>>>> [linpc1:21511] [[48379,0],2] tcp:send_handler called to send to peer [[48379,0],0]
>>>> [linpc1:21511] [[48379,0],2] tcp:send_handler CONNECTING
>>>> [linpc1:21511] [[48379,0],2]:tcp:complete_connect called for peer [[48379,0],0] on socket 9
>>>> [linpc1:21511] [[48379,0],2] tcp_peer_complete_connect: sending ack to [[48379,0],0]
>>>> [linpc1:21511] [[48379,0],2] SEND CONNECT ACK
>>>> [linpc1:21511] [[48379,0],2] send blocking of 48 bytes to socket 9
>>>> [linpc1:21511] [[48379,0],2] connect-ack sent to socket 9
>>>> [linpc1:21511] [[48379,0],2] tcp_peer_complete_connect: setting read event on connection to [[48379,0],0]
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] mca_oob_tcp_listen_thread: new connection: (16, 11)
>> 193.174.26.208:53741
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] connection_handler: working connection (16, 11)
>> 193.174.26.208:53741
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] CONNECTION REQUEST ON UNKNOWN INTERFACE
>>>> ^CKilled by signal 2.
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] OOB_SEND:
>> ../../../../../openmpi-1.8.2a1r31742/orte/mca/rml/oob/rml_oob_send.c:199
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] OOB_SEND:
>> ../../../../../openmpi-1.8.2a1r31742/orte/mca/rml/oob/rml_oob_send.c:199
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:base:send to target [[48379,0],1]
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:base:send unknown peer [[48379,0],1]
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] is NOT reachable by TCP
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:base:send to target [[48379,0],2]
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:base:send unknown peer [[48379,0],2]
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] is NOT reachable by TCP
>>>> Killed by signal 2.
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] TCP SHUTDOWN
>>>> [tyr.informatik.hs-fulda.de:06877] mca: base: close: component tcp closed
>>>> [tyr.informatik.hs-fulda.de:06877] mca: base: close: unloading component tcp
>>>> tyr fd1026 106
>>>>
>>>>
>>>> Thank you very much for your help in advance. Do you need anything else?
>>>>
>>>>
>>>> Kind regards
>>>>
>>>> Siegmar
>>>>
>>>>
>>>>
>>>>> On May 14, 2014, at 9:06 AM, Siegmar Gross <Siegmar.Gross_at_[hidden]> wrote:
>>>>>
>>>>>> Hi Ralph,
>>>>>>
>>>>>>> What are the interfaces on these machines?
>>>>>>
>>>>>> tyr fd1026 111 ifconfig -a
>>>>>> lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
>>>>>> inet 127.0.0.1 netmask ff000000
>>>>>> bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
>>>>>> inet 193.174.24.39 netmask ffffffe0 broadcast 193.174.24.63
>>>>>> tyr fd1026 112
>>>>>>
>>>>>>
>>>>>> tyr fd1026 112 ssh sunpc1 ifconfig -a
>>>>>> lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
>>>>>> inet 127.0.0.1 netmask ff000000
>>>>>> nge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
>>>>>> inet 193.174.26.210 netmask ffffffc0 broadcast 193.174.26.255
>>>>>> tyr fd1026 113
>>>>>>
>>>>>>
>>>>>> tyr fd1026 113 ssh linpc1 /sbin/ifconfig -a
>>>>>> eth0 Link encap:Ethernet HWaddr 00:14:4F:23:FD:A8
>>>>>> inet addr:193.174.26.208 Bcast:193.174.26.255 Mask:255.255.255.192
>>>>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>>>>>> RX packets:18052524 errors:127 dropped:0 overruns:0 frame:127
>>>>>> TX packets:15917888 errors:0 dropped:0 overruns:0 carrier:0
>>>>>> collisions:0 txqueuelen:1000
>>>>>> RX bytes:4158294157 (3965.6 Mb) TX bytes:12060556809 (11501.8 Mb)
>>>>>> Interrupt:23 Base address:0x4000
>>>>>>
>>>>>> eth1 Link encap:Ethernet HWaddr 00:14:4F:23:FD:A9
>>>>>> BROADCAST MULTICAST MTU:1500 Metric:1
>>>>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>> collisions:0 txqueuelen:1000
>>>>>> RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
>>>>>> Interrupt:45 Base address:0xa000
>>>>>>
>>>>>> lo Link encap:Local Loopback
>>>>>> inet addr:127.0.0.1 Mask:255.0.0.0
>>>>>> UP LOOPBACK RUNNING MTU:16436 Metric:1
>>>>>> RX packets:1083 errors:0 dropped:0 overruns:0 frame:0
>>>>>> TX packets:1083 errors:0 dropped:0 overruns:0 carrier:0
>>>>>> collisions:0 txqueuelen:0
>>>>>> RX bytes:329323 (321.6 Kb) TX bytes:329323 (321.6 Kb)
>>>>>>
>>>>>> tyr fd1026 114
>>>>>>
>>>>>>
>>>>>> Do you need something else?
>>>>>>
>>>>>>
>>>>>> Kind regards
>>>>>>
>>>>>> Siegmar
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On May 14, 2014, at 7:45 AM, Siegmar Gross <Siegmar.Gross_at_[hidden]> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I just installed openmpi-1.8.2a1r31742 on my machines (Solaris 10
>>>>>>>> Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with
>>>>>>>> Sun C5.12 and still have the following problem.
>>>>>>>>
>>>>>>>> tyr fd1026 102 which mpiexec
>>>>>>>> /usr/local/openmpi-1.8.2_64_cc/bin/mpiexec
>>>>>>>> tyr fd1026 103 mpiexec -np 3 --host tyr,sunpc1,linpc1 hostname
>>>>>>>> [tyr.informatik.hs-fulda.de:12827] [[37949,0],0] CONNECTION
>>>>>>>> REQUEST ON UNKNOWN INTERFACE
>>>>>>>> [tyr.informatik.hs-fulda.de:12827] [[37949,0],0] CONNECTION
>>>>>>>> REQUEST ON UNKNOWN INTERFACE
>>>>>>>> ^CKilled by signal 2.
>>>>>>>> Killed by signal 2.
>>>>>>>> tyr fd1026 104
>>>>>>>>
>>>>>>>>
>>>>>>>> The command works fine with openmpi-1.6.6rc1.
>>>>>>>>
>>>>>>>> tyr fd1026 102 which mpiexec
>>>>>>>> /usr/local/openmpi-1.6.6_64_cc/bin/mpiexec
>>>>>>>> tyr fd1026 103 mpiexec -np 3 --host tyr,sunpc1,linpc1 hostname
>>>>>>>> tyr.informatik.hs-fulda.de
>>>>>>>> linpc1
>>>>>>>> sunpc1
>>>>>>>> tyr fd1026 104
>>>>>>>>
>>>>>>>>
>>>>>>>> I have reported the problem before and I would be grateful, if
>>>>>>>> somebody could solve it. Please let me know if I can provide any
>>>>>>>> other information.
>>>>>>>>
>>>>>>>>
>>>>>>>> Kind regards
>>>>>>>>
>>>>>>>> Siegmar
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/