Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] open-mpi on Mac OS 10.9 (Mavericks)
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-12-03 08:16:59


Best guess I can offer is that they are blocking loopback on those networks
- i.e., they are configured such that you can use them to connect to a
remote machine, but not to a process on your local machine. I'll take a
look at the connection logic and see if I can get it to failover to the
loopback device in that case. I believe we disable use of the loopback if
an active TCP network is available as we expect it to include loopback
capability.

Meantime, you might want to talk to your IT folks and see if that is
correct and intentional - and if so, why.

On Tue, Dec 3, 2013 at 5:04 AM, Meredith, Karl
<karl.meredith_at_[hidden]>wrote:

> I disconnected for our corporate network (ethernet connection) and tried
> running again: same result, it stalls.
>
> Then, I also disconnected from our local wifi network and tried running
> again: it worked!
>
> bash-4.2$ mpirun -np 2 --mca btl sm,self hello_c
> Hello, world, I am 0 of 2, (Open MPI v1.7.4a1, package: Open MPI
> meredithk_at_[hidden]<mailto:
> meredithk_at_[hidden]> Distribution, ident:
> 1.7.4a1r29784, repo rev: r29784, Dec 02, 2013 (nightly snapshot tarball),
> 173)
> Hello, world, I am 1 of 2, (Open MPI v1.7.4a1, package: Open MPI
> meredithk_at_[hidden]<mailto:
> meredithk_at_[hidden]> Distribution, ident:
> 1.7.4a1r29784, repo rev: r29784, Dec 02, 2013 (nightly snapshot tarball),
> 173)
> bash-4.2$ mpirun -np 2 hello_c
> Hello, world, I am 0 of 2, (Open MPI v1.7.4a1, package: Open MPI
> meredithk_at_[hidden]<mailto:
> meredithk_at_[hidden]> Distribution, ident:
> 1.7.4a1r29784, repo rev: r29784, Dec 02, 2013 (nightly snapshot tarball),
> 173)
> Hello, world, I am 1 of 2, (Open MPI v1.7.4a1, package: Open MPI
> meredithk_at_[hidden]<mailto:
> meredithk_at_[hidden]> Distribution, ident:
> 1.7.4a1r29784, repo rev: r29784, Dec 02, 2013 (nightly snapshot tarball),
> 173)
>
> Why? What would be causing the network to be interfering with mpirun? Do
> you have any insight?
>
> Karl
>
>
>
> On Dec 3, 2013, at 7:54 AM, Ralph Castain <rhc_at_[hidden]<mailto:
> rhc_at_[hidden]>> wrote:
>
> Hmmm...are you connected to a network, or at least have a network active,
> when you do this? It looks a little like the system is trying to open a
> port between the process and mpirun, but is failing to do so.
>
>
>
> On Tue, Dec 3, 2013 at 4:51 AM, Meredith, Karl <karl.meredith_at_[hidden]
> <mailto:karl.meredith_at_[hidden]>> wrote:
> Using openmpi-1.7.4, no macports, only apple compilers/tools:
>
> mpirun -np 2 --mca btl sm,self hello_c
>
> This hangs, also in MPI_Init().
>
> Here’s the back trace from the debugger:
>
> bash-4.2$ lldb -p 4517
> Attaching to process with:
> process attach -p 4517
> Process 4517 stopped
> Executable module set to
> "/Users/meredithk/tools/openmpi-1.7.4a1r29784/examples/hello_c".
> Architecture set to: x86_64-apple-macosx.
> (lldb) bt
> * thread #1: tid = 0x57efb, 0x00007fff8c991a3a
> libsystem_kernel.dylib`__semwait_signal + 10, queue =
> 'com.apple.main-thread, stop reason = signal SIGSTOP
> frame #0: 0x00007fff8c991a3a libsystem_kernel.dylib`__semwait_signal +
> 10
> frame #1: 0x00007fff8ade4e60 libsystem_c.dylib`nanosleep + 200
> frame #2: 0x0000000108d668e3
> libopen-rte.6.dylib`orte_routed_base_register_sync(setup=true) + 2435 at
> routed_base_fns.c:344
> frame #3: 0x000000010904e3a7
> mca_routed_binomial.so`init_routes(job=1294401537, ndat=0x0000000000000000)
> + 2759 at routed_binomial.c:708
> frame #4: 0x0000000108d1b84d
> libopen-rte.6.dylib`orte_ess_base_app_setup(db_restrict_local=true) + 2109
> at ess_base_std_app.c:233
> frame #5: 0x0000000108fbc442 mca_ess_env.so`rte_init + 418 at
> ess_env_module.c:146
> frame #6: 0x0000000108cd6cfe
> libopen-rte.6.dylib`orte_init(pargc=0x0000000000000000,
> pargv=0x0000000000000000, flags=32) + 718 at orte_init.c:158
> frame #7: 0x0000000108a3b3c8 libmpi.1.dylib`ompi_mpi_init(argc=1,
> argv=0x00007fff57200508, requested=0, provided=0x00007fff57200360) + 616 at
> ompi_mpi_init.c:451
> frame #8: 0x0000000108a895a0
> libmpi.1.dylib`MPI_Init(argc=0x00007fff572004d0, argv=0x00007fff572004c8) +
> 480 at init.c:84
> frame #9: 0x00000001089ffe4a hello_c`main(argc=1,
> argv=0x00007fff57200508) + 58 at hello_c.c:18
> frame #10: 0x00007fff8d5df5fd libdyld.dylib`start + 1
> frame #11: 0x00007fff8d5df5fd libdyld.dylib`start + 1
>
>
> On Dec 2, 2013, at 2:11 PM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]
> <mailto:jsquyres_at_[hidden]>> wrote:
>
> > Karl --
> >
> > Can you force the use of just the shared memory transport -- i.e.,
> disable the TCP BTL? For example:
> >
> > mpirun -np 2 --mca btl sm,self hello_c
> >
> > If that also hangs, can you attach a debugger and see *where* it is
> hanging inside MPI_Init()? (In OMPI, MPI::Init() simply invokes MPI_Init())
> >
> >
> > On Nov 27, 2013, at 2:56 PM, "Meredith, Karl" <
> karl.meredith_at_[hidden]<mailto:karl.meredith_at_[hidden]>> wrote:
> >
> >> /opt/trunk/apple-only/bin/ompi_info --param oob tcp --level 9
> >> MCA oob: parameter "oob_tcp_verbose" (current value:
> "0", data source: default, level: 9 dev/all, type: int)
> >> Verbose level for the OOB tcp component
> >> MCA oob: parameter "oob_tcp_peer_limit" (current value:
> "-1", data source: default, level: 9 dev/all, type: int)
> >> Maximum number of peer connections to
> simultaneously maintain (-1 = infinite)
> >> MCA oob: parameter "oob_tcp_peer_retries" (current
> value: "60", data source: default, level: 9 dev/all, type: int)
> >> Number of times to try shutting down a
> connection before giving up
> >> MCA oob: parameter "oob_tcp_debug" (current value: "0",
> data source: default, level: 9 dev/all, type: int)
> >> Enable (1) / disable (0) debugging output for
> this component
> >> MCA oob: parameter "oob_tcp_sndbuf" (current value:
> "131072", data source: default, level: 9 dev/all, type: int)
> >> TCP socket send buffering size (in bytes)
> >> MCA oob: parameter "oob_tcp_rcvbuf" (current value:
> "131072", data source: default, level: 9 dev/all, type: int)
> >> TCP socket receive buffering size (in bytes)
> >> MCA oob: parameter "oob_tcp_if_include" (current value:
> "", data source: default, level: 9 dev/all, type: string, synonyms:
> oob_tcp_include)
> >> Comma-delimited list of devices and/or CIDR
> notation of networks to use for Open MPI bootstrap communication (e.g.,
> "eth0,192.168.0.0/16<http://192.168.0.0/16>"). Mutually exclusive with
> oob_tcp_if_exclude.
> >> MCA oob: parameter "oob_tcp_if_exclude" (current value:
> "", data source: default, level: 9 dev/all, type: string, synonyms:
> oob_tcp_exclude)
> >> Comma-delimited list of devices and/or CIDR
> notation of networks to NOT use for Open MPI bootstrap communication -- all
> devices not matching these specifications will be used (e.g., "eth0,
> 192.168.0.0/16<http://192.168.0.0/16>"). If set to a non-default value,
> it is mutually exclusive with oob_tcp_if_include.
> >> MCA oob: parameter "oob_tcp_connect_sleep" (current
> value: "1", data source: default, level: 9 dev/all, type: int)
> >> Enable (1) / disable (0) random sleep for
> connection wireup.
> >> MCA oob: parameter "oob_tcp_listen_mode" (current value:
> "event", data source: default, level: 9 dev/all, type: int)
> >> Mode for HNP to accept incoming connections:
> event, listen_thread.
> >> Valid values: 0:"event", 1:"listen_thread"
> >> MCA oob: parameter "oob_tcp_listen_thread_max_queue"
> (current value: "10", data source: default, level: 9 dev/all, type: int)
> >> High water mark for queued accepted socket list
> size. Used only when listen_mode is listen_thread.
> >> MCA oob: parameter "oob_tcp_listen_thread_wait_time"
> (current value: "10", data source: default, level: 9 dev/all, type: int)
> >> Time in milliseconds to wait before actively
> checking for new connections when listen_mode is listen_thread.
> >> MCA oob: parameter "oob_tcp_static_ports" (current
> value: "", data source: default, level: 9 dev/all, type: string)
> >> Static ports for daemons and procs (IPv4)
> >> MCA oob: parameter "oob_tcp_dynamic_ports" (current
> value: "", data source: default, level: 9 dev/all, type: string)
> >> Range of ports to be dynamically used by
> daemons and procs (IPv4)
> >> MCA oob: parameter "oob_tcp_disable_family" (current
> value: "none", data source: default, level: 9 dev/all, type: int)
> >> Disable IPv4 (4) or IPv6 (6)
> >> Valid values: 0:"none", 4:"IPv4", 6:"IPv6"
> >>
> >> /opt/trunk/apple-only/bin/ompi_info --param btl tcp --level 9
> >> MCA btl: parameter "btl_tcp_links" (current value: "1",
> data source: default, level: 4 tuner/basic, type: unsigned)
> >> MCA btl: parameter "btl_tcp_if_include" (current value:
> "", data source: default, level: 1 user/basic, type: string)
> >> Comma-delimited list of devices and/or CIDR
> notation of networks to use for MPI communication (e.g., "eth0,
> 192.168.0.0/16<http://192.168.0.0/16>"). Mutually exclusive with
> btl_tcp_if_exclude.
> >> MCA btl: parameter "btl_tcp_if_exclude" (current value: "
> 127.0.0.1/8,sppp<http://127.0.0.1/8,sppp>", data source: default, level:
> 1 user/basic, type: string)
> >> Comma-delimited list of devices and/or CIDR
> notation of networks to NOT use for MPI communication -- all devices not
> matching these specifications will be used (e.g., "eth0,192.168.0.0/16<
> http://192.168.0.0/16>"). If set to a non-default value, it is mutually
> exclusive with btl_tcp_if_include.
> >> MCA btl: parameter "btl_tcp_free_list_num" (current
> value: "8", data source: default, level: 5 tuner/detail, type: int)
> >> MCA btl: parameter "btl_tcp_free_list_max" (current
> value: "-1", data source: default, level: 5 tuner/detail, type: int)
> >> MCA btl: parameter "btl_tcp_free_list_inc" (current
> value: "32", data source: default, level: 5 tuner/detail, type: int)
> >> MCA btl: parameter "btl_tcp_sndbuf" (current value:
> "131072", data source: default, level: 4 tuner/basic, type: int)
> >> MCA btl: parameter "btl_tcp_rcvbuf" (current value:
> "131072", data source: default, level: 4 tuner/basic, type: int)
> >> MCA btl: parameter "btl_tcp_endpoint_cache" (current
> value: "30720", data source: default, level: 4 tuner/basic, type: int)
> >> The size of the internal cache for each TCP
> connection. This cache is used to reduce the number of syscalls, by
> replacing them with memcpy. Every read will read the expected data plus the
> amount of the endpoint_cache
> >> MCA btl: parameter "btl_tcp_use_nagle" (current value:
> "0", data source: default, level: 4 tuner/basic, type: int)
> >> Whether to use Nagle's algorithm or not (using
> Nagle's algorithm may increase short message latency)
> >> MCA btl: parameter "btl_tcp_port_min_v4" (current value:
> "1024", data source: default, level: 2 user/detail, type: int)
> >> The minimum port where the TCP BTL will try to
> bind (default 1024)
> >> MCA btl: parameter "btl_tcp_port_range_v4" (current
> value: "64511", data source: default, level: 2 user/detail, type: int)
> >> The number of ports where the TCP BTL will try
> to bind (default 64511). This parameter together with the port min, define
> a range of ports where Open MPI will open sockets.
> >> MCA btl: parameter "btl_tcp_exclusivity" (current value:
> "100", data source: default, level: 7 dev/basic, type: unsigned)
> >> BTL exclusivity (must be >= 0)
> >> MCA btl: parameter "btl_tcp_flags" (current value:
> "314", data source: default, level: 5 tuner/detail, type: unsigned)
> >> BTL bit flags (general flags: SEND=1, PUT=2,
> GET=4, SEND_INPLACE=8, RDMA_MATCHED=64, HETEROGENEOUS_RDMA=256; flags only
> used by the "dr" PML (ignored by others): ACK=16, CHECKSUM=32,
> RDMA_COMPLETION=128; flags only used by the "bfo" PML (ignored by others):
> FAILOVER_SUPPORT=512)
> >> MCA btl: parameter "btl_tcp_rndv_eager_limit" (current
> value: "65536", data source: default, level: 4 tuner/basic, type: size_t)
> >> Size (in bytes, including header) of "phase 1"
> fragment sent for all large messages (must be >= 0 and <= eager_limit)
> >> MCA btl: parameter "btl_tcp_eager_limit" (current value:
> "65536", data source: default, level: 4 tuner/basic, type: size_t)
> >> Maximum size (in bytes, including header) of
> "short" messages (must be >= 1).
> >> MCA btl: parameter "btl_tcp_max_send_size" (current
> value: "131072", data source: default, level: 4 tuner/basic, type: size_t)
> >> Maximum size (in bytes) of a single "phase 2"
> fragment of a long message when using the pipeline protocol (must be >= 1)
> >> MCA btl: parameter "btl_tcp_rdma_pipeline_send_length"
> (current value: "131072", data source: default, level: 4 tuner/basic, type:
> size_t)
> >> Length of the "phase 2" portion of a large
> message (in bytes) when using the pipeline protocol. This part of the
> message will be split into fragments of size max_send_size and sent using
> send/receive semantics (must be >= 0; only relevant when the PUT flag is
> set)
> >> MCA btl: parameter "btl_tcp_rdma_pipeline_frag_size"
> (current value: "2147483647", data source: default, level: 4 tuner/basic,
> type: size_t)
> >> Maximum size (in bytes) of a single "phase 3"
> fragment from a long message when using the pipeline protocol. These
> fragments will be sent using RDMA semantics (must be >= 1; only relevant
> when the PUT flag is set)
> >> MCA btl: parameter "btl_tcp_min_rdma_pipeline_size"
> (current value: "196608", data source: default, level: 4 tuner/basic, type:
> size_t)
> >> Messages smaller than this size (in bytes) will
> not use the RDMA pipeline protocol. Instead, they will be split into
> fragments of max_send_size and sent using send/receive semantics (must be
> >=0, and is automatically adjusted up to at least
> (eager_limit+btl_rdma_pipeline_send_length); only relevant when the PUT
> flag is set)
> >> MCA btl: parameter "btl_tcp_bandwidth" (current value:
> "100", data source: default, level: 5 tuner/detail, type: unsigned)
> >> Approximate maximum bandwidth of interconnect
> (0 = auto-detect value at run-time [not supported in all BTL modules], >= 1
> = bandwidth in Mbps)
> >> MCA btl: parameter "btl_tcp_disable_family" (current
> value: "0", data source: default, level: 2 user/detail, type: int)
> >> MCA btl: parameter "btl_tcp_if_seq" (current value: "",
> data source: default, level: 9 dev/all, type: string)
> >> If specified, a comma-delimited list of TCP
> interfaces. Interfaces will be assigned, one to each MPI process, in a
> round-robin fashion on each server. For example, if the list is
> "eth0,eth1" and four MPI processes are run on a single server, then local
> ranks 0 and 2 will use eth0 and local ranks 1 and 3 will use eth1.
> >>
> >>
> >> On Nov 27, 2013, at 2:41 PM, George Bosilca <bosilca_at_[hidden]
> <mailto:bosilca_at_[hidden]><mailto:bosilca_at_[hidden]<mailto:
> bosilca_at_[hidden]>>> wrote:
> >>
> >> ompi_info —param oob tcp —level 9
> >> ompi_info —param btl tcp —level 9
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]<mailto:users_at_[hidden]>
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden]<mailto:jsquyres_at_[hidden]>
> > For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]<mailto:users_at_[hidden]>
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]<mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]<mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>