Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] open-mpi on Mac OS 10.9 (Mavericks)
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-12-03 07:54:54


Hmmm...are you connected to a network, or at least have a network active,
when you do this? It looks a little like the system is trying to open a
port between the process and mpirun, but is failing to do so.

On Tue, Dec 3, 2013 at 4:51 AM, Meredith, Karl
<karl.meredith_at_[hidden]>wrote:

> Using openmpi-1.7.4, no macports, only apple compilers/tools:
>
> mpirun -np 2 --mca btl sm,self hello_c
>
> This hangs, also in MPI_Init().
>
> Here’s the back trace from the debugger:
>
> bash-4.2$ lldb -p 4517
> Attaching to process with:
> process attach -p 4517
> Process 4517 stopped
> Executable module set to
> "/Users/meredithk/tools/openmpi-1.7.4a1r29784/examples/hello_c".
> Architecture set to: x86_64-apple-macosx.
> (lldb) bt
> * thread #1: tid = 0x57efb, 0x00007fff8c991a3a
> libsystem_kernel.dylib`__semwait_signal + 10, queue =
> 'com.apple.main-thread, stop reason = signal SIGSTOP
> frame #0: 0x00007fff8c991a3a libsystem_kernel.dylib`__semwait_signal +
> 10
> frame #1: 0x00007fff8ade4e60 libsystem_c.dylib`nanosleep + 200
> frame #2: 0x0000000108d668e3
> libopen-rte.6.dylib`orte_routed_base_register_sync(setup=true) + 2435 at
> routed_base_fns.c:344
> frame #3: 0x000000010904e3a7
> mca_routed_binomial.so`init_routes(job=1294401537, ndat=0x0000000000000000)
> + 2759 at routed_binomial.c:708
> frame #4: 0x0000000108d1b84d
> libopen-rte.6.dylib`orte_ess_base_app_setup(db_restrict_local=true) + 2109
> at ess_base_std_app.c:233
> frame #5: 0x0000000108fbc442 mca_ess_env.so`rte_init + 418 at
> ess_env_module.c:146
> frame #6: 0x0000000108cd6cfe
> libopen-rte.6.dylib`orte_init(pargc=0x0000000000000000,
> pargv=0x0000000000000000, flags=32) + 718 at orte_init.c:158
> frame #7: 0x0000000108a3b3c8 libmpi.1.dylib`ompi_mpi_init(argc=1,
> argv=0x00007fff57200508, requested=0, provided=0x00007fff57200360) + 616 at
> ompi_mpi_init.c:451
> frame #8: 0x0000000108a895a0
> libmpi.1.dylib`MPI_Init(argc=0x00007fff572004d0, argv=0x00007fff572004c8) +
> 480 at init.c:84
> frame #9: 0x00000001089ffe4a hello_c`main(argc=1,
> argv=0x00007fff57200508) + 58 at hello_c.c:18
> frame #10: 0x00007fff8d5df5fd libdyld.dylib`start + 1
> frame #11: 0x00007fff8d5df5fd libdyld.dylib`start + 1
>
>
> On Dec 2, 2013, at 2:11 PM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]>
> wrote:
>
> > Karl --
> >
> > Can you force the use of just the shared memory transport -- i.e.,
> disable the TCP BTL? For example:
> >
> > mpirun -np 2 --mca btl sm,self hello_c
> >
> > If that also hangs, can you attach a debugger and see *where* it is
> hanging inside MPI_Init()? (In OMPI, MPI::Init() simply invokes MPI_Init())
> >
> >
> > On Nov 27, 2013, at 2:56 PM, "Meredith, Karl" <
> karl.meredith_at_[hidden]> wrote:
> >
> >> /opt/trunk/apple-only/bin/ompi_info --param oob tcp --level 9
> >> MCA oob: parameter "oob_tcp_verbose" (current value:
> "0", data source: default, level: 9 dev/all, type: int)
> >> Verbose level for the OOB tcp component
> >> MCA oob: parameter "oob_tcp_peer_limit" (current value:
> "-1", data source: default, level: 9 dev/all, type: int)
> >> Maximum number of peer connections to
> simultaneously maintain (-1 = infinite)
> >> MCA oob: parameter "oob_tcp_peer_retries" (current
> value: "60", data source: default, level: 9 dev/all, type: int)
> >> Number of times to try shutting down a
> connection before giving up
> >> MCA oob: parameter "oob_tcp_debug" (current value: "0",
> data source: default, level: 9 dev/all, type: int)
> >> Enable (1) / disable (0) debugging output for
> this component
> >> MCA oob: parameter "oob_tcp_sndbuf" (current value:
> "131072", data source: default, level: 9 dev/all, type: int)
> >> TCP socket send buffering size (in bytes)
> >> MCA oob: parameter "oob_tcp_rcvbuf" (current value:
> "131072", data source: default, level: 9 dev/all, type: int)
> >> TCP socket receive buffering size (in bytes)
> >> MCA oob: parameter "oob_tcp_if_include" (current value:
> "", data source: default, level: 9 dev/all, type: string, synonyms:
> oob_tcp_include)
> >> Comma-delimited list of devices and/or CIDR
> notation of networks to use for Open MPI bootstrap communication (e.g.,
> "eth0,192.168.0.0/16"). Mutually exclusive with oob_tcp_if_exclude.
> >> MCA oob: parameter "oob_tcp_if_exclude" (current value:
> "", data source: default, level: 9 dev/all, type: string, synonyms:
> oob_tcp_exclude)
> >> Comma-delimited list of devices and/or CIDR
> notation of networks to NOT use for Open MPI bootstrap communication -- all
> devices not matching these specifications will be used (e.g., "eth0,
> 192.168.0.0/16"). If set to a non-default value, it is mutually
> exclusive with oob_tcp_if_include.
> >> MCA oob: parameter "oob_tcp_connect_sleep" (current
> value: "1", data source: default, level: 9 dev/all, type: int)
> >> Enable (1) / disable (0) random sleep for
> connection wireup.
> >> MCA oob: parameter "oob_tcp_listen_mode" (current value:
> "event", data source: default, level: 9 dev/all, type: int)
> >> Mode for HNP to accept incoming connections:
> event, listen_thread.
> >> Valid values: 0:"event", 1:"listen_thread"
> >> MCA oob: parameter "oob_tcp_listen_thread_max_queue"
> (current value: "10", data source: default, level: 9 dev/all, type: int)
> >> High water mark for queued accepted socket list
> size. Used only when listen_mode is listen_thread.
> >> MCA oob: parameter "oob_tcp_listen_thread_wait_time"
> (current value: "10", data source: default, level: 9 dev/all, type: int)
> >> Time in milliseconds to wait before actively
> checking for new connections when listen_mode is listen_thread.
> >> MCA oob: parameter "oob_tcp_static_ports" (current
> value: "", data source: default, level: 9 dev/all, type: string)
> >> Static ports for daemons and procs (IPv4)
> >> MCA oob: parameter "oob_tcp_dynamic_ports" (current
> value: "", data source: default, level: 9 dev/all, type: string)
> >> Range of ports to be dynamically used by
> daemons and procs (IPv4)
> >> MCA oob: parameter "oob_tcp_disable_family" (current
> value: "none", data source: default, level: 9 dev/all, type: int)
> >> Disable IPv4 (4) or IPv6 (6)
> >> Valid values: 0:"none", 4:"IPv4", 6:"IPv6"
> >>
> >> /opt/trunk/apple-only/bin/ompi_info --param btl tcp --level 9
> >> MCA btl: parameter "btl_tcp_links" (current value: "1",
> data source: default, level: 4 tuner/basic, type: unsigned)
> >> MCA btl: parameter "btl_tcp_if_include" (current value:
> "", data source: default, level: 1 user/basic, type: string)
> >> Comma-delimited list of devices and/or CIDR
> notation of networks to use for MPI communication (e.g., "eth0,
> 192.168.0.0/16"). Mutually exclusive with btl_tcp_if_exclude.
> >> MCA btl: parameter "btl_tcp_if_exclude" (current value: "
> 127.0.0.1/8,sppp", data source: default, level: 1 user/basic, type:
> string)
> >> Comma-delimited list of devices and/or CIDR
> notation of networks to NOT use for MPI communication -- all devices not
> matching these specifications will be used (e.g., "eth0,192.168.0.0/16").
> If set to a non-default value, it is mutually exclusive with
> btl_tcp_if_include.
> >> MCA btl: parameter "btl_tcp_free_list_num" (current
> value: "8", data source: default, level: 5 tuner/detail, type: int)
> >> MCA btl: parameter "btl_tcp_free_list_max" (current
> value: "-1", data source: default, level: 5 tuner/detail, type: int)
> >> MCA btl: parameter "btl_tcp_free_list_inc" (current
> value: "32", data source: default, level: 5 tuner/detail, type: int)
> >> MCA btl: parameter "btl_tcp_sndbuf" (current value:
> "131072", data source: default, level: 4 tuner/basic, type: int)
> >> MCA btl: parameter "btl_tcp_rcvbuf" (current value:
> "131072", data source: default, level: 4 tuner/basic, type: int)
> >> MCA btl: parameter "btl_tcp_endpoint_cache" (current
> value: "30720", data source: default, level: 4 tuner/basic, type: int)
> >> The size of the internal cache for each TCP
> connection. This cache is used to reduce the number of syscalls, by
> replacing them with memcpy. Every read will read the expected data plus the
> amount of the endpoint_cache
> >> MCA btl: parameter "btl_tcp_use_nagle" (current value:
> "0", data source: default, level: 4 tuner/basic, type: int)
> >> Whether to use Nagle's algorithm or not (using
> Nagle's algorithm may increase short message latency)
> >> MCA btl: parameter "btl_tcp_port_min_v4" (current value:
> "1024", data source: default, level: 2 user/detail, type: int)
> >> The minimum port where the TCP BTL will try to
> bind (default 1024)
> >> MCA btl: parameter "btl_tcp_port_range_v4" (current
> value: "64511", data source: default, level: 2 user/detail, type: int)
> >> The number of ports where the TCP BTL will try
> to bind (default 64511). This parameter together with the port min, define
> a range of ports where Open MPI will open sockets.
> >> MCA btl: parameter "btl_tcp_exclusivity" (current value:
> "100", data source: default, level: 7 dev/basic, type: unsigned)
> >> BTL exclusivity (must be >= 0)
> >> MCA btl: parameter "btl_tcp_flags" (current value:
> "314", data source: default, level: 5 tuner/detail, type: unsigned)
> >> BTL bit flags (general flags: SEND=1, PUT=2,
> GET=4, SEND_INPLACE=8, RDMA_MATCHED=64, HETEROGENEOUS_RDMA=256; flags only
> used by the "dr" PML (ignored by others): ACK=16, CHECKSUM=32,
> RDMA_COMPLETION=128; flags only used by the "bfo" PML (ignored by others):
> FAILOVER_SUPPORT=512)
> >> MCA btl: parameter "btl_tcp_rndv_eager_limit" (current
> value: "65536", data source: default, level: 4 tuner/basic, type: size_t)
> >> Size (in bytes, including header) of "phase 1"
> fragment sent for all large messages (must be >= 0 and <= eager_limit)
> >> MCA btl: parameter "btl_tcp_eager_limit" (current value:
> "65536", data source: default, level: 4 tuner/basic, type: size_t)
> >> Maximum size (in bytes, including header) of
> "short" messages (must be >= 1).
> >> MCA btl: parameter "btl_tcp_max_send_size" (current
> value: "131072", data source: default, level: 4 tuner/basic, type: size_t)
> >> Maximum size (in bytes) of a single "phase 2"
> fragment of a long message when using the pipeline protocol (must be >= 1)
> >> MCA btl: parameter "btl_tcp_rdma_pipeline_send_length"
> (current value: "131072", data source: default, level: 4 tuner/basic, type:
> size_t)
> >> Length of the "phase 2" portion of a large
> message (in bytes) when using the pipeline protocol. This part of the
> message will be split into fragments of size max_send_size and sent using
> send/receive semantics (must be >= 0; only relevant when the PUT flag is
> set)
> >> MCA btl: parameter "btl_tcp_rdma_pipeline_frag_size"
> (current value: "2147483647", data source: default, level: 4 tuner/basic,
> type: size_t)
> >> Maximum size (in bytes) of a single "phase 3"
> fragment from a long message when using the pipeline protocol. These
> fragments will be sent using RDMA semantics (must be >= 1; only relevant
> when the PUT flag is set)
> >> MCA btl: parameter "btl_tcp_min_rdma_pipeline_size"
> (current value: "196608", data source: default, level: 4 tuner/basic, type:
> size_t)
> >> Messages smaller than this size (in bytes) will
> not use the RDMA pipeline protocol. Instead, they will be split into
> fragments of max_send_size and sent using send/receive semantics (must be
> >=0, and is automatically adjusted up to at least
> (eager_limit+btl_rdma_pipeline_send_length); only relevant when the PUT
> flag is set)
> >> MCA btl: parameter "btl_tcp_bandwidth" (current value:
> "100", data source: default, level: 5 tuner/detail, type: unsigned)
> >> Approximate maximum bandwidth of interconnect
> (0 = auto-detect value at run-time [not supported in all BTL modules], >= 1
> = bandwidth in Mbps)
> >> MCA btl: parameter "btl_tcp_disable_family" (current
> value: "0", data source: default, level: 2 user/detail, type: int)
> >> MCA btl: parameter "btl_tcp_if_seq" (current value: "",
> data source: default, level: 9 dev/all, type: string)
> >> If specified, a comma-delimited list of TCP
> interfaces. Interfaces will be assigned, one to each MPI process, in a
> round-robin fashion on each server. For example, if the list is
> "eth0,eth1" and four MPI processes are run on a single server, then local
> ranks 0 and 2 will use eth0 and local ranks 1 and 3 will use eth1.
> >>
> >>
> >> On Nov 27, 2013, at 2:41 PM, George Bosilca <bosilca_at_[hidden]
> <mailto:bosilca_at_[hidden]>> wrote:
> >>
> >> ompi_info —param oob tcp —level 9
> >> ompi_info —param btl tcp —level 9
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden]
> > For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>