Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] open-mpi on Mac OS 10.9 (Mavericks)
From: Meredith, Karl (karl.meredith_at_[hidden])
Date: 2013-12-04 07:25:29


I’ve been able to find out a little more information. The problem persists regardless of which network I am connected to. I’ve tried our corporate ethernet network, wifi network, a local hotel wifi network, my home network. The problem persists over all network connections.

I tried turning off the third-party firewall on my machine (DoorStop X, required by my corporate IT department) and then everything works fine. In order to use openmpi, does the firewall need to be disabled? This seems like an odd requirement. The other odd thing is that Mac OS 10.8 didn’t show this problem, whereas Mac OS 10.9 does.

Before turning off my firewall, I have these rules

$ )sudo ipfw list
Password:
05000 allow ip from any to any via lo*
05005 allow log tcp from any to any dst-port 22 in setup
05006 allow udp from any to any dst-port 22 in
05007 allow log tcp from any to any dst-port 80 in setup
05008 allow udp from any to any dst-port 80 in
05009 deny log tcp from any to any dst-port 548 in setup
05010 deny udp from any to any dst-port 548 in
05011 allow log tcp from any to any dst-port 3031 in setup
05012 allow udp from any to any dst-port 3031 in
05013 deny log tcp from any to any dst-port 3689 in setup
05014 deny udp from any to any dst-port 3689 in
05015 deny log tcp from any to any dst-port 5298 in setup
05016 deny udp from any to any dst-port 5298 in
05017 deny log tcp from any to any dst-port 8770 in setup
05018 deny udp from any to any dst-port 8770 in
05019 deny log tcp from any to any dst-port 515 in setup
05020 deny udp from any to any dst-port 515 in
05021 deny log tcp from any to any dst-port 631 in setup
05022 deny udp from any to any dst-port 631 in
05023 allow log tcp from any to any dst-port 139 in setup
05024 allow udp from any to any dst-port 139 in
05025 allow log tcp from any to any dst-port 445 in setup
05026 allow udp from any to any dst-port 445 in
05027 allow log tcp from any to any dst-port 3283 in setup
05028 allow udp from any to any dst-port 3283 in
05029 allow log tcp from any to any dst-port 5900 in setup
05030 allow udp from any to any dst-port 5900 in
05031 deny log tcp from any to any dst-port 5060 in setup
05032 deny udp from any to any dst-port 5060 in
05033 deny log tcp from any to any dst-port 5297 in setup
05034 deny udp from any to any dst-port 5297 in
05035 deny log tcp from any to any dst-port 16384-16403 in setup
05036 deny udp from any to any dst-port 16384-16403 in
05037 allow log tcp from any to any dst-port 53 in setup
05038 allow udp from any to any dst-port 53 in
05039 allow log tcp from any to any dst-port 67-68 in setup
05040 allow udp from any to any dst-port 67-68 in
05041 allow log tcp from any to any dst-port 123 in setup
05042 allow udp from any to any dst-port 123 in
05043 allow log tcp from any to any dst-port 5353 in setup
05044 allow udp from any to any dst-port 5353 in
64000 deny log tcp from any to any in setup
64001 deny udp from any to any dst-port 1-1023 in
65535 allow ip from any to any

After turning off the firewall, I have these rules:

$ )sudo ipfw list
Password:
65535 allow ip from any to any

And the test examples run fine:
$ )mpirun -np 2 hello_cxx
Hello, world! I am 0 of 2(Open MPI v1.7.3, package: Open MPI meredithk_at_[hidden]<mailto:meredithk_at_[hidden]> Distribution, ident: 1.7.3, Oct 17, 2013, 118)
Hello, world! I am 1 of 2(Open MPI v1.7.3, package: Open MPI meredithk_at_[hidden]<mailto:meredithk_at_[hidden]> Distribution, ident: 1.7.3, Oct 17, 2013, 118)

Our local IT expert believes that this problem is related to this bug from way back in openmpi 1.2.3, but it seems like the patch was never implemented:
http://www.open-mpi.org/community/lists/users/2007/05/3344.php

I’m not good enough with networking to be able to tell if it is related or not.

Karl

On Dec 3, 2013, at 8:16 AM, Ralph Castain <rhc_at_[hidden]<mailto:rhc_at_[hidden]>> wrote:

Best guess I can offer is that they are blocking loopback on those networks - i.e., they are configured such that you can use them to connect to a remote machine, but not to a process on your local machine. I'll take a look at the connection logic and see if I can get it to failover to the loopback device in that case. I believe we disable use of the loopback if an active TCP network is available as we expect it to include loopback capability.

Meantime, you might want to talk to your IT folks and see if that is correct and intentional - and if so, why.

On Tue, Dec 3, 2013 at 5:04 AM, Meredith, Karl <karl.meredith_at_[hidden]<mailto:karl.meredith_at_[hidden]>> wrote:
I disconnected for our corporate network (ethernet connection) and tried running again: same result, it stalls.

Then, I also disconnected from our local wifi network and tried running again: it worked!

bash-4.2$ mpirun -np 2 --mca btl sm,self hello_c
Hello, world, I am 0 of 2, (Open MPI v1.7.4a1, package: Open MPI meredithk_at_[hidden]<mailto:meredithk_at_[hidden]><mailto:meredithk_at_[hidden]<mailto:meredithk_at_[hidden]>> Distribution, ident: 1.7.4a1r29784, repo rev: r29784, Dec 02, 2013 (nightly snapshot tarball), 173)
Hello, world, I am 1 of 2, (Open MPI v1.7.4a1, package: Open MPI meredithk_at_[hidden]<mailto:meredithk_at_[hidden]><mailto:meredithk_at_[hidden]<mailto:meredithk_at_[hidden]>> Distribution, ident: 1.7.4a1r29784, repo rev: r29784, Dec 02, 2013 (nightly snapshot tarball), 173)
bash-4.2$ mpirun -np 2 hello_c
Hello, world, I am 0 of 2, (Open MPI v1.7.4a1, package: Open MPI meredithk_at_[hidden]<mailto:meredithk_at_[hidden]><mailto:meredithk_at_[hidden]<mailto:meredithk_at_[hidden]>> Distribution, ident: 1.7.4a1r29784, repo rev: r29784, Dec 02, 2013 (nightly snapshot tarball), 173)
Hello, world, I am 1 of 2, (Open MPI v1.7.4a1, package: Open MPI meredithk_at_[hidden]<mailto:meredithk_at_[hidden]><mailto:meredithk_at_[hidden]<mailto:meredithk_at_[hidden]>> Distribution, ident: 1.7.4a1r29784, repo rev: r29784, Dec 02, 2013 (nightly snapshot tarball), 173)

Why? What would be causing the network to be interfering with mpirun? Do you have any insight?

Karl

On Dec 3, 2013, at 7:54 AM, Ralph Castain <rhc_at_[hidden]<mailto:rhc_at_[hidden]><mailto:rhc_at_[hidden]<mailto:rhc_at_[hidden]>>> wrote:

Hmmm...are you connected to a network, or at least have a network active, when you do this? It looks a little like the system is trying to open a port between the process and mpirun, but is failing to do so.

On Tue, Dec 3, 2013 at 4:51 AM, Meredith, Karl <karl.meredith_at_[hidden]<mailto:karl.meredith_at_[hidden]><mailto:karl.meredith_at_[hidden]<mailto:karl.meredith_at_[hidden]>>> wrote:
Using openmpi-1.7.4, no macports, only apple compilers/tools:

mpirun -np 2 --mca btl sm,self hello_c

This hangs, also in MPI_Init().

Here’s the back trace from the debugger:

bash-4.2$ lldb -p 4517
Attaching to process with:
    process attach -p 4517
Process 4517 stopped
Executable module set to "/Users/meredithk/tools/openmpi-1.7.4a1r29784/examples/hello_c".
Architecture set to: x86_64-apple-macosx.
(lldb) bt
* thread #1: tid = 0x57efb, 0x00007fff8c991a3a libsystem_kernel.dylib`__semwait_signal + 10, queue = 'com.apple.main-thread, stop reason = signal SIGSTOP
    frame #0: 0x00007fff8c991a3a libsystem_kernel.dylib`__semwait_signal + 10
    frame #1: 0x00007fff8ade4e60 libsystem_c.dylib`nanosleep + 200
    frame #2: 0x0000000108d668e3 libopen-rte.6.dylib`orte_routed_base_register_sync(setup=true) + 2435 at routed_base_fns.c:344
    frame #3: 0x000000010904e3a7 mca_routed_binomial.so`init_routes(job=1294401537, ndat=0x0000000000000000) + 2759 at routed_binomial.c:708
    frame #4: 0x0000000108d1b84d libopen-rte.6.dylib`orte_ess_base_app_setup(db_restrict_local=true) + 2109 at ess_base_std_app.c:233
    frame #5: 0x0000000108fbc442 mca_ess_env.so`rte_init + 418 at ess_env_module.c:146
    frame #6: 0x0000000108cd6cfe libopen-rte.6.dylib`orte_init(pargc=0x0000000000000000, pargv=0x0000000000000000, flags=32) + 718 at orte_init.c:158
    frame #7: 0x0000000108a3b3c8 libmpi.1.dylib`ompi_mpi_init(argc=1, argv=0x00007fff57200508, requested=0, provided=0x00007fff57200360) + 616 at ompi_mpi_init.c:451
    frame #8: 0x0000000108a895a0 libmpi.1.dylib`MPI_Init(argc=0x00007fff572004d0, argv=0x00007fff572004c8) + 480 at init.c:84
    frame #9: 0x00000001089ffe4a hello_c`main(argc=1, argv=0x00007fff57200508) + 58 at hello_c.c:18
    frame #10: 0x00007fff8d5df5fd libdyld.dylib`start + 1
    frame #11: 0x00007fff8d5df5fd libdyld.dylib`start + 1

On Dec 2, 2013, at 2:11 PM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]<mailto:jsquyres_at_[hidden]><mailto:jsquyres_at_[hidden]<mailto:jsquyres_at_[hidden]>>> wrote:

> Karl --
>
> Can you force the use of just the shared memory transport -- i.e., disable the TCP BTL? For example:
>
> mpirun -np 2 --mca btl sm,self hello_c
>
> If that also hangs, can you attach a debugger and see *where* it is hanging inside MPI_Init()? (In OMPI, MPI::Init() simply invokes MPI_Init())
>
>
> On Nov 27, 2013, at 2:56 PM, "Meredith, Karl" <karl.meredith_at_[hidden]<mailto:karl.meredith_at_[hidden]><mailto:karl.meredith_at_[hidden]<mailto:karl.meredith_at_[hidden]>>> wrote:
>
>> /opt/trunk/apple-only/bin/ompi_info --param oob tcp --level 9
>> MCA oob: parameter "oob_tcp_verbose" (current value: "0", data source: default, level: 9 dev/all, type: int)
>> Verbose level for the OOB tcp component
>> MCA oob: parameter "oob_tcp_peer_limit" (current value: "-1", data source: default, level: 9 dev/all, type: int)
>> Maximum number of peer connections to simultaneously maintain (-1 = infinite)
>> MCA oob: parameter "oob_tcp_peer_retries" (current value: "60", data source: default, level: 9 dev/all, type: int)
>> Number of times to try shutting down a connection before giving up
>> MCA oob: parameter "oob_tcp_debug" (current value: "0", data source: default, level: 9 dev/all, type: int)
>> Enable (1) / disable (0) debugging output for this component
>> MCA oob: parameter "oob_tcp_sndbuf" (current value: "131072", data source: default, level: 9 dev/all, type: int)
>> TCP socket send buffering size (in bytes)
>> MCA oob: parameter "oob_tcp_rcvbuf" (current value: "131072", data source: default, level: 9 dev/all, type: int)
>> TCP socket receive buffering size (in bytes)
>> MCA oob: parameter "oob_tcp_if_include" (current value: "", data source: default, level: 9 dev/all, type: string, synonyms: oob_tcp_include)
>> Comma-delimited list of devices and/or CIDR notation of networks to use for Open MPI bootstrap communication (e.g., "eth0,192.168.0.0/16<http://192.168.0.0/16><http://192.168.0.0/16>"). Mutually exclusive with oob_tcp_if_exclude.
>> MCA oob: parameter "oob_tcp_if_exclude" (current value: "", data source: default, level: 9 dev/all, type: string, synonyms: oob_tcp_exclude)
>> Comma-delimited list of devices and/or CIDR notation of networks to NOT use for Open MPI bootstrap communication -- all devices not matching these specifications will be used (e.g., "eth0,192.168.0.0/16<http://192.168.0.0/16><http://192.168.0.0/16>"). If set to a non-default value, it is mutually exclusive with oob_tcp_if_include.
>> MCA oob: parameter "oob_tcp_connect_sleep" (current value: "1", data source: default, level: 9 dev/all, type: int)
>> Enable (1) / disable (0) random sleep for connection wireup.
>> MCA oob: parameter "oob_tcp_listen_mode" (current value: "event", data source: default, level: 9 dev/all, type: int)
>> Mode for HNP to accept incoming connections: event, listen_thread.
>> Valid values: 0:"event", 1:"listen_thread"
>> MCA oob: parameter "oob_tcp_listen_thread_max_queue" (current value: "10", data source: default, level: 9 dev/all, type: int)
>> High water mark for queued accepted socket list size. Used only when listen_mode is listen_thread.
>> MCA oob: parameter "oob_tcp_listen_thread_wait_time" (current value: "10", data source: default, level: 9 dev/all, type: int)
>> Time in milliseconds to wait before actively checking for new connections when listen_mode is listen_thread.
>> MCA oob: parameter "oob_tcp_static_ports" (current value: "", data source: default, level: 9 dev/all, type: string)
>> Static ports for daemons and procs (IPv4)
>> MCA oob: parameter "oob_tcp_dynamic_ports" (current value: "", data source: default, level: 9 dev/all, type: string)
>> Range of ports to be dynamically used by daemons and procs (IPv4)
>> MCA oob: parameter "oob_tcp_disable_family" (current value: "none", data source: default, level: 9 dev/all, type: int)
>> Disable IPv4 (4) or IPv6 (6)
>> Valid values: 0:"none", 4:"IPv4", 6:"IPv6"
>>
>> /opt/trunk/apple-only/bin/ompi_info --param btl tcp --level 9
>> MCA btl: parameter "btl_tcp_links" (current value: "1", data source: default, level: 4 tuner/basic, type: unsigned)
>> MCA btl: parameter "btl_tcp_if_include" (current value: "", data source: default, level: 1 user/basic, type: string)
>> Comma-delimited list of devices and/or CIDR notation of networks to use for MPI communication (e.g., "eth0,192.168.0.0/16<http://192.168.0.0/16><http://192.168.0.0/16>"). Mutually exclusive with btl_tcp_if_exclude.
>> MCA btl: parameter "btl_tcp_if_exclude" (current value: "127.0.0.1/8,sppp<http://127.0.0.1/8,sppp><http://127.0.0.1/8,sppp>", data source: default, level: 1 user/basic, type: string)
>> Comma-delimited list of devices and/or CIDR notation of networks to NOT use for MPI communication -- all devices not matching these specifications will be used (e.g., "eth0,192.168.0.0/16<http://192.168.0.0/16><http://192.168.0.0/16>"). If set to a non-default value, it is mutually exclusive with btl_tcp_if_include.
>> MCA btl: parameter "btl_tcp_free_list_num" (current value: "8", data source: default, level: 5 tuner/detail, type: int)
>> MCA btl: parameter "btl_tcp_free_list_max" (current value: "-1", data source: default, level: 5 tuner/detail, type: int)
>> MCA btl: parameter "btl_tcp_free_list_inc" (current value: "32", data source: default, level: 5 tuner/detail, type: int)
>> MCA btl: parameter "btl_tcp_sndbuf" (current value: "131072", data source: default, level: 4 tuner/basic, type: int)
>> MCA btl: parameter "btl_tcp_rcvbuf" (current value: "131072", data source: default, level: 4 tuner/basic, type: int)
>> MCA btl: parameter "btl_tcp_endpoint_cache" (current value: "30720", data source: default, level: 4 tuner/basic, type: int)
>> The size of the internal cache for each TCP connection. This cache is used to reduce the number of syscalls, by replacing them with memcpy. Every read will read the expected data plus the amount of the endpoint_cache
>> MCA btl: parameter "btl_tcp_use_nagle" (current value: "0", data source: default, level: 4 tuner/basic, type: int)
>> Whether to use Nagle's algorithm or not (using Nagle's algorithm may increase short message latency)
>> MCA btl: parameter "btl_tcp_port_min_v4" (current value: "1024", data source: default, level: 2 user/detail, type: int)
>> The minimum port where the TCP BTL will try to bind (default 1024)
>> MCA btl: parameter "btl_tcp_port_range_v4" (current value: "64511", data source: default, level: 2 user/detail, type: int)
>> The number of ports where the TCP BTL will try to bind (default 64511). This parameter together with the port min, define a range of ports where Open MPI will open sockets.
>> MCA btl: parameter "btl_tcp_exclusivity" (current value: "100", data source: default, level: 7 dev/basic, type: unsigned)
>> BTL exclusivity (must be >= 0)
>> MCA btl: parameter "btl_tcp_flags" (current value: "314", data source: default, level: 5 tuner/detail, type: unsigned)
>> BTL bit flags (general flags: SEND=1, PUT=2, GET=4, SEND_INPLACE=8, RDMA_MATCHED=64, HETEROGENEOUS_RDMA=256; flags only used by the "dr" PML (ignored by others): ACK=16, CHECKSUM=32, RDMA_COMPLETION=128; flags only used by the "bfo" PML (ignored by others): FAILOVER_SUPPORT=512)
>> MCA btl: parameter "btl_tcp_rndv_eager_limit" (current value: "65536", data source: default, level: 4 tuner/basic, type: size_t)
>> Size (in bytes, including header) of "phase 1" fragment sent for all large messages (must be >= 0 and <= eager_limit)
>> MCA btl: parameter "btl_tcp_eager_limit" (current value: "65536", data source: default, level: 4 tuner/basic, type: size_t)
>> Maximum size (in bytes, including header) of "short" messages (must be >= 1).
>> MCA btl: parameter "btl_tcp_max_send_size" (current value: "131072", data source: default, level: 4 tuner/basic, type: size_t)
>> Maximum size (in bytes) of a single "phase 2" fragment of a long message when using the pipeline protocol (must be >= 1)
>> MCA btl: parameter "btl_tcp_rdma_pipeline_send_length" (current value: "131072", data source: default, level: 4 tuner/basic, type: size_t)
>> Length of the "phase 2" portion of a large message (in bytes) when using the pipeline protocol. This part of the message will be split into fragments of size max_send_size and sent using send/receive semantics (must be >= 0; only relevant when the PUT flag is set)
>> MCA btl: parameter "btl_tcp_rdma_pipeline_frag_size" (current value: "2147483647", data source: default, level: 4 tuner/basic, type: size_t)
>> Maximum size (in bytes) of a single "phase 3" fragment from a long message when using the pipeline protocol. These fragments will be sent using RDMA semantics (must be >= 1; only relevant when the PUT flag is set)
>> MCA btl: parameter "btl_tcp_min_rdma_pipeline_size" (current value: "196608", data source: default, level: 4 tuner/basic, type: size_t)
>> Messages smaller than this size (in bytes) will not use the RDMA pipeline protocol. Instead, they will be split into fragments of max_send_size and sent using send/receive semantics (must be >=0, and is automatically adjusted up to at least (eager_limit+btl_rdma_pipeline_send_length); only relevant when the PUT flag is set)
>> MCA btl: parameter "btl_tcp_bandwidth" (current value: "100", data source: default, level: 5 tuner/detail, type: unsigned)
>> Approximate maximum bandwidth of interconnect (0 = auto-detect value at run-time [not supported in all BTL modules], >= 1 = bandwidth in Mbps)
>> MCA btl: parameter "btl_tcp_disable_family" (current value: "0", data source: default, level: 2 user/detail, type: int)
>> MCA btl: parameter "btl_tcp_if_seq" (current value: "", data source: default, level: 9 dev/all, type: string)
>> If specified, a comma-delimited list of TCP interfaces. Interfaces will be assigned, one to each MPI process, in a round-robin fashion on each server. For example, if the list is "eth0,eth1" and four MPI processes are run on a single server, then local ranks 0 and 2 will use eth0 and local ranks 1 and 3 will use eth1.
>>
>>
>> On Nov 27, 2013, at 2:41 PM, George Bosilca <bosilca_at_[hidden]<mailto:bosilca_at_[hidden]><mailto:bosilca_at_[hidden]<mailto:bosilca_at_[hidden]>><mailto:bosilca_at_[hidden]<mailto:bosilca_at_[hidden]><mailto:bosilca_at_[hidden]<mailto:bosilca_at_[hidden]>>>> wrote:
>>
>> ompi_info —param oob tcp —level 9
>> ompi_info —param btl tcp —level 9
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]<mailto:users_at_[hidden]><mailto:users_at_[hidden]<mailto:users_at_[hidden]>>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]<mailto:jsquyres_at_[hidden]><mailto:jsquyres_at_[hidden]<mailto:jsquyres_at_[hidden]>>
> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> users mailing list
> users_at_[hidden]<mailto:users_at_[hidden]><mailto:users_at_[hidden]<mailto:users_at_[hidden]>>
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users_at_[hidden]<mailto:users_at_[hidden]><mailto:users_at_[hidden]<mailto:users_at_[hidden]>>
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users_at_[hidden]<mailto:users_at_[hidden]><mailto:users_at_[hidden]<mailto:users_at_[hidden]>>
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users_at_[hidden]<mailto:users_at_[hidden]>
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users_at_[hidden]<mailto:users_at_[hidden]>
http://www.open-mpi.org/mailman/listinfo.cgi/users