We have run into the following problem:
- start up Open MPI application on a laptop
- disconnect from network
- application hangs
I believe that the problem is that all sockets created by Open MPI are bound to the external network interface.
For example, when I start up a 2 process MPI job on my Mac (no hosts specified), I get the following tcp
connections. 192.168.5.2 is an address on my LAN.
tcp4 0 0 192.168.5.2.49459 192.168.5.2.49463 ESTABLISHED
tcp4 0 0 192.168.5.2.49463 192.168.5.2.49459 ESTABLISHED
tcp4 0 0 192.168.5.2.49456 192.168.5.2.49462 ESTABLISHED
tcp4 0 0 192.168.5.2.49462 192.168.5.2.49456 ESTABLISHED
tcp4 0 0 192.168.5.2.49456 192.168.5.2.49460 ESTABLISHED
tcp4 0 0 192.168.5.2.49460 192.168.5.2.49456 ESTABLISHED
tcp4 0 0 192.168.5.2.49456 192.168.5.2.49458 ESTABLISHED
tcp4 0 0 192.168.5.2.49458 192.168.5.2.49456 ESTABLISHED
Since this application is confined to a single machine, I would like it to use 127.0.0.1,
which will remain available as the laptop moves around. I am unable to force it to bind
sockets to this address, however.
Some of the things I've tried are:
- explicitly setting the hostname to 127.0.0.1 (--host 127.0.0.1)
- turning off the tcp btl (--mca btl ^tcp) and other variations (--mca btl self,sm)
- using --mca oob_tcp_include lo0
The first two have no effect. The last one results in an error message of:
[myhost.locall:05830] [0,0,0] mca_oob_tcp_init: invalid address '' returned for selected oob interfaces.
Is there any way to force Open MPI to bind all sockets to 127.0.0.1?
As a side question -- I'm curious what all of these tcp connections are used for. As I increase the number
of processes, it looks like there are 4 sockets created per MPI process, without using the tcp btl.
Perhaps stdin/out/err + control?